Sis Eng

Download as pdf or txt
Download as pdf or txt
You are on page 1of 163

SPEECH INTERACTIVE SYSTEM

Speech Interactive System

Портативный цифровой стереофонический диктофон

STC-H203M

SIS 2М
ГНОМ
Version 7.x

Speech signal editing, analysis


and noise reduction software
Руководство по эксплуатации
ЦВАУ 467669,008РЭ
User Manual
COPYRIGHT
Dear Customer,
Thank you for purchasing this product!
For optimum performance and safety
please read this User Manual carefully.

Copyright
Copyright © 1993 - 2009 by Speech Technology Center Limited (STC Ltd.). All rights
reserved. SIS Speech Interactive Software in a part or as a whole may be used
according to the corresponding license. To receive more copies or other information,
please contact STC.

Disclaimer
Speech Technology Center accepts no liability whatsoever for any loss or injury
incurred by the owner or by any third party while using this SIS software and
specifically disclaims any warranties, merchantability or fitness for any particular
purpose.

The contents of the SIS software and User Manual are subject to change without
notice.
TABLE OF CONTENTS

TABLE OF CONTENTS

SIS 7.0.X END-USER LICENSE ...................................................................................................... 1


1 INTRODUCTION.......................................................................................................................... 2
2 SYSTEM CAPABILITIES ............................................................................................................. 3
3 SYSTEM INSTALLATION AND ADJUSTMENT .......................................................................... 5
3.1 S y s t e m r e q u i r e m e n t s .................................................................................................. 5
3.2 I n s t a l l i n g S I S h a r d w a r e ............................................................................................. 5
3.3 I n s t a l l i n g S I S s o f t w a r e ............................................................................................... 5
3.4 A d j u s t i n g d i s p l a y c o l o r ............................................................................................... 5
3.5 C o n f i g u r i n g I / O s o u n d c a r d s .................................................................................... 6
4 GETTING STARTED.................................................................................................................... 8
4.1 L o a d i n g s y s t e m c o n f i g u r a t i o n ................................................................................. 8
4.2 C h o o s i n g i n t e r f a c e l a n g u a g e ................................................................................... 8
4.3 C h a n g i n g f o n t s i z e o f d i a l o g w i n d o w s .................................................................. 9
4.4 C h a n g i n g b u t t o n s s i z e a n d s t y l e ............................................................................ 9
4.5 S I S m a i n s c r e e n ........................................................................................................... 9
4.6 U s i n g s o u r c e - d e s t i n a t i o n m e n u s .......................................................................... 13
5 WORKING WITH DATA WINDOWS .......................................................................................... 16
5 . 1 C r e a t i n g n e w w i n d o w ................................................................................................ 16
5 . 2 C o n t e x t u a l m e n u o f a c t i v e w i n d o w ...................................................................... 16
5 . 3 S w i t c h i n g b e t w e e n w i n d o w s ................................................................................... 17
5.3.1 Activating window using the mouse ........................................................................ 17
5.3.2 Switching between windows using the menu ......................................................... 17
5.3.3 Switching between windows using the windows list .............................................. 18
5.3.4 Appending comments to data box ........................................................................... 18
5.3.5 Appending marks text panel to data box ................................................................ 19
5.3.5 Linking windows ......................................................................................................... 20
5 . 4 M o v i n g a n d r e s i z i n g w i n d o w s ................................................................................ 20
5 . 5 S h i f t i n g m e n u b o x e s .................................................................................................. 20
5 . 6 S e t t i n g w i n d o w p a r a m e t e r s .................................................................................... 21
5.6.1 Viewing information about window axes ................................................................. 21
5.6.2 Viewing detailed information about active window ................................................ 21
5.6.3 Changing the scale type in active window.............................................................. 23
5.6.4 Segments list.............................................................................................................. 23
5.6.5 Changing background color...................................................................................... 24
5 . 7 I n d i v i d u a l s e l e c t i o n o f f r a g m e n t s i n S I S w i n d o w s ......................................... 24
5 . 8 C o p y i n g w i n d o w t o c l i p b o a r d ................................................................................. 25
6 READING FILES FROM DISK ................................................................................................... 26
6 . 1 R e a d i n g s o u n d d a t a .................................................................................................. 26
6 . 2 S t a n d a r d f i l e e x t e n s i o n s .......................................................................................... 29
6 . 3 R e a d i n g t e x t f i l e s ....................................................................................................... 29
6.3.1 Text file headings ...................................................................................................... 29

i
6.3.1 Reading text files without headings......................................................................... 31
7 DATA INPUT FROM SOUNDCARD .......................................................................................... 33
7.1 Sound signal presentation by SIS and its relation with sound card bit range....................... 33
7.2 Sound input parameters ...................................................................................................... 34
7.3 Digital waveform viewer ....................................................................................................... 35
7.4 V i s i b l e s p e e c h o f c u r r e n t l y i n p u t s i g n a l ............................................................ 36
8 PLAYING SOUND DATA ........................................................................................................... 37
9 SAVING DATA ........................................................................................................................... 40
9.1 Saving data segments and fragments to disk ...................................................................... 40
9.2 Saving data in the text form ................................................................................................. 41
9.3 Saving current state of processing....................................................................................... 41
10 ZOOMING AND SCROLLING DATA IN THE WINDOW .......................................................... 42
10.1 Data scrolling ..................................................................................................................... 42
10.2 Horizontal data zooming .................................................................................................... 43
10.3 Selecting data fragment for display.................................................................................... 44
10.4 Vertical data zooming ........................................................................................................ 44
10.5 Shifting and resizing the data box...................................................................................... 45
10.6 Zoom mode........................................................................................................................ 46
10.7 Editing the signal ............................................................................................................... 48
10.8 Visualization options .......................................................................................................... 49
11 OPERATIONS WITH CURSORS USING TEMPORARY AND CONSTANT MARKS
FOR SELECTING DATA FRAGMENTS .................................................................................. 51
11.1 Creating, shifting and deleting the vertical cursor .............................................................. 51
11.2 Temporary marks............................................................................................................... 51
11.3 Constant marks.................................................................................................................. 52
11.4 Marks list............................................................................................................................ 53
11.5 Horizontal cursor and horizontal temporary marks ............................................................ 54
11.6 Saving the marks together with data.................................................................................. 55
12 GENERATING A TEST SIGNAL .............................................................................................. 56
13 DATA PROCESSING............................................................................................................... 57
1 3 . 1 S e l e c t i n g s o u r c e s e g m e n t / f r a g m e n t o f d a t a ............................................... 57
13.2 Signal normalization .......................................................................................................... 58
13.3 Operations with constants.................................................................................................. 59
13.4 Linear signal transformation............................................................................................... 60
13.5 Deleting a signal ................................................................................................................ 61
13.6 Appending a fragment to the end of the segment .............................................................. 61
1 3 . 7 I n s e r t i n g a f r a g m e n t o f t h e c u r r e n t s e g m e n t i n t o a n o t h e r s e g m e n t .... 62
1 3 . 8 C o p y i n g t h e d a t a ...................................................................................................... 63
1 3 . 9 M o v i n g a s i g n a l ........................................................................................................ 63
1 3 . 1 0 U s i n g t h e c o n t e x t u a l m e n u o f a n a c t i v e w i n d o w ....................................... 64
1 3 . 1 1 S h i f t i n g t h e c u r r e n t s e g m e n t ............................................................................ 64
1 3 . 1 2 D a t a s m o o t h i n g ...................................................................................................... 64
1 3 . 1 3 M i x i n g t h e s e g m e n t s ............................................................................................ 65
1 3 . 1 4 C l i p p i n g ..................................................................................................................... 67

ii
13.15 C h a n g i n g s i g n a l s a m p l i n g r a t e ......................................................................... 67
13.16 R e v e r s e ..................................................................................................................... 69
13.17 I n v e r s i o n ................................................................................................................... 70
13.18 M o d i f i c a t i o n o f a s i g n a l t e m p o w i t h o u t v o i c e p i t c h d i s t o r t i o n s ........... 70
13.19 M o d u l a t i o n ................................................................................................................ 71
13.20 W a v e f o r m a c c u r a c y t r a n s f o r m a t i o n ................................................................ 71
13.21 M o n o / s t e r e o s i g n a l t r a n s f o r m a t i o n ................................................................. 71
13.22 O p e r a t i o n s w i t h a s e r v i c e t i m e c h a n n e l ....................................................... 72
14 NOISE REDUCTION................................................................................................................ 73
1 4 . 1 A d a p t i v e i n v e r s e f i l t e r i n g ..................................................................................... 73
14.1.1 Method overview...................................................................................................... 73
14.1.2 Method parameters ................................................................................................. 73
1 4 . 2 A d a p t i v e w i d e b a n d n o i s e f i l t e r i n g ..................................................................... 75
14.2.1 Method overview...................................................................................................... 75
14.2.2 Method parameters ................................................................................................. 77
1 4 . 3 A d a p t i v e f i l t e r i n g o f s t a t i o n a r y w i d e b a n d n o i s e .......................................... 80
14.3.1 Method overview...................................................................................................... 80
14.3.2 Method parameters ................................................................................................. 80
1 4 . 4 A d a p t i v e f i l t r a t i o n o f t o n a l a n d r e g u l a r n o i s e ............................................... 81
14.4.1 Method overview...................................................................................................... 81
14.4.2 Method parameters ................................................................................................. 83
1 4 . 5 F i l t e r i n g o f i m p u l s e n o i s e ..................................................................................... 85
1 14.6 Dynamic filtering ....................................................................................................... 86
1 4 . 7 F i l t e r i n g s t e r e o s i g n a l s ......................................................................................... 87
14.7.1 Method overview...................................................................................................... 87
14.7.2 Method parameters ................................................................................................. 87
14.7.3 Using stereo filter .................................................................................................... 89
1 4 . 8 S i g n a l p r o c e s s i n g w i t h D i r e c t X p l u g - i n s ......................................................... 89
14.8.1 Sound Cleaner plug-in ............................................................................................ 89
14.8.2 Other DirectX plug-ins ............................................................................................ 91
1 4 . 9 E q u a l i z e r ..................................................................................................................... 92
14.9.1 Equalizer controls .................................................................................................... 92
14.9.2 Equalizer toolbar ..................................................................................................... 93
14.9.3 Equalizer options ..................................................................................................... 94
14.9.4 Zooming and scrolling ............................................................................................. 96
14.9.5 Adjusting filter FC .................................................................................................... 96
14.9.6 Elastic mode............................................................................................................. 96
14.9.7 Additional FC adjustment ....................................................................................... 96
14.9.8 Equalizer operation modes..................................................................................... 97
14.9.9 Stereo signal filtering .............................................................................................. 97
15 ANALYSIS OF SPEECH SIGNALS.......................................................................................... 99
1 5 . 1 P r o d u c i n g 3 - d i m e n s i o n a l r e p r e s e n t a t i o n s f o r 3 - D d a t a .......................... 100
15.1.1 3-D scaling options................................................................................................ 100
15.1.2 Scaling the third dimension for not yet calculated 3-D data............................. 100
15.1.3 Changing the image limits .................................................................................... 101
15.1.4 Adjusting brightness and contrast of the image................................................. 101
15.1.5 Visible speech normalization ............................................................................... 101
15.1.6 Changing the graphic representation .................................................................. 102
1 5 . 2 W i n d o w i n g f o r F F T ................................................................................................ 102

iii
15.2.1 Theoretical substantiation of windowing............................................................. 102
15.2.2 Description of 5 most commonly used weighting windows............................... 104
15.2.3 Equi-periodic window ............................................................................................ 105
15.2.4 Window selection tips ........................................................................................... 106
1 5 . 3 C a l c u l a t i n g a d y n a m i c s p e c t r o g r a m ............................................................... 106
15.3.1 Parameter settings ................................................................................................ 107
1 5 . 4 C a l c u l a t i n g p e r i o d i c i t y f u n c t i o n ( d y n a m i c c e p s t r o g r a m ) ........................ 109
15.4.1 Parameter settings ................................................................................................ 110
15.5 Calculating dynamic characteristics of the autoregressive model
o f t h e s p e e c h s i g n a l ............................................................................................. 112
15.5.1 Parameter settings ................................................................................................ 112
15.6 Spectral analysis based on linear prediction of speech.
O b t a i n i n g a g r a p h i c a l r e p r e s e n t a t i o n o f L P C f r e q u e n c y r e s p o n s e ..... 114
15.6.1 Parameter settings ................................................................................................ 114
1 5 . 7 C a l c u l a t i n g t h e e n e r g y ( r o o t - m e a n - s q u a r e ) ................................................ 116
1 5 . 8 C a l c u l a t i n g z e r o c r o s s i n g f r e q u e n c y .............................................................. 117
1 5 . 9 A v e r a g i n g .................................................................................................................. 117
1 5 . 1 0 P i t c h e x t r a c t i o n .................................................................................................... 118
15.10.1 General information ............................................................................................ 118
15.10.2 LLK method .......................................................................................................... 118
15.10.3 Pitch extraction step by step.............................................................................. 120
15.10.4 Spectral method .................................................................................................. 121
15.10.5 Speaker comparison ........................................................................................... 122
15.10.6 Controlling the correctness of the pitch extraction.......................................... 123
15.10.7 Editing pitch over cepstrum image .................................................................... 123
1 5 . 1 1 C a l c u l a t i n g i n s t a n t a n e o u s F o u r i e r s p e c t r u m ............................................ 124
15.11.1 Parameter settings .............................................................................................. 124
15.11.2 Spectrum calculation process ............................................................................ 124
1 5 . 1 2 C a l c u l a t i n g a v e r a g e F o u r i e r s p e c t r u m ........................................................ 125
15.12.1 Parameter settings .............................................................................................. 125
1 5 . 1 3 F o r m a n t s a n a l y s i s ............................................................................................... 127
15.13.1 Formants calculation ........................................................................................... 127
15.13.2 Editing formants over spectrum image ............................................................. 129
1 5 . 1 4 H i s t o g r a m c a l c u l a t i o n ........................................................................................ 130
1 5 . 1 5 C o m p a r i n g h i s t o g r a m s a n d c a l c u l a t i n g t h e i r p a r a m e t e r s .................... 130
1 5 . 1 6 E d i T r a c k e r m o d u l e .............................................................................................. 132
16 USING PLUG-INS .................................................................................................................. 133
17 TESTING THE INPUT/OUTPUT CHANNEL .......................................................................... 135
1 7 . 1 T e s t i n g t h e i n p u t c h a n n e l ................................................................................... 135
1 7 . 2 T e s t i n g t h e o u t p u t c h a n n e l ................................................................................ 137
1 7 . 3 I n p u t / o u t p u t c h a n n e l l o o p t e s t .......................................................................... 138
GLOSSARY ................................................................................................................................. 140

iv
SIS 7.0.X END-USER LICENSE

The Software, in accordance with the existing international and Russian copyright laws
and treaties, may not be installed and used on more than one computer simultaneously.

You may install the Software onto a second computer provided the Software to be first
uninstalled (all the files are deleted) from the hard disk of the first computer.

Copies of the Software can be stored on other storage devices only for backup or
archival purposes to prevent software loss as a result of disk failure.

If the user violates the terms of this license in any way, Speech Technology Center will
refuse to fulfill its obligations on warranty services, technical assistance and privileged
installations of subsequent versions of the Software. Speech Technology Center also
reserves the right to take other measures to protect its copyright in accordance with
software copyright law.

For all information about the software and hardware produced by STC refer to:

Tel: +7 (812) 331-0665


Fax: +7 (812) 327-9297
Mail: Russia, 196084, St. Petersburg, P.O. Box 515
Office: 4 Krasutskogo str., St. Petersburg
E-mail: [email protected]
Web-site: http://www.speechpro.com/

When requesting assistance, you should have the following information readily
available:

z the name of the product and version number;

z type of the computer and information about its configuration;

z the name of the operation system being used and its version number;

z a precise description of the problem.

1
1 INTRODUCTION

The current software package SIS produced by Speech Technology Center (STC) is
intended for speech signal editing and analysis, noise reduction and speaker
identification. It is a powerful multi-purpose tool allowing you to improve speech
intelligibility, enhance signal quality and reduce noise in recorded speech. It provides
the operator with unrivalled opportunities for signal input/output, editing, analysis,
display and processing of speech and other low frequency signals.

SIS allows experts to implement some additional operations when at the following
stages of the digital signals research:

z Auditory and linguistic analysis.

z Instrumental analysis – calculating and comparing the speech records parameters.

z Records authenticity control.

z Visual analysis – getting on the screen the required images for the parametric
representations of the selected signals.

SIS can be delivered along with the STC-H246 sound I/O device intended for measuring
characteristics and forming electrical signals in the sound frequency range. This device
provides:

z Signal analog-to-digital and digital-to-analog conversion.

z Copying signals onto a computer’s hard disk without any distortions and with all
signal properties relevant for forensic examination preserved.

z Measuring amplitudes and frequencies of the variable electrical signals with the
voltage, phase and time intervals determined.

It should be noted that this software was designed especially for experts in low
frequency signal processing, linguistics and forensic audio. Please contact Speech
Technology Center specialists to find out more about training and consultation.

2
2 SYSTEM CAPABILITIES

1. Signal input, segmentation and editing.

z High-accuracy input of the signal:


ƒ Using the STC-H246 measuring device with the sampling rate up to
200 kHz and 24 bit resolution.
ƒ Using the STC-H216 device with the sampling rate up to 48 kHz and 16 bit
resolution.

z Assured accuracy of the signal parameters calculation.

z Displaying the dynamic spectrogram in the real time during signal input.

z Loading several signals into one window, including the signals with different
sampling rate and capacity values.

z Providing typical functions of the signal editing.

z Synchronizing several signals by shifting them relative to each other on a


specified time interval.

z Combining two mono signals into stereo one and vice versa; producing
composite stereo signal.

z Different means of the signal editing are available:


ƒ Temporary marks.
ƒ Unlimited quantity of the constant marks with signatures.
ƒ Selecting simultaneously any quantity of fragments between any
neighbouring constant marks for subsequent processing.
ƒ Creating individual textual comments for every mark.

2. Signal visualization and analysis.

z Multiwindow interface.

z Windows with different kinds of visualization of the same or different signals


can be placed in the SIS working area (for subsequent comparison):
ƒ Waveform.
ƒ Dynamic spectrogram (sonogram).
ƒ Average spectrum (FFT).
ƒ Cepstrogram.
ƒ Autocorrelogram.
ƒ Pitch curve.
ƒ LPC frequency response (Linear Prediction Coefficients).
3
ƒ Histogram.
ƒ Partial correlation.
ƒ Zero crossing frequency.

z Build-in presets are available to calculate characteristics and analyze various


types of voices and recording channels:
ƒ Male voice: Tenor, Baritone, Basso, ContrBasso.
ƒ Female voice: Soprano, Mezzo Soprano, Contralto.
ƒ Child (teenager).
ƒ Microphone or telephone line.

z Main characteristics of the speech signal can be displayed in one data window
(via superposition of the images) to control calculation accuracy:
ƒ Formant trajectories, pitch curve and dynamic spectrogram.
ƒ Cepstrogram and pitch curve.
ƒ Waveforms of several signals with different sampling rate and capacity
values.

z Manual pitch curve and formant trajectories correction using different types of
visualization and calculation.

z Synchronizing several windows with different visualization types to perform a


comparative analysis.

z Automatic search of the comparable fragments for subsequent instrumental


analysis (including automatic extraction of vowels for subsequent formant
analysis).

z Automatic calculation and comparison of pitch statistics with the result


obtained in the text form.

3. Noise reduction and signal pre-processing.

z Performing noise reduction and preparing speech records for comparative


instrumental analysis using built-in tools.

4. Using plug-ins.

z Direct-X plug-ins made by other producers are supported.

z The EdiTracker software module designed for detecting signs of tampering in


analog and digital records can be attached.

4
3 SYSTEM INSTALLATION AND ADJUSTMENT

3.1 System requirements


The SIS hard- and software should be installed on a PC meeting the following minimum
requirements:
z CPU: Intel Pentium 300 MHz or faster;
z RAM: 512 MB or more;
z Video adapter and SVGA monitor (resolution not below 1024*768, color quality
32 bit);
z OS Microsoft Windows 2000/XP;
z I/O sound card (STC-H216, STC-H246 or an integrated sound card);
z Headphones;
z Keyboard, mouse;
z Free hard disk space (recommended at least 10 GB).

3.2 Installing SIS hardware


To install and adjust the I/O sound cards used with SIS, refer to the operating
instructions delivered with them.

3.3 Installing SIS software


To install the software, run the Setup.exe file located in the SIS_SETUP folder on the
installation CD. If you already have a version of the SIS software installed on your PC,
the program will suggest you to uninstall it first.

Once the uninstallation has been completed, run Setup.exe again and follow the
installation wizard instructions appearing on the screen.

The software is protected with a HASP software protection key. Therefore, at the final
stage of the installation process the program will install a HASP device driver. When you
will see a relevant message on the screen, check that your HASP device is connected to
your PC and press OK.

After successful installation completion, press Finish to quit the installer.

3.4 Adjusting display color


Please note that in order to operate properly, SIS requires a true color palette to be
installed on your PC’s display. To set your display to the true color mode:

z Click with the right mouse button anywhere on your desktop background and
select Properties from the context menu.

z In the pop-up window click on the Settings tab.

z In the Colour quality menu choose True Color (32 bit).

5
If your display is in a different mode, you will not be able to work with SIS and the
following message will appear at program start:

3.5 Configuring I/O sound cards


Before running the program for the first time, it is necessary to edit the sis_70.ini file
located in the same folder as the sis_70.exe file. The default file content is given
below:
[SETTINGS]
YMS_DISK = C
SOUND_CARD = STC-H216
CARD_NUM=1
[SOUND CARD]
BIT24_MODE = 0

To operate properly, SIS requires 2 GB of disk (virtual) memory on one of the drives,
namely on that specified in YMS_DISK =. By default it is drive C. If there isn’t enough
free space available on this drive, specify some other drive for reading and writing data
by changing the C drive name to another (for example, YMS_DISK = E). When started,
SIS will check the amount of the available free disk space and, if necessary, prompt the
user to change the drive.

The SOUND_CARD = line specifies the sound card you use. If you use a 24-bit sound
card STC-H246 (default setting):

SOUND_CARD = STC-H246

If you use a 16-bit sound card STC-H216:

SOUND_CARD = STC-H216

Delete SOUND_CARD = line if you use sound card integrated in your PC.

The BIT24_MODE parameter specifies the allowed operating mode. If 24-bit mode is
not allowed:
BIT24_MODE = 0

If 24-bit mode is allowed (for STC-H246):


BIT24_MODE = 24
6
1. Software supplied for the STC sound card operates only with this card.

2. Software supplied for the other cards (with HASP key) operates only with

L the Windows preferred device for playback and recording. It is also advisable
to select the specified sound card as the preferred device for playback and
recording. You can do this via Start ► Settings ► Control Panel ►
(Sounds and) Multimedia ► tab Audio).

7
4 GETTING STARTED

4.1 Loading system configuration


System configuration is automatically loaded at program start from the Sis.cfg file. All
parameters entered by the user are stored in this file and saved by the system on exit.

If you wish to have several configuration files for different purposes, save each
configuration under a different name using the Options►Save configuration
command. You can subsequently restore the desired configuration with the
Options►Load configuration command. A dialog window will be displayed where you
can choose any of the existing configuration files.

Figure 1. The Options menu

4.2 Choosing interface language


To change the software language, use the Options►Change interface language
command. You can choose between Russian, English, Chinese and Spanish. In the dialog
window select the desired option by double-clicking it with the mouse. All texts and
messages will be translated immediately.

Figure 2. The interface language selection window

Do not confuse the language of dialog with the font. The font may be changed with the
key combination you typically use on your computer for changing the font (e.g. <Ctrl>
+ <Shift>).

To be able to use the Chinese language you need to have Chinese


L characters included into your Windows fonts.

8
4.3 Changing font size of dialog windows
The size of dialog windows can vary depending on the screen size and resolution. To
change the size of dialog windows (as well as the font size), use the Options►Font
size menu item. A dialog window will appear on the screen (Fig. 3).

Figure 3. The Font size window

The choice of either option will not affect the font size of the main menu (this parameter
can be changed using standard Windows procedures), but it will change the font size of
all dialog windows. The size of windows will be modified proportionally to the used font
size.

The Small font size is recommended for 1024*768 screen resolution, the Normal size –
for 1280*1024 and more.

4.4 Changing buttons size and style


To change the size and the style of buttons use the Options►Buttons size menu item.
A dialog window will appear on the screen containing two submenus: Buttons size and
Buttons style.

Figure 4. The Buttons size window

The Small buttons size is recommended for 1024*768 screen resolution, the Normal –
for 1280*1024 and more. Buttons style submenu allows you to choose Plastic or
Metallic style.

4.5 SIS main screen


When you start the program, the main screen will appear with the user menu and
toolbar at the top (Fig. 5).

The menu string contains menus of all functions and operations provided in the system.

9
Figure 5. SIS main screen

The main toolbar below the menu string consists of the following toolbars:

File

Open new window

Open file

Save fragments

Copy image

Sound

Start playback

Pause playback

Go to play point

Loop playback mode

Stop playback

Input from sound card

DirectX plug-ins

Sound Cleaner

EdiTracker

10
other DirectX plug-ins

Information

Window information

Segments list in the currently active window

Marks list in the currently active window

Windows list

Show

Logarithmic/linear scale

Show all data

Show highlighted data

Show data between temporary marks

Vertical self-scaling

Edit

Add data (to the end of destination segment)

Copy data

Insert data

Move data

Delete data

Interrupt operation

The appearance of the main toolbar can be customized by user. Double-click on toolbar
to open Customize Toolbar window. Each toolbar has its own customizing window.
Figure 6 shows Customize Toolbar window of the File toolbar.

At the right area of the window there is a list of current toolbar buttons. At the left there
is a list of buttons that can be added on the toolbar (Available toolbar buttons list).
By default all available buttons are placed on the toolbar.

To remove button from the toolbar select the button in the Current toolbar buttons
list and press <-Remove. To place removed button on a toolbar, select it in the
Available toolbar buttons list and press Add->. In the same way a separator
between buttons on a toolbar can be added or removed.

11
Figure 6. The Customize Toolbar window

To change an order in which buttons are displayed on the toolbar, use Move Up and
Move Down.

To restore default settings press Reset. To close Customize Toolbar window press
Close.

Some most common commands can also be executed with hotkeys:

Key Function

Ctrl/O Open file

Ctrl/R Input from sound card

Ctrl/T Input/output test

Ctrl/Shift/D Delete all windows

Ctrl/D Delete current window

Ctrl/V Visible in window (processing interval)

Ctrl/A All data (processing interval)

<F2> Save fragment

<F3> Open file

Replace a linear (vertical) scale by the decibel scale


<F5>
and back

<F6> Speech playback

Playback of the speech signal between temporary


Shift/F6
marks

Ctrl/F6 Playback of the speech signal visible in window

12
Key Function

Alt/F6 Playback of visible data and further to the end

<Home> Repeat playback

<Backspace> Set the Loop mode on

<Esc> Set the Loop mode off

<F7> Current window information

<5> on the number pad Signal autoscaling

Autoscaling of signals in the window relative to the


Ctrl/<5> on the number pad
active signal

<F8> Display all data in window

Shift/F8 Display data between temporary marks in window

Ctrl/F8 Display highlighted data in window

Match signals in linked windows to current window X


<F11>
zoom rate

Ctrl/Q FTT power spectrum average

Ctrl/2 Separate stereo-signal on 2 mono

Ctrl/M Data smoothing (filter length)

The Data►Undo option realized in the new SIS software versions (beginning from
v. 7.0.1) allows you to cancel the last signal processing operation and return to the
previous signal state. You can use the Undo command for the most of data processing
operations. The undo Depth, e.g. a number of times you can use the Undo operation,
is limited by the disk space allowed by user. You can change this parameter using the
sis_70.ini file (by default the Depth value is 8):

[Data Storage]
Undo Depth = 8.

4.6 Using source-destination menus


When performing data processing and analysis operations, a standard source-
destination menu appears (Fig. 7).

All processing and analysis operations require a destination window to be specified.


Clicking on the arrow to the right of the Source-destination field (at the top), you can
choose a destination window from those already existing (indicated by letters) or you

13
can choose a new window to be created (indicated by the "_" sign). However, the arrow
will allow you to browse through the first 5 suitable windows only. To see the full list of
available windows, press Choose destination window at the bottom of the menu and
select an appropriate window from the displayed list.

Figure 7. The source-destination menu type

To create a new destination window, press Create destination window. The menu will
disappear and a dashed rectangular frame specifying the dimensions of the prospective
window will be displayed on the screen. The mouse cursor will also disappear: moving
the mouse or pressing the arrow keys you can shift the window frame to the desired
position. By moving the mouse while holding down the left mouse button or by using the
<Shift>/s (t) key combination you can change the frame dimensions. The left top
corner will be fixed, while the right bottom corner position can be modified the desired
way.

To display a new window as limited by the rectangular frame, press <Enter> or click
the mouse button. Once the new window has been displayed, the dialog window will
appear again on top of it.

The OPTIONS button calls window where you can set necessary parameters of the
selected operation.

If no destination window has been chosen, the system will automatically create a new
empty window, but it will place it at its own choice – below the source window with the
same left and right boarders.

When checking the All segments in window option, the selected operation will be
applied to all segments in the current window and not only to the active one.

14
The source-destination menu also allows you to specify which part of the speech signal
should be processed. The following options are available:

z All data. All sound data in the current window will be processed.

z Highlighted. Only highlighted data (a highlighted interval between constant


marks) will be processed.

z Between temporary marks. Only data located between temporary marks will be
processed (there should be two of them).

z Visible in window. Only currently visible data will be processed.

For any operation to be started, you should press the desired data part type field.

15
5 WORKING WITH DATA WINDOWS

Data windows are used to display graphic data and text.

If a window is deleted, all data displayed in it and not duplicated simultaneously in, at
least, one more window will be deleted as well. Windows may overlap; they can be
shifted; their size and shape can be modified.

At least one window should be created to start working with the data.

5.1 Creating new window


To create a new window, use either the Windows►Open new window command or
the toolbar button. A newly created window will be automatically ascribed a name
(one character). This name will be displayed in each window corner.

The newly created window will be of the UNI (universal) type and can be used for any
kind of input data.

An unlimited number of windows can be created in the program, but once their number
gets over 107, short window names (consisting of one symbol) begin to repeat, as a
result of which linking of windows and some other operations will be difficult to perform.
Thus, it is undesirable to have more than 108 windows open simultaneously.

A newly created window always appears on top of all previously opened or created
windows. It becomes current (active), i.e. all operations (editing, analysis, processing,
etc.) will be related to the data contained in this window.

Several different segments of data can be loaded into one window. The names of all
segments currently contained in the data box will be displayed in the window heading,
as long as its width allows. Each segment name will be displayed in the same colour as
its data. The heading background colour is black if the window is active and grey if it is
inactive.

5.2 Contextual menu of active window


Contextual menu appears at mouse right-click on any place of working area of an active
window (Fig. 8).

Contextual menu contains commands from Show and Windows menus of the main
screen and commands of copy/paste fragment. Also, depending on data type, it may
contain Background color or Drawing type selection items. If the window contains
constant marks, contextual menu has Make highlighted option.

If data in the window represents the results of calculations, the Recalculate option is
available. It allows you to repeat calculations with new parameters.

16
Figure 8. Context menu of an active window

5.3 Switching between windows

5.3.1 Activating window using the mouse


The quickest way to switch to another window, i.e. to make it active, is to click with the
mouse anywhere in the visible part of this window. Once you activate a window, it will
be displayed on top of all other windows.

Switching between windows does not require closing previously opened dialog windows:
after you activate another window, any currently open submenu will be redrawn and
appear again on the screen.

Keep in mind that any modifications of values in a menu field are considered

) temporary until you come out of this field. That is, if you activate another
window while the modified field is still active, the changes will be lost and
previous settings will be restored after the menu is redrawn.

5.3.2 Switching between windows using the menu


Use the Windows►Activate previous command to switch to the previous window and
Windows►Activate next to switch to the next window. The second window from the
top is considered "previous", while the bottom window is considered "next".

17
5.3.3 Switching between windows using the windows list
Using the menu Windows►Windows list you can view the list of all windows currently
opened in the program. The list looks as shown in Figure 9; it contains window activity
indicator, window selection indicator, window name and finally window type.

Figure 9. The Windows list window

Use the mouse or the <Tab> or <Shift>/<Tab> keys to switch between the columns.
To move up/down in the column, use either the mouse or the arrows keys.

You can make any window active by clicking on the activity indicator or by pressing the
<Spacebar> key. Only one window at a time can be activated.

The menu is scrollable, so if you do not see the desired item in the visible part of the
list, browse through the list using the scroll bar at the right (or pressing the arrow keys
or <Page Up>/<Page Down> on the keyboard).

To move active window to the center of main window, press Move to center.

To quit the windows list, press OK (to apply the changes) or Cancel (to quit without
applying the changes).

5.3.4 Appending comments to data box


To append comments to the data box, use menu option Windows►Append comment
panel or the corresponding option of the contextual menu. The system will ask you if
the comments field should be appended above (Upper data box) or below (Under
data box) the data and, once you select the desired option, it will redraw the window
adding an empty line where specified.

To make the comments field active, click there with the mouse – its background colour
will get lighter and a blinking blue cursor will appear in it. To quit the comments line,
press <Enter> or click with the mouse anywhere outside it.

A comments line can contain up to 256 characters.

18
You can append as many comments lines to the window as you like, but if the vertical
size of the window is insufficient to display all the lines, some of them will not be
displayed. As you reduce the size of the window, the lines located above the data will
disappear first (starting with the lowest), then the lines below the data will be hidden
(also starting with the lowest). As you increase the window size, the lines will come up
again.

To delete a comments field, make it active and press <Ctrl>/<Delete> key


combination.

5.3.5 Appending marks text panel to data box


Every permanent vertical mark can be provided with a text line. The length of the line
must not exceed 40 characters. This text can be seen in the menu item Marks►Marks
list. To display the marks text directly under the marks in the signal, enter the menu
Windows►Append viewer of marks text or the corresponding option of the
contextual menu. The mark text viewer will immediately appear under the data box as a
big field. If the viewer doesn't appear, increase the vertical window size. If you try to
add the viewer panel when it is already added, the system message will appear: "The
required object is already present in window" and your request will be ignored.

You can also add a mark text using contextual menu of the active window. To do this,
move the mouse pointer to the constant vertical mark. When it takes the form of ,
press the right mouse button and in the contextual menu select Mark text item. The
viewer panel will appear below the data box (if it has not been created before) and the
Mark text dialog window will be opened. Enter the mark text and press OK.

The marks in the viewer panel are the same as in the data box. If the mark is provided
with an inscription, the text will be positioned over the mark (bottom to top, left to
right), but independent of the text length at least one dash of the mark will remain at
the bottom of the viewer panel.

If the text doesn't fit onto one vertical line, it will be displayed in several lines. But if
different marks are placed too close to each other, the respective texts will overlap. To
let text be the most completely displayed, move apart the vertical borders of comment
fields or zoom-in the data box.

The viewer is refreshed every time when the horizontal size of the data box is changed,
as well as when the segments are deleted or inserted.

The text of any mark or the viewer height may be changed without entering the main
menu. To change the mark text move the mouse pointer to the corresponding vertical
line (+/- 5 pixels), until it takes the form of , and press the left mouse button. The
pop-up menu will appear on the screen containing Mark text, Copy and Delete items.
To edit mark text, select Mark text item.

19
To change the viewer panel height, click the right mouse button within the viewer panel
area and in the appeared contextual menu choose The panel height. Enter the value of
panel height. It should be even (SIS will approximate it anyway) and be in the range
from 4 to 42 (symbols).

New values will be applied after pressing OK and will be ignored if you press Cancel.

If you choose Close panel, the marks viewer panel will be deleted from the window. All
texts will remain unchanged.

5.3.5 Linking windows


SIS allows several data windows to be linked with each other. In this case all operations
with the cursor, marks, zoom control buttons etc. for these windows will be
synchronized. To set or modify the current window links, enter the menu
Windows►Link window and modify the active field contents appropriately. You should
enter short names of all windows you want to be linked to the current window. A short
name consists of one symbol, it is indicated in each window corner. The names are
case-sensitive, so make sure that you do not confuse upper- and lower-case characters.

To dissociate the current window from any of the linked windows, just remove its name
from the link field.

The name of the current window itself is not entered into the field.

After you have entered the names of the windows you want to link, press OK. The
specified windows will be linked. If for some reason linking is impossible, the system will
display a message “Link is not set” in the message line.

Once the link has been established, all operations with the horizontal cursor and
constant/temporary marks will be performed simultaneously in all linked windows.

To bring the visible area along the horizontal axis in all linked windows to that in the
window which will be chosen as active, press <F11>. To make system synchronize the
horizontal zoom rate and data box boundaries in linked windows automatically, select
Synchronize linked windows in the Options►Extra options menu.

5.4 Moving and resizing windows


For moving or resizing the window you should use typical Windows procedures.

5.5 Shifting menu boxes


If the dialog menu boxes (e.g. of the source-destination type) overlap the image in the
current window or some other place of the screen but doesn't occupy the whole
workspace, it can be shifted (dragged) somewhere. To do this, place the mouse cursor
on the box heading, press the left mouse button and hold it down. A dashed frame will
show up indicating the menu box boundaries. You can't change its size, but moving the

20
mouse while holding its left button down, you can drag this frame to any desired
position on the screen.

The whole SIS main window area is available for moving, except the menu line, the
toolbar and the bottom line (reserved for system messages).

5.6 Setting window parameters

5.6.1 Viewing information about window axes


There is a simple and quick way to view information about each window horizontal and
vertical axes of coordinates. For that just click with the left mouse button anywhere on
the axis values area and hold the mouse button down. The numbering on the horizontal
axis will disappear, and in its place you will see the information about the axis you are
pointing at. This information includes: physical parameter measured, measure units,
scale factor (if it is not equal to 1), scale type. For example: “Time, sec, Lin”,
“Frequency, Hz, Log”, “Value, arbitrary units, Lin” etc. The scale type can be:

z Lin. Linear scale.

z Log. Logarithmic scale.

z Bark. A scale of barks (only for frequency, piece-linear-interrupted up to 1200 Hz,


further logarithmic).

5.6.2 Viewing detailed information about active window


To view current window information in more detail, press the <F7> key or use the
Windows►Window information command. An information box will be displayed (see
Fig. 10).

The box contains the following fields:

Type - contains information about the window type (e.g. Waveform, Cepstrum,
Spectrogram, etc);

X axis: - contains information about the physical parameter, units and scale type
represented along the horizontal axis (see section 5.6.1);

Y axis: - contains information about the physical parameter, measure units and
scale type represented along the horizontal axis (see section 5.6.1);

Z axis: - contains information about the third axis (see section 5.6.1). Not used
for 2-D data.

21
Figure 10. Window information box

For some types of windows you can press the X axis or Y axis to receive the following
choice lists:

Linear, Logarithmic, Barks – scale choice for the Y-axis of spectrogram and for
zero crossing frequency, for the X-axis of the FFT spectrum;

Time, Frequency - for the Y-axis of cepstrum, autocorrelation, pitch.

When you press Data drawing mode, an option list appears allowing you to choose the
method of data representation in the box. It contains the following options:

Copy - new data will be drawn on top of the previous in an opaque way;

OR - new data will be drawn on top of the previous with addition of colours by the
method of the “logic OR”, (bit-by-bit addition: 0 and 1 – 1; 0 and 0 - 0; 1 and 0 -
1, 1 and 1 - 1);

XOR - new data will be drawn on top of the previous with addition of colors by the
method of the “exclusive OR”, (bit-by-bit addition: 0 and 1 - 1; 0 and 0 - 0; 1 and
0 - 1; 1 and 1 - 0);

Draw axes: Yes – purely for information.

Free memory (Mb): XXXX - an information field showing the amount of free virtual
memory in the SIS system in Mbytes. All data about all signals are stored in the virtual
memory, and once it is overflowed, signal analysis and filtration becomes impossible.

Segments list - contains the list of segments displayed in the current window. Each
line corresponds to one segment. It has two fields, one containing segment names, the

22
other - segment types. The colour of the text corresponds to the colour of the segment.
The name of the segment can be modified like ordinary text.

OK - press to apply the changes and exit the menu.

Cancel - press to discard all changes and exit the menu.

All changes in the active window will be applied only after you quit the described menu.

To get information about another window, just click on the desired window without
closing the information box. The box will for a moment disappear, the new active
window will be displayed on top, then the information box will pop up again, but now it
will provide information about the new active window.

5.6.3 Changing the scale type in active window


To change the scale type of the axis, click with the right mouse button anywhere on axis
values area and in the pop-up menu select the desired scale type:

z Linear.

z Logarithm.

z Barks (bark scale).

The pop-up menu also contains the information on the axis: physical parameter
measured, measure units, scale type.

The <F5> key quickly switches scale type of the vertical axis from linear to dB and vice
versa. For information: signal value in dB is equal to decimal logarithm of the value
increased on 20 (for negative values the module will be taken).

5.6.4 Segments list


The program allows changing the color and name of an already created segment.

To change the name and color of an existing segment, you can use the
Show►Segments list menu item. The Segments list window will pop up on the
screen (see Fig. 11). It contains two submenus: the list of colors and the list of
segments.

In the list of colors each line corresponds to one color and is indicated by this particular
color. In the list of segments, each segment is described by one line. In this line you can
type in a name, type, indicator of stereo-segment, sampling rate of the segment, and a
field for color changing, indicated by the letter C. The name of the segment can be
updated, as a usual textual line with the length up to 12 symbols. If it is a stereo-
segment, you will see the letter S after its type. The list contains the information about
all (up to 255) segments from an active window.

23
If you make another window active by clicking on it with the mouse, without leaving the
menu, the menu will disappear and appear again, but already with a list of segments
from a new active window.

Figure 11. The Segments list window

To change the color of a segment, select a desired color in the menu of color-choice
(double-click on the respective color field). Then enter the right submenu (by clicking
anywhere in it with the left mouse button) and in the Color column click with the left
mouse button on the field C in the line of the corresponding segment. The segment
name and the letter C will immediately change its colour. After you quit the Segment
list window (by pressing OK), the active window will be displayed with the new name
and colors of segments.

For “visible speech”, selected color will be applied only if a segment is represented by
deviation right/up.

5.6.5 Changing background color


For 2D data you can change the background color of the box using the Background
color item of the contextual menu.

5.7 Individual selection of fragments in SIS windows


In the SIS program, it is possible to save parameters of the selected fragments for each
window individually. For example, if the user has chosen to process the Between
temporary marks data fragment for one window and Highlighted fragment for
another one, then these settings will be used for each window separately till the user
change them.

Indication of the processing fragment type and its change can be performed by the
button at the right of the signal window. On the left-clicking this button the number of
control buttons, specifying the data fragment type, will appear:

24
- all data;
- highlighted;
- between temporary marks;
- visible in window ;
- selected.

To choose a new type or confirm the old one, click one of these buttons. At that, the
button in the signal window changes its appearance.

5.8 Copying window to clipboard


Using the button you are able copy to clipboard an image of the current window. If it
is necessary, you can enable the black color inversion during copying. For that check the
Invert black color when copying window to clipboard option in the
Options►Extra options menu.

25
6 READING FILES FROM DISK

SIS allows you to read various file types, generated both by the SIS system itself (e.g.
*.DAT files, also from previous versions of the system), and by systems supporting
*.WAV, *.MP3, *.WMA, *.AVI file formats.

The actual file extension is of no consequence, since the specified file types contain, in
addition to the sound data proper, all associated information. Thus, if you saved a
spectrogram as a *.DAT file, it will still be read as a spectrogram, not as a waveform.
However, it is advisable to make use of the typical extension types given below, to
facilitate file search.

6.1 Reading sound data


The read a sound file from disk, activate the Data►Open file menu (you can also use
the button or the hotkeys: <F3> or <Ctrl>/O). A standard open file dialog box will
appear for file selection. You can choose the standard sound file types indicated in the
box, or you can try to open any unformatted or undefined sound file – the system will
read it as a waveform (mono, 16 bit).

It should be noted that if the file is generated by the SIS system or by some other
software in the Windows-compatible format (*.WAV), the type of data will be checked
on compatibility with the type of window for data input. For example, if the file contains
a Fourier spectrum, the data will be entered as a Fourier spectrum, irrespective of the
user's desire and the file extension.

For the files with .DAT extension there is a possibility to read not entire file, but only its
part. To activate this option, in the Open file dialog box remove flag Read whole file.
After this, each time with opening .DAT- file the Big files window will appear:

Figure 12. The Big files window

In this window you have to enter the beginning position and the length (in seconds) of
the fragment to read.

To disable this option check Read whole file in the Open file dialog box.

26
If the file has an .ALW extension (A-law waveform), SIS will read it as 8- bit file in A–
law format. The sampling rate is 8 kHz.

If the file has a .SND extension, SIS will read it as 8-bit file mu-law format. The
sampling rate is 8 kHz.

If the file has a .VOX extension (waveform in covox format), SIS will read it as 8- bit file
in ADPCM format. The sampling rate is 8 kHz.

The Data►Open file command also allows you to open file with any unidentified format
as an oscillogram (mono, 16 bits, PCM). In this case the sampling rate can be assigned
arbitrary:

Figure 13. The Sampling rate selection window

The entered data will be immediately displayed in the current window. A waveform is
typically displayed in counts – arbitrary units (along the Y axis); the time is represented
in seconds (along the X axis). If desired, you can change the mode of signal
representation to dB instead of counts. To do this, use the Show►Linear/dB scaling
menu or press the <F5> key.

) To read compressed sound WAV (RIFF) files or to read MP3, AVI, MPEG and
other files of such kind the system uses DirectShow filters of the operational
system in order to build a file reading graph. The Speech Technology Center does
not supply DirectShow filters and does not account for incorrect work of the
filters. Installation and correct tuning of the filters is produced by the user’s
administrative and technical support services. Free filter packages can be found in
the Internet (for instance K-Lite Mega Codec pack). In their documentation
composition and range of application are described usually. If in attempt to open
such file the program opens the window of unidentified format (see Fig. 13), it
means that the program failed to open the file through the DirectShow filter and
suggested opening it as an unidentified format file.

To read compressed sound WAV (RIFF) files or to read MP3, AVI, MPEG and other files
of such kind the system uses DirectShow filters of the operational system in order to
build a file reading graph. The Speech Technology Center does not supply DirectShow
filters and does not account for incorrect work of the filters. Installation and correct

27
tuning of the filters is produced by the user’s administrative and technical support
services. Free filter packages can be found in the Internet (for instance, K-Lite Mega
Codec pack). In their documentation composition and range of application are described
usually. If in attempt to open such file the program opens the window of unidentified
format (see Fig. 14) then it means that the program failured in opening the file through
the DirectShow filter and suggests opening it as an unidentified format file.

Several sound signals can be displayed in the same window. Each signal (called
segment) is displayed in a different color (see Fig. 14). The names of the segments are
represented at the top of the window by the same colours as corresponding segments.

Figure 14. The sound window with several opened segments

In the case of stereo-segments the right channel data is represented by a darker shade
of the same color as the left channel data (for example, left channel data is light-blue,
right channel data is dark-blue).

You can change the color of the segments using the Show►Segments list menu (see
section 5.6.4).

Playing file from the file open menu before loading it to the program is not supported in
the current version.

In any mode file open menus allow deleting one or several files from disk (for instance,
when free disk space is required). Just right-click on the file name and press <Delete>
(or select the Delete option in the context-sensitive menu). The system will ask for a
confirmation and the file will be sent to the recycle bin.

28
6.2 Standard file extensions
The following file extensions are supported:

z .ACA - average autocorrelation,

z .AUC – autocorrelation,

z .CEA - average cepstrum ('A'-accumulated),

z .CEP - cepstrum,

z .DAT - waveform,

z .ENE - energy,

z .LPA - average LPC,

z .LPC - linear prediction coefficients,

z .PCA - average coefficient of local correlation,

z .PCR - partial correlation,

z .PIT - pitch,

z .SPA - nominal Fourier spectrum (amplitude),

z .SPE - spectrogram,

z .ZFR - zero crossing frequency.

6.3 Reading text files


SIS allows reading text files containing waveforms, pitch and Fourier spectra, both
created by SIS and by other systems.

6.3.1 Text file headings


If the user creates a text file himself, he can provide it with a heading containing
3-4 lines:

z First line - STCautoidentification Text segment.

z Second line - FFT spectrum/Waveform/Pitch, where FFT spectrum stands for


a Fourier spectrum (fast Fourier transform).

z Third line describes the data format and should contain one of the following:
ƒ X,Y
ƒ F=%%%%%%
ƒ S=%%%%%%.
If the third line contains X,Y, it means that each following line contains two numeric
values, the first of which is an abscissa value (X), the second is an ordinate value (Y).
Please note that all abscissas should be arranged in the ascending order and change
with a constant step. An abscissa and ordinate values should be divided by space.

29
Example:
STCautoidentification Text segment
Waveform
X,Y
0 523
6.25e-005 459
0.000125 463
0.0001875 527
0.00025 602
0.0003125 536

If the third line contains F=%%%%%%, then the fourth line should contain the Y
symbol. F means the signal sampling frequency value in Hz, e.g. F=11025. Then each
subsequent line should contain one numeric value (signal value in the next point). The
first signal value corresponds to abscissa (time) value equal to zero.

Example:
STCautoidentification Text segment
Waveform
F=11025
Y
-87
-78
-88
-97
-111

If the third line contains S=%%%%%%, then the fourth line should contain Y. S
means the step with which the abscissa value (X) changes, e.g. S=10.7666.

Then each subsequent line contains one numeric value (for the next point). The step for
each signal should be written in standard units, e.g. in Hz for an FFT spectrum type
and in seconds for a Waveform. The first signal value corresponds to abscissa (time)
value equal to zero.

Example:
STCautoidentification Text segment
Spectrum
S=10.7666
Y
34.1674
59.8291
85.2392
96.8834
104.305

30
6.3.1 Reading text files without headings
The user can read text files containing one or two columns with numeric values. Signal
values with a constant step are placed in one column, while two-column arrangement is
used to represent abscissa and ordinate values.

The file should have a .TXT extension. When you open such a file from SIS, the system
will display the following query: "File type is detected by extension as text file. Do you
agree?" If you choose No, the system will read the file in binary format, PCM 16 bit. If
you choose OK, the system will start setting the signal parameters. You will first see a
dialog window shown in Fig. 15.

Figure 15. The segment type selection window

Under First three the actual first three lines of the processed file are displayed – for the
user to make sure this is the right file. Up to 30 characters can be displayed from each
line.

Clicking with the mouse on one of the last four lines, the user can either cancel the
procedure (pressing Cancel) or select the desired segment type (Waveform, Pitch or
FFT spectrum). When you click OK, a format selection window will appear (see
Fig. 16).

Figure 16. The format selection window

Under First three the actual first three lines of the processed file are displayed – for the
user to make sure this is the right file. Up to 30 characters can be displayed from each
line.

31
Selecting X,Y means that each line contains two numeric values, the first considered as
a coordinate, the second – as a signal value. The coordinates (abscissas) must change
with a constant step. If not, SIS will calculate the mean step value and will use it for this
segment.

Y means that each line contains one numeric value standing for signal value in the
specified point.

Clicking Cancel, you will cancel the operation and the file will not be read.

After you have selected one of the suggested formats, the following window will be
displayed:

Figure 17. The sampling rate and step setting window

One of the values, sampling rate or step, has to be entered. If you enter the non-zero
value of sampling rate, the step will be calculated via sample rate. If you enter the non-
zero value of step, while the sampling rate is zero or not entered, the system will use
this step value. If both values, sampling rate and step, are zero or not entered, the
system will calculate step for sampling rate equal to 11025 Hz.

After pressing OK the system will start reading the values until the file ends or until 100
format errors are detected, or SIS virtual memory is overflowed.

32
7 DATA INPUT FROM SOUNDCARD

When recording data via the line input or from a microphone, you need not synchronize
signal actuation on your input device with the moment of pressing the record button in
the program menu. Initially, after you press the record button (or select
Data►Input from sound card), a settings dialog box will be displayed and sound
input will start in the so-called “tuning mode” for you to adjust your incoming signal
level. To start signal input, press <spacebar>.

In the tuning mode you can control the volume of the input sound with the Windows
mixer. During recording you can open the mixer window and adjust the volume to the
desired level using the sliders.

Signal input can be performed only if the sound window is active. Thus, after you
L have adjusted signal level in the mixer window, activate the sound window again
by clicking on it with the mouse pointer.

7.1 Sound signal presentation by SIS and its relation


with sound card bit range
The SIS program may present sound signal in two formats - 16-bit integer (int16) and
32-bit floating-point number (float) within the following ranges:

z for int16 format: [-32768; +32767];

z for float format: [-1038 ;+1038].

When performing a 24-bit signal input using analog-to-digital converter, the program
receives 24-bit integer from the sound card, converts it to the floating-point number
(float) and divides the result by 256 (i.e. normalizes a signal to 32767 limits). As a
result, a 24-bit signal amplitude will not exceed 32767 value, but a bit range precision
will be increased in 256 times. This allows you to use signals of both formats in the
same window.

At the same time, a float-signal amplitude may exceed an int16-signal amplitude on a


few orders.

It is not forbidden in the SIS program to operate with float-signal with amplitude value
greater than 32767, but 10 times reserve is kept for amplitude limited by 320000 value.
This reserve is needed, if a signal value exceeds 32767 after some kind of processing
(e.g. filtration). In this case, the result will not be clipped and the useful signal will
remain unaffected. So as not to decrease signal quality after such processing it is
recommended to normalize a signal to limits under 37767.

33
There is no prohibition to generate signals with amplitude value greater than
L 32767 in the SIS program. But we don't recommend you to operate with such
signals as they can't be reproduced by the sound card correctly.

When performing a float-signal output onto 24-bit card, the program multiplies a signal
amplitude value by 256 and converts the result to integer (int16). Thus, if signal
amplitude value exceeds 32767, a bit range overflow will occur and a signal (sound) will
be clipped. That is, as mentioned above, the signals which amplitude value is greater
than 32767 can't be reproduced correctly even by 24-bit sound card.

7.2 Sound input parameters


To input data from the sound card, use the Data►Input from sound card menu item
(<Ctrl>/R) or the button. A dialog box with the following sound recording options
will be displayed:

Figure 18. The Sound Record Option window

In the Sampling rate field the user has to specify the sampling rate (frequency) at
which the sound will be recorded. By default it is 11025 Hz. Any value within the [4001 -
192000] range can be entered here.

Then the user has to choose a mode of input:

z Mono - mono input (only the left channel is recorded);

z Stereo - stereo input;

z Mixed mono - the sound is recorded as a stereo signal from two channels and
the channels are mixed (added up) during the input.

File format of the recorded signal depends on the input accuracy (16- or 24-bit).
Recorded signal can be saved either as 16-bit, or as 24-bit data, regardless of the
formats supported by your sound card. The maximum sound card resolution is shown
below the Input signal resolution group.
34
Program automatically checks if the sound card supports 24-bit data. To suppress
checking, include into the file sis_70.ini the following line:
[SOUND CARD]
BIT24_MODE = 0
If program is supplied to operate with third-party sound cards, the sis_70.ini file
always contains this line.

To suppress signal output during signal recording, check Mute output. In this case, you
can estimate a signal level by its waveform.

Having set the necessary parameters, press OK. Then a new window will be created, the
system will pass into the digital waveform viewer mode and wait for the user’s request
to start the recording of the sound signal.

Pressing Cancel you will close the menu and return to the previously active window or
menu.

7.3 Digital waveform viewer


If at the moment of passing into a digital waveform viewer mode the current (top)
window is not empty, the system will create a new window which is placed, whenever
possible, in the less occupied screen area.

In the digital waveform viewer mode the user has an opportunity to control the source
signal and to start/suspend/stop sound recording into computer memory.

Once the signal input mode is activated, a box appears in the current window with the
input signal displayed in the yellow color. To start recording, press the <spacebar>.
The signal will change its color from yellow to blue. When you press the <spacebar>
for the second time, the recording process will be suspended. To resume recording,
press the <spacebar> again. To quit the digital waveform viewer mode press <ESC>.

The input signal will be displayed in the window and automatically assigned a name
NONAME1. The next such window will be called NONAME2 etc. You can subsequently
edit its name in the usual way.

The following key combinations are available for data zooming and shifting during sound
input:

z <Home> - 2x vertical zoom-in;

z <End> - 2x vertical zoom-out;

z <PgUp> - visible signal upward shift by 20%;

z <PgDn> - visible signal downward shift by 20%.

35
7.4 Visible speech of currently input signal
If during signal input from the sound card you press the <F4> key, a new window will
appear under the waveform window. It will contain visible speech (spectrum) of the
input signal displayed continuously and simultaneously with the signal waveform.

The spectrum is calculated with the following parameters:

z Gaussian weighting window (Gauss);

z Number of counts depending on the sampling rate:

ƒ 2048 - for sampling rates less or equal to 32000 Hz;

ƒ 4096 - for sampling rates from 32000 to 64000 Hz;

ƒ 8192 - for sampling rates from 64000 to 128000 Hz;

ƒ 16384 - for sampling rates from 128000 to 200000 Hz.

The following key combinations are available for data zooming and shifting in the visible
speech window:

z <Home> - 2x vertical zoom-in;

z <End> - 2x vertical zoom-out;

z <PgUp> - increase the image contrast;

z <PgDn> - decrease the image contrast.

Press <TAB> to move to the source signal (waveform) window and back.

To stop visible speech display, make the visible speech window active and press
<ESC>. The source signal will still be continuously displayed in the waveform window.
To suspend/resume signal input press <spacebar>, to quit signal input mode press
<ESC>.

36
8 PLAYING SOUND DATA

To be played, the speech signal (waveform) should be located in a currently active


window. If a currently active window is of the “visible speech” type and the source
speech signal (waveform) has not been deleted or changed, the system will
automatically detect the source waveform and will start the playback.

During playback the cursor will be immovable and located in the window center; the
“visible speech” will be moving relatively to the cursor. Pressing the <spacebar> will
stop sound playback and “visible speech” movement simultaneously.

In the pause of playback (after pressing the <spacebar> or button on the toolbar)
you can set one or several constant marks by pressing <Insert> key. By pressing the
<spacebar> once more sound playback will be resumed from where it stopped.

Now if you press the <Insert> key, a constant mark will appear in the current cursor
position, or you can press the <spacebar> and sound playback will be resumed from
where it stopped.

In the Playback menu you can choose the desired fragment to be played:

z All data. All downloaded sound data will be played.

z Highlighted. Only highlighted data (a highlighted interval between constant


marks) will be played.

z Between temporary marks. Only data located between temporary marks will be
played (there should be two of them).

z From temporary marks to end. Sound data from the second (right) temporary
mark to the end will be played.

z From start to temporary marks. Sound data from the beginning to the first
(left) temporary mark will be played.

z Visible in window. Only data currently visible in window will be played.

z Visible data and later. Data from that currently visible in window to the end will
be played.

z All segments in window. All segments in the active window will be played one
by one, beginning with the top one. The system makes a pause between different
segments – the pause duration can be specified in the Playback►Options menu.

z Selected. Only data fragments marked by selected (ticked) marks will be played.
Marks can be selected in the Marks list.

37
If the loop mode button is not pressed, the playback starts from the beginning of
the fragment and stops at its termination. Playback can be terminated only by pressing
<Esc> or button on the toolbar. Other user’s actions, except playback mode
operation, will be ignored.

If the loop mode button is pressed, the sound fragment will be played repeatedly.
To play the fragment again, press <Home>. To return to the loop mode, press
<spacebar>.

Note that in the pause of playback any pressing <Home> will switch the system
L into the one-time playback mode, while any pressing of the <spacebar> will
return the system back to the loop mode.

To start playback, press <F6> on the keyboard. Also, the following key combinations
can be used for playback of speech fragments:

z Shift/F6 – speech between temporary marks;

z Ctrl/F6 – speech visible in window;

z Alt/F6 – speech visible in window and to the end.

You can change signal playback frequency in the Playback►Options menu. After you
press OK, the signal in the current window will be played at a specified frequency. To
reset the frequency, enter the Playback►Options menu again and quit it by pressing
Cancel (or <ESC>).

The Options menu also allows you to set the time interval (pause) between the
playback of segments in the current window. Press the <PLAY> button to start
playback right from the Playback►Options menu.

Without quitting the playback mode (in pause of playback) you can also set one or
several constant marks by pressing <Insert>, scroll image, change size and location of
boxes. To assign names to the marks, use the Marks►Marks list menu. To resume
playback, just press the <spacebar> again.

The default window length is 5 sec (2.5 sec to the left and 2.5 sec to the right of the
active cursor) and can be changed within the 0.1 to 100 sec range in the
Options►Extra options menu (Data view size (sec) for speech visualization). To
resume playback, just press the <spacebar> again.

Using the button in the right side if each signal window you can control playback
speed. Placing the mouse cursor on this button, you can see the pop-up box with the
current value of playback speed coefficient (1.00 value conforms to an original playback
speed).

To change playback speed, left-click the speed control button and move the mouse
cursor to the right (to speed up playback) or to the left (to slow down playback) without

38
releasing the button. The playback speed coefficient can be chosen from the 0.33..3.03
range. When releasing the left mouse button, the current coefficient value will affect
playback speed until the user will change it again.

39
9 SAVING DATA

9.1 Saving data segments and fragments to disk


A segment with data of any type and any fragment of a segment can be saved.

On selection the Data►Save command:

z if *.DAT file was loaded, then whole data segment with marks and comments is
saved as a *.DAT file;

z if *.WAV file was loaded, then whole data segment without marks and comments
is saved;

z if another sound file was loaded, then menu Data►Save as opens.

A window can include some segments but only the top (active) segment is saved. The
name of the active segment goes first (from the left) in the window name. An active
segment name starts the name of the window (it is in the left). In order to make active
any other segment in the window, press its name with the right button of the mouse.

On selection the Data►Save as command, a Save As dialog box will appear on the
screen, where you will have to specify the location, name and type of the file to be
saved. In addition to the standard options, it has the Now you work with group-box
which specifies the type of fragment of data to be saved. You can modify the currently
indicated type using the buttons available under the fragment type field:

- all data;
- highlighted;
- between temporary marks;
- visible in window;
- selected.

Suggested for saving file types correspond to saving data types (see standard
extensions in section 6.2).

Under the field Now you work with the type of the saving fragment is shown. It can
be changed with the help of five buttons below. To save a data segment, choose All
data as a type of processing fragment.

To the right of the fragment type selection area, there are two check-boxes: Save
comments and Save marks. If the signal contains no marks or comments, these
options will be dimmed.

Remember that marks inside saving fragment or at its boundaries are saved only.

40
If the signal contains constant marks and/or textual comments, these check-boxes will
be accessible and the user can tick them if he wants this information to be stored with
the sound data.

Remember that marks, comments and other overhead information are saved only in
*.DAT files.

9.2 Saving data in the text form


All kinds of 2-dimensional data (waveform, pitch, spectrum, histogram etc.) can be
saved in the text form.

To save the data (fragment or segment) as text, use the Data►Text export option
available in the main menu. The Save as window described in the section 9.1 will
appear on the screen.

Each file is saved in the text format described in section 6.2. The data will be converted
into text without accuracy loss.

9.3 Saving current state of processing


The SIS program has possibility to save current state of processing (current state or
dispatch state) and return to that state later.

If several empty windows or windows with different types of visualization are opened,
you can save this state of processing with all its settings using the Data►Save current
state menu command.

Later you can continue working with the saved settings using the Data►Load saved
state menu command.

41
10 ZOOMING AND SCROLLING DATA
IN THE WINDOW

As soon as the data box appears in the window, you can make use of the zooming and
scrolling tools: a data scroll bar, buttons for horizontal and vertical zooming (double-
headed-arrows) and the third axis zoom button for 3-dimensional data.

DEFINITIONS:

A segment is a chunk of data which forms an entity and is not connected with other
data. For example, data read from a file will form a segment. Similarly, all data written
from the sound card within one session, upon the completion of sound input, will form
one segment. Each new segment in the window is represented by a different color, as
long as the number of colors is sufficient.

A fragment is a part of data which is singled out in some way from the segment but
has not lost its connection with the rest of the data. It can be, for example, some part of
a segment limited by temporary marks, or part of a segment in the highlighted interval
between constant marks, or part of a segment visible in the box.

10.1 Data scrolling


Once new data is read from disk, the data box width is adjusted according to the value
specified in the Windows►Options menu (Waveform window length; "Visible
Speech" window length). The window length is 5 sec by default and it is not affected
by signal duration. The system provides three ways of controlling horizontal and vertical
data display in the box:

(1) Select the Show►Change box position/length command in the menu and
modify the box left and right boundaries in seconds. This may change the position and
the width of the box. If, contrary to your expectations, the data disappears from the
box, use Show►All data (or press <F8>) to display all loaded data. If this does not
help, expand the upper and lower limits of the box (Show►Change box
position/height) or adjust them using the Show►Y-autoscaling command or press
<5> on the number pad.

(2-3) Use the horizontal scroll bar. The functions of the SIS scroll bar are similar to that
of standard Windows scroll bar, but also have some specific features. The scrollbar is
located at the bottom of the data box.

If you activate the left or the right arrow of the scroll bar, the coordinate
boundaries (seconds) of the horizontal axis in the box will be shifted by 3/4 of the
current box width and the data will be redrawn (3/4 value can be changed in the
Options►Extra options menu item).

42
The black marker in middle part of the scroll bar can also be used to shift
the data. For smooth scrolling of data drag the marker with pressed left mouse button.

If the horizontal box boundaries don't exceed the data boundaries, the total width of the
scroll bar corresponds to the total length of data; and the width of its black part
(marker) corresponds to the width and position of the box respectively. If you activate
any point on the scroll bar middle part using the mouse, the box boundaries will be
changed so that the left border of the black marker will be precisely in this activated
point. You should activate the left border of the scroll bar middle part to see the
beginning of the data. If the horizontal box boundaries exceed the data boundaries,
then the left border of the scroll bar middle part corresponds to the minimum of the left
box boundary and the left data boundary in the box; and the right border of the scroll
bar middle part corresponds to the maximum of the right box boundary and the right
data boundary in the box.

If there is an area without sound data inside the windowed selection, it will be marked
by a thin horizontal line at half height of the marker between the scroll arrows.

You can change the shift step of the visible area in the box using the Options►Extra
options menu (Data view shift). The shift value is always set in the proportion of the
visible area width and can be changed in the limits from 0.001 to 1.0.

10.2 Horizontal data zooming


One way of rescaling the image in the data box was described in section 10.1. Another
way is to use the mouse and a horizontal zoom button located at the bottom left of the
window, next to the window name. It looks like a double-headed horizontal arrow .
Using this button you can either spread or compress the signal image in the box. If you
click this button with the mouse, a rectangular will appear above the button. It is
divided into two parts: its left part is colored black, while the right part color is grey.
The mouse cursor will be located inside the rectangular at the border separating the two
areas.

If you press <ESC>, the rectangular will disappear without any changes.

If you click with the mouse on the border between the black and grey fields, no changes
will take place.

If you click on the black area of the rectangle, the width of the box (in seconds) will be
reduced according to the proportion between the distance from the left boundary of the
black area to the cursor and the whole length of the black area.

If you click on the grey area, the width of the box (in seconds) will be enlarged
according to the proportion between the distance from the left boundary of the black
area to the cursor and the whole length of the black area.

43
Thus, without entering the menu, you can increase the width of the box by 3 times and
reduce it by 15 times. You can make several clicks to obtain the desired zoom degree.

The third way of rescaling the image in the data box is to use the mouse wheel. Move
the mouse pointer to the horizontal axis until the pointer takes the form of the double-
headed arrow, and rotate the mouse wheel to spread or compress the signal image in
the box.

10.3 Selecting data fragment for display


The commands under the Show menu allow you to specify which data fragment of to be
displayed in the window:

z All data. All the current segment will be displayed. You can alternatively press
<F8>.

z Highlighted. Only highlighted data (a highlighted interval between constant


marks) will be displayed. You can alternatively use <Ctrl/F8> key combination.

z Between temporary marks. Data located between temporary marks will be


displayed (there should be two of them). You can alternatively use key
combination <Shift/F8>.

z From temporary marks to end. Data from the right temporary mark to the end
will be displayed.

z From start to temporary marks. Data from the beginning to the left temporary
mark will be displayed.

10.4 Vertical data zooming


For a quick change of the vertical zoom rate, use the vertical data zoom button
located to the right of the data box. With the help of this button you can either enlarge
or shrink the image in the data box along the vertical axis. Once you click this button
with the mouse, a two-color rectangular will appear to the left of the button.

If you press <ESC>, the rectangular will disappear without any changes.

If you click with the mouse on the border between the black and grey fields, no changes
will take place either.

If you click on the black area of the rectangle, the height of the box will be reduced (i.e.
the image will be enlarged) according to the proportion between the distance from the
left boundary of the black area to the cursor and the whole length of the black area.

If you click on the grey area, the height of the box will be enlarged (i.e. the image will
shrink) according to the proportion between the distance from the left boundary of the
black area to the cursor and the whole length of the black area.

44
Another way to modify the top and bottom borders of the data box is by using the
Show►Change box position/height menu. The values of the top and bottom box
boundaries are given in the same coordinates as those currently used for data display in
the box.

You an also use the Show►Y-autoscaling command or press <5> on the number pad
to adjust the bottom and top boarders of the box to the maximum and minimum signal
amplitude in the window.

Another way of rescaling the image in the data box is to use the mouse wheel. Move the
mouse pointer to the vertical axis until the pointer takes the form of the double-headed
arrow, and rotate the mouse wheel to spread or compress the signal image in the box.

10.5 Shifting and resizing the data box


You can change the size of the data box by modifying its height and length with the
button located in each window to the right of the data box. After clicking on this button
the mouse cursor disappears and you will see a dashed frame appear along the data box
boarders with a dashed horizontal line in the middle. You can perform the following
operations:

z Shift the data box upwards/downwards without resizing it. If one of the horizontal
lines of the frame goes out of the window boundaries during the shifting, it
becomes invisible on the screen.

z Resize the data box by modifying its bottom and right boundaries. To do this,
press and hold down the left mouse button. Now you can move the mouse either
up – to reduce data box height, or down – to enlarge it. Similarly, without
releasing the mouse button, you can modify the data box length – by dragging its
right boundary. You can only reduce the box length relative to its current length in
this mode. To enlarge it, use the horizontal zoom control described in the section
10.2. You can simultaneously modify the box height and length by dragging the
frame bottom right corner with the mouse diagonally across the window, while the
top left corner of the frame remains fixed.

z After you sized the data box as desired, you can change its position. Release the
left mouse button and move the frame with the mouse until it occupies the desired
position on the screen. The left and right boundaries will always remain within
current data box limits, while the top and bottom coordinates can exceed them.
After you positioned the frame the desired way, click with the right mouse button
and the data box will be modified accordingly.

If you press <ESC>, you will quit the mode without any changes.

45
10.6 Zoom mode
The Zoom mode is intended for a detailed examination of any part of the data in the
box. To use this mode, choose the Show►Zoom menu item.

If the active window contains 3-D data (spectrogram, cepstra and etc.), set at least one
temporary mark in the area of interest.

The Zoom mode is not used for spatial representations of data (axonometry,
L right/up deviation).

If the active window contains 2-D data (for example, a waveform), it is not necessary to
set marks in it. After selecting the Show►Zoom you will see a dashed frame appearing
on the screen. You can resize and position it to include the desired piece of data using
the mouse and arrow keys (similarly to the sizing of a newly created window - see 4.5
for details). After you modified the frame size and position the desired way, click the
right mouse button. A new window will appear containing the zoomed data. All
segments located inside the frame will be represented in the zoom window.

If you zoom on 3-D data (e.g. spectrogram) no frame will appear. To specify the point
you wish to zoom on, set one or two temporary marks there. You can move along the
segment in the zoom window (up, down, forward, backward) using the arrow keys. If
you press the <+> key on the number pad, the previous slice will be displayed in the
zoom window. No more than 4 slices, each in a different color, can be displayed in the
window simultaneously. The slice shift interval relative to the first slice (in msec) is used
for the slice name.

For stereo segments both channels will be represented (the right one has a darker
color). You can move along the segment (up, down, forward, backward) using arrow
keys. Each pressing will shift the cursor by one frame size.

While working in the Zoom mode, the mouse cursor is located within the zoom window
and you can't actually use the main menu. Therefore SIS provides a special set of hot
keys and key combinations for basic menu commands. Using these keys you can do the
following:

Key Function

Shifts the frame by a quarter of its width (or by one slice of


s "visible speech") to the left

Ctrl/s Shifts the frame by 3/4 of its width to the left (not used for
"visible speech")

Shifts the frame by a quarter of its width (or by one slice of


t "visible speech") to the right

Ctrl/t Shifts the frame by 3/4 of its width to the right (not used for
"visible speech")

46
Key Function

The frame width is reduced by half (the data is zoomed in)


Shift/s due to right boundary shift along the horizontal axis (not
used for "visible speech")

The frame width is enlarged by half (the data is zoomed


Shift/t out) due to right boundary shift (not used for "visible
speech")
u Shifts the frame up by 1/10 of its height
v Shifts the frame down by 1/10 of its height

<PgUp> Shifts the frame up by 1/2 of its height

<PgDn> Shifts the frame down by 1/2 of its height

The image is made symmetric relative to the zero axis


<Home>
within current data box dimensions

<F6> Plays fragment

<F5> Changes the data representation scale (linear/logarithmic)

Performs autoscaling along the Y-axis, with the top box


<5> boundary equal to the maximum signal value in the current
on the number pad window and the bottom box boundary equal to the minimum
signal value in the current window

Only for "visible speech" in the "zoom" window:


<+> One more slice is added, preceding the slice marked by the
cursor in the source window (but not more than four slices
on the number pad
in all). The displacement of the previous slice relative to the
last slice (in ms) is used as their names.

<-> For "visible speech" only:


on the number pad The last added slice is deleted from the "zoom" window.

<Insert> Sets a constant mark in the source window.

<ESC> Quit the "zoom" mode

For waveforms etc.:


- quit the "zoom" mode and display the zoomed data in
the source window at the same zoom rate.
<F9>
For "visible speech":
- save a "zoom" window together with all data in it after
quitting the "Zoom" mode.

The message line (at the bottom of the screen) lists the most frequently used hotkeys.

While in the "zoom" window, you can make use of the buttons available for other
window types as well. Thus, when viewing 2-D data (waveform, pitch) you can press the
cursor source button , move the cursor to the required place and set a constant mark
47
(pressing <Insert>). The constant mark will appear both in the zoom and in the source
window. For 3-D data the mark will indicate the current section of the "visible speech".
In 2-D data windows the mark will appear in the position corresponding to the middle
point of the zoom window.

Using a contextual menu (pressing the right mouse button) one can switch to the main
window for producing standard actions (listening, scaling etc.) without closing “zoom”
window and loosening its settings (size, shift). Later it is possible to return from the
main window to “zoom” window.

“Zoomed” fragments can be played in following modes (0.5 sec standoff can be changed
in options appeared on the right mouse button press):

z Visible in the “zoom” window data (±0.5 sec before and after, if necessary).

z Between temporary marks in the “zoom” window data (-0.5 sec before and
+0.5 sec after the selection, if necessary).

In the Options►Extra options you can change the step of visible data shift performed
by pressing right/left arrow keys in the Zoom mode. By default this value is 0.25 of the
visible area.

10.7 Editing the signal


The SIS system allows you to change the amplitude value of the signal. This can be
done using the signal editing option available in the Zoom mode window. Changes
made during the signal editing can be undone with the help of the Data►Undo
command.

To enable editing option, open the signal in the Zoom mode and press the right mouse
button. Then in the pop-up menu select Edit mode►Draw or Edit mode►Erase.

Once in the Draw mode, you can change the amplitude value of the signal with pressed
left mouse button. The correction of the signal occurs according to a principle of linear
interpolation.

Once in the Erase mode, you can use the left mouse button to erase the signal. Erasing
means that the amplitude value is being set to zero. To adjust the size of the eraser,
rotate the mouse wheel.

The waveform and the pitch must be shown in the linear scale (along the vertical axis),
while spectra and other signals may be shown in the dB scale; the signal value in the
count is shown in dB and ordinary units.

To disable editing option, deselect the corresponding item (Draw or Erase) in the pop-
up menu. To quit the Zoom mode, press <ESC>.

48
10.8 Visualization options
If you enter the Options►Extra options menu, you will be able to change the
following parameters:

z Visible data shift step (upon clicking on the arrows of the scrolling
indicator/control in any window). The default value is 0.75 of the visible area
width. The value is always set proportionally to the visible area width in the [0.001
- 1.0] range.

z Visible data shift step for the Zoom mode (upon clicking on the arrow keys). The
default value is 0.25 of the visible area width.

z Visible data size (upon pressing <F9> in the playback mode). The default value is
5 sec (2.5 to the left and 2.5 sec to the right from the stop point) and can be
changed in the [0.1 - 100 sec] range.

z Number of horizontal marks. The default value is 2 and can be increased up to 5.

z Scaling width is fixed. This mode provides a scaling width to be equal for all
windows. Scale divisions of vertical axis will expand until the numbers at
graduations exceeds 5 digits (with floating point). Then scaling width will increase
again.

z Shift data to zero after copying. When copying data from one window to another,
they will be shifted to zero. When deleting data from the beginning of segment,
remaining data will be also shifted to zero. If this option is not selected, coordinate
position of the beginning of the data does not change.

z Playback process visualization. The cursor will accompany the point of playback.

z Change box limits after copying. Each time, when reading a new file, the data will
be automatically scaled along a vertical axis. If this option is not selected,
autoscaling will be performed only when reading the first file.

z Mode COPY for drawing. In this mode upper segments will cover lower segments
during drawing. Alternatively, in the OR mode, the upper segments will be
transparent.

z Add comments to "visible speech". Coordinates of the vertical and horizontal


temporary marks will be displayed in the comment line at the top of the window.
The comment line will be created simultaneously with creating a visible speech
(spectrogram, cepstrum, autocorrelation).

z To cycle highlighted data copying. Highlighted data copying dialogue will appear
again after copying. Highlighted region between the constant marks can be edited
without exiting copying dialog window.

z Invert black color when copying window to clipboard.

49
z Use hours and minutes for time scaling.

z Synchronize linked windows. Enable this option to make system to synchronize


automatically the horizontal zoom rate in linked windows.

z Save configuration on exit. Configuration will be saved automatically each time


when exiting program. By default this mode is switched on.

50
11 OPERATIONS WITH CURSORS USING TEMPORARY
AND CONSTANT MARKS FOR SELECTING
DATA FRAGMENTS

11.1 Creating, shifting and deleting the vertical cursor


Apart from the signals and calculation results there can be objects in the box for
selecting and marking points and fragments of the signal - a vertical cursor, temporary
marks and constant marks (see section 11.5 on using the horizontal cursor). The
function of the cursor is to mark a point in the signal and to control the marks. There
can be just one cursor in the current box.

To evoke the cursor, either click with the left mouse button on the cursor source button
located in the top right corner of the current window, or double-click with the left
mouse button on any place of working area of the window. The mouse cursor will
disappear, and a yellow dashed line (cursor) will appear in the middle of the box. This
line will follow the horizontal mouse movements. A message line displaying the X-
coordinate of the window cursor will appear at the top of the box. When a message line
appears, the data box will be reduced in size and redrawn.

"Visible speech" (spectrogram, cepstrum etc.) may be redrawn rather slow when
inserting an empty message line. To speed up working, select Add comments to
"Visible Speech" in the menu Options►Extra options. An empty message line will
be created simultaneously with the creation of visible speech.

To quit the mode, press the middle (right) mouse button or <ESC>. The standard
mouse cursor will appear on the screen, and the vertical cursor in the box will
disappear.

You can alternatively evoke the vertical cursor using the Marks►Set cursor/mark
command. In the top left corner of the window cursor position along the X-coordinate
will be shown. If there are suitable signals in the box, the coordinates of the current
signal in the given point will be also shown separated by a colon.

To quit the mode, press <ESC>. The standard mouse cursor will appear in the screen,
and the vertical cursor in the box will disappear.

11.2 Temporary marks


Temporary marks are used for marking certain points or fragments of interest in the
signal. There can be no more than two temporary marks in one box.

To set a temporary mark, evoke the cursor pressing the cursor source button, move the
cursor to the desired position and click the left mouse button or press

51
<Ctrl>/<Insert> key combination. The X-coordinate of the mark will be displayed in
the comments line.

If there are two temporary marks already set in the window, the older mark will
disappear. If you press <Ctrl>/<Delete>, the nearest to the cursor temporary mark
will disappear.

Beginning from version 6.2.1, the SIS system offers a new mode for operating with
temporary marks. To enable this mode, in the Options►Extra options menu tick the
Select signal between temporary marks item. To set a temporary mark, just move
the mouse pointer to the desired position and click the left mouse button. The second
mark can be set in the same way. If the data box already contains two temporary
marks, the older one will disappear. The selected fragment between temporary marks is
displayed as a contrasting area.

To select fragment you can also press and hold down the left mouse button, and move it
to the desired position. The selected area is marked out by color. The left and right
edges of the selection correspond to the positions of temporary marks. The positions of
the edges of the selected fragment can be changed by dragging them with the left
mouse button.

To remove selection, press <ESC>.

11.3 Constant marks


Constant marks are used to mark certain points and fragments in the signal. The system
allows setting up to 500 constant marks.

To set a permanent mark, evoke the cursor, move it to the desired position and press
<Insert>.

If you press <Delete>, the nearest to the cursor constant mark will disappear. To
delete all marks simultaneously, select Marks►Delete all marks.

Beginning from version 6.2.1, the SIS system offers a new mode for operating with
constant marks. To enable this mode in the Options►Extra options menu, tick the
Select signal between temporary marks item. To set a constant mark, move the
mouse pointer to the desired position and press <Insert> key.

Any fragment between two constant marks can be highlighted. For that, just click with
the right mouse button anywhere within the marked region and select Make
highlighted item in the context menu. You will see yellow highlighting appear above
the selected fragment. To remove this highlighting, move the cursor to any place of the
highlighted fragment, press the right mouse button and select Make highlighted menu
item once more.

52
11.4 Marks list
All constant marks in the window are entered into the marks list. You can view this list
via the Marks►Marks list menu. The window containing the marks list additional menu
will appear on the screen (see Fig. 19).

Figure 19. The Marks list window

The marks list provides the following information:

z Mark order number.

z Mark position along the X axis (abscissa coordinate).

z Highlighting index (for the fragment from the current to the following mark).

z Signature – an arbitrary textual commentary assigned to the mark.

z Length of the fragment between the current and the following mark.

z Indicator of the mark visibility in the window.

Apart from the marks list proper, there are some additional fields allowing you to select
and deselect certain marks. If you have a highlighted fragment in your data, the
coordinates of the beginning and end points of the highlighted area, as well as its overall
duration will be indicated.

Besides, each mark within the highlighted fragment will be denoted with one of the
following characters (depending on its position in the highlighted area): s (start) –
starting mark, h (highlighted) – all marks between the highlighted fragment’s
boundaries, f (finish) – final mark. These symbols appear between the mark’s
coordinate and signature fields in the marks list.

53
If the mark is currently inside the data box, you will see the V symbol in the View field.
If you click on this symbol with the left mouse button, the corresponding data fragment
(from the current to the following mark) will be displayed.

To select a mark in the marks list, activate a check-box in the desired line pressing
<spacebar> or clicking the left mouse button.

To save marks list to a text file, use the <Export marks list> button. The following
dialog window will appear:

Figure 20. The Export marks list dialog window

This window contains a prompt how to convert a text file into a table form, using
Microsoft Word. Checking Save selected only, you can exclude saving the unnecessary
marks. After pressing OK, you will be asked to choose name and destination folder for a
text file.

11.5 Horizontal cursor and horizontal temporary marks


The horizontal cursor is used to find out the exact signal value at any point of interest.
To evoke the horizontal cursor, click on the cursor source button with the right
(middle) mouse button. The mouse cursor will disappear and a solid horizontal line will
appear in the window. It will follow all vertical moves of the mouse within the box limits.
The same horizontal cursors will appear in all linked windows (if there are any). Signal
value corresponding to the horizontal cursor position, will be represented in the
message line above the data box after the Y axis name. The value is represented in the
current measurement units (counts, seconds or Hertz).

“Visible speech” (spectrogram, cepstrum etc.) may be redrawn rather slowly when
inserting an empty message line. To speed up working, select Add comments to
“Visible Speech” in the menu Options►Extra options menu. An empty message line
will be created simultaneously with the creation of visible speech.

54
If you press the left mouse button, the temporary horizontal mark will appear in the
cursor position, and its coordinate position will be shown in the message line to the right
of the current cursor position value.

If you press the left mouse button once more, the second temporary horizontal mark
will appear.

Marks will appear until their total amount exceeds the Number of horizontal marks
parameter value. This parameter can be varied in the range from 2 to 5 using the
Options►Extra options menu. If you press the left mouse button again, the nearest
mark will be replaced by the new one, and the other marks remain unchanged.

Coordinate positions of the marks will be displayed in the message line to the right of
the current cursor position value in ascending order. Temporary horizontal marks are
set in the linked windows simultaneously, but their values are represented only if a
suitable comment line already exists.

After pressing the right (middle) mouse button or <ESC>, the horizontal cursor
disappears and the mouse cursor appears again. The comments line and temporary
mark values will remain in the window.

11.6 Saving the marks together with data


If you want constant marks to be stored together with the data, in the Save As window
(see section 9.1) select the *.DAT data format and check the Save marks option. Now
all vertical constant marks within the selected fragment will be saved together with the
corresponding text. At each subsequent opening of this file the marks will be loaded
together with the data automatically.

It should be kept in mind, however, that marks and comments (as well as all other
service information) can be saved only for *.DAT sound files.

55
12 GENERATING A TEST SIGNAL

The system provides an opportunity to form test signals of several types with various
parameters (amplitude, period, frequency). To generate a test waveform, select
Data►Generate test waveform. The following menu will appear on the screen
(Fig. 21):

Figure 21. The Generate test waveform window

This menu allows user to:

z Choose waveform type: sine wave, rectangles, saw (sawtooth), delta-impulse,


white noise.

z Set signal parameters: amplitude (magnitude) and period/frequency.

z Set signal sampling rate.

z Set signal length (in seconds).

z Choose between mono/stereo data format (by either enabling or disabling Stereo
option).

z Specify the signal bit capacity (by either enabling or disabling 24-bits signal).

Note that waveform type selection is performed by setting its magnitude to any value
above 0. You can generate a mixed signal by selecting several waveform types. All
signals with amplitudes above zero will be mixed during generation.

Having selected the desired signal type and having set all necessary parameters for
signal generation, press OK. The generated signal will be drawn in the active window.

56
13 DATA PROCESSING

To edit the data, enter the Edit menu using the keyboard or the mouse. In the drop-
down menu you will see listed all available operations.

The Now you work with item at the top of the list allows you to choose data fragment
to be processed. The currently selected type is indicated below.

13.1 Selecting source segment / fragment of data


The active (top) segment of the active (top) window always serves as the data source in
all kinds of editing procedures. If there are several segments in the window and you
don't want to work with the top one (its name is the leftmost in the window), you can
make any other segment active in one of the following ways:

1. Move the mouse cursor to the name of the necessary segment in the active
window and click the second (right) mouse button. The selected segment will
become active and the window name will be redrawn.

2. Move the mouse cursor to the symbol in the active window name line and click
the right (middle) mouse button. The bottom segment in the window will become
active, while the order of other segments will not change.

3. Select Data►Activate next and thus activate the next segment (second from
above) or select the menu Data►Activate previous to activate the previous
(bottom) segment.

The user can choose to edit not the whole segment, but some part of it. The currently
specified data type to be processed is indicated under the Now you work with entry at
the top of the drop-down menu. To change fragment type, enter this field and choose
the desired fragment type from the list. The following options are available:

z All data. The whole current segment of the active window will be processed.

z Highlighted. The highlighted area between permanent marks will be processed.

z Between temporary marks. Part of the current between temporary marks


(there should be two of them) will be processed.

z Visible in window. Part of the current segment displayed in the box will be
processed.

z Selected. Fragments of data located within selected intervals between permanent


marks will be processed (marks can be selected in the Marks►Marks list menu).

You can also change the fragment type by pressing the Data interval selection button
located at the right part of the window.

57
The specified fragment type will be immediately displayed under the Now you work
with entry.

13.2 Signal normalization


Normalization means multiplying by a constant and shifting the signal in each point so
that either the maximum amplitude of the signal becomes equal to the given value or
the entire signal values fall within the given interval.

Normalization can be used in the following cases:

z Before playback – to bring the signal to conformity with the bit range of the
digital-analog converter or to increase the volume.

z Before filtering – to reduce the effect of round-off errors.

To perform signal normalization, select the Edit►Normalisation command. An


additional menu will appear (see Fig. 22):

Figure 22. The Normalization menu

Here you should set the following normalization parameters:

z Choose normalization type from the two available: By magnitude or In interval,


and set necessary values for Maximum level (magnitude) and Minimum level.
If normalization by magnitude is selected, the last field value is not used. The
values are always set in the same units as those of the vertical axis in the active
window (counts, as a rule).

z Specify data fragment type for processing. This can be done by pressing the Now
you work with field at the top of the menu. A list of available options will appear,
identical to that described in section 13.1. Fragment selection can also be
performed using the buttons below:
- all data;
- highlighted;

58
- between temporary marks;
- visible in window;
- selected.

During stereo signal normalization both channels will change proportionally.

To apply normalization to all segments of the active window, check All segments in
window option.

After pressing OK, normalization will be performed and active window will be redrawn.

13.3 Operations with constants


To use operations with constants, enter the Edit►Operations with constant menu
item. The menu for setting the operation parameters will appear on the screen (see
Fig. 23).

Figure 23. The Operations with constant menu

The top fields are intended for data fragment selection and are described in sections
13.1 and 13.2.

Then you should specify the type of operation:

" + " – addition of constant (the constant is added to the signal),

" - " – subtraction of constant (the constant is subtracted from the signal),

" * " – multiplication by constant,

" / " – division by constant.

You should choose one of the operations and enter the necessary constant value in the
field located to the right of the Signal field under the Constant field.

59
Then, if you process a stereo signal, there will be the To be processed field with a
button next to it for channel selection. By pressing this button you can change the
processing mode. The one currently selected appears on the button. Three modes are
available:

z Both channels.

z Left channel.

z Right channel.

Channel selection option is not available for mono signals – you will see dashed lines in
place of these fields.

To apply the chosen operation with constant to all segments of the active window, check
the All segments in window option.

Press OK to start the process. After the operation has been performed, the active
window will be redrawn.

When processing 16-bit signals you should keep in mind that if the constant value is too
big, so that the result of the operation in one of the counts exceeds the bit range of the
integer (-32767, 32767), you will see the message: "Overflow. Request killed"
displayed in the message line, and the operation will not run. This doesn’t apply to
24-bit signals. For signal bit accuracy transformation see section 13.19.

13.4 Linear signal transformation


To run linear signal transformation, enter the menu Edit►Linear transformation
menu. The menu for setting the operation parameters will appear on the screen (see
Fig. 24).

The top fields are intended for data fragment selection and are described in sections
13.1 and 13.2.

Linear transformation consists in multiplying the signal by the linear function on the left
and on the right edge. The values for the left and right edge are to be set in the
Multiplier on left edge and Multiplier on right edge fields.

To apply linear transformation to all segments of the active window, check the
All segments in window option.

Press OK to start the process. After the operation has been performed, the active
window will be redrawn.

When processing 16-bit signals it should be kept in mind that if the constant value is too
big, so that the result of the operation in one of the counts exceeds the bit range of the
integer (-32767, 32767), you will see the message: "Overflow. Request killed"
displayed in the message line, and the operation will not run. This doesn’t apply to 24-
bit signals. For signal bit accuracy transformation see section 13.19.

60
Figure 24. The Linear transformation menu

13.5 Deleting a signal


To delete the signal, enter the menu Edit►Delete. You will see a warning message
informing you that the selected fragment will be deleted. You should either confirm your
intention to delete the fragment by pressing OK or cancel the operation by pressing
Cancel. If signal window contains constant marks, you will be suggested to select the
To process marks option.

If you press OK, the fragment or the whole current segment will be immediately deleted
and the window will be redrawn. If the fragment is deleted, all the next nearest data will
be moved to the released place, so the segment will become continuous again, but the
position of the temporary marks will not change and they will mark other data chunks of
the current segment. Constant marks will be processed along with the signal if the To
process marks option is selected. If this option is not selected, the position of the
constant marks will not change.

If you press Cancel, you will return to the Edit menu.

13.6 Appending a fragment to the end of the segment


This operation is used to append the selected fragment of the current segment in the
active window to the end of the current segment in the other window. The active
window is always on the top and the destination window is the second from the top by
default, if it has suitable or UNIVERSAL type (all empty windows have UNIVERSAL type).

To perform this operation, enter the Edit►Append menu. The operation menu will
appear on the screen (Fig. 25).

The top fields are intended for data fragment selection and are described in sections
13.1 and 13.2.

The lower part of the menu is intended for destination window selection and is similar to
the source–destination menu type described in section 4.6.

61
Figure 25. The Append menu

After pressing OK the operation will be performed and the destination window will be
updated. The only thing that can prevent this operation from being performed, is the
unsuitable destination-window type.

If you append stereo signal to mono signal, the data of the left and right channels will
be mixed. If you append mono-signal to stereo signal, the same data will be appended
to the both channels.

13.7 Inserting a fragment of the current segment into another


segment
To insert (a fragment of) data from one segment into another, select the Edit►Insert
command. The operation menu will appear on the screen. It is identical to that
described in the previous section (see 13.6). Once a destination window has been
created or chosen, press OK. You will see a destination choice submenu displayed on
the screen (Fig. 26).

Figure 26. The Destination choice menu

The menu contains 5 fields for selecting the place in the target signal for inserting the
specified data fragment:

z Before temporary mark. If there are two temporary marks in the destination
window, the data will be inserted before the left temporary mark.

z Before highlighted.
62
z After highlighted.

z To signal start.

z Before first mark.

Select the required option with the mouse cursor or using the arrow keys and press OK.

The insertion will be performed after that if the destination window has the destination
segment of the same type as the source segment. The data inserted in the destination
window will not be deleted from the source window. All the data following the point of
insertion will be moved to the right (to the greater abscissa values) to clear the place for
the insertion, but the position of the temporary and constant marks will not be changed
and they will mark other data of the destination segment.

If you insert stereo signal into mono signal, the data of the left and right channels will
be mixed. If you insert mono signal into stereo signal, the same data will be inserted
into both channels.

13.8 Copying the data


To copy the data, enter the menu Edit►Copy. The operation menu will appear on the
screen. It is identical to that described in section 13.6.

After you press OK, the operation will be performed. Regardless of the source data type,
the copied data will form a uniform segment with the destination window segment. All
constant marks will be copied with the data and placed in the new segment at the same
points as in the old one.

The fragments of the waveforms and pitch curves are copied with shifting to zero by
default (the left border of each copying fragment is equal to zero). You can change this
option by entering the Options►Extra options menu and correcting the field Shift
data to zero after copying. If this option is not checked, the fragment’s horizontal
borders will not be changed after copying. It may be convenient, but it is not simple to
find scattered segments. To display the upper segment in whole, use the <F8> hot key.

13.9 Moving a signal


To move a signal from one window to another, enter the Edit►Move menu item. The
operation menu will appear on the screen (Fig. 27).

Figure 27. The Move menu

63
Here you have to create or choose destination window and then press OK.

“Moving” consists in re-assigning the current segment to another window; data is not
copied. It saves a lot of time, especially when you work with long segments. The
destination window becomes active after segment moving.

This operation is not applicable to fragments of data – only entire segments can be
moved.

13.10 Using the contextual menu of an active window


Besides the operations described above, the SIS program allows you to delete, copy, cut
and paste data fragments using the contextual menu commands.

For example, to copy (cut) and subsequently paste the data fragment, you can do the
following:

z Select the signal fragment

z Click on it with the right mouse button and choose the Copy fragment (Cut
fragment) command in the appeared contextual menu.

z Click the right mouse button in the destination window where the copied (cut)
fragment is to be pasted and choose the Paste fragment command in the
contextual menu.

z In the dialog window, choose point of fragment insertion (see Figure 25) and
confirm the command.

Signal is displayed with temporary marks and is shown in the center of the window.
Image scale is changing in order that the inserted fragment occupies 50% of the window
length.

13.11 Shifting the current segment


To shift the current segment to the right or to the left, set two temporary marks in the
active window so that the distance between the marks is exactly equal to the desired
shift value. Then select the Edit►Segment shift menu command. A menu containing
two fields Shift left and Shift right will appear on the screen. Choose one of these
options by pressing on it. The menu will disappear, the segment will be shifted and the
active window will be redrawn.

13.12 Data smoothing


Data smoothing is available for average and instantaneous Fourier spectra, average
cepstra, average autocorrelation, average partial correlation, average LPC.

To smooth the data, select the Edit►Data smoothing menu item. The operation menu
will appear on the screen (Fig. 28).

The top fields are intended for data fragment selection and are described in sections
13.1 and 13.2.

64
Figure 28. The Data smoothing menu

Using the Create new segment field, you can confirm or cancel the creation of a new
segment. In case you have confirmed the new segment creation (tick the field), the
smoothed spectrum will be drawn in the window with the other color together with the
processed segment; if this mode is cancelled, only the smoothed spectrum will be drawn
in the active window, and the processed spectrum will not be represented in the
window.

Using the Geometrical average field, you can transfer the segment smoothing from
the linear dimension into the logarithmic one. If this mode is enabled (ticked), the
logarithms will be counted of the data before smoothing, and inverse operation will be
performed after it. The data must not have negative values to carry out this mode.

There is the Filter length submenu, which permits to choose the necessary smoothing
window length from the list. The window length is specified both in counts and in Hertz.
The minimum smoothing practically doesn't change the window length of the spectrum,
the maximum smoothing results in square polynomial of it. The approximation is
performed by the least-squares method using a method of the sliding polynomial.

The calculation starts after pressing OK. The smoothed spectrum or its fragment will be
drawn in the active window by another color.

13.13 Mixing the segments


Mixing can be required in case of making the sound effects or test signals (for example,
speech signal with noise of certain type). When the segments are mixed the
corresponding counts of the first and the second signals are summed and the results are
recorded. You can mix signals located in the same window.

65
To mix the signals, load the processed signals in the active window and enter the menu
Edit►Mixing. The operation menu will appear on the screen (Fig. 29).

Using this menu you can:

z Choose the processed fragment type: all data, highlighted data, data between
temporary marks, data visible in window, selected data.

z Set the Segment result type: current or new. If the current segment is chosen
as the result, the mixing result will be drawn instead of the current segment; if a
new segment is chosen as the result, the mixing result will be drawn on top of the
mixed signals in a different color.

z Set the length of the resulting segment (Result length): it can be set equal
either to the length of the current segment, or to the mixing interval (for example,
between temporary marks).

z Assign the desired weights to each mixed signal. There is a table in the menu
which contains all signals to be mixed and you can enter necessary weights for
each signal using the mouse and keyboard.

z Specify the result name.

z Set the result signal bit capacity (Result type): 16-bits signal or 24-bits signal.

Figure 29. The Mixing menu


After setting all the parameters, press OK. Signal mixing will be performed, and the
active window will be redrawn.

66
13.14 Clipping
Clipping is often applied for the partial reduction of extended impulse noise (when there
are pulse bursts and each has a large duration). To perform clipping, enter the menu
the Edit►Clipping menu item. The operation menu will appear on the screen (Fig. 30).

Figure 30. The Clipping menu


You should choose the desired clipping type (by magnitude or in interval) and the level
of the maximum and minimum values.

The top fields are intended for data fragment selection and are described in sections
13.1 and 13.2.

To apply clipping to all segments of the active window, check the All segments in
window option.

After pressing OK, the operation will be performed, and the active window will be
redrawn.

13.15 Changing signal sampling rate


Sampling rate modification (division) is usually applied to increase the spectral
resolution.

In the average spectrum of a signal with a 10000 Hz sampling rate, you will see two
spectral peaks separately if the distance between them is no less than 10 Hz (after
applying the Analysis►FFT power spectrum operation: frame size - 2048 counts,
Hann window). If you lower the sampling rate in about 80 times (down to 125 Hz), it is
possible to discern the peaks with the 0.12 Hz distance provided between them.

To obtain a signal adjusted by sampling rate, select the Edit►Sampling rate


command. An intermediate menu will appear on the screen, where you have to choose
one of two fields: divide by integer or set to arbitrary (Fig. 31). If you need to divide
the sampling rate by a large number, the first field is preferable, because it initiates a
high-speed procedure. The second menu field allows you to set the arbitrary sampling

67
rate to a lesser or bigger value than the initial rate. This procedure works slower than
the first one.

Figure 31. The Sampling rate menu

If the user has selected Divide by integer, the following Sampling rate menu will
appear on the screen. This is a standard menu of the source-destination type described
in section 4.6.

However, if the creation of a new segment is forbidden (disabled) in the options, as it


will be described below, you can only choose All data for processing. By pressing the
OPTIONS field, you can open the specific menu for this operation (Fig. 32).

Figure 32. The Sampling rate options menu (Divide by integer)


Here you can modify the following parameter values:

z Divisor [2-102] (value can vary in range from 2 up to 102). The new value
(in Hz) will appear in the field New sampling rate according to these values. You
should remember that spectral range is limited by a half of the sampling rate.
During the sampling rate division the whole spectrum in the range from a half of
the new sampling rate up to a half of the previous sampling rate will be
suppressed by more than 72 dB. High-frequency area of the residuary signal
spectrum (10%) falls into the transient region and will be distorted. So the
maximum undistorted frequency is shown in the Pass band frequency field (a
half of the new sampling rate minus 10 %).

z Create new segment. If this parameter is enabled (checked), a new segment


will be created; otherwise, the data will be written into the initial segment.

Having set necessary parameters, press OK. Operation will be performed. The active
window will be redrawn with the result segment. Pressing <ESC> you can interrupt the
operation (only after the new segment has been created).

If the user has selected the Set to arbitrary option, the following Sampling rate menu
will appear on the screen. This is a standard menu of the source-destination type

68
described in section 4.6. By pressing the OPTIONS field, you can enter the specific
menu for this operation (Fig. 33).

Figure 33. The Sampling rate options menu (Set to arbitrary)


Here you can modify the following parameter values:

z New sampling rate. New sampling rate value in the limits from 100 up to
48000 Hz. The value must of the integer type and set-on accuracy - 1 Hz.

z Submenu items: Fast transformation or Best accuracy. If you choose the Fast
transformation option, the processing will be performed four times faster, but
the signal within the bandwidth will be suppressed only by 55 dB. If you choose
the Best accuracy, the signal will be suppressed by more than 75 dB.

z Create a new segment. If this parameter is enabled, a new segment will be


created; otherwise, the data will be written into the initial segment.

Having set the necessary parameters, press OK. The operation will be performed. The
active window will be redrawn with the resulting segment. Pressing <ESC> you can
interrupt the operation (only after the new segment has been created).

13.16 Reverse
To perform signal reversion, you select the Edit►Reverse menu command. The
operation will be immediately performed and the active window will be redrawn.

Reversion changes the order of data points in the waveform within the given interval to
the inverted. For example, if the user enables the reverse of the signal part between 1
and 2 sec, with data counts from 10000 to 20000, then after performing the operation
the data counts will be swapped over as follows: 10000 to 20000, 10001 to 19999 and
so on.

It should be noted that if several neighboring intervals between the marks are selected
continuously, they will be interpreted by the reverse operation as one large interval in
the Selected data mode. And so the values will be swapped only for this large interval
but not within each of the small selected intervals.

69
13.17 Inversion
To perform the inversion, select the Edit►Negative menu command.

For signals having the integer format (waveforms, pitch, energy), this operation means
that all the signal values within the given interval will change their sign to the opposite
(i.e. they are multiplied by (-1)).

For average spectrum, this operation means that each value of the signal within the
given interval will be replaced by value inverted to the initial one (i.e. 1 divided by the
initial value), and thus the inverse spectrum is obtained.

This operation is inapplicable to "visible speech".

13.18 Modification of a signal tempo without voice pitch


distortions
This function allows you to obtain a slowed or accelerated signal without changing the
pitch.

To obtain a tempocorrected signal from the initial one, enter the menu
Edit►Tempocorrection. The Tempocorrection window will appear on the screen
(Fig. 34). This is the standard menu of the source-destination type. The work with this
menu is described in section 4.6.

Figure 34. The Tempocorrection options menu


By pressing the OPTIONS field, you can enter the specific for this operation options
menu and modify the following parameter values:

z Moderation coefficient. You can choose any value in the range [0.33..3].

z Period. An assumed pitch period (in sec). The envelope of the output signal
becomes sawtooth and specific extra-sounds arise when increasing this
parameter; the clicking arises when decreasing the period.

z Submenu items: Tempo accuracy or Result quality. In order to keep high


quality of the output signal, it is forbidden to multiply or delete some speech
sections. If you choose the Result quality option, such sections will be kept.
However, it will cause the specified moderation coefficient value to be applied
70
inaccurately (to about 0.01). If you choose the Tempo accuracy option, the
exact moderation coefficient value will be used even to the prejudice of quality.

Having set the necessary parameters, press OK. The operation will be performed, and
the active window will be redrawn.

13.19 Modulation
To perform signal modulation, select Edit►Modulation. Modulation consists in point-
by-point multiplication of two signals with a floating normalization of the result.

Modulation can be applied to 16-bit or 24-bit waveforms recorded in the mono format
and having the same sampling frequency. A 24-bit numeral is represented in SIS as a
32-bit numeral where 24 bit is for the stagnant part and 8 bit – for the exponent.

After you have selected this operation, a dialog window with a notification "You
modulate the active segment by second segment" will be displayed.

To start operation, press OK.

13.20 Waveform accuracy transformation


This operation is used to obtain a signal of required accuracy and is applicable to
waveforms only (mono or stereo).

This option allows you to convert a 16-bit signal into a 24-bit signal or vice versa.

To perform accuracy transformation, select the Edit►Accuracy transformation


command. Once the operation is started, the system will detect the top (active)
segment accuracy and prompt the target accuracy (different from the source signal
accuracy). The following dialog window will be displayed (Fig. 35):

Figure 35. The Waveform accuracy transformation menu


If the Create new segment option is checked, a new segment will be created.
Otherwise, the processed data will be written into the source segment, but its color will
be changed.

13.21 Mono/stereo signal transformation


In addition to the operations of stereo signal editing (copying, deleting, normalization,
filtration etc.), the SIS program allows you to perform the following:

z To separate a stereo segment to two mono segments.

71
z To roll stereo segment channels.

z To merge two mono segments into one stereo segment (provided they have the
same sampling and bit rates).

To use these features, enter the menu Edit►Mono/stereo operations and select the
desired operation type.

The operation of merging two mono signals may be lengthy if signals start from different
time points, because the part of the data from the earlier signal will be deleted.

Other operations are pretty straightforward and don't require any further explanation.

13.22 Operations with a service time channel


The commands contained in the Edit►Service time marks menu are intended for
processing sound files recorded with the help of dedicated equipment.

72
14 NOISE REDUCTION

The Noise reduction menu is intended for filtering sound signals. Depending on the
type of noise present in the signal, the user can choose one of the options listed in the
menu:

z Adaptive filtering:

ƒ Inverse filtering;

ƒ Wideband noise;

ƒ Tonal and regular noise;

ƒ Stereo noise.

z Stationary wideband noise.

z Filtering of IMPULSE noise.

z DYNAMIC filtering.

14.1 Adaptive inverse filtering

14.1.1 Method overview


Adaptive filtering algorithms are used to suppress strong spectral components and,
thus, unmask the speech signal, improve its intelligibility and reduce the operator’s
fatigability while listening. At the same time, this processing algorithm allows smoothing
frequency components of the signal. However, wideband interference is also enhanced
in by spectrum smoothing, which may result in degraded speech perception (because of
"rustling" or "crackling" noise arising in the output signal).

Gain range reduction leads to the weakening of this effect, but it also leads to the
weakening of the useful signal and the suppressed (residual) interference sounds
become more distinct. So, useful signal extraction requires a compromise between the
gain level of speech and noise on the one hand and the rejected interference on the
other.

Partial cancellation of the unmasked noise can be performed by signal bandwidth


limitation or by amplifying the most informative spectrum area using a timbre correction
filter.

14.1.2 Method parameters


To perform adaptive inverse filtering, select Noise reduction►Inverse filtering. The
Inverse filtering menu will appear on the screen. This is a standard menu of the
source-destination type described in section 4.6. By pressing the OPTIONS field you
can open parameter menu for this operation (Fig. 36):

73
Figure 36. The Inverse filtering options menu

Time delay of filter, sec – [1..1000] sec – sets time of correcting filter adaptation to
signal spectrum changes. The recommended value is 3-4. For non-stationary noise and
music 1-2 is recommended.

Maximum gain for one frequency, dB – [10..40] dB - limits the maximum


amplification of weak (speech and noise) spectrum components. The recommended
value: 20-30 dB.

Gain, dB – [0..10] dB - sets signal amplitude value at filter output. The recommended
value: 3-9 dB. Adjusted by ear.

DEFAULT SETTINGS – restores default parameter settings. Recommended in cases


when the method stopped working as a result of manipulations with parameter values.

Harmonics suppression - enables/disables intensive reduction of very strong


harmonic interferences. It doesn't affect weak and medium harmonics.

Frame length (counts) – 16, 32, ...2048 - sets the spectrum resolution, the number
of spectrum bands and the duration of the processed fragment of data. For large
numbers of narrowband interferences (tonal pulses etc.) the recommended value is
1024..2048.

Method version:

z Inversion.

z Contrasting.

74
The Inversion method version cancels high-power tonal noises, while the Contrasting
version, conversely, emphasizes frequency maxima in speech (if tonal interferences are
absent).

Timbre correction parameters:

z Timbre correction - enables/disables a timbre correction filter.

z Inverse timbre – if enabled, creates an addition to the timbre correction filter for
passing the signal outside the timbre correction filter passband. This allows you to
hear which part of the signal is not covered by the filter band.

Low frequency, Hz – [100..700] Hz - sets the filter lower passband frequency. The
recommended value is 200 Hz. In case of intensive low-frequency noise (hum) this
parameter should be increased.

High frequency, Hz – [2500..Fmax] Hz - sets the filter higher passband frequency. The
recommended value is 3600 Hz. In case of intensive broadband noise this parameter
should be decreased.

Low frequency gain, dB – [-24..+24] dB - amplifies the output signal in the low-
frequency area, to obtain more natural timbre.

High frequency gain, dB – [-24..+24] dB - amplifies or attenuates the output signal in


the high-frequency area. As a rule, attenuating the high-frequency part of the signal
(+10, +20 dB) improves signal perception. But in some cases, conversely, its
amplification can increase speech intelligibility.

14.2 Adaptive wideband noise filtering


The following processing modes belong to adaptive broadband noise filtering:

z Speech processing.

z Music processing.

z Noise cancellation in the pauses.

Timbre correction, harmonics cancellation and background enhancement methods can


be used as additional options in each of the modes listed above.

14.2.1 Method overview


The main purpose of the adaptive wideband noise filtration is to facilitate speech signal
perception in the presence of random broadband additive noise. This type of noise
includes rumbling, hissing, buzzing, roar sounds (conditioner, sea, wind, street). It can’t
be removed using one-channel adaptive filtration methods, spectrum smoothing or
equalizing because the noise spectrum is dispersed.

Adaptive wideband noise filtration allows cancelling noise both in speech and in pauses.
This does not improve speech intelligibility, but considerably reduces the tiring effect

75
during listening. This method of adaptive filtration allows you to cancel narrowband
noise types as well, which makes it quite a universal noise cancelling means. Adaptive
background enhancement is used for the unmasking of background acoustic signal
environment.

The Speech treatment mode is used for speech signal enhancement on the
background of unsteady broadband and tonal noise types related to industrial
electromagnetic hum, mechanical vibrations, apartments and street noise,
communication channels or recording devices noise. It allows speech signal unmasking
with signal-to-noise ratio in the range from -5 up to -10 dB.

The Music treatment mode is used for removing unsteady broadband noise from the
music signal.

The Removing noise in pauses mode differs from the Speech treatment mode in
that the initial signal remains unchanged in the speech activity areas.

The Extract background mode is used for background acoustic environment extraction
(i.e. the result will be the very components which are usually suppressed).

The Harmonics suppression mode is used for additional suppression of high-power


narrowband noise at 30-40 dB. This mode can be used independently (at 0 dB reduction
depth).

The Timbre correction mode is used for providing convenient speech sounding,
reduction of strong noise in the high-frequency and low-frequency areas, smoothing of
channel and recording devices frequency response function, enhancement of weak
speech signals and signals after filtration in the most informative spectrum band. This
mode allows you to "select" the most informative spectral area by suppressing the
signal in HF or LF area, where the noise exceeds useful signal.

One of the features of filtering based on adaptive spectral subtraction is the appearance
of "bells", "clicking", "purling" in the output signal, which is caused by the decreasing of
the initial signal noise level and unmasking its certain peaks. Increasing noise reduction
level will lead to the disappearance of these sounds, but useful signal reduction will be
also suppressed simultaneously. So a compromise between noise reduction and speech
distortion is needed for useful signal extraction. Unmasked noise can be "masked" once
more by increasing the reduction depth. Moreover, partial "bells" reduction can be
achieved by increasing the smoothing values (Spectrum smoothing and Smoothing
by time in the data smoothing parameters menu).

The specific feature of adaptive spectral subtraction in conditions of small signal-to-


noise ratios (less than 0 dB) is sound distortions and suppression of useful signal. To
avoid this, the most informative spectrum area needs to be amplified using a timbre
correction filter can be the way to resist this feature.

76
At last, when adjusting adaptive spectral subtraction settings, you should keep in mind
that background sounds can change dramatically after noise cancellation has been
performed. Thus, the processed roaring of the going car can sound as "purling". If you
want to use your ear intuition – to help you make out well-known sounds, you should
decrease the suppression range to 25-15 dB. This will lead to background and acoustic
environment enhancement. Then, slowly increasing the reduction range, you can
achieve the desired compromise between the noise reduction level and suppression of
useful speech signal.

Adaptive spectral subtraction can be used in combination with timbre correction


methods (after enabling the respective fields). Besides, Spectrum smoothing and
Smoothing by time methods additionally perform timbre correction by smoothing the
spectrum and in that way enhancing the spectral area where the signal is weakened.

If signal-to-noise ratio is small, the filtered useful signal will be weak. To increase its
amplitude it may be useful to apply additional amplification, set by the Maximum
amplification for one frequency, dB parameter.

The parameters are adjusted by ear. The influence of each parameter and
recommended values are listed below.

14.2.2 Method parameters


To perform adaptive wideband noise filtering, select Noise reduction►Wideband
noise. The Wideband noise menu will appear on the screen. This is a standard menu
of the source-destination type described in section 4.6. By pressing the OPTIONS field
you can open options menu for this operation (Fig. 37).

Figure 37. The Wideband noise options menu

The options menu allows you to control the filtration process by manipulating the
following parameters.

Method version – select one of the suggested methods:

z Speech treatment.

z Music treatment.

77
z Removing noise in pauses.

Reduction intensity – allows you to set one of the 10 available reduction degrees
(from Weak,1 to Strong,10). You can disable this parameter by choosing Turn off.
Disabling is necessary to enter the Advanced options menu (see Fig. 38 below),
otherwise, a warning message will appear at the bottom of the screen: "Turn off
DEFAULT SETTINGS to get access to advanced options", and access will be denied.

Timbre correction - enables/disables the timbre correction filter.

Low frequency, Hz – [100..700] Hz - sets the filter lower passband frequency. The
recommended value is 200 Hz. In case of intensive low-frequency noise (hum) this
parameter should be increased.

High frequency, Hz – [2500..Fmax] Hz - sets the filter higher passband frequency. The
recommended value is 3600 Hz. In case of intensive broadband noise this parameter
should be decreased.

Low frequency gain, dB – [-24..+24] dB - amplifies the output signal in the low-
frequency area. Can be used to obtain more natural timbre.

High frequency gain, dB – [-24..+24] dB - amplifies or attenuates the output signal in


the high-frequency area. As a rule, attenuating the high-frequency part of the signal
(+10, +20 dB) improves signal perception. But in some cases, conversely, its
amplification can increase speech intelligibility.

Advanced options - this field provides access to the additional options menu (Fig. 38).

) The Advanced options menu is available only if the Reduction intensity


parameter is turned off.

This menu is used when the standard set of options proved insufficient.

Figure 38. The Wideband noise advanced options menu

78
Time delay of filter, sec – [1..1000] sec - sets time of correcting filter adaptation to
signal spectrum changes. The recommended value is 3-4 sec. For non-stationary noise
1-2 sec is recommended.

Extract background mode is used for background acoustic environment extraction. If


this parameter is enabled (ticked), the processing result will be the noise but not the
useful signal (i.e. the very components which are usually suppressed). It is sometimes
useful to estimate the existing noise.

Frame length (counts) – 16, 32, ...2048 - sets the spectrum resolution, the number
of spectrum bands and the duration of the processed fragment of data. As you increase
the frame length, an ever larger number of different sounds become mixed in one
frame, and speech-free bands are eliminated. For spectral subtraction the recommended
value is 128..2048.

Suppression [1-40] - controls noise and, to some extent, speech suppression level in
the signal. Increasing this value will increase the suppression level.

Contrast [10-100] - controls the transition level between suppression and passing
areas in each spectral component of the signal. Increasing this value will lead to larger
speech signal amplitude and sharper residual noise sound. The recommended value is
80.

Reduction diapason, dB – [1..24] dB - sets maximum reduction depth (ignoring


timbre correction) of the spectral components during processing. Setting very high
values will result in the "music" sound effect in the output signal. If you set reduction
depth to 0 dB, the output signal will be equal to the input (if harmonics reduction is
disabled).

Harmonics suppression - enables/disables intensive reduction of very strong


narrowband harmonics.

Best accuracy – when enabled, this parameter increases filter performance and
accelerates its adaptation to noise, due to reducing the processing rate approximately
twice.

Spectrum smoothing – [0..9] - decreases the residual "music" noise as a result of


spectrum frequency smoothing. It leads to some decreasing of the useful signal
amplitude level.

Smoothing by time – [0..5] - decreases the residual "music" noise as a result of


spectrum temporal smoothing. It leads to some decreasing of the useful signal
amplitude and edge levels.

79
14.3 Adaptive filtering of stationary wideband noise

14.3.1 Method overview


Suggested algorithms of stationary wideband noise reduction are based on different
variants of the spectral subtraction method (MSS). The MSS method differs from the
adaptive frequency subtraction method, described in section 14.2, in that the noise
spectrum is estimated once before the filtration is started, which makes this method
applicable to stationary noise only. At the same time, since no adaptive methods are
used, this algorithm allows it to avoid adaptation noise.

The user should mark in the signal an area of pure noise with temporary marks. This
fragment will be used as a sample.

If the noise is stationary or almost stationary (the noise of a record, street, hall etc.),
this method is preferable to the adaptive ones, since the latter cannot accurately
estimate the background noise and, in addition to that, produce noise themselves. This
noise, called the adaptation noise, is perceived by human ear and worsens the quality
and even speech intelligibility in some nontrivial cases.

Using manual mode of average noise spectrum determination, the user has the
possibility to specify the signal fragment containing noise without accurately marking its
borders (Noise print autodetection).

14.3.2 Method parameters


To perform stationary wideband noise filtering, select Noice reduction►Stationary
wideband noise. The Wideband noise filtering menu will appear on the screen. This
is a standard menu of the source-destination type described in section 4.6. By pressing
the OPTIONS field you can open parameters menu for this operation (Fig. 39).

Figure 39. The Stationary wideband noise options menu

The Options menu allows you to control the filtration process by manipulating the
following parameters.

Method version – select one of the suggested methods:

80
z Denoiser by noise print.

z Noise print autodetection.

Both methods reduce noise, but if you select the first method, you should accurately
mark the noise sample in the signal using temporary marks while the second method
allows the selected fragment to contain some useful signal as well (i.e. the sample can
be marked with overlapping).

Reduction intensity – allows you to set one of the 10 available reduction degrees
(from Weak,1 to Strong,10). Increasing this value will steadily increase noise
reduction degree. You can disable this parameter by choosing Turn off. Disabling is
necessary to enter the Advanced options menu; otherwise, a warning message will
appear at the bottom of the screen: "Turn off DEFAULT SETTINGS to get access to
advanced options", and access will be denied.

Timbre correction - enables/disables the timbre correction filter.

Low frequency, Hz – [100..700] Hz - sets the filter lower passband frequency. The
recommended value is 200 Hz. In case of intensive low-frequency noise (hum) this
parameter should be increased.

High frequency, Hz – [2500..Fmax] Hz - sets the filter higher passband frequency.


The recommended value is 3600 Hz. In case of intensive broadband noise this
parameter should be decreased.

Low frequency gain, dB – [-24..+24] - amplifies the output signal in the low-
frequency area. Can be used to obtain more natural timbre.

High frequency gain, dB – [-24..+24] - amplifies or attenuates the output signal in


the high-frequency area. As a rule, attenuating the high-frequency part of the signal
(+10, +20 dB) improves signal perception. But in some cases, conversely, its
amplification can increase speech intelligibility.

Advanced options - this field provides access to the additional options menu
completely identical to that used in the Noise reduction►Wideband noise operation
(see section 14.2.2, Fig. 37).

14.4 Adaptive filtration of tonal and regular noise

14.4.1 Method overview


Adaptive signal filtering based on the Widrow method is used to remove narrowband
stationary noise and various regular interferences (e.g. car engine or falling water
sounds) from the signal.

One-channel adaptive noise filtration mode works with mono signals and is used for
periodic or similar to periodic interferences cancellation (vibrations, power-line hum,
household device noises, slow music, cars, etc). This mode can be used for unmasking

81
the speech signal (suppressing the tonal noise by 20..40 dB) or, in some cases, for
concert hall noise cancellation.

There are four method implementations: spectral, temporal, echo cancellation and
harmonics suppression.

The temporal method (In time domain) provides better rate of convergence, but
requires more computing power at an equal number of coefficients.

The main difference of these noise cancellation methods from others consists in that for
certain noise types they provide far better preservation of the speech signal – as
compared to other methods. This results from this filter principle of operation consisting
in noise compensation (subtraction), not multiplication by zero. In this case, the useful
signal remains unaffected, provided the adaptation is good.

However, the application of this method has some restrictions. Thus, using adaptive
noise reduction for mono signal, you can suppress only harmonics with relatively stable
phase. This mode is effective for stereo signals only in cases when the noise ratio in the
primary and reference channels changes relatively slowly.

The Echo cancellation method implementation is used to reduce echo and


reverberation.

The Harmonics suppression method implementation provides high calculation speed


and quickly adjusts to changes in the spectral composition of the interference. However,
along with harmonics, it also suppresses useful signal at the same frequencies. Unlike
the temporal and spectral methods, this mode allows you to suppress harmonic
interference of mechanical origin.

The main parameters defining noise reduction level are the number of filter coefficients
N (the Frame length parameter in the options) and delay time. These parameters are
different for mono and stereo modes.

You should set the delay time equal to less than N/2. Increasing the number of
coefficients leads to the ability to cancel more noise spectral peaks (spectral peaks of
the tonal noise follow with the interval equal to the voice fundamental period
frequency). So you should set 512-2048 coefficients for the 50 Hz interference. To
prevent speech signal distortion, a delay time must be set not less than 25 msec (250
counts at the 10000 Hz frequency).

The filter adjustment time is defined by the adaptation rate. The rate value is set in the
16..29 range for unstable noise types and in the 2..15 range for stable (slowly
changing) noise types.

If tonal noise is cancelled insufficiently, repeat the processing once more.

Sometimes it is necessary to use a timbre correction filter to improve speech signal.

82
14.4.2 Method parameters
To perform adaptive filtering of tonal and regular noise, select the Noise reduction►
Tonal and regular noise menu. The Tonal and regular noise menu will appear on
the screen. This is a standard menu of the source-destination type described in section
4.6. By pressing the OPTIONS field you can open parameter menu for this operation
(Fig. 40).

Figure 40. The Tonal and regular noise options menu

The Options menu allows you to control the filtration process by adjusting the following
parameters.

Frame length (counts) – 16, 32, ...2048 - sets the number of spectrum bands and the
duration of the processed fragment of data. 256-2048 is recommended.

DEFAULT SETTINGS – restores default parameter settings.

Method version:

z Spectral,

z In time domain,

z Echo cancellation,

z Harmonics suppression.

Delay (points) - changes from 0 to 1024. It is recommended to set 250 and more, but
less than the number of filter coefficients. If the delay is equal to 0.02 sec (200 counts
at sampling rate 10000 Hz), the quality of the processed speech signal will degrade.

Adaptation (on/off) - enables/disables the adjustment of filter coefficients. If


Adaptation is disabled, filtration will be performed with initial coefficient values. If the
coefficients weren't adjusted and saved previously and Adaptation is disabled, the

83
signal output will be equal to input. It is necessary to choose a signal fragment, if
possible without useful signal, enable Adaptation and Save coefficients. Once the
filtration is finished, disable Adaptation and process the signal using the saved (fixed)
filter coefficients. If noise features are stable or the noise source is stationary, the noise
will be cancelled (weakened) and the speech will be kept safe.

Save coefficients - if this mode is enabled, filter coefficients will be saved after
filtration and can be used for subsequent filtering, when the adaptation mode is
disabled. Otherwise, these values will be equal to zero after processing.

Show deleting noise - if enabled, this parameter will output cancelled noise without
signal. This parameter is useful to estimate how much of the useful signal is mistakenly
cancelled.

Adaptation rate – [2..30] - sets filter adjustment (convergence) time to the changes in
the noise spectrum. The recommended speed is 10-15. For quickly changing noise types
20-25 is recommended. The quality of the processed speech signal will become worse at
high rates.

Adaptation threshold – [0..32000] counts - sets waveform amplitude value, the


exceeding of which initiates filter coefficients adjustment. This parameter secures filter
coefficients values during processing low amplitude waveforms (for example, in noise
pauses).

High freq. contrasting – when enabled, greatly increases high frequencies, which
sometimes leads to intelligibility improvement.

Timbre correction - enables/disables a timbre correction filter.

Low frequency, Hz – [100..700]Hz - sets the filter lower passband frequency. The
recommended value is 200 Hz. In the case of intensive low-frequency noise (hum) this
parameter should be increased.

High frequency, Hz – [2500 – Fmax]Hz - sets the filter higher passband frequency. The
recommended value is 3600 Hz. In case of intensive broadband noise this parameter
should be decreased.

Low frequency gain, dB – [24..+24]dB - amplifies the output signal in the low-
frequency area. Can be used to obtain more natural timbre.

High frequency gain, dB – [24..+24]dB - amplifies or attenuates the output signal in


the high-frequency area. As a rule, attenuating the high-frequency part of the signal
(+10, +20 dB) improves signal perception. But in some cases, conversely, its
amplification can increase speech intelligibility.

Having set the necessary parameters, press OK. Choose the processed fragment type in
the displayed menu and filtration will start. Pressing <ESC> will interrupt the filtration.
If the filtration was interrupted, part of the data will remain unprocessed. Once the

84
filtration is completed, the filtered segment will be displayed in the active window on top
of the source segment.

14.5 Filtering of impulse noise


This method allows you to remove clicks, radio pulsed interference, etc. It reduces the
fatigability effect during listening and increases intelligibility after removing power
impulses.

Impulse noise filtering method is based on the substitution of pulse areas of the signal
by the interpolated and smoothed values. The signal remains unchanged in the areas
where pulses are not detected. The effectiveness of the filtration depends on the correct
choice of the processing parameters.

To run the filtration, enter the Noise reduction►Filtering of IMPULSE noise menu.
The operation menu will appear on the screen. This is a standard menu of the source-
destination type described in section 4.6. By pressing the OPTIONS field you can open
parameter menu for this operation (Fig. 41).

Figure 41. The Filtering of IMPULSE noise options menu

The Options menu allows you to control the filtration process by adjusting the following
parameters:

DEFAULT SETTINGS – restores default method settings.

Impulse length [1-5] - performs pulse detection and localization based on its
duration. It is set in relative units. If this parameter is decreased, weak and short pulses
will be detected better, while long ones – worse; and vice versa – if this parameter is
increased. The default value is 3.

Detecting threshold [1-10] - performs pulse detection and localization based on its
energy. It is set in relative units and is equal to 6 by default. If this parameter is
decreased, weak and short pulses will be detected well, but useful signal can be
damaged.

Contrast [-9,9] – similarly to the detecting threshold, performs pulse detection and
localization based on its energy. The value is set in relative units and is equal to 0 by

85
default. Increasing the contrast leads to a greater number of detected pulses. The
contrast stretches pulse energy axis and thus influences the pulse width estimation.

Calculation rate [1, 2, 4] - defines the compromise between speed and quality of
processing. The processing speed increases proportionally to step increase, which
results in some quality loss. The default value is 1.

Method version - allows you to choose one of three methods of signal processing:

z Impulse interpolation – the signal is restored by interpolation in the pulse


detection areas.

z Impulse smoothing - detected impulses are smoothed. This method is effective


only when impulse interpolation is not applied.

z Impulse detection - used for filtration parameters estimation. Outputs the


detection function: 32000 – if no impulse was detected and 0 – each time an
impulse is detected with probability 1.

Pressing the field corresponding to the type of processed fragment will initiate the
filtration process. The filtered fragment will be drawn in the destination window or, if it
is not available, in the active window above the initial fragment in another color.

1 14.6 Dynamic filtering


Methods of the dynamic filtration are used for speech intelligibility improvement under
conditions of large fluctuations in the signal level, e.g. caused by knocks (pulses of long
duration).

To run dynamic filtration, select the Noise reduction►DYNAMIC filtering menu item.
The operation menu will appear on the screen. This is a standard menu of the source-
destination type described in section 4.6. After pressing the OPTIONS field you can
choose one of the following methods:

z AGC (Auto Gain Control),

z Compressor,

z Limiter,

z Removing noise in pauses.

The AGC (Auto Gain Control) and the Compressor methods compress the dynamic
range of the signal, which weakens impulse noise and equals signal levels for speakers
talking with different loudness. Both methods perform the same function using different
means.

The Limiter method weakens strong lengthy impulses and weak background noise.

The Removing noise in pauses mode suppresses background noise in pauses.

86
The single parameter used for all these methods is Threshold. Once current mean
amplitude of the signal starts to differ from the specified threshold level, the signal will
be processed. The signal above the threshold level will be weakened in the AGC (Auto
Gain Control), Compressor and Limiter methods. The signal below the threshold
level will be weakened if the Removing noise in pauses method is selected. The
threshold level is 2000 (counts) by default.

14.7 Filtering stereo signals

14.7.1 Method overview


This operation processes stereo segments using the Widrow method. One stereo
channel contains a noisy signal, while the other contains only noise without the signal.
The noise in both channels should be correlated. The "signal + noise" channel (LEFT) is
displayed in a lighter color.

If the noise to be cancelled comes from a stationary source, both channels can contain
signal and noise, but then there must be an area in both channels without useful signal
(only source noise). The effectiveness of cancellation in standard acoustic conditions is
12-25 dB in this case.

14.7.2 Method parameters


To run the filtration, enter the Noise reduction►Stereo noise menu. The operation
menu will appear on the screen. This is a standard menu of the source-destination type
described in section 4.6. By pressing the OPTIONS field you can open parameter menu
for this operation (Fig. 42).

Figure 42. The Stereo noise compensation options menu

The Options menu allows you to control the filtration process by adjusting the following
parameters.

87
DEFAULT SETTINGS – restores default method settings.

Frame length (counts) – 16, 32, ...2048 - sets the number of filter coefficients.
Theoretically, the necessary number of the filter coefficients should correspond to
reverberation (echo) time in the room where the signal was recorded. In practice,
however, increasing the frame size results in slower processing and is sometimes
inefficient, because the noise to be cancelled is exposed to nonlinear distortions before
recording.

Adaptation (on/off) - enables/disables the adjustment of filter coefficients. You can


adjust filter coefficients before processing the signal, then disable Adaptation and
process the signal using the saved (fixed) filter coefficients.

Save coefficients - if this mode is enabled, filter coefficients will be saved after
filtration and can be used for subsequent filtering, when the Adaptation mode is
disabled.

Show deleting noise - if this parameter is enabled, the filtered signal will be at the
output, otherwise, it will contain the difference between the input and the filtered signal.

Delay (points) - sets the shifting value of the input signal relative to input noise (the
left channel relative to the right channel). For stereo signal this value should be set to 0.

Method version - allows you to choose one of the following processing methods:

z Spectral - noise search and suppression is performed during signal spectra


comparison.

z In time domain – noise search and suppression is performed during initial


signals comparison. The temporal method provides a much better rate of
convergence, but requires bigger computing power, at an equal number of
coefficients.

z Delay estimation - signal is not processed, the delay value (in counts) is
estimated and set.

z Space reflection - allows you to isolate the useful signal, if it comes into both
channel in phase. It happens if the speaker is located symmetrically with regard to
both microphones (delay = 0). Asymmetrical position can be taken into account by
setting a nonzero delay.

Adaptation rate [2..30] - sets adaptation speed. If the parameter value is too big, the
filter will become unstable. If the value is too small, the noise will be suppressed
weakly.

Adaptation threshold - [0..32000] counts - sets the waveform amplitude threshold.


Once this value is exceeded, filter coefficients adjustment takes place. This parameter
secures filter coefficients values during processing low amplitude waveforms (for
example, in noise pauses).

88
Timbre correction - enables/disables a timbre correction filter.

Low frequency, Hz – [100..700] Hz - sets the filter lower passband frequency. The
recommended value is 200 Hz. In the case of intensive low-frequency noise (hum) this
parameter should be increased.

High frequency, Hz – [2500..Fmax] Hz - sets the filter higher passband frequency. The
recommended value is 3600 Hz. In case of intensive broadband noise this parameter
should be decreased.

Low frequency gain, dB – [24..+24] dB - amplifies the output signal in the low-
frequency area. Can be used to obtain more natural timbre.

High frequency gain, dB – [24..+24] dB - amplifies or attenuates the output signal in


the high-frequency area. As a rule, attenuating the high-frequency part of the signal
(+10, +20 dB) improves signal perception. But in some cases, conversely, its
amplification can increase speech intelligibility.

To start adaptive filtration, press OK. To cancel operation, use the Cancel button.

14.7.3 Using stereo filter


To process a stereo segment containing signal and noise, activate the window containing
this segment. Then in the Options menu set the parameters of adaptive stereo filtering
(see section 14.7.2). Having set the necessary parameters, press OK.

Having returned to the source-destination menu, choose the processed fragment type.
It is recommended to specify All data if the noise is stationary throughout the whole
speech signal. Then the adaptive filtration of the specified signal fragment will start.
Pressing <ESC>, you can interrupt filtering.

Reduction of noise produced by a stationary noise source is usually performed in two


steps. First the Adaptation and Save coefficients options are enabled and the signal
fragment containing noise from the stationary source (without the useful signal) is
processed. The filter is thus adjusted to the noise source. Then the Adaptation option
is disabled and the whole signal is processed.

14.8 Signal processing with DirectX plug-ins

14.8.1 Sound Cleaner plug-in


SIS allows you to use the Sound Cleaner software for signal processing as a plug-in,
provided you have previously installed Sound Cleaner (preferably version 6.03 or
higher) and DirectX (v. 9.0 or higher) on your PC.

) You are strongly recommended to load Sound Cleaner first and convert it to
DX-plug-in mode before running it as a plug-in from SIS.

89
To run Sound Cleaner as a plug-in, press the toolbar button in SIS main window.

A menu containing three options will be displayed:

Preview. This option is used to adjust Sound Cleaner filtering parameters for current
signal processing. The signal will be transferred from SIS to Sound Cleaner for
processing and then back to SIS for playback. Selecting this option, you can adjust the
Sound Cleaner filter using all capabilities available there with constant audio control of
the filter performance. Having achieved the desired effect, return to the SIS window,
press <ESC> and select the next menu item.

Process. After you select this option, you will see a standard menu of the source-
destination type described in section 4.6. The Options menu includes two processing
options:

z Mute processing,

z Edit signal mode.

If you tick Mute processing, the signal will be filtered without being played, which will
significantly increase the processing speed. If this option is disabled, processing will be
performed simultaneously with playback in real time.

If Edit signal mode is enabled, the source signal will be replaced with the result of the
filtering. If you disable this option, SIS will create a new segment for the processed
data.

Friendly advice. Displays the following reminder:

Figure 43. The Friendly advice window

Keep in mind that Sound Cleaner version 6.02 and below will not be able to process
24-bit signals – you are recommended to upgrade the Sound Cleaner software to
version 6.03.

See the Sound Cleaner User Manual for more information about working with its
different sound filters.

90
14.8.2 Other DirectX plug-ins
SIS allows you to use any DirectX modules installed on your computer. To call external
filter press the button on the toolbar. The DX filters window shown in Figure 44 will
appear on the screen.

Figure 44. The DX filters window

The DX filters window contains toolbar and filters list. Functions of buttons on the
toolbar are described in the table below.

Button Action

Preview mode. Playback of processed signal without saving it.

Process mode. Process signal and transfer it to SIS.

Stop. Stop preview or processing.

Add. Add filter to the list.

Mute. Processing without playback.

Edit mode. The source signal will be replaced by processed signal.

To add filter to the filters list press the button on the toolbar and select filter in the
drop-down list. To include filter in processing tick its name in the filters list. Filters which
are not selected will not be included in filtering.

During processing filters will be applied in the same order as they appear in the list.

To open filter settings window, double-click on filter name in the list.

To delete filter from the list, click on its name in the list with the left mouse button and,
holding down it, press <Delete>.

Two modes of DirectX filtering are available:

z Preview mode is used to adjust filtering parameters for current signal


processing. Press to start preview. The signal will be processed and played
back.

91
z Process mode is used to process signal and save the result of filtering in the SIS
signal window.

If (Mute) option is selected, the signal will be filtered without being played, which
will significantly increase the processing speed. If this option is disabled, processing will
be performed simultaneously with playback in real time. If (Edit mode) is selected
the source signal will be replaced with the result of the filtering. If you disable this
option, SIS will create a new segment for the processed data.

14.9 Equalizer
This process displays signal spectrum and enables user to correct it via inverse filtering
and filter contrasting. Equalizer may work in automatic or semi-automatic mode; it is
also possible to tune the filter manually to make fine spectrum adjustments. This
module may suppress any stationary components of a signal regardless of their
frequency and location; it also may be used to raise the amplitude in a chosen spectral
band. This filter works well for phonograms containing considerable stationary noises
such as power-line noise, mechanical and engine noises and so on.

To launch equalizer, select the Noise reduction►Equalizer menu item.

Number displayed in the window header is current FFT window size. The larger it is, the
larger is number of equalizer bands and more fine and precise adjustments may be
made. To select number of bands, you should open the Options dialog box (described
further). Try to set largest possible number of bands to achieve best filtering quality and
precision, but remember, that it increases system load as well. FFT window size is
strictly determined by number of bands (in fact, it is equal to number of bands
multiplied by four).

14.9.1 Equalizer controls


In Figure 45 a standard equalizer window is shown.
You can see toolbar below the window header and a black window, where current signal
(yellow) and filter (green) spectra are displayed. X-axis zoom bar is just under this
window; currently visible area is marked with blue. Two red markers at the upper edge
of the spectrum window indicate passband borders (you may drag them to change the
borders). Lower half of the window is occupied with filter band adjustment sliders. In the
bottom of the window the Elastic mode indicator and the Additional FC adjustment
sliders are situated.

92
Figure 45. The DX filters window

14.9.2 Equalizer toolbar


Equalizer toolbar contains most important and frequently used process controls:

Set maximum horizontal zoom, i.e. each adjustment slider corresponds to a


single filter band.
Zoom in, increasing X-axis scale two times.
Zoom out, decreasing X-axis scale two times.
Zoom out, showing all the spectrum from 0 Hz up to half of the sampling rate.
Reset filter, placing all the adjustment sliders to zero.
Autoscale Y-axis according to maximum and minimum spectrum values.

Select a rectangular area to be displayed in the spectrum window. Select it and


press left button and see a dashed rectangle appear, then drag it to specify the area you

93
wish to view, and release the mouse button. Selected area will be enlarged to fit the
spectrum window.

Switch Y-axis from linear to logarithmic (dB) scale. When this button is pressed,
equalizer displays signal in logarithmic scale, otherwise the scale is linear.

Turn spectrum accumulation on and off. While this button is pressed, the program
accumulates signal calculating and displaying average spectrum. Depress it to switch
back to instant spectrum. It automatically stops spectrum accumulation.

Build inverse or harmonic filter (selected by user) basing on current spectrum (either
instant or average, if spectrum accumulation is on). Pressing this button automatically
stops spectrum accumulation. Note, that either filter is calculated within passband
borders only.

Contrast the filter. See section 14.9.3 for more information about filter contrasting.

If you have turned on Filter contrasting in the Additional Options menu,

) it will be automatically done during inverse filter calculation and/or automatic


filtration. In this case pressing this button will make to contrast the filter once
more.

Automatic filtration button turns off spectrum accumulation and then inverse or
harmonic filter calculation (filter type is selected by user). You may set time of
automatic spectrum accumulation in Options menu.

.Options. Displays Additional options window.

Save settings. Allows you to save equalizer settings to file with .eq_cfg extension.

Load settings. Allows you to load previously saved equalizer settings from .eq_cfg
file.

This indicator button turns red if a signal was over-amplified which caused an
overflow during equalizer output. Press the button to bring the indicator back to passive
state until the next overflow.

14.9.3 Equalizer options


Press button to open the Additional Options window (see Figure 46).

There you may choose number of bands for the equalizer. Generally speaking, setting
greater number of bands will increase filter performance as well as your PC load.

94
Figure 46. The Equalizer Additional Options window

The Filter Contrasting Options field occupies the center of the window. Contrasting
means that will automatically detect narrow gaps in the filter FC and then broaden and
deepen them. This operation may considerably improve filtering quality, especially if
there are clear local noise peaks in the spectrum of a signal. To enable filter contrasting
check the respective flag in the bottom part of the window. Then adjust contrasting
options:

z Discrimination Level [0-1] parameter represents relation between filter value in


the gap and on its edge. The program uses it to determine which filter gaps are to
be contrasted. 1 value means, that it will contrast all the local minimums; 0 –
filter won’t change. Values around 0.5 will make the program to skip small
(usually, natural) minimums at the same time contrasting large and distinct gaps.

z Analysis Window Width value determines maximum width of a gap (in Hz),
which will be considered "narrow" and, therefore, will be subject to contrasting.
Default value is 70 Hz.

z High Intensity flag, when checked, will slightly broaden the gaps in addition to
common contrasting effect.

At the lower part of the screen you may see a group of additional controls.
Accumulation time value is a duration of spectrum accumulation for automatic filter
calculation. Filter contrasting flag turns on respective operation, as was mentioned
earlier; Graphics drawing may be turned off to disable drawing of signal spectrum
and, thus, to enhance performance of weak PCs.

OK button will close the window saving all the changes, Cancel will discard them.

95
14.9.4 Zooming and scrolling
To zoom signal spectrum window in and out, you may use appropriate toolbar buttons,
buttons and horizontal zoom bar located below the spectrum area. Blue-marked
fragment of this bar indicates which part of the whole spectrum is currently displayed. If
the whole bar is blue, then the whole spectrum is displayed (if, for example, button was
pressed).

Click the left or the right mouse button over the bar to set respective borders of
displayed spectrum. You may also scroll it with arrow buttons to the left of the bar or
with arrow keys. In the latter case you have to set the focus on the bar. Pressing the
button or key moves the displayed area 1/16 (1/32 for large window) part of the whole
signal area to the right or left.

14.9.5 Adjusting filter FC


There are 16 spectrum adjustment sliders in the standard equalizer window. Current
signal level adjustment value in controlled band is indicated above each slider (in dB).
Highest and lowest possible values (+20/-72 dB), which correspond to extreme slider
positions, are given near the left edge of the window. There you may also see a number
of spectral bands, which are currently controlled by a single slider. If you zoom in and
out, this number will change, reaching 1 at maximum X-axis zoom level.

14.9.6 Elastic mode


The Elastic mode enables you to simultaneously adjust several sliders as if they were
bound together with an elastic thread. In this mode it is much easier to change filter FC
smoothly.

To include sliders in an “elastic” group, you should click left and right mouse button on
the bar below the sliders, thus setting left and right border of the group. Selected group
will be marked with blue indicator in the bar.

Note, that only sliders (not signal bands!) may be grouped and bound together. This
means, that if you zoom in or out, those sliders, which you have previously included in a
group will control other signal bands.

The button to the left of the Elastic mode bar turns linearization (smoothing) on and
off. Outlines of a filter built in Elastic mode will be smooth (not jagged) if this mode is
on.

14.9.7 Additional FC adjustment


Sliders Q1 and Q2 located at the very bottom of the screen provide additional filter FC
adjustment. This extra adjustment makes speech sound more natural and be more
comfortable for the listener’s perception.

96
The Q1 slider adjusts FC convexity within 100-800 Hz frequency band. The Q2 slider
changes FC increase/decrease for every 1000 Hz grade starting from 1000 Hz value.

Both sliders work within -18/+18 dB range with current value indicated to the left of it.

14.9.8 Equalizer operation modes


To start processing, select the Noise reduction►Equalizer menu command. This will
open the Equalizer options window (Figure 47).

Figure 47. The Equalizer options window

The Play sound while processing parameter, if enabled, allows playing signal during
filtering. If it is disabled, the signal will be filtered without being played, which will
significantly increase the processing speed.

The buttons at the top of the options window allows you to select data fragment to be
processed by equalizer (all data, highlighted, between temporary marks, visible in
window). The box located below these buttons contains the list of channels which are
available for processing (left channel, right channel, both channels).

Equalizer provides two operation modes:

z Preview mode – press the Preview button. The filtered signal will not be saved
in SIS window. This mode is used to adjust filtering parameters or to accumulate
spectrum.

z Process mode – press the Process button. This mode is used to filter the signal
and to save the result of filtering in SIS signal window. By default the filtered
signal is saved in the same window as the source signal. You can select another
window or create a new one in source-destination menu.

Pressing the Stop button, you can interrupts equalizer operation.

14.9.9 Stereo signal filtering


The noises in the left and in the right channels of stereo signal may be different. For
stereo signal processing it is recommended the following sequence of operations:
97
1. Adjust filter for the left channel.

2. Process the left channel. To do this, select Left channel in the Equalizer options
window.

3. Adjust filter for the right channel.

4. Process the right channel. To do this, select Right channel in the Equalizer
options window.

If there is no need to adjust filters for each channel separately, then after adjusting a
left channel filter select Both channels in the Equalizer options window. In this case
channels will be processed one after another without playback.

L In the Equalizer options window Both channels and Play sound while
processing options cannot be selected simultaneously.

98
15 ANALYSIS OF SPEECH SIGNALS

SIS provides various methods of speech signal analysis. Enter the Analysis menu and
select one of the analysis options listed below:

z Spectrogram,

z Cepstrum,

z Autocorrelation,

z LPC Spectrogram,

z Average…,

z Pitch extraction (Spectrum),

z Formants analysis,

z Energy (r.m.s.),

z Average spectrum,

z Histogram,

z Histogram parameters.

Selecting Analysis►More… you can use the following analysis methods:

z Linear Prediction Coefficients,

z Partial correlation,

z Zero crossing frequency,

z FFT power spectrum,

z Pitch extraction (LLK),

z Compare speakers.

To use any of above-listed methods, enter the respective Analysis menu item. After
choosing method, the operation menu will appear on the screen. This is a standard
menu of the source-destination type described in section 4.6.

The 3-D images (time-frequency-intensity) are used for the visualization of the listed
above 3-dimensional characteristics. The 3-D axis is perpendicular to the screen plane
and consequently 3-D data are represented in a particular way, for example, by
deviation to the right, using hues of grey or axonometry etc.

99
15.1 Producing 3-dimensional representations for 3-D data
To receive 3-D images of the specified characteristics, enter the Analysis menu item
and choose the required characteristic. The operation menu will appear on the screen.
This is a standard menu of the source-destination type described in section 4.6.

First you should enter the field Options and set the required parameters for calculation.
The third dimension can be represented in various ways, namely: by deviation to the
right and upwards, using axonometry, different colors, hues of grey.

The first two modes (right deviation and up deviation) produce practically the same
images.

Unlike in the first two modes, in the axonometric mode for intensity representation the
third axis is introduced for spectra images.

The main advantage of these three representation modes (as compared to colored and
grey hues), is that they cover a wider dynamic range. The dynamic range is limited, on
the one hand, by the screen ability to represent and, on the other hand, by the ability of
the user to distinguish the hues of grey and different colors.

Enter the field Drawing type in the Options menu for the specific operation and
choose the desired representation type in the list of the specified modes of data
representation (by clicking on the selected option with the mouse or by pressing
<Enter>). Your choice will be reflected in the corresponding field of the Options menu.

Having set the necessary parameters, press OK and you will return to the standard
menu. There you can create or choose a destination window for the output. Start the
procedure by pressing on the field with the desired data fragment type for processing.

15.1.1 3-D scaling options


The minimum value of the third dimension is always equal to 0 in the data box.
Therefore the third dimension image is completely determined by the same scale factor,
which is a multiplier for the data before representation. The higher this factor, the more
discernible are the weaker signal features.

To change this factor, enter the Show►Redraw 3-Dimensional data menu and
modify the factor value under the Multiplier for the 3-rd dimension field. The
multiplier value indicated there is the one used for the currently displayed 3-D data. If
there are no 3-D data in the active window, you will fail to enter the specified menu.

15.1.2 Scaling the third dimension for not yet calculated 3-D data
Everything described in the previous section applies to the already displayed 3-D data.
The scale multiplier for newly calculated and displayed data is 1 by default. To change
this multiplier, enter the menu Show►3-D drawing options menu item and modify
the multiplier value in the Multiplier for the 3-rd dimension field. When you enter

100
this menu item, you will see the factor value as set for the last representation. The
modified value of the multiplier will be used for the next 3-D calculations.

15.1.3 Changing the image limits


To change the image boundaries you can either use the scroll-bar located at the bottom
of any window, or set the image limits directly, using the Show menu.

The Show menu contains the Change box position/length item for changing the
image boundaries along the horizontal axis. Select this item and in the displayed Box
length window enter the values for the left and right box boundaries into the
corresponding fields. Press OK and the 3-D image will be redrawn within the new
boundaries.

The Show menu also contains the Change box position/height item for changing the
image boundaries along the vertical axis. To modify the image upper and lower limits,
select this item and in the displayed Box height window enter the desired values for
the top and bottom box boundaries into the corresponding fields. Press OK and the 3-D
image will be redrawn within the new boundaries.

Besides of changing data box location and size, you are also allowed to adjust frequency
band when performing spectrogram. To set frequency band, enter the Show►3-D
drawing options and select the Lower frequency limit and Upper frequency limit
values.

15.1.4 Adjusting brightness and contrast of the image


To control brightness and contrast of the 3-D image, use the button to the right of
the 3-D data box.

After left-clicking the button with the mouse, you can control the brightness using
special slider appeared in the box. The button located above the slider allows switching
from the brightness adjustment to the contrast adjustment and vice versa.

To exit the mode, you can press the <Esc> key or press the button once more or
click with the left mouse button anywhere in the SIS workspace.

You can also change the brightness fluently by pointing the mouse to the button and
turning the wheel.

15.1.5 Visible speech normalization


For normalizing “visible speech” (spectrogram, cepstrum etc.) use the button to the
right of the 3-D data box.

When pressing the button, the additional red axis and the red circle (indicating a
current value of normalization) will appear at the top of the window. Move the red circle

101
to the required point holding the left mouse button pressed. The data will be redrawn
after releasing the mouse button.

To exit the mode, press the <Esc> key or press the button once more or click with
the left mouse button anywhere in the SIS workspace.

15.1.6 Changing the graphic representation


To redraw a 3-D image in another representation mode, enter the Show►Redraw 3-
Dimensional data menu item. The Redraw 3-Dimensional data window will appear on
the screen. This menu contains the Drawing type field. Click on this field and in the list
of possible 3-D data representation modes select the one you need and press OK. The
3-D data image will be immediately redrawn.

The following drawing types are available:

z Right deviation,

z Up deviation,

z Axonometry,

z Colour,

z Grey scale.

15.2 Windowing for FFT

15.2.1 Theoretical substantiation of windowing


Signal decomposition into sine/cosine basis functions (Fourier transform) is justified only
for signals of infinite duration. However, since real signals are finite, and in most
situations we should consider time-varying characteristics of spectra, the calculations
are applied to finite components of a given signal. The analysis of a finite signal
component corresponds to the use of a non-finite signal multiplied by the rectangular
function, which is equal to “1” within the given interval and to “0” outside it. This
procedure is usually called “multiplying by a window” or “windowing”, and the function
by which the signal is multiplied is termed a “weighting (window) function” or a
“window”.

When a rectangular window is applied to the signal, the edges of a windowed area may
be very abrupt; this may result in distortions of spectrum structure, with spectral peaks
determined by the form and position of the window, rather than by the signal
properties. To reduce this effect, signal edges are commonly smoothed within the
analysis area, which is implemented by using a roll-off window function with values
decaying from the window centre to the borders.

102
The application of such windows in the spectrum domain results in smoothed spectrum
values and eliminated outbreaks of amplitude. However, this somewhat degrades the
spectrum resolution.

The application of analysis windows in the time domain corresponds (in the spectrum
domain) to the convolution of a signal spectrum with a window function spectrum. Thus,
the application of a rectangular window (i.e. when no weighting window is applied)
corresponds to the convolution of a signal spectrum with a rectangular function
spectrum (known as the Dirichlet convolution kernel). This convolution yields the so-
called "window” effects which consist in the smoothing of spectra of closely located
signals and in enhancing the influence of distant in frequency but powerful interference.

Each window function spectrum contains a main lobe and a number of side-lobes -
parasitic additional lobes affecting each spectrum value and thus degrading the initial
spectrum calculations. Side-lobes having large amplitudes may significantly affect a
given spectral count, even if they are quite distant. The side-lobe amplitudes of the
window function can be reduced only at the cost of increasing the width of the main
lobe, that is, at the cost of spectral resolution degradation. The choice of the weighting
window is used for controlling the effects related to the presence of side-lobes in the
computed spectrum.

The minimum width of spectral peaks in the data fragment weighted by the window is
determined by the width of the main lobe of this window and does not depend on the
initial data.

The side-lobes, sometimes also referred to as “spectral leakage”, affect the amplitudes
of adjacent spectral peaks. Since discrete-temporal Fourier transform is a periodic
function, the superposition of side lobes from adjacent spectral periods may cause
further shifts in the frequencies of spectral peaks. Leakage leads not only to peak
amplitude errors in discrete signal spectra, but can also mask the presence of weak
signals against the background of strong ones (formants with weak amplitudes against
strong ones) and, hence, hamper their detection.

A number of window functions can be suggested to reduce side-lobe level as compared


to the rectangular window (when no weighting window is applied). Reducing side-lobe
level will lessen the shifts in spectral values. This, however, can only be achieved at the
cost of increasing the width of the main lobe, which inevitably results in the degradation
of spectrum resolution. Consequently, some trade-off is to be found between the width
of the main lobe and the side-lobe suppression degree.

A number of parameters are used for the classification of window functions and
evaluation of their quality.

The main lobe bandwidth allows for frequency resolution assessment. Two parameters
are used to quantitatively assess the main lobe width. One conventional parameter is

103
the bandwidth at the half-power point, i.e. 3 dB below the main lobe maximum. The
other parameter is the equivalent bandwidth.

Likewise, two parameters are used for the evaluation of side-lobes. One of them is the
side lobe peak (maximum) revealing how effectively the window suppresses leakage.
The other is the side-lobes roll-off, i.e. the rate at which the level of the side-lobes
nearest to the main lobe drops. This value essentially depends on the used number of
counts N, and with N increasing tends to an asymptotic value expressed in decibels per
octave in the frequency bandwidth change.

15.2.2 Description of 5 most commonly used weighting windows


Below you will find a brief description of 5 most commonly used discrete-temporal
window functions suggested at different times for spectrum estimation.

z REСTANGLE – no windowing applied – the signal remains unchanged.

z HAMMING – Hamming window, i = 0.. N − 1 ,

⎛ ⎛ 2 ⋅ π ⎞⎞
SIGNAL [i ] = SIGNAL [i ] ⋅ ⎜⎜ 0.54 − 0.46 ⋅ cos⎜ ⋅ i ⎟ ⎟⎟ .
⎝ ⎝ N ⎠⎠

z HANN – Hann window, i = 0.. N − 1 ,

⎛ ⎛ 2 ⋅ π ⎞⎞
SIGNAL [i ] = SIGNAL [i] ⋅ ⎜⎜ 0.5 − 0.5 ⋅ cos⎜ ⋅ i ⎟ ⎟⎟ .
⎝ ⎝ N ⎠⎠
z NUTTALL – Nuttall window, i = 0.. N − 1 ,

i − ( N − 1) / 2
ARG = ,
N −1
SIGNAL [i] = SIGNAL [i] ⋅ (0.3635819 + 0.4891775 ⋅ cos(2 ⋅ π ⋅ ARG ) +
+ 0.1365995 ⋅ cos(4 ⋅ π ⋅ ARG) + 0.0106411⋅ cos(6 ⋅ π ⋅ ARG)) .

z GAUSS – Gaussian window, i = 0.. N − 1 ,

i − ( N − 1) / 2
ARG = ,
N −1
⎛ 1 ⎞
SIGNAL [i] = SIGNAL [i ] ⋅ exp⎜ − ⋅ (2 ⋅ α ⋅ ARG) 2 ⎟ , α ≥ 2 .
⎝ 2 ⎠

104
These windows have the following properties:

Window Rectangle Hamming Hann Nuttall Gauss

Maximum side-lobe
-13.3 -43 -31.5 -98 -115
level (dB)

Asymptotic roll-off of side-


-6 -6 -18 -6 (-139)*
lobes (dB/oct)

Equivalent bandwidth 1.00 1.36 1.50 1.80 3.50

Half-power bandwidth
0.89 1.30 1.44 1.70 3.0
(3 dB)

* - Gaussian windows applied to 16-bit and 24-bit signals have different width. So the
side-lobes level equals to -139.

Of all the windows listed in the table, the rectangular window has the narrowest main
lobe and the highest level of side-lobes.

The "square cosine" type of window is named after the Austrian meteorologist Julius von
Hann. It is often mistakenly called "Hanning" window.

The Gaussian window side-lobes in the double logarithmic scale do not tend to a straight
line, but the side slopes drop much faster than in any other described window.

The window of an "elevated cosine" type was introduced by R.U. Hamming and, thus, is
frequently referred to as the Hamming window. The multipliers 0.54 and 0.46 were
chosen to nearly completely eliminate the maximal side-lobe.

15.2.3 Equi-periodic window


The equi-periodic window is applied in some analysis types provided in SIS.

During the computation of the normal spectrum and its associated functions, a different
number of function periods is taken into account when harmonic amplitudes are
computed. Thus, if a 256 window spectrum is computed, the first harmonic with the 256
period fits only once into the window, while the last one with the 2 period will fit as
many as 128 times. As a result, high-frequency harmonics (with small periods) are
considerably averaged out, while low-frequency ones are not.

For the equi-periodic window an autocovariance function is computed so that the


analysis window length increases with frequency reduction, thus compensating for the
above-mentioned effect. The spectrum, cepstrum and autocorrelation are subsequently
computed via this autocovariance function.

105
Window length increases linearly with the increase of the harmonic period (the period is
inversely proportional to frequency), without exceeding the frame limits. The
proportionality coefficient is specified in the interval between 0.3 and 9. Click with the
mouse button anywhere outside the window selection menu. You will see the Enter
period multiplier [0.3,…,9] dialog appeared on the screen. Enter the desired value
and press OK. Pressing Cancel will restore the previous value.

You are recommended to set a small frame shift step to prevent skipping high-frequency
data.

15.2.4 Window selection tips


The strategy of window type selection is dictated by the compromise between the
spectrum distortions caused by adjacent side-lobes on the one hand (spectrum blurring)
and by distant side-lobes on the other (appearance of false peaks).

Thus, if strong signal components are located both close to and at a distance from weak
signal components, you should choose an analysis window with the same level of side-
lobes near the main lobe, to ensure a small shift step of spectral peaks.

If there is only one strong component far removed from the weak signal components,
you should choose a window with a quickly decreasing level of side-lobes (high roll-off
rate); whereas their level in the immediate proximity to the main lobe will be of no
consequence in this case.

If high resolution is required between closely located signal components, and there are
no distant components, the optimal choice would be a window with a very narrow main
lobe, while even a rising level of side-lobes may prove acceptable in this case.

If the signal has a restricted dynamic range, the characteristics of side-lobes are
irrelevant.

If the signal spectrum is relatively smooth, you can do without windowing.

To obtain a clearer representation of the signal, it is recommended to choose one of the


first three window types. The windows are listed in the decreasing order of the side-
lobes level and increasing width of the main lobe: Rectangular (none), Hamming, Hann,
Nuttall, Gauss. The effective window width as compared to the rectangular window is
reduced accordingly by a factor of 1.36, 1.5, 1.8 and 3.5.

15.3 Calculating a dynamic spectrogram


To characterize any complex sound acoustically, you should first get some information
about its pitch, about pitch harmonics frequency and about all its frequency components
relative intensity (i.e. how pitch and harmonics relate to each other with regard to
intensity). This data can be received using the spectral analysis of the sound. Dynamic
spectrogram enables one to receive a continuous diagram of the spectral characteristics
changes of the sound fragments of different duration. To get an image of the dynamic

106
spectrogram, enter the Analysis►Spectrogram menu item. The operation menu will
appear on the screen. This is a standard menu of the source-destination type described
in section 4.6.

A progress bar for the current process will be displayed at the bottom of the screen in
the message line. You can interrupt the process by pressing the <ESC> key.

Press the OPTIONS field to set the necessary parameters. After setting the parameters
you should create or choose a destination window and start the calculating process by
pressing on the desired data fragment type.

15.3.1 Parameter settings


You can set the necessary parameters in the Options menu (Fig. 48).

Figure 48. The Spectrogram options window

Manipulating these parameters you can receive the most acceptable visible speech
representation. The parameters included in this menu (and other menus enclosed in it)
are listed below.

Frame length. The user can choose the frame length from the list: 16, 32, ...16384
counts. When the frame length is set in counts, it is displayed in milliseconds in the
adjoining field. According to the frame size it is possible to receive a narrowband or
broadband spectrum. The spectral picture is more detailed on the narrowband spectrum,
and it is more general on the broadband spectrum. The frame size should exceed the
maximum pitch period value to receive the narrowband spectrum. The frame size should
be 256 counts or more for male voice, and it should be 128 counts or more for female
voice. The frame size should be lower than the maximum pitch period value to receive

107
of the broadband spectrum. It should be 64 counts for male voice, and 32 counts for
female voices.

Frame shift. Frame shift step determines the size, on which window shifts along the
signal. It is necessary to remember, that if the frame shift step exceeds the frame size
itself, not all the signal points will be involved in the spectrogram calculating process.
The optimum frame shift step value should be set within the limits from 1/4 up to 1/2 of
the frame length. When the frame shift step is set in counts, it is displayed in
milliseconds in the adjoining field.

Weighting window type. The following options are available:

z Hamming,

z Hann,

z Nuttall,

z None (rectangular),

z Equi-periodic, 2.0.

See section 15.2 for the description of window types and window selection tips.

Use result filtering. There is an opportunity to choose a filter at 0..55 points.


Geometric averaging of the image according to the chosen point number is performed
during your work with the filter. The averaging is applied for more accurate formant
imaging and for the pitch harmonics smoothing. If you want to see the separate
harmonics, the averaging is not performed.

Use normalization. If you enter this field, the additional menu will appear on the
screen. There you should set the following parameters:

z Ascending from ... Hz (for spectrogram).

z Ascending from ... points (for other).

z Ascending with... dB/oct.

Using these parameters, you set the starting frequency for the spectrum amplitude
increase and the rise length. For example, the spectrum amplitude increase from 200 Hz
on 6 dB/oct means the increase of the spectrum amplitude on 6 dB at 400 Hz in
comparison with 200 Hz and etc. Changing the spectrum increase you can get the most
obvious spectrum image on the high frequencies. Amplitude increase optimum value is
set by the trial-and-error method for each particular signal.

z Normalization to slice maximum:


ƒ None (normalization is disabled),
ƒ All signal (the whole signal is normalized),
ƒ Highest level only (fragments exceeding the set level are normalized).

108
z Normalize signal which is highest the level ... % from maximum.

z Null frequency consideration. If this option is enabled, filtration is performed


with consideration of the zero (constant) component.

Having set the normalization parameters, press OK and the Options menu will appear
on the screen again.

Drawing type. See section 15.1.

Drawing options. See section 15.1.

Number of formants. If the specified value is negative (below zero), the system itself
will determine the number of formants to be represented. If the zero value is set, the
formats will not be calculated and represented. If you enter a positive value here, the
specified number of the formants will be represented, but not more than 10.

Presets. Allows using standard parameters sets:

z Formants:
ƒ Male voice: Tenor (130-520 Hz), Baritone (110-390 Hz), Basso (80-350 Hz)
or ContrBasso (50-220 Hz).
ƒ Female voice: Soprano (260-1050 Hz), Mezzo Soprano (220-880 Hz) or
Contralto (165-700 Hz).
ƒ Child.

z Broadband:
ƒ Male voice.
ƒ Female voice.
ƒ Child.

z Harmonics:
ƒ Tone harmonics: High pitch (>220 Hz) or Low pitch (<220 Hz).
ƒ Technical harmonics: Maximal, Medium or Minimal resolution.

15.4 Calculating periodicity function (dynamic cepstrogram)


To obtain a dynamic cepstrogram, select the Analysis►Cepstrum menu item. The
operation menu will appear on the screen. This is a standard menu of the source-
destination type described in section 4.6.

Using the OPTIONS field you can set the necessary parameters. After setting the
parameters you should create or choose a destination window and start the calculating
process by pressing on the desired data fragment type. The only reason why this
operation will not be performed may be the unsuitable type of the source widow.

A progress bar for the current process will be displayed at the bottom of the screen in
the message line. You can interrupt the process by pressing the <ESC> key.

109
15.4.1 Parameter settings
You can set the necessary parameters in the Options menu (Fig. 49).

Figure 49. The Cepstrum options window

Manipulating these parameters you can receive the most acceptable visible speech
representation. The parameters included in this menu (and other menus enclosed in it)
are listed below:

Frame length. The user can choose the frame length from the list: 16, 32, ... 16384
counts. When the frame length is set in counts, it is displayed in milliseconds in the
adjoining field. For cepstrum calculation, the frame size is typically 256 counts for male
voices and 128 counts for female voices.

Frame shift. The optimum frame shift step value should be set within the limits from
1/4 up to 1/2 of the analyze frame length. When the frame shift step is set in counts, it
is immediately displayed in milliseconds in the adjoining field.
Weighting window type. The following options are available:
z Hamming,

z Hann,

z Nuttall,

z None (rectangular),

z Equi-periodic, 2.0.

See section 15.2 for the description of window types and window selection tips.

110
Use result filtering. There is an opportunity to choose a filter at 0..55 points. Sliding
averaging of the image according to the chosen point number is performed during your
work with the filter. As a rule, averaging is not performed for cepstrum calculation.

Use normalisation. If you enter this field, the additional menu will appear on the
screen. There you should set the following parameters:

z Ascending from ... Hz (for spectrogram).

z Ascending from ... points (for other).

z Ascending with ... dB/oct.

For cepstrum calculation, the rise from 1 count by 3 dB is usually specified. More
accurate values can be obtained manually using a dynamic spectrogram.

z Normalization to slice maximum:


ƒ None (normalization is disabled),
ƒ All signal (the whole signal is normalized),
ƒ Highest level only (fragments exceeding the set level are normalized).

z Normalize signal higher than ...% from maximum.

z Null frequency consideration. If this option is enabled, filtration is performed with


consideration of the zero (constant) component.

Having set the normalization parameters, press OK. The Options menu will appear on
the screen again.

Drawing type. See section 15.1.

Drawing options. See section 15.1.

Presets. This field allows using the sets of predefined parameters:

z Male voice:
ƒ Tenor (130-520 Hz).
ƒ Baritone (110-390 Hz).
ƒ Basso (80-350 Hz).

z Female voice:
ƒ Soprano (260-1050 Hz).
ƒ Contralto (165-700 Hz).

z Additional:
ƒ Singing (male voice).
ƒ Noised signal.
ƒ Infrasound.
ƒ Constant Harmonics (max resolution).

111
15.5 Calculating dynamic characteristics of the autoregressive
model of the speech signal
The following dynamic characteristics of the autoregressive model of the speech signal
can be calculated in the system:

z Autocorrelation coefficients (Analysis►Autocorrelation).

z Partial correlation coefficients (Analysis►More…►Partial correlation).

z Linear Prediction Coefficients (Analysis►More…►Linear Prediction Coefficients).

To receive a 3-D representation of one of the specified characteristics, enter the


Analysis menu and select the required characteristic. An operation menu for the
corresponding operation will appear on the screen. This is a standard menu of the
source-destination type described in section 4.6.

Using the OPTIONS field you can set the necessary parameters. After setting the
parameters you should create or choose a destination window and start the calculating
process by pressing on the desired data fragment type. The only reason why this
operation will not be performed may be the unsuitable type of the source widow. A
progress bar for the current process will be displayed at the bottom of the screen in the
message line. You can interrupt the process by pressing the <ESC> key.

15.5.1 Parameter settings


You can set the necessary parameters in the Options menu. The parameters included in
this menu (and other menus enclosed in it) are listed below:

Frame length. The user can choose the frame length: 16, 32, ...16384 counts. When
the frame length is set in counts, it is immediately displayed in milliseconds in the
adjoining field.

Frame shift. The user can set the frame shift step value. When the frame shift step is
set in counts, it is immediately displayed in milliseconds in the adjoining field.

Weighting window type. The following options are available:

z Hamming,

z Hann,

z Nuttall,

z None (rectangular),

z Equi-periodic, 2.0.

See section 15.2 for the description of window types and window selection tips.

Use result filtering. There is an opportunity to use a filter at 0.. 55 points. If enabled,
the filter will perform sliding averaging of the image according to the chosen number of
points.

112
Use normalisation. If you enter this field, the additional menu will appear on the
screen. There you should set the following parameters:

z Ascending from ... Hz (for spectrogram).

z Ascending from ... points (for other).

z Ascending with... dB/oct.

For the calculation of dynamic characteristics of the speech signal autoregressive model
cepstrum the rise is set in counts (points).

z Normalization to slice maximum:


ƒ None (normalization is disabled),
ƒ All signal (the whole signal is normalized),
ƒ Highest level only (fragments exceeding the set level are normalized).

z Normalize signal higher than... % from maximum.

z Null frequency consideration. If this option is enabled, filtration is performed


with consideration of the zero (constant) component.

Having set the normalization parameters, press OK, and the Options menu will appear
again on the screen.

Drawing type. See section 15.1.

Drawing options. See section 15.1.

Presets. This field allows using the sets of predefined parameters:

z Male voice:
ƒ Tenor (130-520 Hz).
ƒ Baritone (110-390 Hz).
ƒ Basso (80-350 Hz).

z Female voice:
ƒ Soprano (260-1050 Hz).
ƒ Contralto (165-700 Hz).

z Additional:
ƒ Singing (male voice).
ƒ Noised signal.
ƒ InfraSound.
ƒ Constant Harmonics (max resolution).

113
1 5 . 6 S p e c t r a l a n a l ys i s b a s e d o n l i n e a r p r e d i c t i o n o f s p e e c h .
Obtaining a graphical representation of LPC frequency
response
The spectrum analysis based on the Linear Prediction Coefficients (LPC) is performed in
the following way. LPC are calculated using the Levinson-Darbin method and the
autocorrelation coefficients are simultaneously calculated in a temporary window. The
window size cannot be less than the number of coefficients and more than 1024. The
speech signal is weighted in this window by the function (the choice of the function
depends on the choice of the weighting window type).

When the LPC are determined, the sequence is formed, where the first (m+1) members
are equal to LPC (m is the polar model order), and the rest of the members are equal to
zeros. The length of the sequence is selected with regard to the required frequency
resolution (at 10000 Hz sampling rate and resolution not lower than 30 Hz, the length N
is no less than 10000/30; choosing the nearest power of the number “2” exceeding N
you will receive the length of the sequence).

Having calculated FFT of the given sequence and having taken its reversed value, you
can get a smoothed spectrum of the speech signal, where the search for maxima is
performed. The described method of spectral representation allows you to get a clearer
representation of the formant structure as compared to the common analysis methods.

To receive a 3-D image of frequency response to LPC, enter the menu Analysis►LPC
Spectrogram menu item. The operation menu will appear on the screen. This is a
standard menu of the source-destination type described in section 4.6.

Using the OPTIONS field you can set the necessary parameters. After setting the
parameters you should create or choose a destination-window and start the calculating
process by pressing on the desired data fragment type. The only reason why this
operation will not be performed may be the unsuitable type of the source widow. A
progress bar for the current process will be displayed at the bottom of the screen in the
message line. You can interrupt the process by pressing the <ESC> key.

15.6.1 Parameter settings


You can set the necessary parameters in the Options menu (Fig. 50).

Manipulating these parameters you can receive the most acceptable visible speech
representation. The parameters included in this menu (and other menus enclosed in it)
are listed below:

Frame length (counts). The user can set the desired frame length in counts. The
default value is 512.

Frame shift step (counts). The user can set the desired frame length in counts. The
optimum frame shift step value should be set within the limits from 1/4 up to 1/2 of the
frame length. The default value is 128.
114
Frequency resolution (Hz). The value is selected from the list.

Number of LPC coefficients. The user can set the number of LPC coefficients to be
used for obtaining frequency response.

Figure 50. The LPC frequency response options window

Weighting window type. The following options are available:

z Hamming,

z Hann,

z Nuttall,

z None (rectangular).

See section 15.2 for the description of window types and window selection tips.

Use result filtering. There is an opportunity to use a filter at 0..55 points. If enabled,
the filter will perform sliding averaging of the image according to the chosen number of
points.

Use normalisation. If you enter this field, the additional menu will appear in the
screen. There you should set the following parameters:

z Ascending from ... Hz (for spectrogram).

z Ascending from ... points (for other).

z Ascending with... dB/oct.

z Normalization to slice maximum:

115
ƒ None (normalization is disabled),
ƒ All signal (the whole signal is normalized),
ƒ Highest level only (fragments exceeding the set level are normalized).

z Normalize signal higher than... % from maximum.

z Null frequency consideration. If this option is enabled, filtration is performed


with consideration of the zero (constant) component.

Having set the normalization parameters, press OK. The Options menu will appear
again on the screen.

Drawing type. See section 15.1.

Drawing options. See section 15.1.

Number of formants. If the specified value is negative (below 0), the system itself will
determine the number of formants to be represented. If the zero value is set, the
formats will not be calculated and represented. If you enter a positive value here, the
specified number of the formants will be represented, but not more than 10.

Presets. This field allows using the sets of predefined parameters:

z Formants:
ƒ Male voice: Tenor (130-520 Hz), Baritone (110-390 Hz), Basso (80-350 Hz)
or ContrBasso (50-220 Hz).
ƒ Female voice: Soprano (260-1050 Hz), Mezzo Soprano (220-880 Hz) or
Contralto (165-700 Hz).
ƒ Child.

z Smoothen formants:
ƒ Male voice, Basso (80-350 Hz).
ƒ Male voice, Tenor (130-520 Hz).
ƒ Female voice.

z Harmonics:
ƒ Tone harmonics: High pitch (>220 Hz) or Low pitch (<220 Hz).
ƒ Technical harmonics: Maximal or Minimal resolution.

15.7 Calculating the energy (root-mean-square)


Energy calculation in SIS means the calculation of the square root from the sliding
average of the signal square. You should specify the length of the frame (in
milliseconds), where the averaging is performed.

Thus, if you set the frame length equal to 1 ms and the active segment sampling is
10000 Hz, it corresponds to 100 counts. Then, to receive an energy value at any point
of the signal, the system takes 50 values to the left from the point and 50 to the right,

116
then each value is squared (raised to the second power), the results are summarized,
then the sum is divided by 101 and a square root is extracted from the received value.
In all cases the frame where the averaging is performed shifts along the signal with a
step of 1.

To calculate the signal energy, select the Analysis►Energy (r.m.s) menu item. A
standard menu of the source-destination type will appear on the screen (described in
detail in section 4.6).

After pressing the OPTIONS field a dialog window will be displayed where the user can
specify parameter Frame length, ms.

Having set the desired step value, press OK. To start the process, click on the field with
the desired data fragment type.

15.8 Calculating zero crossing frequency


This operation allows you to calculate average zero crossing frequency for the interval
whose length (in milliseconds) is specified by the user. The interval keeps shifting by
one point for the calculation of each subsequent value. The result is represented in Hz.

To calculate zero crossing frequency, enter the menu Analysis►More… and in the
drop-out list select Zero crossing frequency. The operation menu will appear on the
screen. This is a standard menu of the source-destination type described in section 4.6.
Click on the OPTIONS field to set the operation parameters. For this operation only one
parameter should be specified - Frame length, ms.

Having set the required value, press OK. To start the process, click on the field with the
desired data fragment type. Once the calculations are completed, the curve of zero
crossing frequency will be drawn in the destination window.

15.9 Averaging
Averaging is performed for previously calculated 3-D data.

To average the data, select Analysis►Average…. Then the Average menu will appear
on the screen. This is a standard menu of the source-destination type described in
section 4.6.

Along with averaging, calculation of cross-correlation and variance can be performed. To


calculate these parameters, click on the OPTIONS field in the operation menu and
enable the corresponding options in the displayed dialog window (Calculate cross-
correlation and Calculate variance). Cross-correlation is the average correlation
between two nearest frames.

To start the process, click on the field with the desired data fragment type. Once the
calculations are completed, the curve of the averaged characteristic will be drawn in the
destination window.

117
15.10 Pitch extraction
15.10.1 General information
Pitch extraction is performed for vocalized areas of the speech signal. The signal is
considered tonal, if in the given analyzed frame (window) we can see the periodicity of
the whole signal, or at least of its low-frequency part.

The pitch period is the duration of the signal periodicity interval. The period length is
measured in counts or milliseconds. The average period length makes about 70-80
counts for male voices and 40-50 counts for female voices at the sampling rate
10000 Hz.

The pitch (or fundamental frequency - FF) is the value equal to the inverted duration of
the pitch period in msec or the sampling rate divided by the period length in counts. The
fundamental frequency for male voices makes on average 120-140 Hz, for female voices
- 200-240 Hz.

Pitch extraction is realized for mono signals. For stereo signal only left channel pitch
extraction will be performed.

15.10.2 LLK method


To perform pitch extraction using LLK method, select the Analysis►More...►Pitch
extraction (LLK). The operation menu will appear on the screen. This is a standard
menu of the source-destination type described in section 4.6. Click on the OPTIONS
field to set the operation parameters in the options submenu (Fig. 51).

Figure 51. The Pitch extraction (LLK) options window

Here you can choose pitch extraction options:

z You can apply standard parameter settings by selecting Default for men or
Default for women.

z Alternatively, you can set the desired parameters manually by selecting Set
options manually field.

118
The initial parameter values will correspond to those currently stored in the
configuration file, but they can be modified with the Copy default options to manual
field.

When you copy parameter values for manual parameter settings, standard values
specified in the Options menu are used: Default for men or Default for women,
depending on which of the options is currently active.

Once you have made manual settings, ensure that the field Manually set is selected
(by the blue color).

Press Set options manually, to set the parameters manually. The menu containing
three active fields (Signal options, Noise and pauses options and Cepstrum
specific options), as well as a number of information fields, will appear on the screen
(Fig. 52).

To change currently displayed values, press one of the two active fields above.
Parameter value "-1" means that in this method the parameter is not used.

Figure 52. The Set options manually window

Signal options:
z Abs. min. (Hz) – absolute pitch minimum – sets the lower threshold for pitch
extraction, so that pitch frequency will be computed only above this value. The
absolute minimum value of the pitch frequency can be obtained with the
cepstrogram as the smallest function of the pitch change.

z Abs. max. (Hz) – absolute pitch maximum – sets the upper threshold for pitch
extraction, so that pitch frequency will be computed only below this value. The
absolute maximum value of the pitch frequency can be obtained with the
cepstrogram as the biggest function of the pitch change.

Setting minimum and maximum pitch values allows you to get a more accurate result
when you deal with a noisy signal. If you have noticed after the first pitch extraction,
that the method calculates the frequencies corresponding to frequency of the tonal noise

119
located outside the range of the real pitch frequencies, try to set the threshold pitch
values so that the real pitch range lies within them, and the tonal noise frequency lies
outside these limits.
Noise and pauses options:
z Frame length, ms. Analysis frame length for detecting noise/pause intervals.

z Frame shift, ms. Frame shift step for identifying frames as noise/pause intervals.

z Pause threshold. Magnitude threshold for pause detection (in counts). The signal
with the amplitude value below the given threshold is considered as a pause in the
analyzed frame. The threshold amplitude value is set by trial-and-error method:
the correctness of the selected threshold is checked by ear. It makes sense to
change this parameter if you deal with a noisy signal, and some of the noised
pauses were interpreted by system as tonal fragments at the first pitch extraction.
The signal with an energy level (to be accurate: a square root from the energy)
less than the given parameter will be considered as a pause.

z Noise zero cross. Threshold value for zero crossing frequency (in Hz). The signal
with zero crossing frequency above the given threshold is considered as noise in
the analyzed frame. The threshold amplitude value is set by trial-and-error
method: the correctness of the selected threshold is checked by ear. The signal
fragment having a zero crossing frequency bigger than specified in the given
parameter, is considered as noise. It makes sense to change the given parameter
if your signal contains high-frequency noise components. Default parameters are
optimal for processing pure signals.

z Autocorrelation. This parameter is calculated for tonal fragments before pitch


extraction using the LLK method. 15 consecutively calculated autocorrelation
values (window length is 0.02 sec) are used to calculate the presumable pitch
frequency value.

Cepstrum specific options:

z Ascent of cepstrum (dB/oct) [0,9].

z Initial frequency (Hz) [0, max/2].

z Final frequency (Hz).

15.10.3 Pitch extraction step by step


To choose the settings for pitch extraction accurately, you should do the following:

1. Calculate the dynamic cepstrogram of the analyzed speech sample and use the
horizontal cursor to define the pitch minimum and maximum values.

120
2. Calculate the narrow-band spectrum of the analyzed signal and use the horizontal
cursor to determine the bottom and top borders of the signal periodicity
(banding).

3. Determine the threshold amplitude value to cut off the pauses; this value is
adjusted manually while listening to the signal.

4. Determine noise cut-off threshold value using the ZC parameter – zero crossing
frequency, which is adjusted manually while listening to the signal.

One of the ways to check pitch extraction accuracy is to impose the obtained pitch on
the cepstrum. Therefore, before pitch extraction, calculate the signal cepstrum and use
the cepstrum window as the destination-window.

Having set the desired parameters, press OK. To start pitch extraction, click on the field
with the desired data fragment type.

A progress bar for the current process will be displayed at the bottom of the screen in
the message line. You can interrupt the process by pressing the <ESC> key.

Once the calculations are completed, the calculated pitch will be drawn in the
destination window. The time is laid off along the horizontal axis from left to right. It is
represented in the same scale as the source waveform used for pitch extraction. Pitch
period duration in Hz is laid off along the vertical axis. Remember, that the selected
window should not contain any segments or it should contain segments of the same
type.

15.10.4 Spectral method


To perform pitch extraction using spectral method, select the Analysis►Pitch
extraction (Spectrum) menu item. The operation menu will appear on the screen.
This is a standard menu of the source-destination type described in section 4.6. Click on
the OPTIONS field, to set the operation parameters in the Pitch calculation jptions
submenu (Fig. 53).

Figure 53. The Pitch calculation options window

Spectral method uses the following options:

z Female voice or Male voice.

z Microphone or Phone channel.


121
Spectral method is applicable only for signals with 11025 Hz sample rate and 10 sec
length. This method underlies speaker comparison procedure.

15.10.5 Speaker comparison


To compare speakers using their pitch statistics, open in one data window either two
sound fragments or two pitch fragments (spectrum method), which you intend to
compare. Fragments length should not be less than 10 sec.

To perform speaker comparison, select the Analysis►More…►Compare Speakers


menu command. The Comparing pitch statistics window will appear (Fig. 54).

Figure 54. The Comparing pitch statistics window

The following options can be selected:

z Female voice.

z Phone channel.

z Equal time. The equal lengths of the signals under consideration will be used for
comparison.

When pressed START, program performs pitch extraction (if sound data are opened)
and comparing pitch statistics. The results of the comparison will be represented in the
summary table (Fig. 54).

If you need to save the result of the comparison in a textual view, use the COPY button.
The result is copied to the clipboard and can be then pasted to MS Word or MS Excel
file.

122
15.10.6 Controlling the correctness of the pitch extraction
You can control the correctness of the pitch extraction in several ways:

z By superposing the pitch curve upon the cepstrogram.

z By comparing pitch extraction results obtained using various methods and taking
pitch values on different fragments.

z By measuring the distance between adjacent periods on tonal fragments of the


waveform.

The first way is the most convenient and obvious. To use it, calculate the pitch curve
into the same window as the cepstrum. Now you can easily check up the correctness of
pitch extraction: the pitch is extracted correctly, if it coincides with the maximum
blackness of the dynamic cepstrogram on the absolute majority of the fragments.
Otherwise, calculate the pitch again, having modified boundary frequency values (see
section 15.10.2).

If after performing pitch extraction repeatedly you will still see the discrepancy, apply
the method of the step-by-step approximation to achieve correct pitch extraction. SIS
provides several ways to achieve this:

z Increase or reduce average (initial) pitch in the settings so that this value +/-
initial smoothness could allow "cutting off" undesirable errors on partials
(harmonics).

z Reduce initial curve smoothness, for example, from 40 to 20.

z Increase absolute minimum or reduce absolute maximum a little bit, thus cutting
off undesirable frequency ranges.

z Change pauses and noise cut-off settings.

Having specified the final settings, perform pitch extraction throughout the whole signal.

If different settings are required for different signal fragments, these fragments should
be processed separately and the obtained results should be saved into a file.

In such cases you are recommended to apply processing on overlapping


fragments, as pitch extraction stops before reaching the last mark (once the
L number of counts in the fragment gets too small to form an analysis window).
You can also use a function of manual editing of a pitch over cepstrum image.

15.10.7 Editing pitch over cepstrum image


SIS allows you to edit pitch manually over cepstrum image in the Zoom mode.

To do this, load pitch and cepstrum into one window and open a fragment to be

123
corrected in the Zoom mode (see section 10.6). Then edit pitch by usual means of SIS
(for more detailed description see section 10.7).

15.11 Calculating instantaneous Fourier spectrum


If you mark any point in the signal by temporary marks (one or two), spectrum analysis
can be implemented (FFT - Fast Fourier Transform) at 32..131072 counts using one of
the 4 available weighting windows (Hamming, Hann, Nuttall or None (rectangular)).

To calculate an instantaneous spectrum, select the Analysis►More…►FFT power


spectrum menu item. The FFT power spectrum menu will appear on the screen
(Fig. 55) for the user to set the necessary parameters for spectrum calculation.

Figure 55. The FFT power spectrum window

15.11.1 Parameter settings


For spectrum calculation you should set the following parameters:

Frame length (counts) – set the required dimension of the Fast Fourier
transformation.

Weighting window type - choose the required type of the weighting window:

z Hamming,

z Hann,

z Nuttall,

z None.

See section 15.2 for the description of window types and window selection tips.

15.11.2 Spectrum calculation process


Once you have set all the parameters, start the calculation of the instantaneous
spectrum by pressing OK. When the calculation is completed, the calculated spectrum

124
will be drawn in the destination window. Before running the operation, make sure that
the destination window you specified is empty or contains segments of the same type.

15.12 Calculating average Fourier spectrum


During the calculation of the average signal spectrum all spectra calculated on separate
signal fragments are accumulated. It can be presented in the following way: a window
having the specified size (frame length) is imposed on the signal. The spectrum is
calculated for the signal fragment limited by this window. Then this window is shifted by
some value and the spectrum is calculated for the following signal fragment and etc. All
spectra calculated during signal processing are then accumulated and an average
spectrum is calculated.

To calculate average power spectrum, enter the Analysis►Average spectrum menu


item. The operation menu will appear on the screen. This is a standard menu of the
source-destination type described in section 4.6.

Using the OPTIONS field you can set the necessary parameters. After setting the
parameters you should create or choose a destination-window and start the calculating
process (spectrum accumulation) by pressing on the desired data fragment type. A
progress bar for the current process will be displayed at the bottom of the screen in the
message line. You can interrupt the process by pressing the <ESC> key.

15.12.1 Parameter settings


The necessary parameters can be set in the Options menu (Fig. 56).

Figure 56. The Average spectrum options window

For spectrum calculation you should set the following parameters:

125
Frame length (counts). The user can choose the frame length: 32..16384 counts.
When the frame length is set in counts, it is immediately displayed in milliseconds
above.

Frame shift step (counts). The user can set the frame shift step value. Frame shift
step determines the size by which the window shifts along the signal for the next FFT
calculation. It should be kept in mind that if the frame shift step exceeds the frame size
itself, not all the signal points will be included in the spectrum accumulation process.
The optimum frame shift step value should be set within the limits from 1/4 to 1/2 of
the frame length.

When the frame shift step is set in counts, it is displayed in milliseconds above.

Weighting window type - choose the required type of the weighting window:

z Hamming,

z Hann,

z Nuttall,

z Gauss,

z None.

See section 15.2 for the description of window types and window selection tips.

To obtain a clearer signal image representation, it is recommended to choose one of the


first three window types. They are arranged in descending order of the spectral window
characteristic side lobes and in ascending order of the major lobe width in the following
order: None (rectangular), Hamming, Hann, Nuttall.

Normalization. If this option is enabled, the signal in each frame will be normalized to
the specified amplitude value before FFT is performed. After normalization parts of the
signal recorded with different gain levels (differing in amplitude maxima) will contribute
evenly to the averaging of the spectral components. However, enabling only this option
can result in the accumulation of undesirable distortions, because noise regions located
in the current frame will be amplified as well. To avoid this effect, enable the option
Skip pauses along with Normalization.

Skip pauses. It allows detecting pauses in signal fragments with amplitude below the
specified threshold. If this mode is enabled and the maximum amplitude value in the
current FFT frame does not exceed the threshold value, such a fragment will be skipped.

Geometrical average. If this option is enabled, instead of the arithmetical average the
geometrical average spectrum will be accumulated (i.e. N-th root of the product of N
multipliers).

Normalization level. It sets amplitude value for frame normalization with the
Normalization option enabled. The maximum real amplitude of the signal within the

126
frame is changed according to the specified normalization level. The rest amplitudes are
displayed in the relative scale.

Magnitude for pause detection. Maximum amplitude value for pause detection. This
parameter is used only if the option Skip pauses is enabled.

Having set all the necessary parameters, press OK to return to the Average spectrum
menu.

15.13 Formants analysis

15.13.1 Formants calculation


Formant analysis serves as an addition to formants calculation for a dynamic
spectrogram described in section 15.3. Formants analysis allows you to calculate only
formants (without spectrograms) and to represent formants of different signals on the
top of each other.

Using the Edit►Segment shift option you can shift formants to the left or to the right
to check the coincidence or non-coincidence of formant structures in the speech of
different speakers.

Two different methods of calculation are provided in the system: LPC (based on the
model of linear prediction coefficients) and Spectral (based on smoothed Fourier
spectra with noise subtraction). The LPC method produces more accurate results on
tonal sounds, while the Spectral method is more effective for non-tonal sounds and
noisy signals. The Spectral method defines the number of formants and if this number
exceeds the number specified in the menu, it will cancel the unnecessary (high-
frequency) formants. The values of the non-existing formants are specified as the
unrealizable number (-1) and are not displayed. The LPC method always calculates as
many formants as specified in the menu.

To analyze the signal formant structure, select the Analysis►Formants analysis item.
The operation menu will appear on the screen. This is a standard menu of the source-
destination type described in section 4.6.

Pressing the OPTIONS field, you can open the options menu (Fig. 57) where the
necessary parameters are to be set. After setting the parameters you should create or
choose a destination-window and start the calculating process by pressing on the
desired data fragment type. The only reason why this operation will not be performed
may be the unsuitable type of the source widow. A progress indicator for the current
process will be displayed at the bottom of the screen in the message line. It allows you
to control the calculating process. You can interrupt the process by pressing the <ESC>
key.

127
Figure 57. The Formant analysis options menu

When you change the method version (by pressing the respective field) the menu will
look slightly different, as the methods use different sets of parameters.

Frame length (counts):

z for the LPC method - an arbitrary value exceeding the doubled number of the
formants,

z for the Spectral method - any power of 2 (from 16 to 2048, as FFT is used for the
analysis).

Frame shift step (counts). An arbitrary value for both methods.

Frequency resolution (Hz). Is used only in the LPC method, changes from 5.3 to
689.0 Hz. The better the resolution, the slower is the process of formants calculation.

Weighting window type: Hamming, Hann, Nuttall, Gauss, None (rectangular).

Number of formants. A number of formants expected in the inquired frequency band


(any positive number from 1 to 16). The parameter is used for increasing the calculation
accuracy.

Method version. Is used for changing the calculation method. To change method,
press the Method version field. The selected method is shown one line lower.

Presets. This field allows using the sets of predefined parameters:

z Male voice:
ƒ Tenor (130-520 Hz).
ƒ Baritone (110-390 Hz).
ƒ Basso (80-350 Hz).

z Female voice:
ƒ Soprano (260-1050 Hz).

128
ƒ Mezzo Soprano (220-880 Hz).
ƒ Contralto (165-700 Hz).

15.13.2 Editing formants over spectrum image


SIS allows you to edit formants manually in the Zoom mode. To do this, load formants
and spectrogram into the same window and open the fragment to be corrected in Zoom
mode.

In the Zoom mode each formant will be displayed in its own color (for example, the first
– green, the second – red, etc.).

Then open the contextual menu of the "zoom" window by pressing the right mouse
button (Figure 58).

Figure 58. The contextual menu of the formants "zoom" window

To start editing formants in the contextual menu select the Editing formant X item,
where X is the number of the formant. After that the corresponding inscription will
appear at the bottom of the "zoom" window, and the cursor will take the form of a

pencil . Press and hold down the left mouse button, and move the mouse to edit
formant. You can change the formant number by rotating the mouse wheel.

To erase formant, select Erasing formants or Erasing formants in region (all) item.

The Erasing formants mode allows to delete formants by pressing the left mouse
button. Adjust the size of the eraser by rotating the mouse wheel.

After selecting the Erasing formants in region (all) item you will see a dashed frame
appear on the screen. You can resize and position it using the mouse. After you have
modified the frame size and position the desired way, click the right mouse button. ALL
formants appeared within the dashed frame along the X axis will be deleted.

To exit Editing formant or Erasing formants mode, select the Zoom formants item
in the contextual menu.

To quit the Zoom mode, press <ESC>.

129
15.14 Histogram calculation
SIS allows calculation of histograms for waveforms and pitch.

To calculate a histogram, select the Analysis►Histogram menu item. The operation


menu will appear on the screen. This is a standard menu of the source-destination type
described in section 4.6.

In the Options menu, called by pressing the OPTIONS button, you can set the
necessary parameters.

For histogram calculation the entire interval between the lower and the upper boundary
is divided into sub-intervals at a specified step. Each sub-interval is related to a specific
histogram value. If the analyzed signal value falls within the specified interval, the
histogram value there is increased by 1. Having analyzed all source signal values in this
way, we can obtain a histogram. The histogram is then normalized so that the sum of all
its values multiplied by the interval length is equal to 1. Thus, the actual histogram
value after normalization equals the probability density of detecting the specified signal
value. If the histogram is smooth, it is almost independent from the step value. This can
be easily verified by computing a histogram for a loud speech signal waveform in the
interval between -500 and 500 at a step of 2, 5 or 10 counts.

This operation requires three parameters to be set:

Low edge. All values below the specified threshold fall into the first count of the
histogram.

High edge. All values above the specified threshold fall into the last count of the
histogram.

Step. Step value at which the interval between the lower and upper thresholds is
divided into sub-intervals.

All parameters are set in counts for waveforms and in Hz for pitch.

If the program runs out of memory while crating a histogram, the system will inform
you that the histogram step is too small. In this case you should either increase the step
or modify the upper and lower threshold values.

SIS allows for reading any type of source data as text. Once read, this data can be
subsequently used to compute a histogram.

15.15 Comparing histograms and calculating their parameters


Comparison and parameter estimation can be performed for histograms displayed in the
same window. These can be similar histograms ore "same/impostor" pair of histograms.

To perform histogram comparison, select Analysis►Histogram parameters. You will


see a histogram type selection menu:

130
z Same/impostor pair,

z Similar histograms.

If the first option is selected, the following menu will be displayed upon the completion
of calculations (Fig. 59):

Figure 59. The Histogram parameters window (same/impostor pair)

The menu contains four columns. The first column lists the calculated parameters. In the
second and third columns the names of the two compared files and estimated values for
their histograms are represented. The fourth column (Ratio) displays parameter ratios
for the first and second file accordingly.

The estimated parameters include:

Median – coordinates of the point, to the right and to the left of which the square
values under the histogram are equal.

Center of gravity – the first moment of the histogram. For symmetrical histograms
coincides with the median.

Equal Error Rate (EER) – at a certain abscissa value, the probability of false rejection
equals the probability of false acceptance. This value is called Equal Error Rate.

EER coordinate – coordinate (abscissa value) of the Equal Error Rate point.

If the second option is selected, the following menu will be displayed upon the
completion of calculations (Fig. 60):

Figure 60. The Histogram parameters window (similar histograms)

This menu also contains four columns. The first column lists the calculated parameters.
In the second and third columns the names of the two compared files and estimated
values for their histograms are represented. The fourth column (Ratio) displays
parameter ratios for the first and second file accordingly.

Root mean square – square root of the histogram dispersion value (second moment).

Histogram asymmetry – cube root of the histogram third moment.

Correlation – correlation between the segments of histogram between temporary


marks. If temporary marks are absent, correlation between the whole histograms will be
131
found. Correlation varies from 0 (no correlation) to 1 (similar histograms).

15.16 EdiTracker module


SIS allows you to use dedicated software module EdiTracker for detecting signs of
tampering in audio recordings. To run EdiTracker, select Research► EdiTracker
menu item or press the button on the toolbar. For the detailed instructions on the
EdiTracker software module operation, refer to its User Manual.

132
16 USING PLUG-INS

SIS allows you to use plug-ins - separately compiled program modules which are
attached dynamically to SIS for developing its functions.

To successfully use plug-ins, it may be required update some dynamic libraries included
in SIS.

In order to attach plug-in to the SIS software, it is necessary:

1. Copy a folder with files of the module to the plugins folder included in the SIS
software folder. By default it is C:\Program Files\Speech Technology
Center\SIS.

2. Choose the Module►Module registration menu item. In the Plugins


registration dialog window set the flag at the module to be attached (Speech
splitter in Figure 61) and press the OK button.

Figure 61. The Plugins registration window

If necessary module is not found in the Plugins registration window, make sure that
the folder with module files is in the plugins folder. Otherwise, copy it to the plugins
folder. If its contents are incomplete, copy the entire folder to the plugins.

Then open the Plugins registration window again (Module►Module registration)


and press the Refresh button. When the necessary module appears in the list, set the
flag at it and press the OK button.

If a folder with the module files is stored in the folder different from that mentioned
above, you can define path to the necessary module files. Press the Paths button in the
Plugins registration window (see Fig. 61). The Paths to find modules window will

133
appear (Fig. 62). Press the New button, choose the necessary folder in the standard
Browser window and press the OK button.

Figure 62. The Paths to find modules window

Here press the New button, choose the necessary folder using the standard Browser
window and press the OK button.

134
17 TESTING THE INPUT/OUTPUT CHANNEL

Analog-to-digital and digital-to-analog converters used for digital signal processing are
characterized most completely by dynamic factors, such as signal-to-noise ratio (SNR),
total harmonic distortions (THD) and effective bit resolution (Nd) of the ADC/DAC,
obtained as a result of conversion and spectral analysis of the sample sine wave signal.
Distortions of the signal, arising in the course of the transformation process, are
reflected in the transformation noise level and the level of the first and the second
harmonics in the main signal.

Effective bit resolution (Nd) of the ADC/DAC is calculated as:

SNR − 1.76
Nd = .
6.02

The total harmonic distortion (THD) is calculated by the following formula:

V [2] + V [3] + K + V [k ]
THD = 10 ⋅ lg , dB
V [1]

or
V [2] + V [3] + K + V [k ]
THD = ⋅ 100% ,
V [1]
where V[1], V[2], ...V[k] stand for the power of the first and subsequent harmonics,
and k equals to the ratio of the maximum spectrum frequency to the frequency of the
first harmonic (fundamental frequency).

The signal-to-noise ratio (SNR) is calculated by the formula:

V [1]
SNR = .
total sum of harmonic powers − V [1] − V [2] − ... − V [k ]

17.1 Testing the input channel


To obtain these values for sound input channel, perform the following:

1. Input a sample sine wave signal (for example, using any generator) in the
computer. The signal frequency should be not less than 500 Hz and should not
exceed the quarter of the sampling rate, duration - not less than 5 sec, amplitude
- close to the maximum value (but without overload). Typically you should input
the signal with 6-7 seconds duration and frequency about 900 Hz.
135
2. Set two temporary marks to cut off fragments of 1 sec length in the beginning and
in the end of the input signal.

3. Select the Analysis►Average spectrum menu item.

4. Click on the OPTIONS field.

5. Set the following parameters:


ƒ Frame length (counts): 2048 (approximately 10 % of the sampling
frequency in Hz),
ƒ Weighting window type: Gauss,
ƒ Frame shift step (counts): 401 (20% of the frame length),
ƒ Normalization and Skip pauses fields should be disabled.

Then return into the previous menu by pressing OK.

6. Create a destination window of any height, but spanning the entire screen width
using the field Create destination window. The window width doesn't influence
the result, but it influences the representation.

7. Press the All data field. Then the Fourier signal spectrum will be calculated and
the result will be written into the destination window.

8. Enter the menu Data►Input/output test. First the system will inquire: "Have
you inputted in the computer an ordinary sine wave signal and calculated its
Fourier spectrum by 2048 points with the Gauss weighting window?". If all steps,
described above, were performed, you should answer OK by pressing the
corresponding field. Otherwise, you should press Cancel and perform all the
necessary actions exactly as described above.

9. If you have answered OK to the previous question, the vertical multi-cursor will
appear in the active window, which constitutes a system of connected vertical
cursors. Each of them corresponds to the frequency received by the multiplication
of some basis frequency F0 by an integer multiplier, equal to or greater than 1.
The system processes the signal (active segment) values at these frequencies as
the square root of V[0], V[1],..., V[k] (see the formula for THD given in the
beginning of this section). The system automatically takes into account the signal
harmonic finite width and doesn't mix it with the noise.

10. If you start shifting the mouse to the left/right or pressing the arrow keys on the
keyboard, the cursor system will come into motion and in the Output window at
the bottom of the screen the following text will appear:

SNR=.... dB, THD=.... dB, Nd=...., f=.... Hz,

where SNR stands for the signal-to-noise ratio, THD - total harmonic distortions,
Nd - effective resolution of the input/output cards (bit), f – frequency
corresponding to the first cursor (F0).
136
All these values actually don't make sense until you match the first cursor with the
main peak of the sample sine wave. You can try to do it yourself, but it is rather
difficult since even one screen pixel corresponds to a large frequency shift.
Therefore you are recommended to take advantage of the option described below.

11. Press the <F3> key. Then an informational menu will appear on the screen
(Fig. 63). It contains the Signal/noise ratio and Total Harmonic Distortions
fields. The harmonic distortion value is given both in dB and in percents.

Figure 63. The Test i/o soundcard window

If you are satisfied with the measured values, the test may be considered
complete. If you are not satisfied with the results, try to repeat the test. But
before that:
ƒ Replace a standard signal generator.
ƒ Make sure that the external input/output unit does not lie on the monitor,
computer or any other electronic device.

17.2 Testing the output channel


To obtain the values of signal-to-noise ratio, total harmonic distortions, DAC effective
capacity for the signal output channel, do the following:

1. Select the Data►Generate test waveform menu.

2. Set the sine amplitude close to the maximum value (e.g. 30000, since 32767 is
the maximum independent of the I/O board bit capacity). Set the sine frequency
500 Hz or more, but not greater than a quarter of the sampling rate. Set
amplitudes of other test signal types to zero.

3. Set the desired signal duration (about 30 sec) in the field Signal length.

4. Set signal bit capacity in the 24-bit signal field. If this option is enabled, the
result will be a 24-bit signal, otherwise a 16-bit signal will be generated.

5. Generate a test signal by pressing OK.

6. For the obtained test sine wave repeat all operations described in the previous
section 17.1 – items 2-10. The obtained values of the signal-to-noise ratio (105-
140 dB), total harmonic distortions (0.0013 % - 0% for the signal with the
amplitude equal to 30000), are the maximum accessible at the given signal size

137
(determined by the card capacity), and the given analysis method (Gauss
window).

7. Perform sound output through the linear board output, using the menu
Playback►All data and analyze its parameters with a high-quality spectrum
analyzer. The values obtained with this type of analysis will be not better than
those described in item 6 above, and than the values specified in the spectrum
analyzer manual. If you are satisfied with the measured values, the test can be
considered complete. If you are not satisfied with the results, repeat the test. But
before that:
ƒ Replace a standard signal generator.
ƒ Make sure that the external input/output unit doesn’t lie on the monitor,
computer or any other electronic device.

17.3 Input/output channel loop test


The menu option Data►In/Out Loop Test allows a user to test the sound card input-
output channel, as well as external equipment parameters, without using signal
generators and spectrum analyzers. To do it, connect the sound card input with the
output via a cable or via the tested equipment (amplifier, equalizer etc.). The current
signal in the window will be played and recorded at the same time during this test. The
result will be written into a new "noname" window.

You should take into account that devices' technical data values are obtained in SIS
software without using A-weighting (window function imitating human ear by frequency
sensitivity). Therefore technical data obtained with SIS software differs from A-weighted
data described in device's user manual. The difference between these values must not
be more than 10 dB.

Generate 10 seconds of a white noise signal (mono or stereo) via the Data►Generate
test waveform with the amplitude of 5000 and sampling rate between 10000 and
30000 Hz. Run the In/Out Loop Test command and calculate the spectrum of the
result (items 2-6 from section 17.1). The achieved signal will be mono or stereo –
depending on the initial one. The spectrum will be proportional to the frequency
response function. There might be 1.5 dB deviations on the spectrum due to the non-
ideal white noise generator within the system.

To measure signal-to-noise ratio and total harmonic distortion, you will need to generate
a sine wave signal with the 1000 Hz frequency at sampling rate between 10000 and
192000 Hz and with the amplitude equal to 30000.

Then divide stereo signals into two mono signals (Edit►Mono/stereo operations
►Separate stereo-signal to 2 mono). After that select the Data►In/Out Loop Test
and follow steps 2-10 from section 17.1 for the obtained result.

138
Note that before testing any external equipment you should test the
input/output channel. You should avoid signal overload, but remember that
L low-amplitude signals reduce electronic device parameters as a result of
round-off errors. To adjust signal level use software mixers.

139
GLOSSARY

A
Acoustics
(gr. akustikos – of or for hearing, ready to hear)

1. The science of the sound, studying its elastic vibrations and waves.
2. The sound characteristics of an enclosed space or an object (audio device).
3. Acoustic level of (speech) signal – a description of concerned signal (especially,
speech signal) characteristics as a whole and its elements’ characteristics as a sound
physical process without taking into consideration the information transferred by the
signal. Usually, the spectral description of a signal is used at acoustic level.
4. Speech acoustics – a part of general acoustics, studying speech signal structure,
processes of speech production and speech perception. It is concerned with developing
methods and means of analysis, as well as with speech modeling, identification,
synthesis and compression.
Acoustic and phonetic attributes of oral speech
The attributes reflecting acoustic qualities of the vocal tract and articulation skills of the
person. These attributes are perceived and revealed with the help of technical means
and form the basis of instrumental analysis of speech signals; the attributes can be
evaluated quantitatively.
Acoustic depth of sound record
The distance between microphone and sound source estimated by sounding. Such
estimation is possible basically due to gradually changing of sound timbre along with
distance to source of loudness and ratio between sound level of given source and
surrounding acoustic noise as well.
Acoustic event
A single, relatively independent, short- or long-term event being heard in real time or
on record. The term is commonly used to indicate sound aspect of events happening
simultaneously with main speech signal sounding (e.g. knock, music, sound of passing
car or TV set, etc.).
Adaptive noise reduction (adaptive filter, adaptive signal extraction)
An algorithm of a signal extraction/noise reduction. Adaptive filter is a filter that self-
adjusts according to noise characteristics in the context of a noise reduction method
being used. Adaptive filter is usually characterized by an adaptation time constant –
time of filter response to changing input signal. Adaptive signal extraction is a procedure
which allows extracting useful signal from a processed input signal. At that, background
noise and signals suppression is performed due to permanent self-adjusting of
procedure parameters to characteristics of exactly this useful signal.
140
AGC (Automatic gain control)
A device or program tool used for automatic smoothing the input speech signal level.
The AGC effectively reduces the volume if the signal is strong and raises it when it is
weaker. The AGC is typically characterized by the following variables: AGC range, attack
time for (weak signals), decay time (for strong signals). Conformably to communication
equipment, the standard values are: 12..20 dB, 20 ms, 500 ms.
Amplitude (magnitude)
(lat. amplitudo – size)

The maximum deviation value (from the equilibrium position) of an oscillating quantity,
for example, the deviation from zero of an in-circuit electric current voltage, sound
pressure intensity, etc. It represents the size of vibration (deviation value). In strictly
periodic vibrations, the amplitude is a constant.
In the research of harmonic sound vibrations, the amplitude means sound pressure in a
signal expressed by the amplitude of a current, voltage or other electrical quantity on
the output of sound converting equipment (microphone). In the signal waveform figure,
the amplitude represents the deviation size of an image up or down from zero position.
Analog-to-digital converter (ADC)
A device which converts continuous analog signals to digital numbers. The signal is
digitized (that is, the signal values at equal time intervals are measured and stored)
with the fixed sampling rate (Fs), while its amplitude is converted to a sequence of
digital codes (is quantized). The reverse operation is performed by a digital-to-analog
converter (DAC). Typical Fs values are: 11025 Hz – for speech, and 8000 Hz – for
telephone digital channels.
Antiformant
The suppression of the sound spectral envelope, resulting from the resonant properties
of the sound source. The effect of antiformants is typical for nasalilized sounds.
Articulation
(lat. articulo – enunciate clearly)

1. The adjustments and movements of speech organs when pronouncing a particular


sound, taken as a whole.
2. A measure intelligibility of speech in a communication channel or a signal of audio
equipment under test.

Articulation organs
The organs involved in the articulation process, such as lungs, bronchi, windpipe, larynx
with vocal folds, gullet, oral cavity, soft palate, nasal cavity, lips, teeth, tongue (root,
back, forepart and tip). Sinuses of scull, soft tissues of face and neck, as well as the
whole thorax have an indirect influence on the speech signal.

141
Auditory estimation of a speaker
A subjective expert evaluation of speaker’s identification (stable) characteristics, based
on (human) listener perception. Integral auditory estimation is obtained on basis of
speaker’s characteristic estimation by narrow scales which must be maximally simple,
“one-dimensional”, in order to improve accuracy.
Authenticity
(gr. authentikos – reliable)

The reliability and total conformity of all considerable aspects; an absence of casual or
intentional distortions being important for the concerned problem.
Authentic audio record
A record with a complete conformity of its sound contents and original of sounding at
the place where the recording was performed. Authentic record doesn’t contain blank
areas, areas of removal, erasing, insertion, adding, imposing other audio records,
cutting, selective audio recording and is not obtained by means of staging recording.
Synonyms of the term: genuine, reliable, identical with the original.

B
Beating
A periodic change of oscillation amplitude that occurs when two harmonic vibrations with
close frequencies are added together. Beating is commonly heard as interferences in the
radio-electronic devices.
Bel (B)
A logarithmic unit of measurement that expresses a base-10 logarithm of ratio of two
like physical quantities. The “bel” is named in honour of the American scientist
Alexander Graham Bell. When referring to measurements of power or intensity, a ratio
can be expressed as: 1 B = lg(P2/P1) where P2 = 10P1 (P1 and P2 are the power or
intensity quantities). When referring to measurements of amplitude, voltage or current,
a ratio can be expressed as: 1 B = 2lg(F2/F1), where F2 = 10F1 (F1 and F2 are the
amplitude, voltage or current quantities). A decibel, commonly used, is one tenth of a
bel (B).

C
Cepstrum
A representation of the speech signal in the form of a set of coefficients, obtained as the
result of taking the Fourier transform of the decibel spectrum of the given signal. Such
primary representation of speech signal is applied in the automatic speech and speaker

142
recognition systems. Cepstrum is typically used for the MFCC (Mel Frequency Cepstral
Coefficients), calculating cepstral coefficients with help of a nonlinear mel scale of
frequency. Mel scale is considered to approximate the human auditory system’s
response more closely.
Composite stereo
The mode when two mono signals are reproduced simultaneously and, at that, in the
left channel one method of noise reduction is applied to the signal, while in the right
channel another method of noise reduction of the same signal is used. Composite stereo
mode allows increasing speech intelligibility.
Constant mark
A white vertical dashed line in the data box. It is used for demarcating fragments of
data. Marks can be assigned textual inscriptions (via the Marks list menu) and selected
(checked). When a constant mark is deleted, all data in the window will be redrawn.
Current (active) window
A graphic window which serves as a source of data at the current moment. It is always
located above all the rest windows. A short name of a current window is outlined in the
left top corner of a window.

D
(Data) box
In SIS, a black rectangular in the graphic window with numbered axes of coordinates. If
data is loaded in the window, it will be represented in the data box.
Decibel (dB)
In acoustics, a relative unit to quantify sound pressure level and equals one tenth of a
bel (B).
Diagnostic attributes of oral speech
The attributes which allows one to determine accent/dialect, social, psychological,
physiological and other characteristics of the speaker.

Dropout
A momentary loss or a considerable weakening of signal without changing time of its
sounding. Dropout may be caused by recording medium malfunctions or features of
recording and reproducing device.
Dynamic cepstrogram
A flat graphic representation of the speech signal in the 3-dimensional rectangular
system of coordinates: time, frequency and intensity (it shows how sound (speech)
cepstrum changes with time). Cepstrogram is a synchronous graphical representation of

143
a group of successively calculated instantaneous cepstra. Time is commonly laid off
along the horizontal axis, while the vertical axis is meant for frequency of cepstral
components (Hz). Cepstral amplitude for given frequency at the current time point is
presented in the image as a blackening with different degrees of saturation or with help
of a specially created colour spectrum. Cepstrogram is commonly used to represent in a
pictorial view the degree of voice periodicity and fundamental frequency of periodicity –
pitch and its harmonics.
Dynamic range
A system operational characteristic that is calculated as the ratio of the input quantity’s
maximum operational intensity to its minimum intensity at which this quantity can be
still discerned against the background noise of the system; is measured in dB. For linear
systems, dynamic range is practically equal to the SNR, while in real systems, the noise
level exceeds in the presence of the input signal. It is often used to describe the ratio of
the signal’s maximum undistorted amplitude to the root-mean-square amplitude of the
background noise in the presence of the weak signal (usually at 60 dB value lower than
the allowable maximum).
Discreteness
(lat. discretus – separated)

A discontinuity; opposite to continuity. For example, a discrete change of some quantity


with time is a change happening in the defined time periods (in discrete steps); the
system of integers is discrete contrary to the system of real numbers.
Digital-to-analog converter (DAC)
A device which converts a digital code stored in PC memory to an analog signal. DAC
converts a digital signal, sampled by time and quantized by amplitude to a continually-
varying signal. The reverse operation is performed by an analog-to digital converter
(ADC).

E
Echo cancellation
The means and methods of removing echo interference (repetition of a signal and
superposition of a secondary, usually a bit weakened copy (copies) of an initial signal)
from a useful sound signal. For automatic echo cancellation, adaptive filters are
commonly used. Echo arises in communication channels due to electromagnetic signal
reflections at different nodes in a distributed system, as well as a result of an acoustic
reflection of a signal and its reappearing in a microphone.

144
EER (Equal Error Rate)
A reliability index of the object (speaker) recognition systems; computed at the point
where both FAR and FRR errors are equal. When quick comparison of two systems is
required, the EER is commonly used. The lower the EER, the more accurate the system
is considered to be.
Extraction of a (useful) signal/speech
A procedure of the signal processing which allows obtaining the better characteristics of
the signal perception against the background of noise or sound surrounding. Extraction
provides the greater signal intelligibility, the higher speaker recognition, speech/non-
speech discrimination, convenience during listening, etc.

F
FAR (False Acceptance Rate)
A parameter used for assessing reliability of the object (speaker) recognition systems;
defined in % as the probability that the system incorrectly declares a successful match
between the input pattern and a non-matching pattern in the database. It measures the
percent of invalid matches.
FFT (fast Fourier transform)
The simple and efficient algorithm for computing the signal spectrum via discrete Fourier
transform. When using FFT, it commonly requires processing in a single analysis frame a
number of signal points divisible by power of 2: 16, 32, 64, 128, 256, 512, 1024, etc. In
speech technologies, the following types of spectrum are typically used. Instantaneous
signal spectrum is calculated in a single analysis frame. Average signal spectrum is
calculated via averaging the instantaneous spectra within the specified fragment of a
speech signal (e.g. within the whole signal) taken at the fixed time intervals (e.g. one
forth of the analysis frame length).
Filter
An electronic device or program-mathematical algorithm used to remove vibrations of
certain frequencies from a composite signal with wideband spectrum while allowing the
more narrowband vibrations to pass. A high-pass filter attenuates low frequencies and
lets the high ones pass through. A low-pass filter does the opposite. In a more
comprehensive sense, filter is any mean of linear modification of input signal spectrum.
Formant
The amplitude maximum, area of energy concentration in the speech sound spectrum,
determined by the resonant properties of the vocal tract. In the speech sound 3-6
formants are commonly distinguished within the frequency range from 250 to 5000
Hz. Formant is a phonetic characteristic of sound; it contains information about the
speaker’s individual speech features. Formant with the lowest frequency is denoted F1,
the second F2, and so on to the highest frequencies.

145
Formant bandwidth
An interval on the frequency axis occupied by formant; denoted as В1, В2, etc.
depending on the number of formant.
Fourier transform
The main subject of the spectral (frequency) analysis of signals. It is based on the
assumption that all signals (processes) under consideration consist of a certain number
of harmonic (sine and/or cosine) components (called harmonics) with different
frequencies; each component has its amplitude and initial phase angle (phase).
Fragment
In SIS, the part of data which is singled out in some way from the segment, but has not
lost its connection with the remaining data. It can be, for example, part of a segment
limited by temporary marks or part of a segment included in the highlighted interval
between permanent marks or part of a segment visible in the box.
FRR (False Rejection Rate)
A parameter used for assessing the reliability of the object (speaker) recognition
systems; defined in % as the probability that the system incorrectly declares failure of
match between the input pattern and the matching template in the database. It
measures the percent of valid inputs being rejected.

G
Frequency response)
Gain-frequency characteristic of a sound data transmission channel is the dependence of
a signal level at the output on the frequency of a constant-amplitude input signal. It is
typically characterized by two parameters: passband and gain flatness. Gain-frequency
characteristic for ideal acoustic system (in view of its properties) must be flat.

H
Hamming window
A weighting (window) function applied in spectral analysis of the signals and filter
design. The following window functions also exist: Hann (Hanning) window, Blackman
window, Nuttoll window, Gauss window, etc.
Harmonic components (harmonics, overtones)
The simple harmonic (sinusoidal) oscillations which form together the complex sound
oscillations. Harmonic is a component frequency of these oscillations that is an integer
multiple of the fundamental frequency. The totality of harmonic components values
defines a voice timbre and is individual for every speaker.
Harmonic distortion

146
A result of different nonlinear transformations, affecting the useful signal properties.
These can be, for example, amplitude limitation of a signal, a signal compression using
specific encoding algorithms, etc. In most cases, harmonic distortions of a signal
annihilate its useful information without recovery. Total harmonic distortion (THD) is a
measurement of the harmonic distortion and is defined as the ratio of the sum of the
powers of all harmonic frequencies above the fundamental frequency to the power of
fundamental. THD is measured in % or dB. For ADC/DAC systems, the THD value less
than 0.01% corresponds to a high quality recorder, less than 0.1 – a good quality, 1% –
an average quality recorder, 10% – a bad quality recorder.
Harmonic (sinusoidal) oscillation
An oscillation of a physical (or any other) quantity when its value changes with time
according to sinusoidal low: X=Asin(ωt+φ), where X is the displacement (oscillating
quantity value) at the current point of time (t), A is the amplitude of oscillation, ω is the
angular frequency, (ωt+φ) is the current oscillation phase, φ is the initial oscillation
phase. Any non-harmonic oscillation may be uniquely represented as the sum of
different harmonic oscillations, i.e. as the spectrum of harmonic oscillations (harmonic
expansion, Fourier expansion).
Hertz (Hz)
A unit of frequency and is defined as the number of cycles of periodic process per
second. One hertz means “one cycle per second”. It is named after the German physicist
Heinrich Hertz. Commonly used multiples of Hz are: kHz (kilohertz, 103 kHz), MHz
(megahertz, 106 Hz), etc.

Histogram (bar diagram)


One of the most common ways of graphic data representation. The histogram reflects
statistical distributions of numerical value. Histogram is shown as a row of vertical
adjacent rectangles (bars), drawn along a straight line. Each bar width represents
interval where it is drawn and its area is proportional to the frequency of the
corresponding value appearance within this interval.

I
Impulse noise
A noise occurring as short signals with sharply increasing and decreasing amplitudes.
Interference
A process embarrassing an auditory perception of useful signals during record playing,
developing in the form of different kinds of noise, background and other signals having
no useful information.
Inverse spectrum

147
A spectrum having its maximums converted into minimums of the same value and vice
versa.
Inversion
A reversal of a set (succession) of any elements. Reverse order of words.

L
Linear prediction coefficients (LPC)
One of the methods of primary speech signal representation frequently used in the
compression, recognition and synthesis systems.
LPC spectrum
The kind of speech spectrum analysis based on speech linear prediction methods (or
equivalently, autoregressive model of speech signal). LPC spectrum sometimes results
in more effective representation of the signal’s spectral characteristics than the classic
signal spectrum calculated with the help of the Fourier transform.

M
Masking of sound (speech)
A property of human hearing, a psycho-physiological phenomenon, which means that
some components of one signal (e.g. speech) are not heard or are heard weakened in
the presence of another (masking) signal in a signal mixture. For example, in the
frequency neighbourhood of the large-amplitude harmonic, the weaker harmonic signals
are not heard. Speech masking by noise is one of the main reasons of decreasing
speech intelligibility in noise. Sound masking is quantified as a number of decibels, on
which the sound hearing threshold raises in the presence of noise. Low frequency tones
take the stronger masking effect than high frequency tones. In some cases, some
speech components can be masked by other speech components of the same speech
signal and, as a result, speech intelligibility decreases. This phenomenon is called self-
masking of speech.
Mu-law (μ-law, u-law) algorithm
A method of time-domain (speech) signal processing, transforming instantaneous
amplitude according to the law similar to logarithmic one; allows compressing input

148
speech data without quality loss; primarily used in telephony (G.711 standard is
recommended).

N
Noise
1. Disorderly oscillations of a different physical nature, having continuous spectrum in a
sound frequency range.
2. Unwanted sound that complicates the useful signal determination and use. Any
oscillation in solids, liquids and gases can be the source of an audible and inaudible
noise. Radio-electronic (electromagnetic) noise is a random variation of current or
voltage in radio-electronic devices (for example, audio recording and reproducing
equipment).
Noise reduction (noise cancellation)
The process of removing unwanted noise (background noise) from a signal.

O
Octave
The interval including 6 tones and 12 semitones; the interval between two frequencies
having a ratio of 2 to 1. It is used at specifying the fall-off slope for the filter’s frequency
response and normalizing the signal in the spectral analysis. For instance, 12 dB per
octave means the 12 dB amplitude fall-out (approx 4 times less) at the frequency
doubling.
Oscillation frequency
A quantity defined as the number of oscillation periods (the number of oscillations)
occurring per time unit; is the reciprocal of the oscillation period (the duration of one
cycle in an oscillating process).

P
Pause
(lat. pausa, gr. pausis – stop, termination)

A break in speech, which acoustically corresponds to the absence of sound, and


physiologically – to the stop in the activity of speech organs.
Pitch (fundamental frequency, pitch of sound/voice)
A perceived quality of sound that is most closely related to the frequency of the first
harmonic (fundamental frequency) in a discrete spectrum and depends on the size and

149
speed of vocal cords vibrations. In oral speech, this feature determines voice type (bass,
tenor, descant, etc.).
Pitch of voice (sound)
A property of voice measured by the vocal folds oscillation frequency in a unit of time:
the more oscillations account for a unit of time, the higher is the pitch.
Pixel
In digital imaging, the smallest item in an image.
Pseudo stereo
The mode of stereo playback of a mono signal when a signal in one playback channel is
late for a certain time and moves in phase relatively to another playback channel. Using
this mode, it is possible to reduce an operator’s fatigability during listening and enhance
speech intelligibility in noise.

R
Range
A quantity setting the utmost limits of attribute change (e.g., sounding speech
attributes); difference between minimum and maximum values of the attribute.

Relative noise level


The ratio in decibels of the noise value to the useful signal value corresponding to the
maximum recording level.
Reverberation
(lat. reverbero – reflect, cast)

The process of a gradual sound attenuation in an enclosed space after the source is
removed. Each frequency component keeps sounding during some period of time
depending on absorption of the sound reflected from the walls and subjects inside the
room at a given frequency. Vibrations with different frequencies, excited by a sound
source, decay non-simultaneously and add to an original signal distorting its properties.
Reverberation strongly affects intelligibility of speech and music in an enclosed space.
The length of reverberation is characterized by the reverberation time, i.e. period of
time during which the sound intensity decreases by a factor of 1000. Reverberation time
receives special consideration for an acoustic quality of an enclosure. The greater the
room capacity (or the sound free path time), the longer is the reverberation time and
the smaller is the absorption by the bounding surface. To measure the reverberation

150
time, you need to record the decrease process of the sound pressure level after the
source stops.

S
Sampling rate (sampling frequency)
The number of samples per time unit taken from a continuous signal to make a discrete
signal. For time-domain signals, it can be measured in hertz (Hz). The inverse of the
sampling frequency is the sampling period or sampling interval, which is the time
between samples.
Segment
In SIS, the chunk of data which forms an entity and is not connected with other data.
For example, data read from a file or inputted from a sound card will form one segment.
Being recorded via the sound card, all read data form a single segment. Each new
segment in the data box is represented by a different color.
Signal distortion
Refers to any modification (losses or additions) in the sound signal affecting its
significant characteristics and worsening oral perception parameters (intelligibility,
naturalness, speaker’s cognition, etc.). Typical distortions: limitation of the level of weak
and strong sounds, clipping, limitation in the frequency range, gain flatness, frequency
bias at the expense of heterodyning.

Signal energy
A root-mean-square signal value in a frame of a set width (in milliseconds), located
symmetrically relative to the current point in the signal.
Signal/sound power
A quantity defined as the energy transported by the sound wave per concerned area per
time unit. Time-average power value related to unit area is called sound intensity.
Signal-to-noise ratio (SNR)
A measurement defined as the ratio in decibels of the average (or other normalized)
level of a useful signal to the average (or other normalized) level of background noise.
An audio component with a high signal-to-noise ratio (>30 dB) has relatively little
background noise accompanying the signal; a component with a low signal-to-noise
ratio (<10 dB) is noisy.
Sound
A mechanical oscillation travelling through elastic mediums or bodies (solids, liquids and
gases), composed of frequencies within the limits of human hearing (between about 17-
20 Hz and 20 000 Hz). The heightened sensibility of human ear is detected in the
frequency range from 1 kHz to 5 kHz. Mechanical oscillation which is lower in frequency

151
than 17 Hz is called infrasound, while ultrasound is an oscillation with a frequency
greater than the upper limit of human hearing (20 000 Hz).
Sound intensity
A quantity defined as the time-average energy transported by the sound wave per time
unit per unit area that is normal to the wave line. Sound intensity is measured by an
intensity level, expressed in logarithmic units (decibels).
Sound spectrum
An acoustic representation of complex sound providing information about the frequency
of sound source, pitch harmonics and relative intensity of all its frequency components.
Speaker identification
The process of comparing the speech of an unknown speaker against a database of the
speech samples of known speakers to determine whether it matches any of the
templates or not, i.e. to identify the submitted unknown speaker with any of known
speakers.
Speaker identification characteristics
The stable individual characteristics of a speaker that are obtained from his speech:
appearance and speech characteristics, as well as subjective auditory estimation of a
speaker.
Speaker recognition
A generalized term including identification, verification and speaker separation.

Speaker verification
A procedure of checking whether the speaker, whose speech is analyzed, is the person
he pretends to be (e.g. by entering a specific PIN code). Verification itself is one of the
pattern recognition problems, when it is necessary to accept or reject a hypothesis of
identity of the two given classes (patterns).
Spectral range
The frequency range of spectrum within that the given event or object (e.g. the signal)
is considered. Commonly is specified by the upper and lower frequency bounds in Hz.
For example, the spectral range for the standard telephone channel is 300-3400 Hz.
Spectrogram
A graphic representation for the results of sound vibrations spectral analysis.
Spectro-temporal analysis of speech recording
The instrumental method of speech signal analysis used to establish dependences
between the frequency and peak characteristics of speech spectrum and the duration of
the speech process. Spectro-temporal analysis provides the most complete
representation of speech in the form of a continuously changing spectrum of sound
vibrations produced by the resonator parameters of the vocal tract constantly varying in
the time domain.
Spectrum (frequency, harmonic, Fourier spectrum)
152
A signal parametric representation as a set of coefficients of its sinusoid decomposition
with fixed frequencies, called harmonics or Fourier functions. In most devices and
algorithms, processing or using speech signal during its recognition, synthesis,
compression and cleaning, the signal at the beginning of processing is altered from time
domain to spectral representation (spectrum is calculated) by means of the Fourier
transform or its implementation optimized for computer calculations – the fast Fourier
transform (FFT). Spectral analysis is a procedure of spectrum production; it is carried
out for a sequence of signal points – an analysis frame or a signal sample. From the
computational standpoint, it is important how many points are used by this procedure in
one analysis frame. Typical number of points is 256. In the spectrum graphic
representation, a distance between separate points of spectrum in Hz is named spectral
resolution. For each spectral component, a width of spectrum analysis band is called
spectral width. Spectral analysis of speech signal is the signal decomposition into
resonant frequencies with certain amplitudes by means of a set of filters (filter bank) or
fast Fourier transform algorithms.
Speech intelligibility
A measure of sound clarity that indicates the ease of speech understanding. It is a
composite function of an articulation clearness of a speaker, properties of room
acoustics, as well as a quality of transmission channel and recording/playback
equipment. Quantitatively, intelligibility is defined as the ratio between number of
phrases (words, syllables or sounds) accurately understood by a listener and their
general number. In this connection, they distinguish phrasal, verbal, syllabic and sound
intelligibility.
Speech perception
Refers to the processes by which humans are able to interpret and understand the
speech sound signal. Speech perception includes the initial auditory analysis, acoustic
features extraction, as well as their phonetic, prosodic and semantic representation.
Speech sound
A minimum unit of speech flow resulting from human articulation activity. Speech sound
is characterized by specific acoustic and perceptive properties.
Speech tempo
The speed of pronouncing the speech elements: sounds, syllables and words. Is
measured by either a number of sounds, syllables, etc. being pronounced in a time unit,
or by their average length.

T
Temporary mark

153
A yellow vertical dashed line in the data box used for temporary marking fragments of
data. There can be from 0 to 2 temporary marks in the box. If you try to set the third
mark, the first of the two already set marks will disappear.
Threshold of hearing
A minimum sound level that an average person can hear in a noiseless environment.
This point will vary from person to person, but is generally reported as the RMS (root-
mean-square) sound pressure of 20 micropascals or 2×10−4 dynes per square
centimeter at 1 kHz frequency.
Timbre
A subjective quality or color of a speech sound perceived as the impression of the
totality and ratio of the spectral component levels. The following timbre types are
distinguished: epiglottal timbre (also called sound timbre) – the sound quality
depending on location of different articulation organs, not participating in the voice
production, and acoustic processes pertaining to them; glottal timbre – the sound
quality determined by operation of organs involved in the voice production.
Transcription
(lat. transcriptio – rewriting)

The process of matching the human speech sound units (segmental transcription) and
intonation units (suprasegmental transcription) to special written symbols, using a set of
exact rules, so that these sounds can be reproduced later. Two types of transcription:
phonetic and phonematic. Phonetic transcription matches to script the spoken speech,
considering all its sound and intonation features. Commonly, English and Russian special
symbols (Roman alphabet and Cyrillic alphabet) and additional super-/interlinear
diacritical signs are distinguished. For linguistic purposes, the IPA (International
Phonetic Alphabet) standard transcription scheme is frequently used.

V
Visible speech (sonogram, dynamic spectrogram)
A flat graphic representation of the speech signal in the 3-dimensional rectangular
system of coordinates: time, frequency and intensity (it shows how changes with time
the spectrum of compound sounds including speech). Spectrogram is a synchronous
graphical representation of a group of successively calculated instantaneous spectra.
Time is commonly laid off along the horizontal axis, while the vertical axis is meant for
frequency (Hz). Intensity of every frequency component at the current time point is
presented in the image as a blackening with different degrees of saturation or with help
of a specially created colour spectrum. Visible speech representation describes almost

154
uniquely the key characteristics of the speech signal sounding (its formant and harmonic
components).
Vocal tract
The totality of articulation organs.

W
Waveform
Waveform of the speech signal is a graphic representation of the signal vibration
amplitude as a function of time. Waveforms can be obtained using signal processing
equipment: loop waveform viewers, signal level recorders and electronic waveform
viewers. Waveforms can be used to extract fragments of data for further research.
Window
In SIS, a rectangular displayed in the main screen along with graphical representation in
it and data contained only inside this window.
White noise
A noise which contains equally represented sound vibrations with different frequencies,
that is the signal contains equal wave power within a fixed bandwidth at any center
frequency (e.g. noise of waterfall). White noise draws its name from white light. The
amplitudes of a white noise signal are independent from each other in the successive
points of time. On ear white noise is interpreted as a uniform hissing. It is typically
developed as a tape noise, noise of amplifier background, etc. Widespread hardware and
software generators of white noise are intended for testing audio equipment and signal
processing algorithms.
Wow (and flutter)
A signal distortion, resulting from parasitic frequency modulation with frequencies taken
within the 0.2-200 Hz range which is caused by irregular tape motion during recording
or playback. “Wow” effect is typical for analog tape recorders.

155
02-090609–7.0.1.

You might also like