Sphinx-4 Application Programmer's Guide - CMUSphinx Wiki
Sphinx-4 Application Programmer's Guide - CMUSphinx Wiki
Sphinx-4 Application Programmer's Guide - CMUSphinx Wiki
CMUSphinx
Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University
Download
Learn
Research
Develop
Communicate
shn4scap/d/m/pixdm/elwrdHlool.aa pix/r/pseucushn/eohlool/elWrdjv:
pcaeeucushn.eohlool akgd.m.pixdm.elwrd iprd.m.pixfotn.tlMcohn moteucushn.rnedui.irpoe iprd.m.pixrcgie.eonzr moteucushn.eonzrRcgie iprd.m.pixrsl.eut moteucushn.eutRsl iprd.m.pixui.rp.ofgrtoMngr moteucushn.tlposCniuainaae /* * ipeHlooleosoigasmlpehapiainbituigShn.TiplctossteShn *AsmlelWrddmhwnipesecplctoulsnpix4hsapiainuehpix4 npitrhcuoaialemnsicmnuinoutrneniecs *edone,wihatmtclysgetnoigadoitteacsadslne. / * pbilselWrd{ ulccasHlool pbittcvianSrn[rs ulcsaiodmi(tig]ag){ Cniuainaaem ofgrtoMngrc iag.egh>0 f(rslnt){ ceofgrtoMngrag[] m=nwCniuainaae(rs0) }es le{ ceofgrtoMngrHlool.ls.eRsuc(hlool.ofgxl) m=nwCniuainaae(elWrdcasgteore"elwrdcni.m") } Rcgieeonzr=(eonzrmlou(rcgie" eonzrrcgieRcgie)c.okp"eonzr) rcgie.loae) eonzralct( /trhirpoeoxtihrgamihsioosbe /sattemcohnreiftepormftisntpsil Mcohnirpoe=(irpoemlou(mcohn" irpoemcohnMcohn)c.okp"irpoe) i!irpoesateodn(){ f(mcohn.trRcrig) Sse.u.rnl(Cnotrirpoe" ytmotpitn"antsatmcohn.) rcgie.eloae) eonzrdalct( Sse.xt1 ytmei() } Sse.u.rnl(SyGoonnel)(Bisa|Eado|Puhlp|Rtil)) ytmotpitn"a:(odmrig|Hlohkhvnral|Piiia|Wl" /optercgiinutltepormxt. /loheontonihrgameis wie(re hltu){ Sse.u.rnl(Satsekn.PesCroqi.n) ytmotpitn"trpaigrstlCtut\" Rsleut=rcgie.eonz( eutrsleonzrrcgie) irsl=nl){ f(eut!ul Srneutet=rsl.eBsFnleutoilr) tigrslTxeutgtetiaRslNFle( Sse.u.rnl(Yusi:"+rslTx\' ytmotpitn"oadeutet+'n) }es le{ Sse.u.rnl(Icnthahtyusi.n) ytmotpitn"a'erwaoad\" } } } }
eucushn.eonzrRcgie d.m.pixrcgie.eonzr
[tp/cupixsucfrentshn4jvdceucushn/eonzrRcgie.tl ht:/mshn.oreog.e/pix/aao/d/m/pixrcgie/eonzrhm]
eucushn.eutRsl d.m.pixrsl.eut
[tp/cupixsucfrentshn4jvdceucushn/eutRsl.tl ht:/mshn.oreog.e/pix/aao/d/m/pixrsl/euthm]
eucushn.tlposCniuainaae d.m.pixui.rp.ofgrtoMngr
[tp/cupixsucfrentshn4jvdceucushn/tlposCniuainaae.tl ht:/mshn.oreog.e/pix/aao/d/m/pixui/rp/ofgrtoMngrhm]
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
1/15
4/1/12
The R c g i e is the main class any application should interact with. The R s l is returned by the R c g i e to the application eonzr eut eonzr after recognition completes. The C n i u a i n a a e creates the entire Sphinx-4 system according to the configuration specified by ofgrtoMngr the user. Let's look at the m i ( method. The first few lines creates the URL of the XML-based configuration file. an) A C n i u a i n a a e is then created using that URL. ofgrtoMngr The C n i u a i n a a e then reads in the file internally. Since the configuration file specifies the components recognizer and ofgrtoMngr microphone (we will look at the configuration file next), we perform a lou( okp) in the C n i u a i n a a e to obtain these components. ofgrtoMngr The method of the R c g i e is then called to allocate the resources need for the recognizer. eonzr
[tp/cupixsucfrentshn4jvdceucushn/tlposCniuainaae.tllou(aa ht:/mshn.oreog.e/pix/aao/d/m/pixui/rp/ofgrtoMngrhm#okpjv
alct( loae)
[tp/cupixsucfrentshn4jvdceucushn/eonzrRcgie.tlalct(] ht:/mshn.oreog.e/pix/aao/d/m/pixrcgie/eonzrhm#loae)
The M c o h n class is used for capturing live audio from the system audio device. Both the R c g i e and the M c o h n are irpoe eonzr irpoe configured as specified in the configuration file. Once all the necessary components are created, we can start running the demo. The program first turns on the M c o h n ( irpoe
mcohn.trRcrig) irpoesateodn(
After the microphone is turned on successfully, the program enters a loop that repeats the following: It tries to recognize what the user is saying, using the Rcgie.eonz( eonzrrcgie)
[tp/cupixsucfrentshn4jvdceucushn/rnedui/irpoehm#trRcrig) ht:/mshn.oreog.e/pix/aao/d/m/pixfotn/tlMcohn.tlsateodn(]
[tp/cupixsucfrentshn4jvdceucushn/eonzrRcgie.tlrcgie) ht:/mshn.oreog.e/pix/aao/d/m/pixrcgie/eonzrhm#eonz(]
method. Recognition stops when the user stops speaking, which is detected by the e d o n e built into the front end by configuration. npitr Once an utterance is recognized, the recognized text, which is returned by the method R s l . e B s R s l N F l e ( eutgteteutoilr)
[tp/cupixsucfrentshn4jvdceucushn/eutRsl.tlgteteutoilr) ht:/mshn.oreog.e/pix/aao/d/m/pixrsl/euthm#eBsRslNFle(]
is printed out. If the R c g i e recognized nothing (i.e., result is null), then it will print out a message saying that. eonzr Finally, if the demo program cannot turn on the microphone in the first place, the R c g i e will be deallocated, and the program exits. It eonzr is generally a good practice to call the method dalct( eloae)
[tp/cupixsucfrentshn4jvdceucushn/eonzrRcgie.tldalct(] ht:/mshn.oreog.e/pix/aao/d/m/pixrcgie/eonzrhm#eloae)
after the work is done to release all the resources. Note that several exceptions are thrown. These exceptions should be caught and handled appropriately. Hopefully, by this point, you will have some idea of how to write a simple Sphinx-4 application. We will now turn to the harder part, understanding the various components necessary to create a grammar-based recognizer. These components are specified in the configuration file, which we will now explain in depth.
Recognizer
The lines below define the recognizer component that performs speech recognition. It defines the name and class of the recognizer, R c g i e . This is the class that any application should interact with. If you look at the javadoc of the R c g i e class, you will see eonzr eonzr that it has two properties, 'decoder' and 'monitors'. This configuration file is where the value of these properties are defined.
<opnnae"eonzrye"d.m.pixrcgie.eonzr> cmoetnm=rcgie"tp=eucushn.eonzrRcgie" <rprynm=dcdrau=dcdr/ poetae"eoe"vle"eoe"> <rpryitnm=mntr" poetlsae"oios> <tmacrcTakr<ie> ie>cuayrce/tm <tmsedrce/tm ie>peTakr<ie> <tmmmrTakr<ie> ie>eoyrce/tm <poetls> /rpryit <cmoet /opnn>
We will explain the monitors later. For now, let's look at the decoder.
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
2/15
4/1/12
Decoder
The 'decoder' property of the recognizer is set to the component called 'decoder', which is defined as:
<opnnae"eoe"tp=eucushn.eoe.eoe" cmoetnm=dcdrye"d.m.pixdcdrDcdr> <rprynm=sacMngrau=sacMngr/ poetae"erhaae"vle"erhaae"> <cmoet /opnn>
The
decoder
component
is
of
class
[tp/cupixsucfrentshn4jvdceucushn/eoe/eoe.tl. ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrDcdrhm]
'searchManager' is set to the component 'searchManager', defined as:
<opnnae"erhaae" cmoetnm=sacMngr tp=eucushn.eoe.erhSmlBedhisSacMngr> ye"d.m.pixdcdrsac.iperatFrterhaae" <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=lnus"vle"ltigit/ poetae"igitau=faLnus"> <rprynm=pue"vle"rvaPue"> poetae"rnrau=tiilrnr/ <rprynm=soe"vle"heddcrr/ poetae"crrau=traeSoe"> <rprynm=atvLsFcoyau=atvLs"> poetae"cieitatr"vle"cieit/ <cmoet /opnn>
eucushn.eoe.eoe d.m.pixdcdrDcdr
Its property
The
sacMngr erhaae
is
of
class
eucushn.eoe.erhSmlBedhisSacMngr d.m.pixdcdrsac.iperatFrterhaae
[tp/cupixsucfrentshn4jvdceucushn/eoe/erhSmlBedhisSacMngrh ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrsac/iperatFrterhaae.
This class performs a simple breadth-first search through the search graph during the decoding process to find the best path. This search manager is suitable for small to medium sized vocabulary decoding.
The l g a hproperty is the log math that is used for calculation of scores during the search process. It is defined as having the log base of oMt 1.0001. Note that typically the same log base should be used throughout all components, and therefore there should only be one l g a h oMt definition in a configuration file:
<opnnae"oMt"tp=eucushn.tlLgah> cmoetnm=lgahye"d.m.pixui.oMt" <rprynm=lgaeau=100"> poetae"oBs"vle".01/ <rprynm=uedTbeau=tu"> poetae"sAdal"vle"re/ <cmoet /opnn>
The linguist of the s a c M n g ris set to the component 'flatLinguist' (which we will look at later), which again is suitable for small to erhaae medium sized vocabulary decoding. The pruner is set to the 'trivialPruner':
<opnnae"rvaPue" cmoetnm=tiilrnr tp=eucushn.eoe.rnrSmlPue"> ye"d.m.pixdcdrpue.ipernr/
which
is
of
class
[tp/cupixsucfrentshn4jvdceucushn/eoe/rnrSmlPue.tl. ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrpue/ipernrhm]
pruner performs simple absolute beam and relative beam pruning based on the scores of the tokens. The scorer of the s a c M n g r is erhaae set to the eucushn.eoe.crrTraeAosiSoe d.m.pixdcdrsoe.heddcutccrr component 'threadedScorer', which is
eucushn.eoe.rnrSmlPue d.m.pixdcdrpue.ipernr
This
of
class
[tp/cupixsucfrentshn4jvdceucushn/eoe/crrTraeAosiSoe.tl ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrsoe/heddcutccrrhm]
It can use multiple threads (usually one per CPU) to score the tokens in the active list. Scoring is one of the most time-consuming step of the decoding process. Tokens can be scored independently of each other, so using multiple CPUs will definitely speed things up. The t r a e S o e is defined as follows: heddcrr
<opnnae"heddcrr cmoetnm=traeSoe" tp=eucushn.eoe.crrTraeAosiSoe" ye"d.m.pixdcdrsoe.heddcutccrr> <rprynm=fotn"vle"{rned"> poetae"rnedau=$fotn}/ <rprynm=iCueaieau=tu"> poetae"spRltv"vle"re/ <rprynm=nmhed"vle""> poetae"uTrasau=0/ <rprynm=mncralseTra"vle"0/ poetae"iSoebePrhedau=1"> <rprynm=soebeKeFaueau=tu"> poetae"cralsepetr"vle"re/ <cmoet /opnn>
The 'frontend' property is the front end from which features are obtained. For details about the other properties of the t r a e S o e , heddcrr please refer to javadoc for TraeAosiSoe heddcutccrr
[tp/cupixsucfrentshn4jvdceucushn/eoe/crrTraeAosiSoe.tl ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrsoe/heddcutccrrhm]
Finally, the a t v L s F c o yproperty of the s a c M n g ris set to the component 'activeList', which is defined as follows: cieitatr erhaae
<opnnae"cieit cmoetnm=atvLs" ye"d.m.pixdcdrsac.attoAtvLsFcoy< tp=eucushn.eoe.erhPriincieitatr" <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"< <rprynm=asltBaWdhau=$asltBaWdh"< poetae"boueemit"vle"{boueemit}/ <rprynm=rltvBaWdhau=$rltvBaWdh"< poetae"eaieemit"vle"{eaieemit}/ <cmoet /opnn<
It
is
of
class
eucushn.eoe.erhPriincieitatr d.m.pixdcdrsac.attoAtvLsFcoy
[tp/cupixsucfrentshn4jvdceucushn/eoe/erhPriincieitatr.tl ht:/mshn.oreog.e/pix/aao/d/m/pixdcdrsac/attoAtvLsFcoyhm]
It uses a partitioning algorithm to select the top N highest scoring tokens when performing absolute beam pruning. The 'logMath' property specifies the l g a hused for score calculation, which is the same L g a hused in the s a c M n g r The oMt oMt erhaae. property 'absoluteBeamWidth' is set to the value given at the very top of the configuration file using ${absoluteBeamWidth}. The same is for ${relativeBeamWidth}.
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
3/15
4/1/12
Linguist
Now let's look at the f a L n u s component (a component inside the s a c M n g r The linguist is the component that generates ltigit e r h a a e ). the search graph using the guidance from the grammar, and knowledge from the dictionary, acoustic model, and language model.
<opnnae"ltigit cmoetnm=faLnus" tp=eucushn.igitfa.ltigit> ye"d.m.pixlnus.ltFaLnus" <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=gamrau=jgGamr/ poetae"rma"vle"sfrma"> <rprynm=aosiMdlau=wj/ poetae"cutcoe"vle"s"> <rprynm=wrIsrinrbblt" poetae"odnetoPoaiiy vle"{odnetoPoaiiy"> au=$wrIsrinrbblt}/ <rprynm=lnugWih"vle"{agaeegt"> poetae"agaeegtau=$lnugWih}/ <cmoet /opnn>
It also uses the l g a hthat we've seen already. The grammar used is the component called 'jsgfGrammar', which is a BNF-style grammar: oMt
<opnnae"sfrma"tp=eucushn.sp.SFrma" cmoetnm=jgGamrye"d.m.pixjaiJGGamr> <rprynm=gamroainau=rsuc:dm/pixhlool// poetae"rmaLcto"vle"eore/eoshn/elwrd"> <rprynm=dcinr"vle"itoay/ poetae"itoayau=dcinr"> <rprynm=gamraeau=hlo/ poetae"rmaNm"vle"el"> <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <cmoet /opnn>
JSGF grammars are defined in JSAPI [http://java.sun.com/products/java-media/speech/]. The class that translates JSGF into a form that Sphinx-4 understands is eucushn.sp.SFrma d.m.pixjaiJGGamr [ t p / c u p i x s u c f r e n t s h n 4 j v d c e u c u s h n / s f J G G a m r h m ] Note that this link to ht:/mshn.oreog.e/pix/aao/d/m/pixjg/SFrma.tl. the javadoc also describes the limitations of the current implementation). The property 'grammarLocation' can take two kinds of values. If it is a URL, it specifies the URL of the directory where JSGF grammar files are to be found. Otherwise, it is interpreted as resource locator. In our example, the HelloWorld demo is being deployed as a JAR file. The 'grammarLocation' property is therefore used to specify the location of the resource hello.gram [http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/hello.gram] within the JAR file. Note that it is not necessary to the JAR file within which to search. The 'grammarName' property specifies the grammar to use when creating the search graph. 'logMath' is the same log math as the other components. The 'dictionary' is the component that maps words to their phonemes. It is almost always the dictionary of the acoustic model, which lists all the words that were used to train the acoustic model:
<opnnae"itoay cmoetnm=dcinr" tp=eucushn.igitdcinr.atitoay> ye"d.m.pixlnus.itoayFsDcinr" <rprynm=dcinrPt"vle poetae"itoayahau= vle"eore/S_gu1de_6_0e_3H_80zdc/mdc..d/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itcuit06"> <rprynm=flePt" poetae"ilrah vle"eore/S_gu1de_6_0e_3H_80zdc/ilrit/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itfledc"> <rprynm=adiEdnPoucainau=fle/ poetae"dSlnigrnnito"vle"as"> <rprynm=wrRpaeetau=<i>/ poetae"odelcmn"vle"sl"> <cmoet /opnn>
The locations of these dictionary files are specified using the Sphinx-4 resource mechanism. The dictionary for filler words like BREATH and LIP_SMACK is the file f l e d c . ilrit For details about the other possible properties, please refer to the javadoc for
[tp/cupixsucfrentshn4jvdceucushn/igitdcinr/atitoayhm] ht:/mshn.oreog.e/pix/aao/d/m/pixlnus/itoayFsDcinr.tl.
FsDcinr atitoay
Acoustic Model
The next important property of the f a L n u s is the acoustic model which describes sounds of the language. It is defined as: ltigit
<opnnae"itoay cmoetnm=dcinr" tp=eucushn.igitdcinr.atitoay> ye"d.m.pixlnus.itoayFsDcinr" <rprynm=dcinrPt"vle poetae"itoayahau= vle"eore/S_gu1de_6_0e_3H_80zdc/mdc..d/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itcuit06"> <rprynm=flePt" poetae"ilrah vle"eore/S_gu1de_6_0e_3H_80zdc/ilrit/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itfledc"> <rprynm=adiEdnPoucainau=fle/ poetae"dSlnigrnnito"vle"as"> <rprynm=wrRpaeetau=<i>/ poetae"odelcmn"vle"sl"> <cmoet /opnn>
'wsj' stands for the Wall Street Journal acoustic models. Sphinx-4 can load acoustic models trained by Sphinxtrain. Common models are packed into JAR files during build and located in lib folder. Sphinx3Loader class [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/acoustic/tiedstate/Sphinx3Loader.html] is used to load them. The JAR needs to be included into classpath. The JAR file for the WSJ models is called W J 8 a _ 3 C p 1 k 4 m l 1 0 z 6 0 H . a , and is in the s h n 4 l b S_gu1de_6_0e_3H_80zjr pix/i directory. As a programmer, all you need to do is to specify the class of the A o s i M d l and the loader of the A o s i M d l as shown cutcoe, cutcoe, above (note that if you are using the WSJ model in other applications, these lines should be the same, except that you might have called your 'logMath' component something else). is in the s h n 4 l bdirectory. The acoustic model could be located in filesystem or on any other pix/i
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
4/15
4/1/12
resource. You need to specify the model location in location property then. The next properties of the f a L n u s are the 'wordInsertionProbability' and 'languageWeight'. These properties are usually for fine ltigit tuning the system. Below are the default values we used for the various tasks. You can tune your system accordingly:
Vocabulary Size Digits (11 words - TIDIGITS) Small (80 words - AN4) Medium (1000 words - RM1) Large (64000 words - HUB4) Word Insertion Probability Language Weight 1E-36 1E-26 1E-10 0.2 8 7 7 10.5
Front End
The last big piece in the configuration file is the front end. There are two different front ends listed in the configuration file: 'frontend' and 'epFrontEnd'. The 'frontend' is good for batch mode decoding (or decoding without endpointing), while 'epFrontEnd' is good for live mode decoding with endpointing. Note that you can also perform live mode decoding with the 'frontend' (i.e., without endpointing), but that you need to explicitly signal the start and end of speech (e.g., by asking the user to explicitly turn on/off the microphone). The definitions for these front ends are:
<****************************> !**************************** <hrnedcniuain> !Tefotnofgrto <****************************> !**************************** <opnnae"rnEdye"d.m.pixfotn.rnEd> cmoetnm=fotn"tp=eucushn.rnedFotn" <rpryitnm=ppln" poetlsae"ieie> <tmmcohn/tm ie>irpoe<ie> <tmpepaie/tm ie>rmhszr<ie> <tmwnoe/tm ie>idwr<ie> <tmft<ie> ie>f/tm <tmmlitrak<ie> ie>eFleBn/tm <tmdt<ie> ie>c/tm <tmlvCN<ie> ie>ieM/tm <tmfauexrcin<ie> ie>etrEtato/tm <poetls> /rpryit <cmoet /opnn> <****************************> !**************************** <hiefotnofgrto> !Telvrnedcniuain <****************************> !**************************** <opnnae"prnEdye"d.m.pixfotn.rnEd> cmoetnm=eFotn"tp=eucushn.rnedFotn" <rpryitnm=ppln" poetlsae"ieie> <tmmcohn/tm ie>irpoe<ie> <tmsecCasfe/tm ie>pehlsiir<ie> <tmsecMre/tm ie>pehakr<ie> <tmnnpehaaitr<ie> ie>oSecDtFle/tm <tmpepaie/tm ie>rmhszr<ie> <tmwnoe/tm ie>idwr<ie> <tmft<ie> ie>f/tm <tmmlitrak<ie> ie>eFleBn/tm <tmdt<ie> ie>c/tm <tmlvCN<ie> ie>ieM/tm <tmfauexrcin<ie> ie>etrEtato/tm <poetls> /rpryit <cmoet /opnn>
As you might notice, the only different between these two front ends is that the live front end (e F o t n ) has the additional components prnEd s e c C a s f e , s e c M r e and n n p e h a a i t r These three components make up the default endpointer of pehlsiir pehakr oSecDtFle. Sphinx-4. Below is a listing of all the components of both front ends, and those properties which have values different from the default:
<opnnae"pehlsiir cmoetnm=secCasfe" ye"d.m.pixfotn.npitSecCasfe" tp=eucushn.rnededon.pehlsiir> <rprynm=trsodau=1"> poetae"hehl"vle"3/ <cmoet /opnn> <opnnae"oSecDtFle" cmoetnm=nnpehaaitr ye"d.m.pixfotn.npitNnpehaaitr/ tp=eucushn.rnededon.oSecDtFle"> <opnnae"pehakr cmoetnm=secMre" ye"d.m.pixfotn.npitSecMre"> tp=eucushn.rnededon.pehakr <rprynm=secTalrau=5"> poetae"pehrie"vle"0/ <cmoet /opnn> <opnnae"rmhszr cmoetnm=pepaie" ye"d.m.pixfotn.itrPemhszr/ tp=eucushn.rnedfle.repaie"> <opnnae"idwr cmoetnm=wnoe" ye"d.m.pixfotn.idwRieCsnWnoe" tp=eucushn.rnedwno.asdoieidwr> <cmoet /opnn> <opnnae"f" cmoetnm=ft tp=eucushn.rnedtasomDsrtFuirrnfr"> ye"d.m.pixfotn.rnfr.iceeoreTasom/ <opnnae"eFleBn" cmoetnm=mlitrak tp=eucushn.rnedfeunyapMlrqecFleBn" ye"d.m.pixfotn.rqecwr.eFeunyitrak> <cmoet /opnn> <opnnae"c" cmoetnm=dt tp=eucushn.rnedtasomDsrtCsnTasom/ ye"d.m.pixfotn.rnfr.iceeoiernfr">
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
5/15
4/1/12
<opnnae"acCN cmoetnm=bthM" ye"d.m.pixfotn.etr.acCN/ tp=eucushn.rnedfaueBthM"> <opnnae"ieM" cmoetnm=lvCN ye"d.m.pixfotn.etr.ieM"> tp=eucushn.rnedfaueLvCN/ <opnnae"etrEtato" cmoetnm=fauexrcin ye"d.m.pixfotn.etr.etsetrEtatr/ tp=eucushn.rnedfaueDlaFauexrco"> <opnnae"irpoe cmoetnm=mcohn" ye"d.m.pixfotn.tlMcohn" tp=eucushn.rnedui.irpoe> <rprynm=mePredau=1"> poetae"sceRa"vle"0/ <rprynm=coeeweUtrne"vle"as"> poetae"lsBtenteacsau=fle/ <cmoet /opnn>
Let's explain some of the properties set here that have values different from the default. The property 'threshold' [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/endpoint/SpeechClassifier.html#PROP_THRESHOLD] of the S e c C a s f e specifies the minimum difference between the input signal level and the background signal level in order that pehlsiir the input signal is classified as speech. Therefore, the smaller this number, the more sensitive the endpointer, and vice versa. The speechTrailer of the S e c M r e specifies the length of non-speech signal to be included after the end of speech to make sure that no speech signal is lost. pehakr Here, it is set at 50 milliseconds.
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/endpoint/SpeechMarker.html#PROP_SPEECH_TRAILER]'
property
'
The property ' msecPerRead [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/util/Microphone.html#PROP_MSEC_PER_READ]' of the M c o h n specifies the number of milliseconds of data to read at a time from the system audio device. The value specified here is irpoe 10ms. The closeBetweenUtterances specifies whether the system audio device should be released between utterances. It is set to false here, meaning that the system audio device will not be released between utterances. This is set as so because on certain systems (Linux for one), closing and reopening the audio does not work too well.
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/util/Microphone.html#PROP_CLOSE_BETWEEN_UTTERANCES]'
property
'
Instrumentation
Finally, we will explain the various monitors which make up the instrumentation package. These monitors are components of the r c g i e (see above). They are responsible for tracking the accuracy, speed and memory usage of Sphinx-4. eonzr
<opnnae"cuayrce" cmoetnm=acrcTakr tp=eucushn.ntuetto.etahcuayrce" ye"d.m.pixisrmnainBsPtAcrcTakr> <rprynm=rcgie"vle"{eonzr"> poetae"eonzrau=$rcgie}/ <rprynm=soAindeut"vle"as"> poetae"hwlgeRslsau=fle/ <rprynm=soRweut"vle"as"> poetae"hwaRslsau=fle/ <cmoet /opnn> <opnnae"eoyrce" cmoetnm=mmrTakr tp=eucushn.ntuetto.eoyrce" ye"d.m.pixisrmnainMmrTakr> <rprynm=rcgie"vle"{eonzr"> poetae"eonzrau=$rcgie}/ <rprynm=soSmayau=fle/ poetae"hwumr"vle"as"> <rprynm=soDtisau=fle/ poetae"hweal"vle"as"> <cmoet /opnn> <opnnae"peTakr cmoetnm=sedrce" tp=eucushn.ntuetto.peTakr> ye"d.m.pixisrmnainSedrce" <rprynm=rcgie"vle"{eonzr"> poetae"eonzrau=$rcgie}/ <rprynm=fotn"vle"{rned"> poetae"rnedau=$fotn}/ <rprynm=soSmayau=tu"> poetae"hwumr"vle"re/ <rprynm=soDtisau=fle/ poetae"hweal"vle"as"> <cmoet /opnn>
The various knobs of these monitors mainly control whether statistical information about accuracy, speed and memory usage should be printed out. Moreover, the monitors monitor the behavior of a recognizer, so they need a reference to the recognizer that they are monitoring.
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
6/15
4/1/12
The above lines defines frequently tuned properties. They are located at the top of the configuration file so that they can be edited quickly.
Recognizer
<opnnae"eonzr cmoetnm=rcgie" ye"d.m.pixrcgie.eonzr> tp=eucushn.eonzrRcgie" <rprynm=dcdrau=dcdr/ poetae"eoe"vle"eoe"> <rpryitnm=mntr" poetlsae"oios> <tmacrcTakr<ie> ie>cuayrce/tm <tmsedrce/tm ie>peTakr<ie> <tmmmrTakr<ie> ie>eoyrce/tm <tmrcgieMntr<ie> ie>eonzroio/tm <poetls> /rpryit <cmoet /opnn>
The above lines define the recognizer component that performs speech recognition. It defines the name and class of the recognizer. This is the class that any application should interact with. If you look at the javadoc of the ''Recognizer'' class [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/recognizer/Recognizer.html], you will see that it has two properties, 'decoder' and 'monitors'. This configuration file is where the value of these properties are defined.
Decoder
The 'decoder' property of the recognizer is set to the component called 'decoder':
The
decoder
component
is
defined
to
be
of
class
edu.cmu.sphinx.decoder.Decoder
'wordPruningSearchManager':
<opnnae"odrnnSacMngr cmoetnm=wrPuigerhaae" tp=eucushn.eoe.erhWrPuigratFrterhaae" ye"d.m.pixdcdrsac.odrnnBedhisSacMngr> <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=lnus"vle"eTeLnus"> poetae"igitau=lxreigit/ <rprynm=pue"vle"rvaPue"> poetae"rnrau=tiilrnr/ <rprynm=soe"vle"heddcrr/ poetae"crrau=traeSoe"> <rprynm=atvLsMngrau=atvLsMngr/ poetae"cieitaae"vle"cieitaae"> <rprynm=goSiItra"vle""> poetae"rwkpnevlau=0/ <rprynm=cekttOdrau=fle/ poetae"hcSaere"vle"as"> <rprynm=bidodatc"vle"as"> poetae"ulWrLtieau=fle/ <rprynm=aosiLoaedrmsau=17/ poetae"cutcokhaFae"vle"."> <rprynm=rltvBaWdhau=$rltvBaWdh"> poetae"eaieemit"vle"{eaieemit}/ <cmoet /opnn>
The
sacMngr erhaae
is
of
class
edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager
S m l B e d h i s S a c M n g rfor larger vocabulary recognition. This class also performs a simple breadth-first search through iperatFrterhaae
the search graph, but at each frame it also prunes the different types of states separately. The l g a hproperty is the log math that is used for calculation of scores during the search process. It is defined as having the log base of oMt 1.0001. Note that typically the same log base should be used throughout all components, and therefore there should only be one l g a h oMt definition:
<opnnae"oMt"tp=eucushn.tlLgah> cmoetnm=lgahye"d.m.pixui.oMt" <rprynm=lgaeau=100"> poetae"oBs"vle".01/ <rprynm=uedTbeau=tu"> poetae"sAdal"vle"re/ <cmoet /opnn>
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
7/15
4/1/12
The linguist of the s a c M n g ris set to the component 'lexTreeLinguist' (which we will look at later), which again is suitable for large erhaae vocabulary recognition. The pruner is set to the 'trivialPruner':
<opnnae"rvaPue" cmoetnm=tiilrnr tp=eucushn.eoe.rnrSmlPue"> ye"d.m.pixdcdrpue.ipernr/
which
is
of
class
edu.cmu.sphinx.decoder.pruner.SimplePruner
and relative beam pruning based on the scores of the tokens. The scorer of the searchManager is set edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer to the component 'threadedScorer', which is of class
one per CPU) to score the tokens in the active list. Scoring is one of the most time-consuming step of the decoding process. Tokens can be scored independently of each other, so using multiple CPUs will definitely speed things up. The t r a e S o e is defined as follows: heddcrr
<opnnae"heddcrr cmoetnm=traeSoe" tp=eucushn.eoe.crrTraeAosiSoe" ye"d.m.pixdcdrsoe.heddcutccrr> <rprynm=fotn"vle"{rned"> poetae"rnedau=$fotn}/ <rprynm=iCueaieau=tu"> poetae"spRltv"vle"re/ <rprynm=nmhed"vle""> poetae"uTrasau=0/ <rprynm=mncralseTra"vle"0/ poetae"iSoebePrhedau=1"> <rprynm=soebeKeFaueau=tu"> poetae"cralsepetr"vle"re/ <cmoet /opnn>
The 'frontend' property is the front end from which features are obtained. For details about the other properties
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/decoder/scorer/ThreadedAcousticScorer.html].
for ''ThreadedAcousticScorer''
Finally, the 'activeListManager' property of the w r P u i g e r h a a e is set to the component 'activeListManager', which is odrnnSacMngr defined as follows:
<opnnae"cieitaae" cmoetnm=atvLsMngr ye"d.m.pixdcdrsac.ipecieitaae" tp=eucushn.eoe.erhSmlAtvLsMngr> <rpryitnm=atvLsFcois> poetlsae"cieitatre" <tmsadrAtvLsFcoy/tm ie>tnadcieitatr<ie> <tmwrAtvLsFcoy/tm ie>odcieitatr<ie> <tmwrAtvLsFcoy/tm ie>odcieitatr<ie> <tmsadrAtvLsFcoy/tm ie>tnadcieitatr<ie> <tmsadrAtvLsFcoy/tm ie>tnadcieitatr<ie> <tmsadrAtvLsFcoy/tm ie>tnadcieitatr<ie> <poetls> /rpryit <cmoet /opnn> <opnnae"tnadcieitatr" cmoetnm=sadrAtvLsFcoy ye"d.m.pixdcdrsac.attoAtvLsFcoy> tp=eucushn.eoe.erhPriincieitatr" <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=asltBaWdhau=$asltBaWdh"> poetae"boueemit"vle"{boueemit}/ <rprynm=rltvBaWdhau=$rltvBaWdh"> poetae"eaieemit"vle"{eaieemit}/ <cmoet /opnn> <opnnae"odcieitatr" cmoetnm=wrAtvLsFcoy ye"d.m.pixdcdrsac.attoAtvLsFcoy> tp=eucushn.eoe.erhPriincieitatr" <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=asltBaWdhau=$asltWrBaWdh"> poetae"boueemit"vle"{boueodemit}/ <rprynm=rltvBaWdhau=$rltvWrBaWdh"> poetae"eaieemit"vle"{eaieodemit}/ <cmoet /opnn>
The
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/decoder/search/SimpleActiveListManager.html].
SmlAtvLsMngr ipecieitaae
is
of
class
edu.cmu.sphinx.decoder.search.SimpleActiveListManager
Since the word-pruning search manager performs pruning on different search state types separately, we need a different active list for each state type. Therefore, you see different active list factories being listed in the S m l A t v L s M n g r one for each type. So how do ipecieitaae, we know which active list factory is for which state type? It depends on the 'search order' as returned by the search graph (which in this case is generated by the L x r e i g i t e T e L n u s ). The search state order and active list factory used here are:
State Type LexTreeWordState LexTreeEndWordState LexTreeEndUnitState LexTreeUnitState LexTreeHMMState ActiveListFactory wordActiveListFactory wordActiveListFactory standardActiveListFactory standardActiveListFactory standardActiveListFactory
LexTreeNonEmittingHMMState standardActiveListFactory
There are two types of active list factories used here, the standard and the word. If you look at the 'frequently tuned properties' above, you will find that the word active list has a much smaller beam size than the standard active list. The beam size for the word active list is set by 'absoluteWordBeamWidth' and 'relativeWordBeamWidth', while the beam size for the standard active list is set by 'absoluteBeamWidth' and 'relativeBeamWidth'.
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
8/15
4/1/12
The S m l A t v L s M n g rallows us to control the beam size of different types of states. ipecieitaae
Linguist
Lets look at the 'lexTreeLinguist' (a component inside the w r P u i g e r h a a e ). The linguist is the component that generates odrnnSacMngr the search graph using the guidance from the grammar, and knowledge from the dictionary, acoustic model, and language model.
<opnnae"eTeLnus" cmoetnm=lxreigit ye"d.m.pixlnus.ete.eTeLnus" tp=eucushn.igitlxreLxreigit> <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=aosiMdlau=wj/ poetae"cutcoe"vle"s"> <rprynm=lnugMdlau=tirmoe"> poetae"agaeoe"vle"rgaMdl/ <rprynm=dcinr"vle"itoay/ poetae"itoayau=dcinr"> <rprynm=adilrod"vle"as"> poetae"dFleWrsau=fle/ <rprynm=fleIsrinrbblt"vle"E1"> poetae"ilrnetoPoaiiyau=10/ <rprynm=gnrtUiSae"vle"as"> poetae"eeaentttsau=fle/ <rprynm=wnUirmma"vle"re/ poetae"atngaSerau=tu"> <rprynm=uirmmaWih"vle""> poetae"ngaSeregtau=1/ <rprynm=wrIsrinrbblt" poetae"odnetoPoaiiy vle"{odnetoPoaiiy"> au=$wrIsrinrbblt}/ <rprynm=slnenetoPoaiiy poetae"iecIsrinrbblt" vle"{iecIsrinrbblt}/ au=$slnenetoPoaiiy"> <rprynm=lnugWih"vle"{agaeegt"> poetae"agaeegtau=$lnugWih}/ <cmoet /opnn>
For
details
about
the
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/lextree/LexTreeLinguist.html].
Lxreigit eTeLnus,
please
refer
to
the
Javadocs
of
the
LexTreeLinguist
In general, the L x r e i g i tis the one to use for large vocabulary speech recognition, and the F a L n u s is the one to use eTeLnus ltigit for small vocabulary speech recognition. The L x r e i g i thas a lot of properties that can be set, but the ones that are must be set are the 'logMath', the 'acousticModel', the eTeLnus 'languageModel', and the 'dictionary'. These properties are the necessary sources of information for the L x r e i g i t to build the eTeLnus search graph. The rest of the properties are for controlling the speed and accuracy performance of the linguist, and you can read more about them in the Javadocs of the ''LexTreeLinguist [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/lextree/LexTreeLinguist.html].
Acoustic Model
The 'acousticModel' is where the LexTreeLinguist obtains the HMM for the words or units. For the HelloNGram demo it's the same wsj model as for HelloDigits:
<opnnae"s" cmoetnm=wj ye"d.m.pixlnus.cutctesaeTeSaecutcoe" tp=eucushn.igitaosi.idtt.idttAosiMdl> <rprynm=lae"vle"sLae"> poetae"odrau=wjodr/ <rprynm=uiMngrau=uiMngr/ poetae"ntaae"vle"ntaae"> <cmoet /opnn> <opnnae"sLae"tp=eucushn.igitaosi.idtt.pixLae" cmoetnm=wjodrye"d.m.pixlnus.cutctesaeShn3odr> <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=uiMngrau=uiMngr/ poetae"ntaae"vle"ntaae"> <rprynm=lcto"vle"eore/S_gu1de_6_0e_3H_80z/ poetae"oainau=rsuc:WJ8a_3Cp1k4ml10z60H"> <rprynm=mdleiiinau=ecWJcen1de_6_0e_3H_80z40.df/ poetae"oeDfnto"vle"t/S_la_3Cp1k4ml10z60H.00me"> <rprynm=dtLcto"vle"dcniuu_gu"> poetae"aaoainau=c_otnos8a// <cmoet /opnn>
Language Model
The 'languageModel' component of the lexTreeLinguist is called the 'trigramModel', because it is a trigram language model. It is defined as follows:
<opnnae"rgaMdl cmoetnm=tirmoe" tp=eucushn.igitlnug.ga.ipeGaMdl> ye"d.m.pixlnus.agaenrmSmlNrmoe" <rprynm=lcto" poetae"oain vle"eore/d/m/pixdm/elnrmhloga.rga.m/ au=rsuc:eucushn/eohloga/elnrmtirml"> <rprynm=lgahau=lgah/ poetae"oMt"vle"oMt"> <rprynm=dcinr"vle"itoay/ poetae"itoayau=dcinr"> <rprynm=mxet"vle""> poetae"aDphau=3/ <rprynm=uirmegtau=."> poetae"ngaWih"vle"7/ <cmoet /opnn>
The language model is generated by the CMU Statistical Language Modeling Toolkit. It is in text format, which can be loaded by the SimpleNGramModel [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/language/ngram/SimpleNGramModel.html] class. For this class, you also need to specify the dictionary that you are using, which is the same as the one used by the l x r e i g i t eTeLnus. Same for 'logMath' (note that the same l g a hcomponent should be used throughout the system). oMt The 'maxDepth' property is 3, since this is a trigram language model. The 'unigramWeight' should normally be set to 0.7.
Dictionary
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
9/15
4/1/12
The last important component of the L x r e i g i tis the 'dictionary', which is defined as follows: eTeLnus
<opnnae"itoay cmoetnm=dcinr" tp=eucushn.igitdcinr.atitoay> ye"d.m.pixlnus.itoayFsDcinr" <rprynm=dcinrPt"vle poetae"itoayahau= vle"eore/S_gu1de_6_0e_3H_80zdc/mdc..d/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itcuit06"> <rprynm=flePt" poetae"ilrah vle"eore/S_gu1de_6_0e_3H_80zdc/ilrit/ au=rsuc:WJ8a_3Cp1k4ml10z60H/itfledc"> <rprynm=adiEdnPoucainau=fle/ poetae"dSlnigrnnito"vle"as"> <rprynm=wrRpaeetau=<i>/ poetae"odelcmn"vle"sl"> <cmoet /opnn>
As you might realize, it is using the dictionary inside the JAR file of the Wall Street journal acoustic model. The main dictionary for words is the W J 8 a _ 3 C p 1 k 4 m l 1 0 z 6 0 H / i t c u i t 0 6 file inside the JAR file, and the dictionary for filler words S_gu1de_6_0e_3H_80zdc/mdc..d like BREATH and LIP_SMACK is W J 8 a _ 3 C p 1 k 4 m l 1 0 z 6 0 H / i t f l e d c . You can inspect the S_gu1de_6_0e_3H_80zdc/ilrit contents of a JAR file by (assuming your JAR file is called myJar.jar)
jrtfmJrjr avya.a
You can see the contents of the WSJ JAR file by:
shn4avi/S_gu1de_6_0e_3H_80zjr pix>jrtflbWJ8a_3Cp1k4ml10z60H.a re21:12S00MTN/ 0FiFb150:2MK21EAIF 0re21:12S00MTN/AIETM 16FiFb150:0MK21EAIFMNFS.F re21:12S00WJ8a_3Cp1k4ml10z60H/ 0FiFb150:2MK21S_gu1de_6_0e_3H_80z re21:12S00WJ8a_3Cp1k4ml10z60H/dcniuu_gu 0FiFb150:2MK21S_gu1de_6_0e_3H_80zc_otnos8a/ re21:12S00WJ8a_3Cp1k4ml10z60H/it 0FiFb150:2MK21S_gu1de_6_0e_3H_80zdc/ re21:12S00WJ8a_3Cp1k4ml10z60H/t/ 0FiFb150:2MK21S_gu1de_6_0e_3H_80zec 19re21:12S00WJ8a_3Cp1k4ml10z60H/EDE 42FiFb150:2MK21S_gu1de_6_0e_3H_80zRAM 5758FiFb150:8MK21S_gu1de_6_0e_3H_80zc_otnos8a/en 151re21:11S00WJ8a_3Cp1k4ml10z60H/dcniuu_gumas 126re21:12S00WJ8a_3Cp1k4ml10z60H/dcniuu_gumxuewihs 372FiFb150:2MK21S_gu1de_6_0e_3H_80zc_otnos8a/itr_egt 21re21:12S00WJ8a_3Cp1k4ml10z60H/dcniuu_gutasto_arcs 40FiFb150:2MK21S_gu1de_6_0e_3H_80zc_otnos8a/rniinmtie 5758FiFb150:8MK21S_gu1de_6_0e_3H_80zc_otnos8a/aine 151re21:11S00WJ8a_3Cp1k4ml10z60H/dcniuu_guvracs 5re21:12S00WJ8a_3Cp1k4ml10z60H/itapadc 34FiFb150:2MK21S_gu1de_6_0e_3H_80zdc/lh.it 4195FiFb150:6MK21S_gu1de_6_0e_3H_80zdc/mdc..d 783re21:11S00WJ8a_3Cp1k4ml10z60H/itcuit06 7re21:12S00WJ8a_3Cp1k4ml10z60H/itdgt.it 33FiFb150:2MK21S_gu1de_6_0e_3H_80zdc/iisdc 0re21:12S00WJ8a_3Cp1k4ml10z60H/itfledc 24FiFb150:2MK21S_gu1de_6_0e_3H_80zdc/ilrit 5597FiFb150:2MK21S_gu1de_6_0e_3H_80zecWJcen1de_6_0e_3H_80z40.df 646re21:12S00WJ8a_3Cp1k4ml10z60H/t/S_la_3Cp1k4ml10z60H.00me 24re21:12S00WJ8a_3Cp1k4ml10z60H/t/S_la_3Cp1k4ml10z60H.ime 61FiFb150:2MK21S_gu1de_6_0e_3H_80zecWJcen1de_6_0e_3H_80zc.df 7re21:12S00WJ8a_3Cp1k4ml10z60H/t/aibe.e 35FiFb150:2MK21S_gu1de_6_0e_3H_80zecvralsdf 19re21:11S00WJ8a_3Cp1k4ml10z60H/ies.em 77FiFb150:6MK21S_gu1de_6_0e_3H_80zlcnetrs
The locations of the dictionary files with the JAR file are specified using the Sphinx-4 resource mechanism. In short, this mechanism looks for all JAR files for specified path to the resource. The general syntax is:
rsuc:{oainihAieoheieeore eore/lctonteJRflftedsrdrsuc}
Take
the
'dictionaryPath'
property,
for
example.
The
location
in
the
JAR
file
The rest of the configuration file, which includes the front end configuration and the configuration of the monitors, are the same as in the H l o o l demo. Therefore, please refer to those sections for explanations. This concludes the walk-through of the simple H l o G a elWrd elNrm example.
Configuration Management
In ConfigurationManagement the configuration is described by an XML file which is interpreted when the application initializes. ConfigurationManagement is described in detail here Sphinx-4 Configuration Management [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/doc-files/ConfigurationManagement.html]. ConfigurationManagement offers the advantage of keeping the configuration and the code separate. With this choice one can alter the configuration without touching the application code. Here is an example of the front end configuration described by XML.
<****************************> !**************************** <hiefotnofgrto> !Telvrnedcniuain <****************************> !**************************** <opnnae"prnEdye"d.m.pixfotn.rnEd> cmoetnm=eFotn"tp=eucushn.rnedFotn" <rpryitnm=ppln" poetlsae"ieie> <tmadoieaaore<ie> ie>uiFlDtSuc/tm <tmdtBokr<ie> ie>aalce/tm <tmsecCasfe/tm ie>pehlsiir<ie> <tmsecMre/tm ie>pehakr<ie> <tmnnpehaaitr<ie> ie>oSecDtFle/tm <tmpemhszr<ie> ie>repaie/tm <tmwnoe/tm ie>idwr<ie> <tmft<ie> ie>f/tm
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
10/15
4/1/12
<tmmlitrak<ie> ie>eFleBn/tm <tmdt<ie> ie>c/tm <tmlvCN<ie> ie>ieM/tm <tmfauexrcin<ie> ie>etrEtato/tm <poetls> /rpryit <cmoet /opnn> <****************************> !**************************** <hrnedpplns> !Tefotnieie <****************************> !**************************** <opnnae"uiFlDtSuc"tp=eucushn.rnedui.uiFlDtSuc"> cmoetnm=adoieaaoreye"d.m.pixfotn.tlAdoieaaore/ <opnnae"aalce"tp=eucushn.rnedDtBokr/ cmoetnm=dtBokrye"d.m.pixfotn.aalce"> <opnnae"pehlsiirye"d.m.pixfotn.npitSecCasfe"> cmoetnm=secCasfe"tp=eucushn.rnededon.pehlsiir/ <opnnae"oSecDtFle" cmoetnm=nnpehaaitr ye"d.m.pixfotn.npitNnpehaaitr/ tp=eucushn.rnededon.oSecDtFle"> <opnnae"pehakrye"d.m.pixfotn.npitSecMre"/ cmoetnm=secMre"tp=eucushn.rnededon.pehakr> <opnnae"repaie" cmoetnm=pemhszr ye"d.m.pixfotn.itrPemhszr/ tp=eucushn.rnedfle.repaie"> <opnnae"idwr cmoetnm=wnoe" ye"d.m.pixfotn.idwRieCsnWnoe" tp=eucushn.rnedwno.asdoieidwr> <cmoet /opnn> <opnnae"f" cmoetnm=ft tp=eucushn.rnedtasomDsrtFuirrnfr" ye"d.m.pixfotn.rnfr.iceeoreTasom> <cmoet /opnn> <opnnae"eFleBn" cmoetnm=mlitrak tp=eucushn.rnedfeunyapMlrqecFleBn" ye"d.m.pixfotn.rqecwr.eFeunyitrak> <cmoet /opnn> <opnnae"c" cmoetnm=dt tp=eucushn.rnedtasomDsrtCsnTasom/ ye"d.m.pixfotn.rnfr.iceeoiernfr"> <opnnae"ieM" cmoetnm=lvCN ye"d.m.pixfotn.etr.ieM"> tp=eucushn.rnedfaueLvCN/ <opnnae"etrEtato" cmoetnm=fauexrcin ye"d.m.pixfotn.etr.etsetrEtatr/ tp=eucushn.rnedfaueDlaFauexrco">
Raw Configuration
The other configuration option is to call the constructors directly. This is referred to as raw configuration. Raw configuration is useful when the configuration is not easily described by a static XML structure. This occurs in applications that require extremely complex, or dynamic configuration Raw configuration is also preferred when writing scripts. In this case it is not desirable to separate the configuration and the code. Here is an example of raw configuration:
poetdvintrnEd){ rtceodiiFotn( hsdtBokr=nwDtBokr ti.aalceeaalce( 0/lcSzM 1/bokies ) hssecCasfeepehlsiir ti.pehlsiir=nwSecCasfe( 0/faeeghs 1,/rmLntM, .0,/dutet 003/ajsmn, 0/trsod 1,/hehl, /mninl 0/iSga ) hssecMreepehakr ti.pehakr=nwSecMre( 0,/trSecTm, 20/satpehie 0,/nSlneie 50/ediecTm, 0,/pehedr 10/secLae, 0/pehedrrms 5,/secLaeFae 0/pehrie 10/secTalr ) hsnnpehaaitr=nwNnpehaaitr) ti.oSecDtFleeoSecDtFle( hspepaieerepaie( ti.rmhszr=nwPemhszr .7/repaiFco 09/pemhssatr ) hswnoeeasdoieidwr ti.idwr=nwRieCsnWnoe( .6/dullh 04,/obeapa 565,/idwiens 2.2f/wnoSzIM 00/wnoSitns 1.f/idwhfIM ) hsft=nwDsrtFuirrnfr( ti.feiceeoreTasom 1/nmeFtons ,/ubrfPit as/ivr fle/net ) hsmlitrak=nwMlrqecFleBn( ti.eFleBneeFeunyitrak 3.,/iFe, 100/mnrq 800/mxrq 60.,/aFe, 0/ubritr 4/nmeFles )
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
11/15
4/1/12
hsdt=nwDsrtCsnTasom ti.ceiceeoiernfr( 0/nmeMlitr, 4,/ubreFles 3/cptuSz 1/esrmie ) hscn=nwLvCN ti.meieM( 20/iiilen 1.,/ntaMa, 0,/cnidw 10/mWno, 6/cnhfWno 10/mSitidw ) hsfauexrcin=nwDlaFauexrco( ti.etrEtatoeetsetrEtatr /wno 3/idw ) raLsieie=nwAryit) AryitpplneraLs( ieieadadoaaore ppln.d(uiDtSuc) ieieaddtBokr ppln.d(aalce) ieieadsecCasfe) ppln.d(pehlsiir ieieadsecMre) ppln.d(pehakr ieieadnnpehaaitr ppln.d(oSecDtFle) ieieadpepaie) ppln.d(rmhszr ieieadwnoe) ppln.d(idwr ieieadft ppln.d(f) ieieadmlitrak ppln.d(eFleBn) ieieaddt ppln.d(c) ieieadcn ppln.d(m) ieieadfauexrcin ppln.d(etrEtato) hsfotnernEdppln) ti.rned=nwFotn(ieie }
This example was taken from the RawTranscriber demo. RawTranscriber.java TranscriberConfiguration.java CommonConfiguration.java
Rsl.eBsRslNFle eutgteteutoilr
[tp/cupixsucfrentshn4jvdceucushn/eutRsl.tlgteteutoilr) ht:/mshn.oreog.e/pix/aao/d/m/pixrsl/euthm#eBsRslNFle(]
method to obtain a string of the best result that has no filler words like ++SMACK++. This method first attempts to return the best path that has reached the final state. If no paths have reached the final state, it returns the best path out of the paths that have not reached the final state. If you only want to return those paths that have reached the final state, you should call the method
Rsl.eBsFnleutoilr eutgtetiaRslNFle
For example, the HelloWorld demo uses this method to avoid treating any partial sentence in the grammar as the result. There are other methods in the Result object that can give you more information, e.g., the N-best results. You will also notice that there are a number of methods that return Tokens. Tokens are objects along a search path that record where we are at the search, and the various scores at that particular location. For example, the Token object has a g t o dmethod that tells you which word the search is in. For details about the Token object please eWr refer to the javadoc for Token [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/decoder/search/Token.html]. For details about the Result object, please refer to the javadoc for Result [http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/result/Result.html].
[tp/cupixsucfrentshn4jvdceucushn/eutRsl.tlgtetiaRslNFle(] ht:/mshn.oreog.e/pix/aao/d/m/pixrsl/euthm#eBsFnleutoilr)
Writing Scripts
One of the huge advantages of working in Java is the wealth of scripting options. These options include Groovy, Ruby, Python and Clojure and many other choices to suit every programming taste and philosophy. All these languages compile to the JVM, and are are trivially able to call Java code. Hence Sphinx4 can be scripted in any of these popular languages. While the XML configuration files can be used with scripting languages, it is generally more elegant and readable to call Java constructors directly. Compare the following Front End set up from the Groovy, Python and Clojure examples to the XML and Raw configurations described above.
Groovy Example
GroovyTranscriber.groovy
/ntadodt /iiuiaa dfadoore=nwAdoieaaore30,nl) euiSuceuiFlDtSuc(20ul dfadoRag.egh>1 euiUL=(rslnt)? nw502dc%Fp%F>ieag[].oR(.oR( e.%Fos2ai2"Fl(rs0)tUI)tUL): nw502dc%Fp%F>R(fl:ot+"scap/d/m/pixdm/rncie/000110.a" e.%Fos2ai2"UL"ie"+ro/r/pseucushn/eotasrbr101920083wv) adoorestuiFl(uiULul uiSuc.eAdoieadoR,nl)
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
12/15
4/1/12
/ntfoted /iirnn dfdtBokr=nwDtBokr eaalceeaalce( 1/bokies 0/lcSzM ) dfsecCasfeepehlsiir epehlsiir=nwSecCasfe( 1,/rmLntM, 0/faeeghs 003/ajsmn, .0,/dutet 1,/hehl, 0/trsod 0/iSga /mninl ) dfsecMreepehakr epehakr=nwSecMre( 20/satpehie 0,/trSecTm, 50/ediecTm, 0,/nSlneie 10/secLae, 0,/pehedr 5,/secLaeFae 0/pehedrrms 10/secTalr 0/pehrie ) dfnnpehaaitr=nwNnpehaaitr) eoSecDtFleeoSecDtFle( dfpepaieerepaie( ermhszr=nwPemhszr 09/pemhssatr .7/repaiFco ) dfwnoeeasdoieidwr eidwr=nwRieCsnWnoe( 04,/obeapa .6/dullh 2.2f/wnoSzIM 565,/idwiens 1.f/idwhfIM 00/wnoSitns ) dfft=nwDsrtFuirrnfr( efeiceeoreTasom ,/ubrfPit 1/nmeFtons fle/net as/ivr ) dfmlitrak=nwMlrqecFleBn( eeFleBneeFeunyitrak 100/mnrq 3.,/iFe, 60.,/aFe, 800/mxrq 4/nmeFles 0/ubritr ) dfdt=nwDsrtCsnTasom eceiceeoiernfr( 4,/ubreFles 0/nmeMlitr, 1/esrmie 3/cptuSz ) dfcn=nwLvCN emeieM( 1.,/ntaMa, 20/iiilen 10/mWno, 0,/cnidw 10/mSitidw 6/cnhfWno ) dffauexrcin=nwDlaFauexrco( eetrEtatoeetsetrEtatr 3/idw /wno ) dfppln eieie=[ adoore uiSuc, dtBokr aalce, secCasfe, pehlsiir secMre, pehakr nnpehaaitr oSecDtFle, pepaie, rmhszr wnoe, idwr ft f, mlitrak eFleBn, dt c, cn m, fauexrcin etrEtato ] dffotnernEdppln) erned=nwFotn(ieie
Python Example
PythonTranscriber.py
#iiuiaa ntadodt adoore=Adoieaaore30,Nn) uiSucuiFlDtSuc(20oe adoRUL"ie"+ro/r/pseucushn/eotasrbr101920083wv) uiUL=R(fl:ot+"scap/d/m/pixdm/rncie/000110.a" adoorestuiFl(uiULoe uiSuc.eAdoieadoR,Nn) #iirnn ntfoted dtBokr=DtBokr aalceaalce( 1lcSzM 0#bokies ) secCasfepehlsiir pehlsiir=SecCasfe( 1,#faeeghs 0rmLntM, 003dutet .0,#ajsmn, 1,#trsod 0hehl, 0#mninl iSga ) secMrepehakr pehakr=SecMre( 20trSecTm, 0,#satpehie 50nSlneie 0,#ediecTm, 10pehedr 0,#secLae, 5,pehedrrms 0#secLaeFae 10pehrie 0#secTalr
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
13/15
4/1/12
) nnpehaaitr=Nnpehaaitr) oSecDtFleoSecDtFle( pepaierepaie( rmhszr=Pemhszr 09repaiFco .7#pemhssatr ) wnoeasdoieidwr idwr=RieCsnWnoe( 04,#dullh .6obeapa 2.2,#wnoSzIM 565idwiens 1.idwhfIM 00#wnoSitns ) ft=DsrtFuirrnfr( ficeeoreTasom ,#nmeFtons 1ubrfPit fle#ivr asnet ) mlitrak=MlrqecFleBn( eFleBneFeunyitrak 100iFe, 3.,#mnrq 60.,#mxrq 800aFe, 4ubritr 0#nmeFles ) dt=DsrtCsnTasom ciceeoiernfr( 4,#nmeMlitr, 0ubreFles 1#cptuSz 3esrmie ) cn=LvCN mieM( 1.,#iiilen 20ntaMa, 10#cnidw 0,mWno, 10#cnhfWno 6mSitidw ) fauexrcin=DlaFauexrco( etrEtatoetsetrEtatr 3#wno idw ) ppln ieie=[ adoore uiSuc, dtBokr aalce, secCasfe, pehlsiir secMre, pehakr nnpehaaitr oSecDtFle, pepaie, rmhszr wnoe, idwr ft f, mlitrak eFleBn, dt c, cn m, fauexrcin etrEtato ] fotnrnEdppln) rned=Fotn(ieie
Clojure Example
ClojureTranscriber.clj
ntadodt iiuiaa (euiSucnwAdoieaaore30i) dfadoore(euiFlDtSuc20nl) (euiUL(eRsr"ie"ro/r/pseucushn/eotasrbr101920083wv)) dfadoRnwUL(tfl:ot"scap/d/m/pixdm/rncie/000110.a") (stuiFluiSucuiULnl .eAdoieadooreadoRi) iirnn ntfoted (eaalcenwDtBokr dfdtBokr(eaalce 1)bokies 0)lcSzM (epehlsiir(epehlsiir dfsecCasfenwSecCasfe 1rmLntM 0faeeghs 003dutet .0ajsmn 1hehl 0trsod 0)iSga )mninl (epehakr(epehakr dfsecMrenwSecMre 20trSecTm 0satpehie 50nSlneie 0ediecTm 10pehedr 0secLae 5pehedrrms 0secLaeFae 10)pehrie 0)secTalr (eoSecDtFlenwNnpehaaitr) dfnnpehaaitr(eoSecDtFle) (ermhszr(erepaie dfpepaienwPemhszr 09)pemhssatr .7)repaiFco (eidwr(easdoieidwr dfwnoenwRieCsnWnoe 04dullh .6obeapa 2.2wnoSzIM 565idwiens 1.)wnoSitns 00)idwhfIM (efnwDsrtFuirrnfr dfft(eiceeoreTasom nmeFtons 1ubrfPit fle)net as)ivr (eeFleBnnwMlrqecFleBn dfmlitrak(eeFeunyitrak 100iFe 3.mnrq 60.mxrq 800aFe 4)nmeFles 0)ubritr (ecnwDsrtCsnTasom dfdt(eiceeoiernfr
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
14/15
4/1/12
4nmeMlitr 0ubreFles 1)cptuSz 3)esrmie (emnwLvCN dfcn(eieM 1.iiilen 20ntaMa 10mWno 0cnidw 10)mSitidw 6)cnhfWno (eetrEtatonwDlaFauexrco dffauexrcin(eetsetrEtatr 3)idw )wno (eieie[ dfppln adoore uiSuc dtBokr aalce secCasfe pehlsiir secMre pehakr nnpehaaitr oSecDtFle pepaie rmhszr wnoe idwr ft f mlitrak eFleBn dt c cn m fauexrcin) etrEtato] (erned(ernEdppln) dffotnnwFotnieie)
Additional Information
Non-wiki Sphinx-4 Documentation home page [http://cmusphinx.sourceforge.net/sphinx4/index.html] Sphinx4 Configuration management
[http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/docfiles/ConfigurationManagement.html]
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported [http://creativecommons.org/licenses/by-nc-sa/3.0/]
cmusphinx.sourceforge.net/wiki/tutorialsphinx4
15/15