Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel

Psutka, Josef; Vaněk, Jan; Pražák, Aleš

Title:	Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel
Other Titles:	Různé architektury DNN-HMM používané v akustickém modelování s jedním mluvčím a jedním kanálem
Authors:	Psutka, Josef Vaněk, Jan Pražák, Aleš
Citation:	PSUTKA, J. VANĚK, J. PRAŽÁK, A. Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel. In Statistical Language and Speech Processing, 9th International Conference, SLSP 2021, Cardiff, UK, November 23–25, 2021, Proceedings. Cham: Springer, 2021. s. 85-96. ISBN: 978-3-030-89578-5 , ISSN: 0302-9743
Issue Date:	2021
Publisher:	Springer
Document type:	konferenční příspěvek ConferenceObject
URI:	2-s2.0-85118136456 http://hdl.handle.net/11025/47266
ISBN:	978-3-030-89578-5
ISSN:	0302-9743
Keywords:	rozpoznávání řeči;akustické modelování;topologie HMM;Lattice-free MMI;single-speaker
Keywords in different language:	speech recognition;acoustic modeling;HMM topology;Lattice-free MMI;single-speaker
Abstract:	V tomto článku diskutujeme některé zajímavé rysy trénování speciálního akustického modelu pouze pro jednoho řečníka s konstantním akustickým pozadím (akustický kanál). V současné době metoda LF-MMI dosahuje nejlepších výsledků v mnoha úlohách rozpoznávání řeči. Typický tréninkový postup LF-MMI používá speciální 1stavovou topologii HMM, která má různé soubory pdf na přechodech self-loop a dopředných přechodech. Rádi bychom probrali nahrazení tohoto typického LF-MMI HMM různými typy topologií HMM (1-, 2- a 3-stavové HMM topologie, které mají výstupy spojené se stavy). Dále probereme výhody použití modelování kontextu bifonu oproti použití kontextu trifonu nebo ještě jednoduššího bezkontextového monofonu. Řešíme také vliv množství trénovacích dat a kontextu DNN na WER, a to vše s ohledem na speciální akustický model s jedním mluvčím a téměř konstantním akustickým kanálem.
Abstract in different language:	In this paper, we discuss some interesting features of training a special acoustic model for only one speaker with a constant acoustic background (acoustic channel). Currently, the LF-MMI method achieves the best results in many speech recognition tasks. A typical LF-MMI training procedure uses a special 1-state HMM topology that has different pdfs at the self-loop and forward transitions. We would like to discuss the replacement of this typical LF-MMI HMM by different types of HMM topologies (1-, 2- and 3-state HMM topologies that have outputs associated with states). Next, we discuss the advantages of using biphone context modeling over using the triphone context or even simpler context-free monophone. We also address the effect of the amount of training data and the context of DNN on WER, and all this with regard to a special acoustic model with one speaker and an almost constant acoustic channel.
Rights:	Plný text není přístupný. © Springer
Appears in Collections:	Konferenční příspěvky / Conference Papers (KKY) OBD

Files in This Item:

File	Size	Format
Psutka2021_Chapter_VariousDNN-HMMArchitecturesUse.pdf	390,6 kB	Adobe PDF	View/Open Request a copy

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/47266

search

navigation