Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

Lehečka, Jan; Švec, Jan; Pražák, Aleš; Psutka, Josef

Title:	Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
Authors:	Lehečka, Jan Švec, Jan Pražák, Aleš Psutka, Josef
Citation:	LEHEČKA, J. ŠVEC, J. PRAŽÁK, A. PSUTKA, J. Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. New York: Red Hook, 2022. s. 1831-1835. ISBN: neuvedeno , ISSN: 2308-457X
Issue Date:	2022
Publisher:	International Speech Communication Association
Document type:	konferenční příspěvek ConferenceObject
URI:	2-s2.0-85139048808 http://hdl.handle.net/11025/51163
ISBN:	neuvedeno
ISSN:	2308-457X
Keywords in different language:	speech recognition, audio transformers, Wav2Vec
Abstract in different language:	In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. We are presenting a large palette of experiments with various fine-tuning setups evaluated on two public datasets (CommonVoice and VoxPopuli) and one extremely challenging dataset from the MALACH project. Our results show that monolingual Wav2Vec 2.0 models are robust ASR systems, which can take advantage of large labeled and unlabeled datasets and successfully compete with state-of-the-art LVCSR systems. Moreover, Wav2Vec models proved to be good zero-shot learners when no training data are available for the target ASR task.
Rights:	Plný text není přístupný. © 2022 ISCA
Appears in Collections:	Články / Articles (NTIS) Články / Articles (KKY) OBD

Files in This Item:

File	Size	Format
Lehecka_Svec_Prazak_PsutkaJV-Exploring_Capabilties_Interspeech_2022.pdf	197,58 kB	Adobe PDF	View/Open Request a copy

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/51163

search

navigation