Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings

Švec, Jan; Šmídl, Luboš; Psutka, Josef; Pražák, Aleš

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Švec, Jan
dc.contributor.author	Šmídl, Luboš
dc.contributor.author	Psutka, Josef
dc.contributor.author	Pražák, Aleš
dc.date.accessioned	2022-03-28T10:00:28Z	-
dc.date.available	2022-03-28T10:00:28Z	-
dc.date.issued	2021
dc.identifier.citation	ŠVEC, J. ŠMÍDL, L. PSUTKA, J. PRAŽÁK, A. Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Red Hook, NY: International Speech Communication Association, 2021. s. 851-855. ISBN: 978-1-71383-690-2 , ISSN: 2308-457X	cs
dc.identifier.isbn	978-1-71383-690-2
dc.identifier.issn	2308-457X
dc.identifier.uri	2-s2.0-85119207187
dc.identifier.uri	http://hdl.handle.net/11025/47251
dc.format	5 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	International Speech Communication Association	en
dc.relation.ispartofseries	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	en
dc.rights	Plný text není přístupný.	cs
dc.rights	© ISCA	en
dc.title	Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings	en
dc.type	konferenční příspěvek	cs
dc.type	ConferenceObject	en
dc.rights.access	closedAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a spoken term and estimate its relevance score. The phoneme confusion network generated by a phoneme recognizer is processed by the deep LSTM network which projects each segment of the confusion network into an embedding space. The searched term is projected into the same embedding space using another deep LSTM network. The relevance score is then computed using a simple dot-product in the embedding space and calibrated using a sigmoid function to predict the probability of occurrence. The location of the searched term is then estimated from the sequence of output probabilities. The deep LSTM networks are trained in a self-supervised manner from paired recognition hypotheses on word and phoneme levels. The method is experimentally evaluated on MALACH data in English and Czech languages.	en
dc.subject.translated	spoken term detection	en
dc.subject.translated	relevance-score estimation	en
dc.subject.translated	speech embeddings	en
dc.identifier.doi	10.21437/Interspeech.2021-1704
dc.type.status	Peer-reviewed	en
dc.identifier.obd	43933416
dc.project.ID	VJ01010108/Robustní zpracování nahrávek pro operativu a bezpečnost	cs
dc.project.ID	90140/Velká výzkumná infrastruktura_(J) - e-INFRA CZ	cs
Vyskytuje se v kolekcích:	Konferenční příspěvky / Conference Papers (KKY) OBD

Soubory připojené k záznamu:

Soubor	Velikost	Formát
svec21_interspeech.pdf	307,53 kB	Adobe PDF	Zobrazit/otevřít Vyžádat kopii

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/47251

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace