Structural metadata annotation of speech corpora: comparing broadcast news and broadcast conversations

Kolář, Jáchym; Švec, Jan

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Kolář, Jáchym
dc.contributor.author	Švec, Jan
dc.date.accessioned	2016-01-06T07:50:06Z
dc.date.available	2016-01-06T07:50:06Z
dc.date.issued	2008
dc.identifier.citation	KOLÁŘ, Jáchym; ŠVEC, Jan. Structural metadata annotation of speech corpora: comparing broadcast news and broadcast conversations. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08): 28-29-30 May 2008. Marrakech: ELRA, 2008, p. [1-6]. ISBN 2-9517408-4-0.	en
dc.identifier.isbn	2-9517408-4-0
dc.identifier.uri	http://www.kky.zcu.cz/cs/publications/KolarJ_2008_StructuralMetadata
dc.identifier.uri	http://hdl.handle.net/11025/17111
dc.format	6 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	ELRA	en
dc.rights	© Jáchym Kolář - Jan Švec	cs
dc.subject	extrakce stukturálních metadat	cs
dc.subject	automatická konverze řeči	cs
dc.subject	řečový korpus	cs
dc.title	Structural metadata annotation of speech corpora: comparing broadcast news and broadcast conversations	en
dc.type	článek	cs
dc.type	article	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to downstream automatic processes. It may be achieved by inserting boundaries of syntactic/ semantic units to the flow of speech, labeling non-content words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper compares two Czech MDE speech corpora – one in the domain of broadcast news and the other in the domain of broadcast conversations. A variety of statistics about fillers, edit disfluencies, and syntactic/semantic units are presented. Among many others, we report the statistics indicating that disfluent portions of speech show differences in the distribution of parts of speech (POS) of their word content in comparison with the overall POS distribution. The two Czech corpora are not only compared with each other, but also with available statistics relating to English MDE corpora of broadcast news and telephone conversations.	en
dc.subject.translated	structural metadata extraction	en
dc.subject.translated	automatic conversion of speech	en
dc.subject.translated	speech corpora	en
dc.type.status	Peer-reviewed	en
Vyskytuje se v kolekcích:	Články / Articles (KKY)

Soubory připojené k záznamu:

Soubor	Popis	Velikost	Formát
KolarJ_2008_StructuralMetadata.pdf	Plný text	80,14 kB	Adobe PDF	Zobrazit/otevřít

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/17111

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace