The czech broadcast conversation corpus

Kolář, Jáchym; Švec, Jan

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Kolář, Jáchym
dc.contributor.author	Švec, Jan
dc.date.accessioned	2016-01-08T07:03:40Z
dc.date.available	2016-01-08T07:03:40Z
dc.date.issued	2009
dc.identifier.citation	KOLÁŘ, Jáchym; ŠVEC, Jan. The czech broadcast conversation corpus. In: Text, speech and dialogue. Berlin: Springer, 2009, p. 101-108. (Lectures notes in computer science; 5729). ISBN 978-3-642-04207-2.	en
dc.identifier.isbn	978-3-642-04207-2
dc.identifier.uri	http://www.kky.zcu.cz/cs/publications/JachymKolar_2009_TheCzechBroadcast
dc.identifier.uri	http://hdl.handle.net/11025/17175
dc.format	9 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Springer	en
dc.relation.ispartofseries	Lectures notes in computer science; 5729	en
dc.rights	© Jáchym Kolář - Jan Švec	cs
dc.subject	rozhlasové zprávy	cs
dc.subject	rozpoznávání řeči	cs
dc.subject	lingvistická analýza	cs
dc.title	The czech broadcast conversation corpus	en
dc.type	článek	cs
dc.type	article	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	This paper presents the final version of the Czech Broadcast Conversation Corpus that will shortly be released at the Linguistic Data Consortium (LDC). The corpus contains 72 recordings of a radio discussion program, which yields about 33 hours of transcribed conversational speech from 128 speakers. The release does not only include verbatim transcripts and speaker information, but also structural metadata (MDE) annotation that involves labeling of sentence-like unit boundaries, marking of non-content words like filled pauses and discourse markers, and annotation of speech disfluencies. The MDE annotation is based on the LDC's annotation standard for English, with changes applied to accommodate phenomena that are specific for Czech. In addition to its importance to speech recognition, speaker diarization, and structural metadata extraction research, the corpus is also useful for linguistic analysis of conversational Czech.	en
dc.subject.translated	broadcast news	en
dc.subject.translated	speech recognition	en
dc.subject.translated	linguistic analysis	en
dc.type.status	Peer-reviewed	en
Vyskytuje se v kolekcích:	Články / Articles (KKY)

Soubory připojené k záznamu:

Soubor	Popis	Velikost	Formát
JachymKolar_2009_TheCzechBroadcast.pdf	Plný text	179,85 kB	Adobe PDF	Zobrazit/otevřít

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/17175

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace