Full metadata record
DC FieldValueLanguage
dc.contributor.authorKolář, Jáchym
dc.contributor.authorŠvec, Jan
dc.contributor.authorPsutka, Josef
dc.date.accessioned2016-01-06T09:07:40Z
dc.date.available2016-01-06T09:07:40Z
dc.date.issued2004
dc.identifier.citationKOLÁŘ, Jáchym; ŠVEC, Jan; PSUTKA, Josef. Automatic punctuation annotation in czech broadcast news speech. In: SPECOM 2004 Proceedings. St. Petersburg: Institute for Informatics and Automation of RAS (SPIIRAS), 2004, p. 319-325. ISBN 5-7452-0110-X.en
dc.identifier.isbn5-7452-0110-X
dc.identifier.urihttp://www.kky.zcu.cz/cs/publications/KolarJ_2004_Automaticpunctuation
dc.identifier.urihttp://hdl.handle.net/11025/17116
dc.description.abstractTento článek se zabývá našimi počátečními experimenty s automatickou anotací interpunkce v mluvené češtině. Použili jsme 2 statistické modely - prozodický a jazykový. Byly otestovány 2 implementace prozodického modelu - CART a MLP. Pro jazykové modelováni byl použit N-gramový model se skrytými událostmi. Kombinovaný model dosáhl na referenčních přepisech přesnosti 95.2% a F-measure 78.2%.cs
dc.format7 s.
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherSPIIRASen
dc.rights© Jáchym Kolář - Jan Švec - Josef Psutkacs
dc.subjectautomatická interpunkcecs
dc.subjectprozodiecs
dc.subjecthranice větcs
dc.subjectrozhlasové zprávycs
dc.subjectmorfologické značkovánícs
dc.titleAutomatic punctuation annotation in czech broadcast news speechen
dc.title.alternativeAutomatická anotace interpunkce v řečových nahrávkách českých zprávcs
dc.typečlánekcs
dc.typearticleen
dc.rights.accessopenAccessen
dc.type.versionpublishedVersionen
dc.description.abstract-translatedThis paper reports our initial experiments with automatic punctuation annotation from speech. We have focused on Czech broadcast news speech. We employed two statistical models - prosodic model and language model. The prosodic model expresses relationships between prosodic quantities (such as pitch, speaking rate or loudness) and punctuation marks. We tested two implementations of this model -- decision tree and multi-layer perceptron. Hidden-event N-gram models were employed for language modeling. Instead of using an ordinary word-based model, we replaced infrequent word forms by their morphological tags and trained a mixed model. Scores from both models can be combined. The model combining language model with the decision tree yielded superior results. Testing on true words we achieved classification accuracy 95.2% and F-measure 78.2%.en
dc.subject.translatedautomatic punctuationen
dc.subject.translatedprosodyen
dc.subject.translatedsentence boundaryen
dc.subject.translatedbroadcast newsen
dc.subject.translatedtag-based modelsen
dc.type.statusPeer-revieweden
Appears in Collections:Články / Articles (KKY)

Files in This Item:
File Description SizeFormat 
KolarJ_2004_Automaticpunctuation.pdfPlný text94,95 kBAdobe PDFView/Open


Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/17116

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.