Title: Dynamic threshold selection method for multi-label newspaper topic identification
Authors: Skorkovská, Lucie
Citation: SKORKOVSKÁ, Lucie. Dynamic threshold selection method for multi-label newspaper topic identification. In: International conference on image analysis and recognition. Berlin: Springer, 2013, p. 209-216. (Lecture notes in computer science; 8082). ISBN 978-3-642-40584-6.
Issue Date: 2013
Publisher: Springer
Document type: článek
URI: http://www.kky.zcu.cz/cs/publications/SkorkovskaL_2013_DynamicThreshold
ISBN: 978-3-642-40584-6
Keywords: identifikace tématu;multi-label klasifikace textu;jazykové modelování;naivní bayesovská klasifikace
Keywords in different language: topic identification;multi-label text classification;language modelling;naive bayes classification
Abstract in different language: Nowadays, the multi-label classification is increasingly required in modern categorization systems. It is especially essential in the task of newspaper article topics identification. This paper presents a method based on general topic model normalisation for finding a threshold defining the boundary between the "correct" and the "incorrect" topics of a newspaper article. The proposed method is used to improve the topic identification algorithm which is a part of a complex system for acquisition and storing large volumes of text data. The topic identification module uses the Naive Bayes classifier for the multiclass and multi-label classification problem and assigns to each article the topics from a defined quite extensive topic hierarchy - it contains about 450 topics and topic categories. The results of the experiments with the improved topic identification algorithm are presented in this paper.
Rights: © Lucie Skorkovská
Appears in Collections:Články / Articles (NTIS)

Files in This Item:
File Description SizeFormat 
SkorkovskaL_2013_DynamicThreshold.pdfPlný text170,16 kBAdobe PDFView/Open

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/16982

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.