Title: Shlukovací metody v data miningu
Other Titles: Data mining with clustering
Authors: Klímek, Petr
Citation: E+M. Ekonomie a Management = Economics and Management. 2008, č. 2, s. 120-126.
Issue Date: 2008
Publisher: Technická univerzita v Liberci
Document type: článek
URI: http://www.ekonomie-management.cz/download/1331826675_2e7a/11_klimek.pdf
ISSN: 1212-3609 (Print)
2336-5604 (Online)
Keywords: data mining;clustering;metoda nejbližšího souseda;dendrogram
Keywords in different language: data mining;clustering;nearest neighbour method;dendrogram
Abstract in different language: Data mining is a new discipline lying at the interface of statistics, database technology, pattern recognition, machine learning, and other areas. It is concerned with the secondary analysis of lar- ge databases in order to find previously unsuspected relationships which are of interest or value to the database owners. There are two keys to success in data mining. First is coming up with a precise formulation of the problem you are trying to solve. A focused statement usually results in the best payoff. The second key is using the right data. After choosing from the data available to you, or perhaps buying external data, you may need to transform and combine it in significant ways. New problems arise, partly as a consequence of the sheer size of the data sets involved, and partly because of issues of pattern matching. H owever, since statistics provides the intellectual glue underlying the effort, it is important for statisticians to become involved. There are very real opportunities for statisticians to make significant contributions. The main definition of data mining and the special data mining tasks are mentioned in the first part of this paper. The data mining problem was also discussed in previous issues of E+M. One method (clustering) was chosen to be a subject of this article. One of the opportunities to gain knowledge from data is a use of clustering analysis. Clustering analysis belongs to unsupervised methods of data mining. We put here a focus on this method. Some basic principles are described in the second part of this paper. This method is examined on two examples from the marketing field. In the first example is used software Statgraphics 5.0Plus (www.statgraphics.com) to solve clustering problem (nearest neighbour algorithm and Eucleidi- an distance), and in the second example is used Statistica 6.0Cz software (from Statoft, Inc., www.statsoft.com or www.statsoft.cz). But the building models is only one step in knowledge discovery. It is vital to properly collect and prepare the data, and to check the models against the real world. The „best“ model is often found after building models of several different types, or by trying different technologies or algorithms.
Rights: © Technická univerzita v Liberci
CC BY-NC 4.0
Appears in Collections:Číslo 2 (2008)
Číslo 2 (2008)

Files in This Item:
File Description SizeFormat 
11_klimek.pdfPlný text124,46 kBAdobe PDFView/Open

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/17234

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.