Return to search

Přístupy k shlukování funkčních dat / Approaches to Functional Data Clustering

Classification is a very common task in information processing and important problem in many sectors of science and industry. In the case of data measured as a function of a dependent variable such as time, the most used algorithms may not pattern each of the individual shapes properly, because they are interested only in the choiced measurements. For the reason, the presented paper focuses on the specific techniques that directly address the curve clustering problem and classifying new individuals. The main goal of this work is to develop alternative methodologies through the extension to various statistical approaches, consolidate already established algorithms, expose their modified forms fitted to demands of clustering issue and compare some efficient curve clustering methods thanks to reported extensive simulated data experiments. Last but not least is made, for the sake of executed experiments, comprehensive confrontation of effectual utility. Proposed clustering algorithms are based on two principles. Firstly, it is presumed that the set of trajectories may be probabilistic modelled as sequences of points generated from a finite mixture model consisting of regression components and hence the density-based clustering methods using the Maximum Likehood Estimation are investigated to recognize the most homogenous partitioning. Attention is paid to both the Maximum Likehood Approach, which assumes the cluster memberships to be some of the model parameters, and the probabilistic model with the iterative Expectation-Maximization algorithm, that assumes them to be random variables. To deal with the hidden data problem both Gaussian and less conventional gamma mixtures are comprehended with arranging for use in two dimensions. To cope with data with high variability within each subpopulation it is introduced two-level random effects regression mixture with the ability to let an individual vary from the template for its group. Secondly, it is taken advantage of well known K-Means algorithm applied to the estimated regression coefficients, though. The task of the optimal data fitting is devoted, because K-Means is not invariant to linear transformations. In order to overcome this problem it is suggested integrating clustering issue with the Markov Chain Monte Carlo approaches. What is more, this paper is concerned in functional discriminant analysis including linear and quadratic scores and their modified probabilistic forms by using random mixtures. Alike in K-Means it is shown how to apply Fisher's method of canonical scores to the regression coefficients. Experiments of simulated datasets are made that demonstrate the performance of all mentioned methods and enable to choose those with the most result and time efficiency. Considerable boon is the facture of new advisable application advances. Implementation is processed in Mathematica 4.0. Finally, the possibilities offered by the development of curve clustering algorithms in vast research areas of modern science are examined, like neurology, genome studies, speech and image recognition systems, and future investigation with incorporation with ubiquitous computing is not forbidden. Utility in economy is illustrated with executed application in claims analysis of some life insurance products. The goals of the thesis have been achieved.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:77066
Date January 2007
CreatorsPešout, Pavel
ContributorsMarek, Luboš, Trešl, Jiří, Palát, Milan
PublisherVysoká škola ekonomická v Praze
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0021 seconds