Spelling suggestions: "subject:"dimensijos pažinimas"" "subject:"dimensijos mažinimo""
1 |
Feature extraction via dependence structure optimization / Požymių išskyrimas optimizuojant priklausomumo struktūrąDaniušis, Povilas 01 October 2012 (has links)
In many important real world applications the initial representation of the data is inconvenient,
or even prohibitive for further analysis. For example, in image analysis, text
analysis and computational genetics high-dimensional, massive, structural, incomplete,
and noisy data sets are common. Therefore, feature extraction, or revelation of informative
features from the raw data is one of fundamental machine learning problems.
Efficient feature extraction helps to understand data and the process that generates it,
reduce costs for future measurements and data analysis. The representation of the structured
data as a compact set of informative numeric features allows applying well studied
machine learning techniques instead of developing new ones..
The dissertation focuses on supervised and semi-supervised feature extraction methods,
which optimize the dependence structure of features. The dependence is measured using
the kernel estimator of Hilbert-Schmidt norm of covariance operator (HSIC measure).
Two dependence structures are investigated: in the first case we seek features which
maximize the dependence on the dependent variable, and in the second one, we additionally
minimize the mutual dependence of features. Linear and kernel formulations of
HBFE and HSCA are provided. Using Laplacian regularization framework we construct
semi-supervised variants of HBFE and HSCA.
Suggested algorithms were investigated experimentally using conventional and multilabel
classification data... [to full text] / Daugelis praktiškai reikšmingu sistemu mokymo uždaviniu reikalauja gebeti panaudoti didelio matavimo, strukturizuotus, netiesinius duomenis. Vaizdu, teksto, socialiniu bei verslo ryšiu analize, ivairus bioinformatikos uždaviniai galetu buti tokiu uždaviniu pavyzdžiais. Todel požymiu išskyrimas dažnai yra pirmasis žingsnis, kuriuo pradedama duomenu analize ir nuo kurio priklauso galutinio rezultato sekme. Šio disertacinio darbo tyrimo objektas yra požymiu išskyrimo algoritmai, besiremiantys priklausomumo savoka. Darbe nagrinejamas priklausomumas, nusakytas kovariacinio operatoriaus Hilberto-Šmidto normos (HSIC mato) branduoliniu ivertiniu. Pasiulyti šiuo ivertiniu besiremiantys HBFE ir HSCA algoritmai leidžia dirbti su bet kokios strukturos duomenimis, bei yra formuluojami tikriniu vektoriu terminais (tai leidžia optimizavimui naudoti standartinius paketus), bei taikytini ne tik prižiurimo, bet ir dalinai prižiurimo mokymo imtims. Pastaruoju atveju HBFE ir HSCA modifikacijos remiasi Laplaso reguliarizacija. Eksperimentais su klasifikavimo bei daugiažymio klasifikavimo duomenimis parodyta, jog pasiulyti algoritmai leidžia pagerinti klasifikavimo efektyvuma lyginant su PCA ar LDA.
|
2 |
Požymių išskyrimas optimizuojant priklausomumo struktūrą / Feature extraction via dependence structure optimizationDaniušis, Povilas 01 October 2012 (has links)
Daugelis praktiškai reikšmingu sistemu mokymo uždaviniu reikalauja gebeti panaudoti didelio matavimo, strukturizuotus, netiesinius duomenis. Vaizdu, teksto, socialiniu bei verslo ryšiu analize, ivairus bioinformatikos uždaviniai galetu buti tokiu uždaviniu pavyzdžiais. Todel požymiu išskyrimas dažnai yra pirmasis žingsnis, kuriuo pradedama duomenu analize ir nuo kurio priklauso galutinio rezultato sekme. Šio disertacinio darbo tyrimo objektas yra požymiu išskyrimo algoritmai, besiremiantys priklausomumo savoka. Darbe nagrinejamas priklausomumas, nusakytas kovariacinio operatoriaus Hilberto-Šmidto normos (HSIC mato) branduoliniu ivertiniu. Pasiulyti šiuo ivertiniu besiremiantys HBFE ir HSCA algoritmai leidžia dirbti su bet kokios strukturos duomenimis, bei yra formuluojami tikriniu vektoriu terminais (tai leidžia optimizavimui naudoti standartinius paketus), bei taikytini ne tik prižiurimo, bet ir dalinai prižiurimo mokymo imtims. Pastaruoju atveju HBFE ir HSCA modifikacijos remiasi Laplaso reguliarizacija. Eksperimentais su klasifikavimo bei daugiažymio klasifikavimo duomenimis parodyta, jog pasiulyti algoritmai leidžia pagerinti klasifikavimo efektyvuma lyginant su PCA ar LDA. / In many important real world applications the initial representation of the data is inconvenient, or even prohibitive for further analysis. For example, in image analysis, text analysis and computational genetics high-dimensional, massive, structural, incomplete, and noisy data sets are common. Therefore, feature extraction, or revelation of informative features from the raw data is one of fundamental machine learning problems. Efficient feature extraction helps to understand data and the process that generates it, reduce costs for future measurements and data analysis. The representation of the structured data as a compact set of informative numeric features allows applying well studied machine learning techniques instead of developing new ones.. The dissertation focuses on supervised and semi-supervised feature extraction methods, which optimize the dependence structure of features. The dependence is measured using the kernel estimator of Hilbert-Schmidt norm of covariance operator (HSIC measure). Two dependence structures are investigated: in the first case we seek features which maximize the dependence on the dependent variable, and in the second one, we additionally minimize the mutual dependence of features. Linear and kernel formulations of HBFE and HSCA are provided. Using Laplacian regularization framework we construct semi-supervised variants of HBFE and HSCA. Suggested algorithms were investigated experimentally using conventional and multilabel classification data... [to full text]
|
3 |
Statistinė medicininių duomenų analizė taikant dimensijos mažinimo metodus / Statistical analysis of medical data using methods of dimension reductionŠlepikaitė, Laura 25 November 2010 (has links)
Darbe yra aprašomi skirtingi dimensijos mažinimo metodai. Pradžioje pristatoma pagrindinių komponenčių analizė ir ypatingųjų reikšmių dekompozicija. Po to plačiau nagrinėjama neneigiamos matricos faktorizacijos tema, aprašomi algoritmai naudojantys atnaujinimo dauginant metodus. Vėliau pasiūlomi algoritmai naujai prie duomenų matricos prijungiamų duomenų transformacijai atlikti. Visi algoritmai realizuoti naudojantis SAS statistiniu paketu. Tyrimui naudoti pacientų, sergančių širdies nepakankamumu, duomenys. Gauti rezultatai parodo, kad dimensijos mažinimas gali būti efektyvus įrankis dirbant su didelio matavimo duomenų rinkiniais. / This work deals with several dimensionality reduction techniques and their implementations in real medical problems. For this reason, firstly, one speaks about classical dimension reduction methods called principal component analysis and singular value decomposition. After these methods are introduced, non – negative matrix factorization (NMF) are presented. Also algorithms for its implementation are introduced. Moreover, two ways for implementation of dimensionality reduction via NMF are presented when applied for feature extraction, followed by pattern recognition. All algorithms were executed using SAS statistical pachage. Patients with heart failure data were used. It was shown that dimensionality reduction could be effective tool for multidimensional data analysis and classification problems.
|
Page generated in 0.076 seconds