Return to search

分類蛋白質質譜資料變數選取的探討 / On Variable Selection of Classifying Proteomic Spectra Data

本研究所利用的資料是來自美國東維吉尼亞醫學院所提供的攝護腺癌蛋白質質譜資料,其資料有原始資料和另一筆經過事前處理過的資料,而本研究是利用事前處理過的資料來作實証分析。由於此種資料通常都是屬於高維度資料,故變數間具有高度相關的現象也很常見,因此從大量的特徵變數中選取到重要的特徵變數來準確的判斷攝護腺的病變程度成為一個非常普遍且重要的課題。那麼本研究的目的是欲探討各(具有懲罰項)迴歸模型對於分類蛋白質質譜資料之變數選取結果,藉由LARS、Stagewise、LASSO、Group LASSO和Elastic Net各(具有懲罰項)迴歸模型將變數選入的先後順序當作其排序所產生的判別結果與利用「統計量排序」(t檢定、ANOVA F檢定以及Kruskal-Wallis檢定)以及SVM「分錯率排序」的判別結果相比較。而分析的結果顯示,Group LASSO對於六種兩兩分類的分錯率,其分錯率趨勢的表現都較其他方法穩定,並不會有大起大落的現象發生,且最小分錯率也幾乎較其他方法理想。此外Group LASSO在四分類的判別結果在與其他方法相較下也顯出此法可得出最低的分錯率,亦表示若須同時判別四種類別時,相較於其他方法之下Group LASSO的判別準確度最優。 / Our research uses the prostate proteomic spectra data which is offered by Eastern Virginia Medical School. The materials have raw data and preprocessed data. Our research uses the preprocessed data to do the analysis of real example. Because this kind of materials usually have high dimension, so it maybe has highly correlation between variables very common, therefore choose from a large number of characteristic variables to accurately determine the pathological change degree of the Prostate is become a very general and important subject. Then the purpose of our research wants to discuss every (penalized) regression model in variable selection results for classifying the proteomic spectra data. With LARS, Stagewise, LASSO, Group LASSO and Elastic Net, each variable is chosen successively by each (penalized) regression model, and it is regarded as each variable’s order then produce discrimination results. After that, we use their results to compare with using statistic order (t-test, ANOVA F-test and Kruskal-Wallis test) and SVM fault rate order. And the result of analyzing reveals Group LASSO to two by two of six kinds of rate by mistake that classify, the mistake rate behavior of trend is more stable than other ways, it doesn’t appear big rise or big fall phenomenon. Furthermore, this way’s mistake rate is almostly more ideal than other ways. Moreover, using Group LASSO to get the discrimination result of four classifications has the lowest mistake rate under comparing with other methods. In other words, when must distinguish four classifications in the same time, Group LASSO’s discrimination accuracy is optimum.

Identiferoai:union.ndltd.org:CHENGCHI/G0098354021
Creators林婷婷
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0022 seconds