乘積型劃分模型最近被擴展為容許個體有協變量的隨機聚類模型,然而協變量受限與對個體性質的描述。隨著科技發展,於越來越多生物醫學或社會研究的聚類問題中,我們需要考慮聚類對象間兩兩關連的額外資料,如基因間的調節關係或人際關係中的社交網絡。為此我們提出一個基於模型的方法,綜合乘積型劃分模型的一種改型與隱馬可夫隨機場對有網絡和協變量信息的對象做聚類。統計推論以貝葉斯方法進行。模型計算以馬可夫鏈蒙地卡羅運算法則進行。為了使馬可夫鏈能更好地混和,使用循序分配合併分裂取樣器進行群體移動以減少困於區域性頂點的機會。 / 為了測試本文提出的新方法的聚類性能,我們在兩個合成數據集上進行了模擬實驗。該實驗涵括多種類型的應變量,協變量網絡結構。結果顯示該方法在大部分實驗條件下都具有高正確聚類率。我們還將此返法應用於兩個真實數據集。第一個真實數據集利用學術期刊間相互引用的信息幫助對學術期刊的分門別類。第二個真實數據集合併酵母中基因的表達、轉錄因子結合位點和基因間的調控網絡信息,已對基因做詳細的功能分類。這兩個基於真實數據的實驗都給出諸多有意義的結果。 / The product partition model was recently extended for the covariate-dependent random partition of subjects, where the covariates are limited to properties of individual subjects. For many clustering problems in biomedical or social studies, we often have extra clustering information from the pairwise association among subjects, such as the regulatory relationship between genes or the social network among people. Here we propose a model-based method for clustering with network information by combining a modified product partition model with hidden Markov random field. The Bayesian approach is used for statistical inference. Markov Chain Monte Carlo algorithms are used to compute the model. In order to improve the mixing of the chain, the Sequentially-Allocated Merge-Split Sampler is adapted to perform group moves as an eort to lower the chance of trapping in local modes. / The new method is tested on two synthesized data sets to evaluate its performance on different types of response variables, covariates and networks. The correct clustering rate is satisfactory under a wide range of conditions. We also applied this new method on two real data sets. The first real data set is the journal data, where the cross citation information among journals is used to groups journals to different categories. The second real data set involves the gene expression, motif binding and gene network of yeast, where the goal is to find detail gene functional groups. Both experiments yielded interesting results. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Fung, Ling Hiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Technical Background --- p.7 / Chapter 2.1 --- Variable notation --- p.8 / Chapter 2.2 --- Two exemplary models for the response variable --- p.10 / Chapter 2.3 --- PPMx --- p.12 / Chapter 2.3.1 --- PPM - definition and its equivalence to DPM --- p.12 / Chapter 2.3.2 --- PPMx - extension with covariates --- p.15 / Chapter 2.3.3 --- Posterior inference --- p.18 / Chapter 2.4 --- HMRF --- p.19 / Chapter 2.4.1 --- Definition --- p.19 / Chapter 2.4.2 --- Constrained Dirichlet Process Mixture --- p.21 / Chapter 3 --- Model-based Clustering with Network Covariates --- p.27 / Chapter 3.1 --- Design of the model --- p.27 / Chapter 3.2 --- The Bayesian MCNC model --- p.30 / Chapter 3.3 --- MCMC computing --- p.31 / Chapter 3.4 --- Performance evaluation criteria --- p.37 / Chapter 4 --- Simulation study --- p.39 / Chapter 4.1 --- Network --- p.39 / Chapter 4.2 --- Covariates --- p.41 / Chapter 4.3 --- The Phase model (M1) --- p.42 / Chapter 4.4 --- The Normal model (M2) --- p.52 / Chapter 4.5 --- Comparing correct clustering percentage and correct co-occurrence percentage --- p.62 / Chapter 5 --- Real data --- p.68 / Chapter 5.1 --- Journal cross-citation data --- p.68 / Chapter 5.2 --- Gene Network of yeast data --- p.76 / Chapter 6 --- Conclusions --- p.89 / Chapter A --- p.91 / Chapter A.1 --- Covariates --- p.91 / Chapter A.1.1 --- Continuous covariates --- p.91 / Chapter A.1.2 --- Categorical covariates --- p.94 / Chapter A.1.3 --- Count covariates --- p.96 / Chapter A.2 --- Phase model --- p.98 / Chapter A.2.1 --- Prior specification --- p.99 / Chapter A.2.2 --- Data generation --- p.99 / Chapter A.2.3 --- Posterior estimation --- p.100 / Chapter A.3 --- Normal model --- p.111 / Chapter A.3.1 --- Prior specification --- p.111 / Chapter A.3.2 --- Data generation --- p.112 / Chapter A.3.3 --- Posterior estimation --- p.112 / Chapter A.4 --- Journal dataset --- p.115
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328725 |
Date | January 2012 |
Contributors | Fung, Ling Hiu., Chinese University of Hong Kong Graduate School. Division of Statistics. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, theses |
Format | electronic resource, electronic resource, remote, 1 online resource (xi, 118 leaves) : ill. (some col.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.008 seconds