1 |
From scenario association to categorical data clustering /Pan, Yuanyi. January 2005 (has links)
Thesis (M.Sc.)--York University, 2005. Graduate Programme in Mathematics and Statistics. / Typescript. Includes bibliographical references (leaves 61-62). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url%5Fver=Z39.88-2004&res%5Fdat=xri:pqdiss &rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR11874
|
2 |
The effects of clustering on the medium and large-scale capacitated location-routing problemBuhrmann, Jacoba Hendrina 26 July 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in fulfilment of the
requirements for the degree of Doctor of Philosophy.
February 23, 2016 / This work investigates the effectiveness of using clustering methods in solving various
capacitated location-routing problems (CLRP) for medium- and large-scale datasets,
with up to 20 000 datapoints. Different clustering methods as well as hybrid clustering
methods are tested and compared.
A new problem called the planar CLRP (plCLRP) is introduced. Based on the results
from the clustering methods, cluster-based approaches are suggested to solve
variants of the CLRP. These include the Hamiltonian p–median problem (HpMP),
the planar CLRP (plCLRP), the concentrator discrete CLRP (cdCLRP) and the
standard discrete CLRP (sdCLRP). A new method called the two-phased proportional
regret ordering based unconstrained to constrained (PROBUC) method is
also proposed to create capacitated clusters.
The focus falls on finding effective non-exponential time algorithms that can be used
to solve large-scale problems with good results. A full set of results for each problem
are presented and comparisons are made with known results from the literature
where possible.
The PLRP (periodic location-routing problem) introduced by Prodhon and Prins
(2008), is also investigated. A change in the current problem formulation, as provided
by Prodhon (2011), is proposed to enforce single-source constraints across
time horizon and limit the maximum number of vehicles.
An approach to solve the PLRP, based on the cluster-based approaches to solve
the discrete CLRPs, is suggested. The results of the cluster-based approach are
compared to best-known solutions for existing PLRP instances given by Prodhon
(2009a). A set of large scale PLRP instances are introduced, based on instances
generated by Harks et al. (2013) for the sdCLRP.
|
3 |
Model-based clustering with network covariates by combining a modified product partition model with hidden Markov random field.January 2012 (has links)
乘積型劃分模型最近被擴展為容許個體有協變量的隨機聚類模型,然而協變量受限與對個體性質的描述。隨著科技發展,於越來越多生物醫學或社會研究的聚類問題中,我們需要考慮聚類對象間兩兩關連的額外資料,如基因間的調節關係或人際關係中的社交網絡。為此我們提出一個基於模型的方法,綜合乘積型劃分模型的一種改型與隱馬可夫隨機場對有網絡和協變量信息的對象做聚類。統計推論以貝葉斯方法進行。模型計算以馬可夫鏈蒙地卡羅運算法則進行。為了使馬可夫鏈能更好地混和,使用循序分配合併分裂取樣器進行群體移動以減少困於區域性頂點的機會。 / 為了測試本文提出的新方法的聚類性能,我們在兩個合成數據集上進行了模擬實驗。該實驗涵括多種類型的應變量,協變量網絡結構。結果顯示該方法在大部分實驗條件下都具有高正確聚類率。我們還將此返法應用於兩個真實數據集。第一個真實數據集利用學術期刊間相互引用的信息幫助對學術期刊的分門別類。第二個真實數據集合併酵母中基因的表達、轉錄因子結合位點和基因間的調控網絡信息,已對基因做詳細的功能分類。這兩個基於真實數據的實驗都給出諸多有意義的結果。 / The product partition model was recently extended for the covariate-dependent random partition of subjects, where the covariates are limited to properties of individual subjects. For many clustering problems in biomedical or social studies, we often have extra clustering information from the pairwise association among subjects, such as the regulatory relationship between genes or the social network among people. Here we propose a model-based method for clustering with network information by combining a modified product partition model with hidden Markov random field. The Bayesian approach is used for statistical inference. Markov Chain Monte Carlo algorithms are used to compute the model. In order to improve the mixing of the chain, the Sequentially-Allocated Merge-Split Sampler is adapted to perform group moves as an eort to lower the chance of trapping in local modes. / The new method is tested on two synthesized data sets to evaluate its performance on different types of response variables, covariates and networks. The correct clustering rate is satisfactory under a wide range of conditions. We also applied this new method on two real data sets. The first real data set is the journal data, where the cross citation information among journals is used to groups journals to different categories. The second real data set involves the gene expression, motif binding and gene network of yeast, where the goal is to find detail gene functional groups. Both experiments yielded interesting results. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Fung, Ling Hiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Technical Background --- p.7 / Chapter 2.1 --- Variable notation --- p.8 / Chapter 2.2 --- Two exemplary models for the response variable --- p.10 / Chapter 2.3 --- PPMx --- p.12 / Chapter 2.3.1 --- PPM - definition and its equivalence to DPM --- p.12 / Chapter 2.3.2 --- PPMx - extension with covariates --- p.15 / Chapter 2.3.3 --- Posterior inference --- p.18 / Chapter 2.4 --- HMRF --- p.19 / Chapter 2.4.1 --- Definition --- p.19 / Chapter 2.4.2 --- Constrained Dirichlet Process Mixture --- p.21 / Chapter 3 --- Model-based Clustering with Network Covariates --- p.27 / Chapter 3.1 --- Design of the model --- p.27 / Chapter 3.2 --- The Bayesian MCNC model --- p.30 / Chapter 3.3 --- MCMC computing --- p.31 / Chapter 3.4 --- Performance evaluation criteria --- p.37 / Chapter 4 --- Simulation study --- p.39 / Chapter 4.1 --- Network --- p.39 / Chapter 4.2 --- Covariates --- p.41 / Chapter 4.3 --- The Phase model (M1) --- p.42 / Chapter 4.4 --- The Normal model (M2) --- p.52 / Chapter 4.5 --- Comparing correct clustering percentage and correct co-occurrence percentage --- p.62 / Chapter 5 --- Real data --- p.68 / Chapter 5.1 --- Journal cross-citation data --- p.68 / Chapter 5.2 --- Gene Network of yeast data --- p.76 / Chapter 6 --- Conclusions --- p.89 / Chapter A --- p.91 / Chapter A.1 --- Covariates --- p.91 / Chapter A.1.1 --- Continuous covariates --- p.91 / Chapter A.1.2 --- Categorical covariates --- p.94 / Chapter A.1.3 --- Count covariates --- p.96 / Chapter A.2 --- Phase model --- p.98 / Chapter A.2.1 --- Prior specification --- p.99 / Chapter A.2.2 --- Data generation --- p.99 / Chapter A.2.3 --- Posterior estimation --- p.100 / Chapter A.3 --- Normal model --- p.111 / Chapter A.3.1 --- Prior specification --- p.111 / Chapter A.3.2 --- Data generation --- p.112 / Chapter A.3.3 --- Posterior estimation --- p.112 / Chapter A.4 --- Journal dataset --- p.115
|
4 |
Hyperplane based efficient clustering and searching /Chan, Alton Kam Fai. January 2003 (has links)
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 55-57). Also available in electronic version. Access restricted to campus users.
|
5 |
Model based and hybrid clustering of large datasets /Tantrum, Jeremy, January 2003 (has links)
Thesis (Ph. D.)--University of Washington, 2003. / Vita. Includes bibliographical references (p. 93-96).
|
6 |
An Alternative Approach to Visualizing Stock Market Correlation Matrices- An Empirical study of forming portfolios that contain only small numbers of stocks using both existing and newly discovered visualization methodsZhan, Cheng Juan January 2014 (has links)
The core of stock portfolio diversification is to pick stocks from different correlation clusters when forming portfolios. The result is that the chosen stocks will be only weakly correlated with each other. However, since correlation matrices are high dimensional, it is close to impossible to determine correlation clusters by simply looking at a correlation matrix. It is therefore common to regard industry groups as correlation clusters. In this thesis, we used three visualization methods namely Hierarchical Cluster Trees, Minimum Spanning Trees and neighbor-Net splits graphs to “collapse” correlation matrices’ high dimensional structures onto two-dimensional planes, and then assign stocks into different clusters to create the correlation clusters. We then simulated sets of portfolios where each set contains 1000 portfolios, and stocks in each of the portfolio were picked from the correlation clusters suggested by each of the three visualization methods and industry groups (another way of determine correlation clusters). The mean and variance distribution of each set of 1000 simulated portfolios gives us an indication of how well those clusters were determined.
The examinations were conducted on two sets of financial data. The first one is the 30 stocks in the Dow Jones Industrial average which contains relatively small number of stocks and the second one is the ASX 200 which contains relatively larger number of stocks. We found none of the methods studied consistently defined correlation clusters more efficiently than others in out-of-sample testing.
The thesis does contribute the finance literature in two ways. Firstly, it introduces the neighbor-Net method as an alternative way to visualize financial data’s underlying structures. Secondly, it used a novel “visualization
|
7 |
Probabilistic model-based clustering of complex dataZhong, Shi, January 2003 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2003. / Vita. Includes bibliographical references. Available also from UMI Company.
|
8 |
An optimization algorithm for clustering using weighted dissimilarity measuresChan, Yat-ling., 陳逸靈. January 2003 (has links)
published_or_final_version / abstract / toc / Mathematics / Master / Master of Philosophy
|
9 |
Hierarchical clustering using dynamic self organising neural networksButchart, Kate January 1996 (has links)
No description available.
|
10 |
Selective isolation classification and ecology of nocardiae from soil, water and biodeteriorating rubberHookey, J. V. January 1983 (has links)
No description available.
|
Page generated in 0.0528 seconds