Spelling suggestions: "subject:"baluster analysis"" "subject:"4cluster analysis""
1 
From scenario association to categorical data clustering /Pan, Yuanyi. January 2005 (has links)
Thesis (M.Sc.)York University, 2005. Graduate Programme in Mathematics and Statistics. / Typescript. Includes bibliographical references (leaves 6162). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url%5Fver=Z39.882004&res%5Fdat=xri:pqdiss &rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR11874

2 
An Alternative Approach to Visualizing Stock Market Correlation Matrices An Empirical study of forming portfolios that contain only small numbers of stocks using both existing and newly discovered visualization methodsZhan, Cheng Juan January 2014 (has links)
The core of stock portfolio diversification is to pick stocks from different correlation clusters when forming portfolios. The result is that the chosen stocks will be only weakly correlated with each other. However, since correlation matrices are high dimensional, it is close to impossible to determine correlation clusters by simply looking at a correlation matrix. It is therefore common to regard industry groups as correlation clusters. In this thesis, we used three visualization methods namely Hierarchical Cluster Trees, Minimum Spanning Trees and neighborNet splits graphs to “collapse” correlation matrices’ high dimensional structures onto twodimensional planes, and then assign stocks into different clusters to create the correlation clusters. We then simulated sets of portfolios where each set contains 1000 portfolios, and stocks in each of the portfolio were picked from the correlation clusters suggested by each of the three visualization methods and industry groups (another way of determine correlation clusters). The mean and variance distribution of each set of 1000 simulated portfolios gives us an indication of how well those clusters were determined.
The examinations were conducted on two sets of financial data. The first one is the 30 stocks in the Dow Jones Industrial average which contains relatively small number of stocks and the second one is the ASX 200 which contains relatively larger number of stocks. We found none of the methods studied consistently defined correlation clusters more efficiently than others in outofsample testing.
The thesis does contribute the finance literature in two ways. Firstly, it introduces the neighborNet method as an alternative way to visualize financial data’s underlying structures. Secondly, it used a novel “visualization

3 
Modelbased clustering with network covariates by combining a modified product partition model with hidden Markov random field.January 2012 (has links)
乘積型劃分模型最近被擴展為容許個體有協變量的隨機聚類模型，然而協變量受限與對個體性質的描述。隨著科技發展，於越來越多生物醫學或社會研究的聚類問題中，我們需要考慮聚類對象間兩兩關連的額外資料，如基因間的調節關係或人際關係中的社交網絡。為此我們提出一個基於模型的方法，綜合乘積型劃分模型的一種改型與隱馬可夫隨機場對有網絡和協變量信息的對象做聚類。統計推論以貝葉斯方法進行。模型計算以馬可夫鏈蒙地卡羅運算法則進行。為了使馬可夫鏈能更好地混和，使用循序分配合併分裂取樣器進行群體移動以減少困於區域性頂點的機會。 / 為了測試本文提出的新方法的聚類性能，我們在兩個合成數據集上進行了模擬實驗。該實驗涵括多種類型的應變量，協變量網絡結構。結果顯示該方法在大部分實驗條件下都具有高正確聚類率。我們還將此返法應用於兩個真實數據集。第一個真實數據集利用學術期刊間相互引用的信息幫助對學術期刊的分門別類。第二個真實數據集合併酵母中基因的表達、轉錄因子結合位點和基因間的調控網絡信息，已對基因做詳細的功能分類。這兩個基於真實數據的實驗都給出諸多有意義的結果。 / The product partition model was recently extended for the covariatedependent random partition of subjects, where the covariates are limited to properties of individual subjects. For many clustering problems in biomedical or social studies, we often have extra clustering information from the pairwise association among subjects, such as the regulatory relationship between genes or the social network among people. Here we propose a modelbased method for clustering with network information by combining a modified product partition model with hidden Markov random field. The Bayesian approach is used for statistical inference. Markov Chain Monte Carlo algorithms are used to compute the model. In order to improve the mixing of the chain, the SequentiallyAllocated MergeSplit Sampler is adapted to perform group moves as an eort to lower the chance of trapping in local modes. / The new method is tested on two synthesized data sets to evaluate its performance on different types of response variables, covariates and networks. The correct clustering rate is satisfactory under a wide range of conditions. We also applied this new method on two real data sets. The first real data set is the journal data, where the cross citation information among journals is used to groups journals to different categories. The second real data set involves the gene expression, motif binding and gene network of yeast, where the goal is to find detail gene functional groups. Both experiments yielded interesting results. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Fung, Ling Hiu. / Thesis (M.Phil.)Chinese University of Hong Kong, 2012. / Abstracts also in Chinese. / Abstract  p.i / Acknowledgement  p.iv / Chapter 1  Introduction  p.1 / Chapter 2  Technical Background  p.7 / Chapter 2.1  Variable notation  p.8 / Chapter 2.2  Two exemplary models for the response variable  p.10 / Chapter 2.3  PPMx  p.12 / Chapter 2.3.1  PPM  definition and its equivalence to DPM  p.12 / Chapter 2.3.2  PPMx  extension with covariates  p.15 / Chapter 2.3.3  Posterior inference  p.18 / Chapter 2.4  HMRF  p.19 / Chapter 2.4.1  Definition  p.19 / Chapter 2.4.2  Constrained Dirichlet Process Mixture  p.21 / Chapter 3  Modelbased Clustering with Network Covariates  p.27 / Chapter 3.1  Design of the model  p.27 / Chapter 3.2  The Bayesian MCNC model  p.30 / Chapter 3.3  MCMC computing  p.31 / Chapter 3.4  Performance evaluation criteria  p.37 / Chapter 4  Simulation study  p.39 / Chapter 4.1  Network  p.39 / Chapter 4.2  Covariates  p.41 / Chapter 4.3  The Phase model (M1)  p.42 / Chapter 4.4  The Normal model (M2)  p.52 / Chapter 4.5  Comparing correct clustering percentage and correct cooccurrence percentage  p.62 / Chapter 5  Real data  p.68 / Chapter 5.1  Journal crosscitation data  p.68 / Chapter 5.2  Gene Network of yeast data  p.76 / Chapter 6  Conclusions  p.89 / Chapter A  p.91 / Chapter A.1  Covariates  p.91 / Chapter A.1.1  Continuous covariates  p.91 / Chapter A.1.2  Categorical covariates  p.94 / Chapter A.1.3  Count covariates  p.96 / Chapter A.2  Phase model  p.98 / Chapter A.2.1  Prior specification  p.99 / Chapter A.2.2  Data generation  p.99 / Chapter A.2.3  Posterior estimation  p.100 / Chapter A.3  Normal model  p.111 / Chapter A.3.1  Prior specification  p.111 / Chapter A.3.2  Data generation  p.112 / Chapter A.3.3  Posterior estimation  p.112 / Chapter A.4  Journal dataset  p.115

4 
The effects of clustering on the medium and largescale capacitated locationrouting problemBuhrmann, Jacoba Hendrina 26 July 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in fulfilment of the
requirements for the degree of Doctor of Philosophy.
February 23, 2016 / This work investigates the effectiveness of using clustering methods in solving various
capacitated locationrouting problems (CLRP) for medium and largescale datasets,
with up to 20 000 datapoints. Different clustering methods as well as hybrid clustering
methods are tested and compared.
A new problem called the planar CLRP (plCLRP) is introduced. Based on the results
from the clustering methods, clusterbased approaches are suggested to solve
variants of the CLRP. These include the Hamiltonian p–median problem (HpMP),
the planar CLRP (plCLRP), the concentrator discrete CLRP (cdCLRP) and the
standard discrete CLRP (sdCLRP). A new method called the twophased proportional
regret ordering based unconstrained to constrained (PROBUC) method is
also proposed to create capacitated clusters.
The focus falls on finding effective nonexponential time algorithms that can be used
to solve largescale problems with good results. A full set of results for each problem
are presented and comparisons are made with known results from the literature
where possible.
The PLRP (periodic locationrouting problem) introduced by Prodhon and Prins
(2008), is also investigated. A change in the current problem formulation, as provided
by Prodhon (2011), is proposed to enforce singlesource constraints across
time horizon and limit the maximum number of vehicles.
An approach to solve the PLRP, based on the clusterbased approaches to solve
the discrete CLRPs, is suggested. The results of the clusterbased approach are
compared to bestknown solutions for existing PLRP instances given by Prodhon
(2009a). A set of large scale PLRP instances are introduced, based on instances
generated by Harks et al. (2013) for the sdCLRP.

5 
Probabilistic modelbased clustering of complex dataZhong, Shi, January 2003 (has links) (PDF)
Thesis (Ph. D.)University of Texas at Austin, 2003. / Vita. Includes bibliographical references. Available also from UMI Company.

6 
Hyperplane based efficient clustering and searching /Chan, Alton Kam Fai. January 2003 (has links)
Thesis (M.Phil.)Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 5557). Also available in electronic version. Access restricted to campus users.

7 
Model based and hybrid clustering of large datasets /Tantrum, Jeremy, January 2003 (has links)
Thesis (Ph. D.)University of Washington, 2003. / Vita. Includes bibliographical references (p. 9396).

8 
Design of a cluster analysis heuristic for the configuration and capacity management of manufacturing cellsShim, Young Hak 17 September 2007 (has links)
This dissertation presents the configuration and capacity management of manufacturing cells using cluster analysis. A heuristic based on cluster analysis is developed to solve cell formation in cellular manufacturing systems (CMS). The clustering heuristic is applied for cell formation considering processing requirement (CFOPR) as well as various manufacturing factors (CFVMF). The proposed clustering heuristic is developed by employing a new solving structure incorporating hierarchical and nonhierarchical clustering methods. A new similarity measure is constructed by modifying the Jarccard similarity and a new assignment algorithm is proposed by employing the new pairwise exchange method. In CFOPR, the clustering heuristic is modified by adding a feedback step and more exact allocation rules. Grouping efficacy is employed as a measure to evaluate solutions obtained from the heuristic. The clustering heuristic for CFOPR was evaluated on 23 test problems taken from the literature in order to compare with other approaches and produced the best solution in 18 out of 23 and the second best in the remaining problems. These solutions were obtained in a considerably short time and even the largest test problem was solved in around one and a half seconds. In CFVMF, the machine capacity was first ensured, and then manufacturing cells were configured to minimize intercellular movements. In order to ensure the machine capacity, the duplication of machines and the split of operations are allowed and operations are assigned into duplicated machines by the largestfirst rule. The clustering heuristic for CFVMF proposes a new similarity measure incorporating processing requirement, material flow and machine workload and a new machinepart matrix representing material flow and processing time assigned to multiple identical machines. Also, setup time, which has not been clearly addressed in existing research, is discussed in the solving procedure. The clustering heuristic for CFVMF employs two evaluation measures such as the number of intercellular movements and grouping efficacy. In two test problems taken from the literature, the heuristic for CFVMF produced the same results, but the tradeoff problem between the two evaluation measures is proposed to consider the goodness of grouping.

9 
Singles purchasing behavior in bride cake marketLiu, Juichin 15 June 2004 (has links)
The purpose of this study is to analyze consumer behavior in bride cake market. The study divides into two parts: one aims consumers who married before 1996 and are above 40yearold. Through deep interviewing, we can profile consumer behavior in the bribe cake before 1996. The sample of the second study is singles of 18 to 35 years old. According to decision process of EKB model, this research tries to realize consumer purchasing behavior.
The results of this research are as follows:
1. The scale of bride cake industry won¡¦t diminish at the present time.
2. Acceptable price is rising up.
3. Western bride cake is consumer¡¦s favorite, then goes to the mixed form.
4. Famous bride cake chain store is the most favorite buying place.
5. Credit is the favorite payment pattern.
6. Valuable information comes from relatives, friends, and store¡¦s sales.
7. In terms of product attributes, consumer emphasizes ¡§good taste¡¨, ¡§good services¡¨, and ¡§fair price¡¨.
8. Geographical variables don¡¦t have significant differences in consumer behavior.
9. Via cluster analysis, samples are divided into three groups, namely ¡§Fashion Seeker¡¨, ¡§Innovator and Adventurer¡¨, and ¡§Realist and economist¡¨. In terms of demographic variables, significant differences exist only in sex and age. There are more female in ¡§Fashion Seeker¡¨ and ¡§Realist and economist¡¨. And the age in these two groups is for the most part range from 18 to 25 years old. The age of ¡§Innovator and Adventurer¡¨ is for the most part range from 26 to 30 years old. ¡§Fashion Seeker¡¨ values ¡§vogue¡¨, ¡§beautiful looking¡¨ and ¡§notable brand¡¨, and concerned ¡§TV advertisement¡¨.

10 
Extending linear grouping analysis and robust estimators for very large data setsHarrington, Justin 11 1900 (has links)
Cluster analysis is the study of how to partition data into homogeneous subsets so that the partitioned data share some common characteristic. In one to three dimensions, the human eye can distinguish well between clusters of data if clearly separated. However, when there are more than three dimensions and/or the data is not clearly separated, an algorithm is required which needs a metric of similarity that quantitatively measures the characteristic of interest.
Linear Grouping Analysis (LGA, Van Aelst et al. 2006) is an algorithm for clustering data around hyperplanes, and is most appropriate when: 1) the variables are related/correlated, which results in clusters with an approximately linear structure; and
2) it is not natural to assume that one variable is a “response”, and the remainder the “explanatories”.
LGA measures the compactness within each cluster via the sum of squared orthogonal distances to hyperplanes formed from the data.
In this dissertation, we extend the scope of problems to which LGA can be applied. The first extension relates to the linearity requirement inherent within LGA, and proposes a new method of nonlinearly transforming the data into a Feature Space, using the Kernel Trick, such that in this space the data might then form linear clusters. A possible side effect of this transformation is that the dimension of the transformed space is significantly larger than the number of observations in a given cluster, which causes problems with orthogonal regression. Therefore, we also introduce a new method for calculating the distance of an observation to a cluster when its covariance matrix is rank deficient.
The second extension concerns the combinatorial problem for optimizing a LGA objective function, and adapts an existing algorithm, called BIRCH, for use in providing fast, approximate solutions, particularly for the case when data does not fit in memory. We also provide solutions based on BIRCH for two other challenging optimization problems in the field of robust statistics, and demonstrate, via simulation study as well as application on actual data sets, that the BIRCH solution compares favourably to the existing stateoftheart alternatives, and in many cases finds a more optimal solution.

Page generated in 0.1479 seconds