11 |
Unsupervised asset cluster analysis implemented with parallel genetic algorithms on the NVIDIA CUDA platformCieslakiewicz, Dariusz 01 July 2014 (has links)
During times of stock market turbulence and crises, monitoring the clustering behaviour
of financial instruments allows one to better understand the behaviour of the stock market
and the associated systemic risks. In the study undertaken, I apply an effective and
performant approach to classify data clusters in order to better understand correlations
between stocks. The novel methods aim to address the lack of effective algorithms to
deal with high-performance cluster analysis in the context of large complex real-time
low-latency data-sets. I apply an efficient and novel data clustering approach, namely
the Giada and Marsili log-likelihood function derived from the Noh model and use a Parallel
Genetic Algorithm in order to isolate residual data clusters. Genetic Algorithms
(GAs) are a very versatile methodology for scientific computing, while the application
of Parallel Genetic Algorithms (PGAs) further increases the computational efficiency.
They are an effective vehicle to mine data sets for information and traits. However,
the traditional parallel computing environment can be expensive. I focused on adopting
NVIDIAs Compute Unified Device Architecture (CUDA) programming model in order
to develop a PGA framework for my computation solution, where I aim to efficiently
filter out residual clusters. The results show that the application of the PGA with
the novel clustering function on the CUDA platform is quite effective to improve the
computational efficiency of parallel data cluster analysis.
|
12 |
Genetic based clustering algorithms and applications.January 2000 (has links)
by Lee Wing Kin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 81-90). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgments --- p.iii / List of Figures --- p.vii / List of Tables --- p.viii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Clustering --- p.1 / Chapter 1.1.1 --- Hierarchical Classification --- p.2 / Chapter 1.1.2 --- Partitional Classification --- p.3 / Chapter 1.1.3 --- Comparative Analysis --- p.4 / Chapter 1.2 --- Cluster Analysis and Traveling Salesman Problem --- p.5 / Chapter 1.3 --- Solving Clustering Problem --- p.7 / Chapter 1.4 --- Genetic Algorithms --- p.9 / Chapter 1.5 --- Outline of Work --- p.11 / Chapter 2 --- The Clustering Algorithms and Applications --- p.13 / Chapter 2.1 --- Introduction --- p.13 / Chapter 2.2 --- Traveling Salesman Problem --- p.14 / Chapter 2.2.1 --- Related Work on TSP --- p.14 / Chapter 2.2.2 --- Solving TSP using Genetic Algorithm --- p.15 / Chapter 2.3 --- Applications --- p.22 / Chapter 2.3.1 --- Clustering for Vertical Partitioning Design --- p.22 / Chapter 2.3.2 --- Horizontal Partitioning a Relational Database --- p.36 / Chapter 2.3.3 --- Object-Oriented Database Design --- p.42 / Chapter 2.3.4 --- Document Database Design --- p.49 / Chapter 2.4 --- Conclusions --- p.53 / Chapter 3 --- The Experiments for Vertical Partitioning Problem --- p.55 / Chapter 3.1 --- Introduction --- p.55 / Chapter 3.2 --- Comparative Study --- p.56 / Chapter 3.3 --- Experimental Results --- p.59 / Chapter 3.4 --- Conclusions --- p.61 / Chapter 4 --- Three New Operators for TSP --- p.62 / Chapter 4.1 --- Introduction --- p.62 / Chapter 4.2 --- Enhanced Cost Edge Recombination Operator --- p.63 / Chapter 4.3 --- Shortest Path Operator --- p.66 / Chapter 4.4 --- Shortest Edge Operator --- p.69 / Chapter 4.5 --- The Experiments --- p.71 / Chapter 4.5.1 --- Experimental Results for a 48-city TSP --- p.71 / Chapter 4.5.2 --- Experimental Results for Problems in TSPLIB --- p.73 / Chapter 4.6 --- Conclusions --- p.77 / Chapter 5 --- Conclusions --- p.78 / Chapter 5.1 --- Summary of Achievements --- p.78 / Chapter 5.2 --- Future Development --- p.80 / Bibliography --- p.81
|
13 |
A study of two problems in data mining: projective clustering and multiple tables association rules mining.January 2002 (has links)
Ng Ka Ka. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 114-120). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgement --- p.vii / Chapter I --- Projective Clustering --- p.1 / Chapter 1 --- Introduction to Projective Clustering --- p.2 / Chapter 2 --- Related Work to Projective Clustering --- p.7 / Chapter 2.1 --- CLARANS - Graph Abstraction and Bounded Optimization --- p.8 / Chapter 2.1.1 --- Graph Abstraction --- p.8 / Chapter 2.1.2 --- Bounded Optimized Random Search --- p.9 / Chapter 2.2 --- OptiGrid ´ؤ Grid Partitioning Approach and Density Estimation Function --- p.9 / Chapter 2.2.1 --- Empty Space Phenomenon --- p.10 / Chapter 2.2.2 --- Density Estimation Function --- p.11 / Chapter 2.2.3 --- Upper Bound Property --- p.12 / Chapter 2.3 --- CLIQUE and ENCLUS - Subspace Clustering --- p.13 / Chapter 2.3.1 --- Monotonicity Property of Subspaces --- p.14 / Chapter 2.4 --- PROCLUS Projective Clustering --- p.15 / Chapter 2.5 --- ORCLUS - Generalized Projective Clustering --- p.16 / Chapter 2.5.1 --- Singular Value Decomposition SVD --- p.17 / Chapter 2.6 --- "An ""Optimal"" Projective Clustering" --- p.17 / Chapter 3 --- EPC : Efficient Projective Clustering --- p.19 / Chapter 3.1 --- Motivation --- p.19 / Chapter 3.2 --- Notations and Definitions --- p.21 / Chapter 3.2.1 --- Density Estimation Function --- p.22 / Chapter 3.2.2 --- 1-d Histogram --- p.23 / Chapter 3.2.3 --- 1-d Dense Region --- p.25 / Chapter 3.2.4 --- Signature Q --- p.26 / Chapter 3.3 --- The overall framework --- p.28 / Chapter 3.4 --- Major Steps --- p.30 / Chapter 3.4.1 --- Histogram Generation --- p.30 / Chapter 3.4.2 --- Adaptive discovery of dense regions --- p.31 / Chapter 3.4.3 --- Count the occurrences of signatures --- p.36 / Chapter 3.4.4 --- Find the most frequent signatures --- p.36 / Chapter 3.4.5 --- Refine the top 3m signatures --- p.37 / Chapter 3.5 --- Time and Space Complexity --- p.38 / Chapter 4 --- EPCH: An extension and generalization of EPC --- p.40 / Chapter 4.1 --- Motivation of the extension --- p.40 / Chapter 4.2 --- Distinguish clusters by their projections in different subspaces --- p.43 / Chapter 4.3 --- EPCH: a generalization of EPC by building histogram with higher dimensionality --- p.46 / Chapter 4.3.1 --- Multidimensional histograms construction and dense re- gions detection --- p.46 / Chapter 4.3.2 --- Compressing data objects to signatures --- p.47 / Chapter 4.3.3 --- Merging Similar Signature Entries --- p.49 / Chapter 4.3.4 --- Associating membership degree --- p.51 / Chapter 4.3.5 --- The choice of Dimensionality d of the Histogram --- p.52 / Chapter 4.4 --- Implementation of EPC2 --- p.53 / Chapter 4.5 --- Time and Space Complexity of EPCH --- p.54 / Chapter 5 --- Experimental Results --- p.56 / Chapter 5.1 --- Clustering Quality Measurement --- p.56 / Chapter 5.2 --- Synthetic Data Generation --- p.58 / Chapter 5.3 --- Experimental setup --- p.59 / Chapter 5.4 --- Comparison between EPC and PROCULS --- p.60 / Chapter 5.5 --- Comparison between EPCH and ORCLUS --- p.62 / Chapter 5.5.1 --- Dimensionality of the original space and the associated subspace --- p.65 / Chapter 5.5.2 --- Projection not parallel to original axes --- p.66 / Chapter 5.5.3 --- Data objects belong to more than one cluster under fuzzy clustering --- p.67 / Chapter 5.6 --- Scalability of EPC --- p.68 / Chapter 5.7 --- Scalability of EPC2 --- p.69 / Chapter 6 --- Conclusion --- p.71 / Chapter II --- Multiple Tables Association Rules Mining --- p.74 / Chapter 7 --- Introduction to Multiple Tables Association Rule Mining --- p.75 / Chapter 7.1 --- Problem Statement --- p.77 / Chapter 8 --- Related Work to Multiple Tables Association Rules Mining --- p.80 / Chapter 8.1 --- Aprori - A Bottom-up approach to generate candidate sets --- p.80 / Chapter 8.2 --- VIPER - Vertical Mining with various optimization techniques --- p.81 / Chapter 8.2.1 --- Vertical TID Representation and Mining --- p.82 / Chapter 8.2.2 --- FORC --- p.83 / Chapter 8.3 --- Frequent Itemset Counting across Multiple Tables --- p.84 / Chapter 9 --- The Proposed Method --- p.85 / Chapter 9.1 --- Notations --- p.85 / Chapter 9.2 --- Converting Dimension Tables to internal representation --- p.87 / Chapter 9.3 --- The idea of discovering frequent itemsets without joining --- p.89 / Chapter 9.4 --- Overall Steps --- p.91 / Chapter 9.5 --- Binding multiple Dimension Tables --- p.92 / Chapter 9.6 --- Prefix Tree for FT --- p.94 / Chapter 9.7 --- Maintaining frequent itemsets in FI-trees --- p.96 / Chapter 9.8 --- Frequency Counting --- p.99 / Chapter 10 --- Experiments --- p.102 / Chapter 10.1 --- Synthetic Data Generation --- p.102 / Chapter 10.2 --- Experimental Findings --- p.106 / Chapter 11 --- Conclusion and Future Works --- p.112 / Bibliography --- p.114
|
14 |
A new approach to clustering large databases in data mining.January 2004 (has links)
Lau Hei Yuet. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 74-76). / Abstracts in English and Chinese. / Abstract --- p.i / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Cluster Analysis --- p.1 / Chapter 1.2 --- Dissimilarity Measures --- p.3 / Chapter 1.2.1 --- Continuous Data --- p.4 / Chapter 1.2.2 --- Categorical and Nominal Data --- p.4 / Chapter 1.2.3 --- Mixed Data --- p.5 / Chapter 1.2.4 --- Missing Data --- p.6 / Chapter 1.3 --- Outline of the thesis --- p.6 / Chapter 2 --- Clustering Algorithms --- p.9 / Chapter 2.1 --- The k-means Algorithm Family --- p.9 / Chapter 2.1.1 --- The Algorithms --- p.9 / Chapter 2.1.2 --- Choosing the Number of Clusters - the MaxMin Algo- rithm --- p.12 / Chapter 2.1.3 --- Starting Configuration - the MaxMin Algorithm --- p.16 / Chapter 2.2 --- Clustering Using Unidimensional Scaling --- p.16 / Chapter 2.2.1 --- Unidimensional Scaling --- p.16 / Chapter 2.2.2 --- Procedures --- p.17 / Chapter 2.2.3 --- Guttman's Updating Algorithm --- p.18 / Chapter 2.2.4 --- Pliner's Smoothing Algorithm --- p.18 / Chapter 2.2.5 --- Starting Configuration --- p.19 / Chapter 2.2.6 --- Choosing the Number of Clusters --- p.21 / Chapter 2.3 --- Cluster Validation --- p.23 / Chapter 2.3.1 --- Continuous Data --- p.23 / Chapter 2.3.2 --- Nominal Data --- p.24 / Chapter 2.3.3 --- Resampling Method --- p.25 / Chapter 2.4 --- Conclusion --- p.27 / Chapter 3 --- Experimental Results --- p.29 / Chapter 3.1 --- Simulated Data 1 --- p.29 / Chapter 3.2 --- Simulated Data 2 --- p.35 / Chapter 3.3 --- Iris Data --- p.41 / Chapter 3.4 --- Wine Data --- p.47 / Chapter 3.5 --- Mushroom Data --- p.53 / Chapter 3.6 --- Conclusion --- p.59 / Chapter 4 --- Large Database --- p.61 / Chapter 4.1 --- Sliding Windows Algorithm --- p.61 / Chapter 4.2 --- Two-stage Algorithm --- p.63 / Chapter 4.3 --- Three-stage Algorithm --- p.65 / Chapter 4.4 --- Experimental Results --- p.66 / Chapter 4.5 --- Conclusion --- p.68 / Chapter A --- Algorithms --- p.69 / Chapter A.1 --- MaxMin Algorithm --- p.69 / Chapter A.2 --- Sliding Windows Algorithm --- p.70 / Chapter A.3 --- Two-stage Algorithm - Stage One --- p.72 / Chapter A.4 --- Two-stage Algorithm - Stage Two --- p.73 / Bibliography --- p.74
|
15 |
Learning by propagation. / CUHK electronic theses & dissertations collectionJanuary 2008 (has links)
Finally, we study how to construct an appropriate graph for spectral clustering. Given a local similarity matrix (a graph), we propose an iterative regularization procedure to iteratively enhance its cluster structure, leading to a global similarity matrix. Significant improvement of clustering performance is observed when the new graph is used for spectral clustering. / In this thesis, we consider the general problem of classifying a data set into a number of subsets, which has been one of the most fundamental problems in machine learning. Specifically, we mainly address the following four common learning problems in three active research fields: semi-supervised classification, semi-supervised clustering, and unsupervised clustering. The first problem we consider is semi-supervised classification from both unlabeled data and pairwise constraints. The pairwise constraints specify which two objects belong to the same class or not. Our aim is to propagate the pairwise constraints to the entire data set. We formulate the propagation model as a semidefinite programming (SDP) problem, which can be globally solved reliably. Our approach is applicable to multi-class problems and handles class labels, pairwise constraints, or a mixture of them in a unified framework. / The second problem is semi-supervised clustering with pairwise constraints. We present a principled framework for learning a data-driven and constraint-consistent nonlinear mapping to reshape the data in a feature space. We formulate the problem as a small-scale SDP problem, whose size is independent of the numbers of the objects and the constraints. Thus it can be globally solved efficiently. Our framework has several attractive features. First, it can effectively propagate pairwise constraints, when available, to the entire data set. Second, it scales well to large-scale problems. Third, it can effectively handle noisy constraints. Fourth, in the absence of constraints, it becomes a novel kernel-based clustering algorithm that can discover linearly non-separable clusters. / Third, we deal with noise robust clustering. Many clustering algorithms, including spectral clustering, often fail on noisy data. We propose a data warping model to map the data into a new space. During the warping, each object spreads its spatial information smoothly over the data graph to other objects. After the warping, hopefully each cluster becomes compact and different clusters become well-separated, including the noise cluster that is formed by the noise objects. The proposed clustering algorithm can handle significantly noisy data, and can find the number of clusters automatically. / Li, Zhenguo. / Adviser: Liu Jianzhuang. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3604. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 121-131). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
16 |
A study of cluster identification approaches for the group technology problem.January 2003 (has links)
Chu Pok Nang. / Thesis submitted on: October 2002. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 69-73). / Abstracts in English and Chinese. / Chapter 1. --- Introduction / Group Technology --- p.6 / Purposes of Research --- p.10 / The Outline of this Thesis --- p.13 / Chapter 2. --- Literature Review / Algorithms for Group Technology --- p.14 / Hierarchical Clustering Approaches --- p.17 / Sorting Based Approaches --- p.18 / Heuristic Exchange Approaches --- p.19 / Seed Based Approaches --- p.20 / Simulated Annealing Approaches --- p.20 / Tabu Search Approaches --- p.21 / Genetic Algorithm Approaches --- p.21 / Neural Network Approaches --- p.22 / Cluster Identification Approaches --- p.22 / Chapter 3. --- The Group Technology Problem / Representing a Manufacturing System --- p.25 / Machine-Part Incidence Matrix --- p.26 / Chapter 4. --- The Improved Cluster Identification Algorithm / Cluster Identification --- p.34 / Formulation --- p.35 / Branch-and-Bound Method --- p.37 / Original Cluster Identification Algorithm --- p.39 / Branching Rule --- p.44 / Chapter 5. --- Computational Studies / Plans for Comparative Studies --- p.49 / Comparison with Existing Cluster Identification Approaches --- p.51 / Solutions to Some Well-known Problems --- p.53 / Comparison with an Optimal Method --- p.60 / Chapter 6. --- Conclusion --- p.63 / Reference --- p.69
|
17 |
Clustering of categorical and numerical data without knowing cluster numberJia, Hong 01 January 2013 (has links)
No description available.
|
18 |
Center-based cluster analysis using inter-point distances.January 2009 (has links)
Law, Shu Kei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 39-40). / Abstract also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Basic concept of clustering --- p.1 / Chapter 1.2 --- Main problems --- p.2 / Chapter 1.3 --- Review --- p.3 / Chapter 1.4 --- Newly proposed method --- p.7 / Chapter 1.5 --- Summary --- p.7 / Chapter 2 --- k-means clustering --- p.9 / Chapter 2.1 --- Algorithm of k-means clustering --- p.9 / Chapter 2.2 --- Selecting k in k-mcans clustering --- p.11 / Chapter 2.3 --- Disadvantages of k-means clustering --- p.12 / Chapter 3 --- Methodology and Algorithm --- p.14 / Chapter 3.1 --- Methodology and Algorithm --- p.14 / Chapter 3.2 --- Illustrative Example --- p.20 / Chapter 4 --- Simulation Study --- p.25 / Chapter 4.1 --- Simulation Plan --- p.25 / Chapter 4.2 --- Simulation Details --- p.27 / Chapter 4.3 --- Simulation Result --- p.30 / Chapter 4.4 --- Summary --- p.34 / Chapter 5 --- Conclusion and Further research --- p.36 / Bibliography --- p.38
|
19 |
Subspace clustering for high dimensional categorical data /Gan, Guojun. January 2003 (has links)
Thesis (M.Sc.)--York University, 2003. Graduate Programme in Mathematics and Statistics. / Typescript. Includes bibliographical references (leaves 112-121) and index. Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL:http://gateway.proquest.com/openurl?url%5Fver=Z39.88-2004&res%5Fdat=xri:pqdiss&rft%5Fval%5Ffmt=info:ofi/fmt:kev:mtx:dissertation&rft%5Fdat=xri:pqdiss:MQ99310
|
20 |
Design of a cluster analysis heuristic for the configuration and capacity management of manufacturing cellsShim, Young Hak 17 September 2007 (has links)
This dissertation presents the configuration and capacity management of manufacturing cells using cluster analysis. A heuristic based on cluster analysis is developed to solve cell formation in cellular manufacturing systems (CMS). The clustering heuristic is applied for cell formation considering processing requirement (CFOPR) as well as various manufacturing factors (CFVMF). The proposed clustering heuristic is developed by employing a new solving structure incorporating hierarchical and non-hierarchical clustering methods. A new similarity measure is constructed by modifying the Jarccard similarity and a new assignment algorithm is proposed by employing the new pairwise exchange method. In CFOPR, the clustering heuristic is modified by adding a feedback step and more exact allocation rules. Grouping efficacy is employed as a measure to evaluate solutions obtained from the heuristic. The clustering heuristic for CFOPR was evaluated on 23 test problems taken from the literature in order to compare with other approaches and produced the best solution in 18 out of 23 and the second best in the remaining problems. These solutions were obtained in a considerably short time and even the largest test problem was solved in around one and a half seconds. In CFVMF, the machine capacity was first ensured, and then manufacturing cells were configured to minimize intercellular movements. In order to ensure the machine capacity, the duplication of machines and the split of operations are allowed and operations are assigned into duplicated machines by the largest-first rule. The clustering heuristic for CFVMF proposes a new similarity measure incorporating processing requirement, material flow and machine workload and a new machine-part matrix representing material flow and processing time assigned to multiple identical machines. Also, setup time, which has not been clearly addressed in existing research, is discussed in the solving procedure. The clustering heuristic for CFVMF employs two evaluation measures such as the number of intercellular movements and grouping efficacy. In two test problems taken from the literature, the heuristic for CFVMF produced the same results, but the trade-off problem between the two evaluation measures is proposed to consider the goodness of grouping.
|
Page generated in 0.221 seconds