Spelling suggestions: "subject:"clustering"" "subject:"klustering""
1 |
A reversible jump MCMC for mixture of t factor analyzersFu, Gujie January 2024 (has links)
This thesis explores the integration of Multivariate t-distribution within Factor Analysis and its extension through mixture models, emphasizing robust statistical methodologies for complex data analysis. We employ reversible jump Markov chain Monte Carlo for model selection, addressing the challenges of non-normal data behaviors such as outliers and heavy tails. The research contributes to the statistical field by enhancing model accuracy and flexibility, particularly in clustering and Bayesian inference. Through theoretical development and practical applications, including simulations and real-world datasets (wine and olive oil data), this study demonstrates the efficacy of these methodologies in uncovering latent structures and provides a comprehensive toolkit for advanced data analysis. / Thesis / Master of Science (MSc)
|
2 |
A Novel Multiobjective EA-based Clustering Algorithm with Automatic Determination of the Number of ClustersChen, Wen-Ling 07 September 2012 (has links)
Automatically determining the number of clusters without a priori knowledge is a difficult research issue for data clustering problem. An effective multiobjective evolutionary algorithm based clustering algorithm is proposed to not only overcome this problem but also provide a better clustering result in this study. The proposed algorithm differs from the traditional evolutionary algorithm in the sense that instead of a single crossover operator and a single mutation operator, the proposed algorithm uses a pool of crossover operators and a pool of mutation operators that are selected at random to increase the search diversity. To evaluate the performance of the proposed algorithm, several well-known datasets are used. The simulation results show that not only can the proposed algorithm automatically determine the number of clusters, but it can also provide a better clustering result.
|
3 |
Intra-topic clustering for social mediaGondhi, Uttej Reddy 28 August 2020 (has links)
With the social media platforms leading the internet in terms of user base and the average time spent, significant amount of data is being generated by these platforms every day. This makes social media platforms a go-to place to understand the reviews, trends, and opinions of the people. Any regular search for a popular topic would result in an abundance of information and thus it is impossible to go through these large amounts of data manually to understand the trends.
This thesis discusses techniques for the intra-topic clustering of such social media data and discusses how social media noise increases the redundancy of the search results. Our goal is to filter the amount of redundant information an end-user must review from a regular social media search. The research proposes clustering models based on two string similarity measures Jaccard word token and T-Information distance. Evaluation parameters are introduced and the models are evaluated on clustering a set of current and historical topics to determine which techniques are the most effective. / Graduate
|
4 |
Evaluating Clustering Techniques over Big Data in Distributed InfrastructuresShetty, Kartik 25 April 2018 (has links)
Clustering is defined as the process of grouping a set of objects in a way that objects in the same group are similar in some sense to each other than to those in other groups. It is used in many fields including machine learning, image recognition, pattern recognition and knowledge discovery. In this era of Big Data, we could leverage the computing power of distributed environment to achieve it over large dataset. It can be achieved through various algorithms, but in general they have high time complexities. We see that for large datasets the scalability and the parameters of the environment in which it is running become issues which needs to be addressed. Therefore it's brute force implementation is not scalable over large datasets even in a distributed environment, which calls the need for an approximation technique or optimization to make it scalable. We study three clustering techniques: CURE, DBSCAN and k-means over distributed environment like Hadoop. For each of these algorithms we understand their performance trade offs and bottlenecks and then propose enhancements or optimizations or an approximation technique to make it scalable in Hadoop. Finally we evaluate it's performance and suitability to datasets of different sizes and distributions.
|
5 |
A combined clustering and placement algorithm for FPGAsYamashita, Mark 05 1900 (has links)
One of the major drawbacks of reprogrammable microchips, such as field-programmable gate arrays (FPGAs), is an inherent speed disadvantage over ASIC technologies. To mitigate this speed disadvantage, this thesis presents a novel algorithm to improve timing performance at the possible expense of area and runtime. The algorithm presented leverages node duplication and a depth-optimal initial clustering to provide a starting point for a non-greedy, iterative optimization technique using detailed placement and timing information to develop the final clustering and placement solutions.
For a set of benchmarks commonly used in FPGA research, the proposed algorithm achieves an 11\% critical-path delay improvement compared to the VPR academic tool flow. This performance improvement is obtained at the expense of a 44\% increase in area usage and a 26x increase in maximum runtime. Techniques have also been implemented to sacrifice performance to moderate the area or runtime increases. For a 1\% critical-path delay penalty, the runtime can be improved by a factor of 4. The algorithm also provides facilities to impose area restrictions, in which case timing degradation is proportional to the area saved.
|
6 |
Overlapping clusteringKrumpelman, Chase Serhur 13 December 2010 (has links)
Analysis of large collections of data has become inescapable in many areas of scientific and commercial endeavor. As the size and dimensionality of these collections exceed the pattern recognition capability of the human mind computational analysis tools become a necessity for interpretation. Clustering
algorithms, which aim to find interesting groupings within collections of data, are one such tool. Each algorithm incorporates into its design an inherent definition of “interesting” intended to capture nonrandom data groupings likely to have some interpretation to human users. Most existing algorithms include
as part of their definition of “interesting” an assumption that each data point can belong at most to one grouping. While this assumption allows for algorithmic convenience and ease of analysis, it is often an artificial imposition on true underlying data structure. The idea of allowing points to belong to multiple groupings - known as “overlapping” or “multiple membership” clustering - has emerged in several domains in ad hoc solutions lacking conceptual unity
in approach, interpretation, and analysis. This dissertation proposes general, domain-independent elucidations and practical techniques which address each of these.
We begin by positing overlapping clustering’s role specifically, and clustering’s role in general, as assistive technologic tools allowing human minds to represent and interpret structures in data beyond the capability of our innate senses. With this guiding purpose clarified, we provide a catalog of existing techniques. We then address the issue of objectively comparing the results
of different algorithms, specifically examining the previously defined Omega
index, as well as multiple membership generalizations of normalized mutual information. Following that comparison, we propose a novel approach to com-
paring clusterings called cluster alignment. By combining a sorting algorithm with a greedy matching algorithm, we produce comparably organized membership matrices and a means for both numerically and visually comparing multiple-membership assignments. With overlapping clustering’s purpose defined, and the means to analyze results, we move on to presenting algorithms for efficiently discovering overlapping clusters in data. First, we present a generalization of one of the common themes in the ad hoc approaches: additive
clustering. Starting with a previously developed structural model of additive clustering, we generalize it to be applicable to any regular exponential
family distribution thereby extending its utility into several domains, notably high-dimensional sparse domains including text and recommender systems. Finally, we address overlapping clustering by examining the properties of data
in similarity spaces. We develop a probabilistic generative model of overlapping data in similarity spaces, and then develop two conceptual approaches to
discovering overlapping clustering in similarity spaces. The first of these is the conceptual multiple-membership generalization of hierarchical agglomerative
clustering, and the second is an iterative density hill-climbing algorithm. / text
|
7 |
A combined clustering and placement algorithm for FPGAsYamashita, Mark 05 1900 (has links)
One of the major drawbacks of reprogrammable microchips, such as field-programmable gate arrays (FPGAs), is an inherent speed disadvantage over ASIC technologies. To mitigate this speed disadvantage, this thesis presents a novel algorithm to improve timing performance at the possible expense of area and runtime. The algorithm presented leverages node duplication and a depth-optimal initial clustering to provide a starting point for a non-greedy, iterative optimization technique using detailed placement and timing information to develop the final clustering and placement solutions.
For a set of benchmarks commonly used in FPGA research, the proposed algorithm achieves an 11\% critical-path delay improvement compared to the VPR academic tool flow. This performance improvement is obtained at the expense of a 44\% increase in area usage and a 26x increase in maximum runtime. Techniques have also been implemented to sacrifice performance to moderate the area or runtime increases. For a 1\% critical-path delay penalty, the runtime can be improved by a factor of 4. The algorithm also provides facilities to impose area restrictions, in which case timing degradation is proportional to the area saved.
|
8 |
Duomenų grupavimo taikymai transporto sistemose / Data Clustering Usage in Transport SystemsPenikas, Marius 04 March 2009 (has links)
Transporto informacinės sistemos privalo greitai apdoroti milžiniškus ir vis didėjančius duomenų kiekius. Kadangi skirtingiems tikslams pasiekti ar skirtingoms išvadoms padaryti reikalingų duomenų kiekis kartais gali skirtis kelias dešimtis ar net kelis šimtus kartų jį optimizavus pavyktų sutaupyti daug laiko ir resursų. Siekiant mažinti duomenų kiekį, neprarandant svarbios informacijos yra naudojamas duomenų grupavimas – objektų priskyrimas tam tikrom grupėm pabal bendrus požymius. Šio darbo tikslas – išanalizuoti ir įvertinti grupavimo algoritmų klases, jų tipinius atstovus bei jų pritaikymą transporto sistemų duomenų grupavimui. / The object of investigation of this paper is data clustering and adjustment of data clustering algorithms to traffic flow control systems. The main goal is to analyze which class of clustering algorithms can perform better with specific traffic data, how much of this data is enough to forecast precise results, how much can we minimize and reduce our data and still get correct results.
|
9 |
FP-growth approach for document clusteringAkbar, Monika. January 2008 (has links) (PDF)
Thesis (MS)--Montana State University--Bozeman, 2008. / Typescript. Chairperson, Graduate Committee: Rafal A. Angryk. Includes bibliographical references (leaves 58-61).
|
10 |
A combined clustering and placement algorithm for FPGAsYamashita, Mark 05 1900 (has links)
One of the major drawbacks of reprogrammable microchips, such as field-programmable gate arrays (FPGAs), is an inherent speed disadvantage over ASIC technologies. To mitigate this speed disadvantage, this thesis presents a novel algorithm to improve timing performance at the possible expense of area and runtime. The algorithm presented leverages node duplication and a depth-optimal initial clustering to provide a starting point for a non-greedy, iterative optimization technique using detailed placement and timing information to develop the final clustering and placement solutions.
For a set of benchmarks commonly used in FPGA research, the proposed algorithm achieves an 11\% critical-path delay improvement compared to the VPR academic tool flow. This performance improvement is obtained at the expense of a 44\% increase in area usage and a 26x increase in maximum runtime. Techniques have also been implemented to sacrifice performance to moderate the area or runtime increases. For a 1\% critical-path delay penalty, the runtime can be improved by a factor of 4. The algorithm also provides facilities to impose area restrictions, in which case timing degradation is proportional to the area saved. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
Page generated in 0.0729 seconds