141 |
Unsupervised Classification of Music Signals: Strategies Using Timbre and RhythmBond, Zachary 06 February 2007 (has links)
This thesis describes the ideal properties of an adaptable music classification system based on unsupervised machine learning, and argues that such a system should be based on the fundamental musical properties of timbre, rhythm, melody and harmony. The first two properties and the signal features associated with them are then explored in more depth. In the area of timbre, the relationship between musical style and commonly-extracted signal features within a broad range of piano music is explored, in an effort to identify features which are consistent among all piano music but different for other instruments. The effect of lossy compression on these same timbre features is also investigated. In the area of rhythm, a new tempo tracking tool is provided which produces a series of histograms containing beat and sub-beat information throughout the course of a musical recording. These histograms are then shown to be useful in the analysis of synthesized rhythms and real music. Additionally, a novel method based on the Expectation-Maximization algorithm is used to extract features for classification from the histograms. / Master of Science
|
142 |
Extensions to the OCLUST AlgorithmClark, Katharine M January 2024 (has links)
OCLUST is a clustering algorithm that trims outliers in Gaussian mixture models. While mixtures of multivariate Gaussian distributions are a useful way to model heterogeneity in data, it is not always an appropriate assumption that the data arise from a finite mixture of Gaussian distributions. This thesis extends the OCLUST algorithm to three types of data which depart from the multivariate Gaussian distribution. The first extension, called funOCLUST, is developed for data which exist in functional form. Next, MVN-OCLUST applies outlier trimming to matrix-variate normal data. Finally, the skewOCLUST algorithm is formulated for skewed data by applying a transformation to normality. However, this final extension occurs after a brief detour in Chapter 5 to establish a foundation for the final chapter. / Thesis / Doctor of Philosophy (PhD)
|
143 |
Quantifying Changes in Social Polarization Over Time and RegionEdwards, David Linville 29 July 2024 (has links)
Recent studies indicate that Americans have grown increasingly divided and polarized in recent years cite{boxell2022cross}, cite{hawdon2020social}. This research aims to describe and measure polarization trends across a historical archive of US-based, primarily regional, newspapers. The newspapers chosen are from various US markets to capture any regional differences in the discussion of issues/topics. Our modeling approach employs the Structural Topic Model (STM) to identify topics within a given corpus and measure the tonal differences of articles discussing the same topic. Specifically, we use the STM to infer potentially related articles and a sentiment analyzer called VADER to identify topics with a high level of semantic disparity. Using this method, we assess the polarization of developing and evolving topics, such as sports, politics, and entertainment, and compare how polarization between and within these topics has changed over time. Through this, we create topic-specific sentiment distributions, referred to as polarization distributions. We conclude by demonstrating the usefulness of these distributions in identifying polarization and showing how high polarization aligns with significant social events. / Doctor of Philosophy / Most Americans have a sense that their nation is becoming more socially polarized. Numerous studies and anecdotal evidence supports this. Our aim with this work is develop a method to quantify polarization in text media and apply this method to news articles published in local and national newspapers. Using a statistical model we are able to group articles based on a common shared topic. We then analyze the sentiment of each article and evaluate how sentiments for a particular topic change over time. We then compare newspapers based on location, political endorsements, and ownership groups.
|
144 |
Skin Detection in Image and Video Founded in Clustering and Region GrowingIslam, A B M Rezbaul 08 1900 (has links)
Researchers have been involved for decades in search of an efficient skin detection method. Yet current methods have not overcome the major limitations. To overcome these limitations, in this dissertation, a clustering and region growing based skin detection method is proposed. These methods together with a significant insight result in a more effective algorithm. The insight concerns a capability to define dynamically the number of clusters in a collection of pixels organized as an image. In clustering for most problem domains, the number of clusters is fixed a priori and does not perform effectively over a wide variety of data contents. Therefore, in this dissertation, a skin detection method has been proposed using the above findings and validated. This method assigns the number of clusters based on image properties and ultimately allows freedom from manual thresholding or other manual operations. The dynamic determination of clustering outcomes allows for greater automation of skin detection when dealing with uncertain real-world conditions.
|
145 |
Clustering Articles in a Literature Digital Library Based on Content and UsageTing, Kang-Di 10 August 2004 (has links)
Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis.
Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user¡¦s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.
|
146 |
Preference-Anchored Document clustering Technique for Supporting Effective Knowledge and Document ManagementWang, Shin 03 August 2005 (has links)
Effective knowledge management of proliferating volume of documents within a knowledge repository is vital to knowledge sharing, reuse, and assimilation. In order to facilitate accesses to documents in a knowledge repository, use of a knowledge map to organize these documents represents a prevailing approach. Document clustering techniques typically are employed to produce knowledge maps. However, existing document clustering techniques are not tailored to individuals¡¦ preferences and therefore are unable to facilitate the generation of knowledge maps from various preferential perspectives. In response, we propose the Preference-Anchored Document Clustering (PAC) technique that takes a user¡¦s categorization preference (represented as a list of anchoring terms) into consideration to generate a knowledge map (or a set of document clusters) from this specific preferential perspective. Our empirical evaluation results show that our proposed technique outperforms the traditional content-based document clustering technique in the high cluster precision area. Furthermore, benchmarked with Oracle Categorizer, our proposed technique also achieves better clustering effectiveness in the high cluster precision area. Overall, our evaluation results demonstrate the feasibility and potential superiority of the proposed PAC technique.
|
147 |
Personalized and Context-aware Document ClusteringYang, Chin-Sheng 15 July 2007 (has links)
To manage the ever-increasing volume of documents, organizations and individuals typically organize documents into categories (or category hierarchies) to facilitate their document management and support subsequent document retrieval and access. Document clustering is an intentional act that should reflect individuals¡¦ preferences with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective document clustering techniques need to take into account a user¡¦s categorization context defined by or relevant to the target task under consideration. However, existing document clustering techniques generally anchor in pure content-based analysis and therefore are not able to facilitate personalized or context-aware document clustering. In response, we design, implement and empirically evaluate three document clustering techniques capable of facilitating personalized or contextual document clustering. First, we extend an existing document clustering technique (specifically, the partial-clustering-based personalized document-clustering (PEC) approach) and propose the Collaborative Filtering¡Vbased personalized document-Clustering (CFC) technique to overcome the problem of small-sized partial clustering encountered by the PEC technique. Particularly, the CFC technique expands the size of a user¡¦s partial clustering based on the partial clusterings of other users with similar categorization preferences. Second, to support contextual document clustering, we design and implement a Context-Aware document-Clustering (CAC) technique by taking into consideration a user¡¦s categorization preference (i.e., a set of anchoring terms) relevant to the context of a target task and a statistical-based thesaurus constructed from the World Wide Web (WWW) via a search engine. Third, in response to the problem of small-sized set of anchoring terms which can greatly degrade the effectiveness of the CAC technique, we extend CAC and propose a Collaborative Filtering-based Context-Aware document Clustering (CF-CAC) technique. Our empirical evaluation results suggest that our proposed CFC, CAC, and CF-CAC techniques better support the need of personalized and contextual document clustering than do their benchmark techniques.
|
148 |
An Efficient Hilbert Curve-based Clustering Strategy for Large Spatial DatabasesLu, Yun-Tai 25 July 2003 (has links)
Recently, millions of databases have been used and we need a new technique that can automatically transform the processed data into useful information and knowledge. Data mining is the technique of analyzing data to discover previously unknown information and spatial data mining is the branch of data mining that deals with spatial data. In spatial data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering is that give n data points in a d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on finding efficient methods to the clustering problem. In general, we can classify these clustering algorithms into four approaches: partition, hierarchical, density-based, and grid-based approaches. The k-means algorithm which is based on the partitioning approach is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent ``natural' cluster, and it is only suitable for concave spherical clusters. The k-means algorithm has high computational complexity and is unable to handle large databases. Therefore, in this thesis, we present an efficient clustering algorithm for large spatial databases. It combines the hierarchical approach with the grid-based approach structure. We apply the grid-based approach, because it is efficient for large spatial databases. Moreover, we apply the hierarchical approach to find the genuine clusters by repeatedly combining together these blocks. Basically, we make use of the Hilbert curve to provide a way to linearly order the points of a grid. Note that the Hilbert curve is a kind of space-filling curves, where a space-filling curve is a continuous path which passes through every point in a space once to form a one-one correspondence between the coordinates of the points and the one-dimensional sequence numbers of the points on the curve. The goal of using space-filling curve is to preserve the distance that points which are close in 2-D space and represent similar data should be stored close together in the linear order. This kind of mapping also can minimize the disk access effort and provide high speed for clustering. This new algorithm requires only one input parameter and supports the user in determining an appropriate value for it. In our simulation, we have shown that our proposed clustering algorithm can have shorter execution time than other algorithms for the large databases. Since the number of data points is increased, the execution time of our algorithm is increased slowly. Moreover, our algorithm can deal with clusters with arbitrary shapes in which the k-means algorithm can not discover.
|
149 |
Stability Selection of the Number of ClustersReizer, Gabriella v 18 April 2011 (has links)
Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.
|
150 |
Hyperplane Clustering : A New Divisive Clustering AlgorithmYogananda, A P 01 1900 (has links) (PDF)
No description available.
|
Page generated in 0.0967 seconds