• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1353
  • 364
  • 187
  • 127
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2707
  • 611
  • 529
  • 428
  • 401
  • 338
  • 287
  • 283
  • 278
  • 247
  • 241
  • 209
  • 206
  • 205
  • 193
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Unsupervised Classification of Music Signals: Strategies Using Timbre and Rhythm

Bond, Zachary 06 February 2007 (has links)
This thesis describes the ideal properties of an adaptable music classification system based on unsupervised machine learning, and argues that such a system should be based on the fundamental musical properties of timbre, rhythm, melody and harmony. The first two properties and the signal features associated with them are then explored in more depth. In the area of timbre, the relationship between musical style and commonly-extracted signal features within a broad range of piano music is explored, in an effort to identify features which are consistent among all piano music but different for other instruments. The effect of lossy compression on these same timbre features is also investigated. In the area of rhythm, a new tempo tracking tool is provided which produces a series of histograms containing beat and sub-beat information throughout the course of a musical recording. These histograms are then shown to be useful in the analysis of synthesized rhythms and real music. Additionally, a novel method based on the Expectation-Maximization algorithm is used to extract features for classification from the histograms. / Master of Science
142

Extensions to the OCLUST Algorithm

Clark, Katharine M January 2024 (has links)
OCLUST is a clustering algorithm that trims outliers in Gaussian mixture models. While mixtures of multivariate Gaussian distributions are a useful way to model heterogeneity in data, it is not always an appropriate assumption that the data arise from a finite mixture of Gaussian distributions. This thesis extends the OCLUST algorithm to three types of data which depart from the multivariate Gaussian distribution. The first extension, called funOCLUST, is developed for data which exist in functional form. Next, MVN-OCLUST applies outlier trimming to matrix-variate normal data. Finally, the skewOCLUST algorithm is formulated for skewed data by applying a transformation to normality. However, this final extension occurs after a brief detour in Chapter 5 to establish a foundation for the final chapter. / Thesis / Doctor of Philosophy (PhD)
143

Quantifying Changes in Social Polarization Over Time and Region

Edwards, David Linville 29 July 2024 (has links)
Recent studies indicate that Americans have grown increasingly divided and polarized in recent years cite{boxell2022cross}, cite{hawdon2020social}. This research aims to describe and measure polarization trends across a historical archive of US-based, primarily regional, newspapers. The newspapers chosen are from various US markets to capture any regional differences in the discussion of issues/topics. Our modeling approach employs the Structural Topic Model (STM) to identify topics within a given corpus and measure the tonal differences of articles discussing the same topic. Specifically, we use the STM to infer potentially related articles and a sentiment analyzer called VADER to identify topics with a high level of semantic disparity. Using this method, we assess the polarization of developing and evolving topics, such as sports, politics, and entertainment, and compare how polarization between and within these topics has changed over time. Through this, we create topic-specific sentiment distributions, referred to as polarization distributions. We conclude by demonstrating the usefulness of these distributions in identifying polarization and showing how high polarization aligns with significant social events. / Doctor of Philosophy / Most Americans have a sense that their nation is becoming more socially polarized. Numerous studies and anecdotal evidence supports this. Our aim with this work is develop a method to quantify polarization in text media and apply this method to news articles published in local and national newspapers. Using a statistical model we are able to group articles based on a common shared topic. We then analyze the sentiment of each article and evaluate how sentiments for a particular topic change over time. We then compare newspapers based on location, political endorsements, and ownership groups.
144

Clustering Articles in a Literature Digital Library Based on Content and Usage

Ting, Kang-Di 10 August 2004 (has links)
Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis. Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user¡¦s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.
145

Preference-Anchored Document clustering Technique for Supporting Effective Knowledge and Document Management

Wang, Shin 03 August 2005 (has links)
Effective knowledge management of proliferating volume of documents within a knowledge repository is vital to knowledge sharing, reuse, and assimilation. In order to facilitate accesses to documents in a knowledge repository, use of a knowledge map to organize these documents represents a prevailing approach. Document clustering techniques typically are employed to produce knowledge maps. However, existing document clustering techniques are not tailored to individuals¡¦ preferences and therefore are unable to facilitate the generation of knowledge maps from various preferential perspectives. In response, we propose the Preference-Anchored Document Clustering (PAC) technique that takes a user¡¦s categorization preference (represented as a list of anchoring terms) into consideration to generate a knowledge map (or a set of document clusters) from this specific preferential perspective. Our empirical evaluation results show that our proposed technique outperforms the traditional content-based document clustering technique in the high cluster precision area. Furthermore, benchmarked with Oracle Categorizer, our proposed technique also achieves better clustering effectiveness in the high cluster precision area. Overall, our evaluation results demonstrate the feasibility and potential superiority of the proposed PAC technique.
146

Personalized and Context-aware Document Clustering

Yang, Chin-Sheng 15 July 2007 (has links)
To manage the ever-increasing volume of documents, organizations and individuals typically organize documents into categories (or category hierarchies) to facilitate their document management and support subsequent document retrieval and access. Document clustering is an intentional act that should reflect individuals¡¦ preferences with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective document clustering techniques need to take into account a user¡¦s categorization context defined by or relevant to the target task under consideration. However, existing document clustering techniques generally anchor in pure content-based analysis and therefore are not able to facilitate personalized or context-aware document clustering. In response, we design, implement and empirically evaluate three document clustering techniques capable of facilitating personalized or contextual document clustering. First, we extend an existing document clustering technique (specifically, the partial-clustering-based personalized document-clustering (PEC) approach) and propose the Collaborative Filtering¡Vbased personalized document-Clustering (CFC) technique to overcome the problem of small-sized partial clustering encountered by the PEC technique. Particularly, the CFC technique expands the size of a user¡¦s partial clustering based on the partial clusterings of other users with similar categorization preferences. Second, to support contextual document clustering, we design and implement a Context-Aware document-Clustering (CAC) technique by taking into consideration a user¡¦s categorization preference (i.e., a set of anchoring terms) relevant to the context of a target task and a statistical-based thesaurus constructed from the World Wide Web (WWW) via a search engine. Third, in response to the problem of small-sized set of anchoring terms which can greatly degrade the effectiveness of the CAC technique, we extend CAC and propose a Collaborative Filtering-based Context-Aware document Clustering (CF-CAC) technique. Our empirical evaluation results suggest that our proposed CFC, CAC, and CF-CAC techniques better support the need of personalized and contextual document clustering than do their benchmark techniques.
147

An Efficient Hilbert Curve-based Clustering Strategy for Large Spatial Databases

Lu, Yun-Tai 25 July 2003 (has links)
Recently, millions of databases have been used and we need a new technique that can automatically transform the processed data into useful information and knowledge. Data mining is the technique of analyzing data to discover previously unknown information and spatial data mining is the branch of data mining that deals with spatial data. In spatial data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering is that give n data points in a d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on finding efficient methods to the clustering problem. In general, we can classify these clustering algorithms into four approaches: partition, hierarchical, density-based, and grid-based approaches. The k-means algorithm which is based on the partitioning approach is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent ``natural' cluster, and it is only suitable for concave spherical clusters. The k-means algorithm has high computational complexity and is unable to handle large databases. Therefore, in this thesis, we present an efficient clustering algorithm for large spatial databases. It combines the hierarchical approach with the grid-based approach structure. We apply the grid-based approach, because it is efficient for large spatial databases. Moreover, we apply the hierarchical approach to find the genuine clusters by repeatedly combining together these blocks. Basically, we make use of the Hilbert curve to provide a way to linearly order the points of a grid. Note that the Hilbert curve is a kind of space-filling curves, where a space-filling curve is a continuous path which passes through every point in a space once to form a one-one correspondence between the coordinates of the points and the one-dimensional sequence numbers of the points on the curve. The goal of using space-filling curve is to preserve the distance that points which are close in 2-D space and represent similar data should be stored close together in the linear order. This kind of mapping also can minimize the disk access effort and provide high speed for clustering. This new algorithm requires only one input parameter and supports the user in determining an appropriate value for it. In our simulation, we have shown that our proposed clustering algorithm can have shorter execution time than other algorithms for the large databases. Since the number of data points is increased, the execution time of our algorithm is increased slowly. Moreover, our algorithm can deal with clusters with arbitrary shapes in which the k-means algorithm can not discover.
148

Stability Selection of the Number of Clusters

Reizer, Gabriella v 18 April 2011 (has links)
Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.
149

Hyperplane Clustering : A New Divisive Clustering Algorithm

Yogananda, A P 01 1900 (has links) (PDF)
No description available.
150

Clustering users based on the user’s photo library / Gruppering av användare baserat på användarens fotobibliotek

Bergholm, Marcus January 2018 (has links)
For any user-adaptive system the most important task is to provide the users with what they want and need without them asking for it explicitly. This process can be called personalisation and is done by tailoring the service or product for individual users or user groups. In this thesis, we explore the possibilities to build a model that clusters users based on the user’s photo library. This was to create a better personalised experience within a service called Degoo. The model used to perform the clustering is called Deep Embedding Clustering and was evaluated on several internal indices alongside an automated categorization model to get an indication of what type of images the clusters had. The user clustering was later evaluated based on split-tests running within the Degoo service. The results shows that four out of five clusters had some general indication of types such as vacation photos, clothes, text, and people. The evaluation of the clustering impact on the split-tests shows that we could see patterns that indicated optimal attribute values for certain user clusters. / Det ultimata målet för alla användaranpassade system är att ge användarna det som de behöver utan att de begär det explicit. Denna process kan kallas användaranpassning och görs genom att skräddarsy tjänsten eller produkten för enskilda användare eller användargrupper. I denna avhandling undersöker vi möjligheterna att bygga en modell som grupperar användare baserat på användarnas fotodata. Motivationen bakom detta var att skapa en bättre personlig upplevelse inom en tjänst som heter Degoo. Modellen som används för att utföra grupperingen heter Deep Embedding Clustering och utvärderades på flera interna index tillsammans med en automatiserad kategoriseringsmodell för att få en indikation av vilken typ av bilder grupperna hade. Användargrupperingen utvärderades senare baserat på flera split-test som körs inom Degoo tjänsten. Resultaten visar att fyra av fem grupper hade en allmän indikation på typer som semesterbilder, kläder, text och människor. Utvärderingen av grupperingseffekten på split-testerna visar att vi kunde se mönster som indikerar optimala attributvärden för vissa grupper.

Page generated in 0.0867 seconds