Spelling suggestions: "subject:"epectral clustering"" "subject:"epectral klustering""
1 |
Klustring för Oceanografiska Mätningar i ÖstersjönDerksen, Filip, Woxenius, Olof January 2023 (has links)
Målet med denna studie var att undersöka inverkan av brus, normaliseringen av Laplacianen, antalet kluster k och antalet grannar i närhetsgrafen knn på en implementation av spektral klustring. Med hjälp av den framtagna klustringen skulle lämpligheten att använda spektral klustring i en oceanografisk tillämpning utvärderas. Undersökningen utfördes på SMHIs data från två väderstationer under två olika tidpunkter: Vinga (Juli, 2019) och Visby (Juli, 1987). Datan behandlades med hjälp av MATLABs zscore-funktion och användes sedan i den spektrala klustringsalgoritmen. Klustringens kvalitet avgjordes genom att betrakta den spektrala tätheten, beräkna den genomsnittliga variansen mellan kluster och granska egenvärdenas storlek. Resultaten visade att bruset kunde försummas, att den icke-normaliserade Laplacianen var att föredra samt att k = 12 och knn = 15 var ett optimalt parameterval förVinga 2019. Dessutom tycktes vissa oceanografiska fenomen, såsom tidvatten och Ekmaneffekten, återfinnas i klustringen. Slutligen tycks spektral klustring vara en lämplig metod för enklare oceanografiska tillämpningar, även om valet av parametrar måste testas för varje applikation av algoritmen.
|
2 |
Efficient Community Detection for Large Scale Networks via Sub-samplingBellam, Venkata Pavan Kumar 18 January 2018 (has links)
Many real-world systems can be represented as network-graphs. Some of the networks have an inherent community structure based on interactions. The problem of identifying this grouping structure given a graph is termed as community detection problem which has certain existing algorithms. This thesis contributes by providing specific improvements to various community detection algorithms such as spectral clustering and extreme point algorithm. One of the main contributions is proposing a new sub-sampling method to make existing spectral clustering method scalable by reducing the computational complexity. Also, we have implemented extreme points algorithm for a general multiple communities detection case along with a sub-sampling based version to reduce the computational complexity. We have also developed spectral clustering algorithm for popularity-adjusted block model (PABM) model based graphs to make the algorithm exact thus improving its accuracy. / Master of Science / We live in an increasingly interconnected world, where agents constantly interact with each other. This general agent-interaction framework describes many important systems, such as social interpersonal systems, protein interaction systems, trade and financial systems, power grids, and the World Wide Web, to name a few. By denoting agents as nodes and their interconnections as links, any such system can be represented as a network. Such networks or graphs provide a powerful and universal representation for analyzing a wide variety of systems spanning a remarkable range of scientific disciplines. Networks act as conduits for many kinds of transmissions. For instance, they are influential in the dissemination of ideas, adoption of technologies, helping find jobs and spread of diseases. Thus networks play a critical role both in providing information and helping make decisions making them a crucial part of the Data and Decisions Destination Area. A well-known feature of many networks is community structure. Nodes in a network are often found to belong to groups or communities that exhibit similar behavior. The identification of this community structure, called community detection, is an important problem with many critical applications. For example, communities in a protein interaction network often correspond to functional groups. This thesis focuses on cutting-edge methods for community detection in networks. The main approach is efficient community detection via sub-sampling. This is applied to two different approaches. The first approach is optimization of a modularity function using a low-rank approximation for multiple communities. The second approach is a spectral clustering where we aim to formulate an algorithm for community detection by exploiting the eigenvectors of the network adjacency matrix.
|
3 |
Graph analysis combining numerical, statistical, and streaming techniquesFairbanks, James Paul 27 May 2016 (has links)
Graph analysis uses graph data collected on a physical, biological, or social
phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep
understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each
contribute a fragment to understanding the graph; however, their combination
allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis
problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis.
|
4 |
Inquiry into the nature and causes of individual differences in economicsBrocklebank, Sean January 2012 (has links)
The thesis contains four chapters on the structure and predictability of individual differences Chapter 1. Re-analyses data from Holt and Laury's (2002) risk aversion experiments. Shows that big-stakes hypothetical payoffs are better than small-stakes real-money payoffs for predicting choices in big-stakes real-money gambles (in spite of the presence of hypothetical bias). Argues that hypothetical bias is a problem for calibration of mean preferences but not for prediction of the rank order of subjects' preferences. Chapter 2. Describes an experiment: Participants were given personality tests and played a series of dictator and response games over a two week period. It was found that social preferences are one-dimensional, stable across a two-week interval and significantly related to the Big Five personality traits. Suggestions are given about ways to modify existing theories of social preference to accommodate these findings. Chapter 3. Applies a novel statistical technique (spectral clustering) to a personality data set for the first time. Finds the HEXACO six-factor structure in an English-language five-factor questionnaire for the first time. Argues that the emphasis placed on weak relationships is critical to settling the dimensionality debate within personality theory, and that spectral clustering provides a more useful perspective on personality data than does traditional factor analysis. Chapter 4. Outlines the relevance of extraversion for economics, and sets up a model to argue that personality differences in extraversion may have evolved through something akin to a war of attrition. This model implies a positive relationship between extraversion and risk aversion, and a U-shaped relationship between extraversion and loss aversion.
|
5 |
Clustering Methods and Their Applications to Adolescent Healthcare DataMayer-Jochimsen, Morgan 01 April 2013 (has links)
Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.
|
6 |
Clustering Methods and Their Applications to Adolescent Healthcare DataMayer-Jochimsen, Morgan 01 January 2013 (has links)
Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.
|
7 |
Quantifying the Structure of Misfolded Proteins Using Graph TheoryWitt, Walter G 01 May 2017 (has links)
The structure of a protein molecule is highly correlated to its function. Some diseases such as cystic fibrosis are the result of a change in the structure of a protein so that this change interferes or inhibits its function. Often these changes in structure are caused by a misfolding of the protein molecule. To assist computational biologists, there is a database of proteins together with their misfolded versions, called decoys, that can be used to test the accuracy of protein structure prediction algorithms. In our work we use a nested graph model to quantify a selected set of proteins that have two single misfold decoys. The graph theoretic model used is a three tiered nested graph. Measures based on the vertex weights are calculated and we compare the quantification of the proteins with their decoys. Our method is able to separate the misfolded proteins from the correctly folded proteins.
|
8 |
A Clustering Method For The Problem Of Protein Subcellular LocalizationBezek, Perit 01 December 2006 (has links) (PDF)
In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein&rsquo / s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common / that is, occurrence (frequency) of common subsequences should be high.
Our idea is to find the common subsequences through clustering and use these common groups (implicit motifs) to classify proteins. To calculate the distance between two subsequences, traditional string edit distance is modified so that only replacement is allowed and the cost of replacement is related to an amino acid substitution matrix. Based on the modified string edit distance, spectral clustering embeds the subsequences into some transformed space for which the clustering problem is expected to become easier to solve. For a given protein sequence, distribution of its subsequences over the clusters is the feature vector which is subsequently fed to a classifier. The most important aspect if this approach is the use of spectral clustering based on modified string edit distance.
|
9 |
Stability Selection of the Number of ClustersReizer, Gabriella v 18 April 2011 (has links)
Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.
|
10 |
The quantification and visualisation of human flourishing.Henley, Lisa January 2015 (has links)
Economic indicators such as GDP have been a main indicator of human progress since the first half of last century. There is concern that continuing to measure our progress and / or wellbeing using measures that encourage consumption on a planet with limited resources, may not be ideal.
Alternative measures of human progress, have a top down approach where the creators decide what the measure will contain.
This work defines a 'bottom up' methodology an example of measuring human progress that doesn't require manual data reduction. The technique allows visual overlay of other 'factors' that users may feel are particularly important.
I designed and wrote a genetic algorithm, which, in conjunction with regression analysis, was used to select the 'most important' variables from a large range of variables loosely associated with the topic. This approach could be applied in many areas where there are a lot of data from which an analyst must choose.
Next I designed and wrote a genetic algorithm to explore the evolution of a spectral clustering solution over time. Additionally, I designed and wrote a genetic algorithm with a multi-faceted fitness function which I used to select the most appropriate clustering procedure from a range of hierarchical agglomerative methods. Evolving the algorithm over time was not successful in this instance, but the approach holds a lot of promise as an alternative to 'scoring' new data based on an original solution, and as a method for using alternate procedural options to those an analyst might normally select.
The final solution allowed an evolution of the number of clusters with a fixed clustering method and variable selection over time. Profiling with various external data sources gave consistent and interesting interpretations to the clusters.
|
Page generated in 0.1172 seconds