Global ETD Search

1	Graph analysis combining numerical, statistical, and streaming techniques Fairbanks, James Paul 27 May 2016 (has links) Graph analysis uses graph data collected on a physical, biological, or social phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each contribute a fragment to understanding the graph; however, their combination allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis. Graph analysis Graph algorithms Data analysis Spectral clustering Numerical analysis
2	Inquiry into the nature and causes of individual differences in economics Brocklebank, Sean January 2012 (has links) The thesis contains four chapters on the structure and predictability of individual differences Chapter 1. Re-analyses data from Holt and Laury's (2002) risk aversion experiments. Shows that big-stakes hypothetical payoffs are better than small-stakes real-money payoffs for predicting choices in big-stakes real-money gambles (in spite of the presence of hypothetical bias). Argues that hypothetical bias is a problem for calibration of mean preferences but not for prediction of the rank order of subjects' preferences. Chapter 2. Describes an experiment: Participants were given personality tests and played a series of dictator and response games over a two week period. It was found that social preferences are one-dimensional, stable across a two-week interval and significantly related to the Big Five personality traits. Suggestions are given about ways to modify existing theories of social preference to accommodate these findings. Chapter 3. Applies a novel statistical technique (spectral clustering) to a personality data set for the first time. Finds the HEXACO six-factor structure in an English-language five-factor questionnaire for the first time. Argues that the emphasis placed on weak relationships is critical to settling the dimensionality debate within personality theory, and that spectral clustering provides a more useful perspective on personality data than does traditional factor analysis. Chapter 4. Outlines the relevance of extraversion for economics, and sets up a model to argue that personality differences in extraversion may have evolved through something akin to a war of attrition. This model implies a positive relationship between extraversion and risk aversion, and a U-shaped relationship between extraversion and loss aversion. 330.015195
3	Clustering Methods and Their Applications to Adolescent Healthcare Data Mayer-Jochimsen, Morgan 01 April 2013 (has links) Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data. K-means spectral clustering clustering healthcare Other Applied Mathematics
4	Clustering Methods and Their Applications to Adolescent Healthcare Data Mayer-Jochimsen, Morgan 01 January 2013 (has links) Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data. K-means spectral clustering clustering healthcare Other Applied Mathematics
5	Quantifying the Structure of Misfolded Proteins Using Graph Theory Witt, Walter G 01 May 2017 (has links) The structure of a protein molecule is highly correlated to its function. Some diseases such as cystic fibrosis are the result of a change in the structure of a protein so that this change interferes or inhibits its function. Often these changes in structure are caused by a misfolding of the protein molecule. To assist computational biologists, there is a database of proteins together with their misfolded versions, called decoys, that can be used to test the accuracy of protein structure prediction algorithms. In our work we use a nested graph model to quantify a selected set of proteins that have two single misfold decoys. The graph theoretic model used is a three tiered nested graph. Measures based on the vertex weights are calculated and we compare the quantification of the proteins with their decoys. Our method is able to separate the misfolded proteins from the correctly folded proteins. mathematical biology graph theory proteins spectral clustering computational biology nest graph model Other Applied Mathematics
6	A Clustering Method For The Problem Of Protein Subcellular Localization Bezek, Perit 01 December 2006 (has links) (PDF) In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein&rsquo / s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common / that is, occurrence (frequency) of common subsequences should be high. Our idea is to find the common subsequences through clustering and use these common groups (implicit motifs) to classify proteins. To calculate the distance between two subsequences, traditional string edit distance is modified so that only replacement is allowed and the cost of replacement is related to an amino acid substitution matrix. Based on the modified string edit distance, spectral clustering embeds the subsequences into some transformed space for which the clustering problem is expected to become easier to solve. For a given protein sequence, distribution of its subsequences over the clusters is the feature vector which is subsequently fed to a classifier. The most important aspect if this approach is the use of spectral clustering based on modified string edit distance. QA Computer Software 76.75-76.765
7	Stability Selection of the Number of Clusters Reizer, Gabriella v 18 April 2011 (has links) Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples. Consistency Cross validation Hierarchical clustering Instability k-means clustering Spectral clustering Stability Mathematics
8	The quantification and visualisation of human flourishing. Henley, Lisa January 2015 (has links) Economic indicators such as GDP have been a main indicator of human progress since the first half of last century. There is concern that continuing to measure our progress and / or wellbeing using measures that encourage consumption on a planet with limited resources, may not be ideal. Alternative measures of human progress, have a top down approach where the creators decide what the measure will contain. This work defines a 'bottom up' methodology an example of measuring human progress that doesn't require manual data reduction. The technique allows visual overlay of other 'factors' that users may feel are particularly important. I designed and wrote a genetic algorithm, which, in conjunction with regression analysis, was used to select the 'most important' variables from a large range of variables loosely associated with the topic. This approach could be applied in many areas where there are a lot of data from which an analyst must choose. Next I designed and wrote a genetic algorithm to explore the evolution of a spectral clustering solution over time. Additionally, I designed and wrote a genetic algorithm with a multi-faceted fitness function which I used to select the most appropriate clustering procedure from a range of hierarchical agglomerative methods. Evolving the algorithm over time was not successful in this instance, but the approach holds a lot of promise as an alternative to 'scoring' new data based on an original solution, and as a method for using alternate procedural options to those an analyst might normally select. The final solution allowed an evolution of the number of clusters with a fixed clustering method and variable selection over time. Profiling with various external data sources gave consistent and interesting interpretations to the clusters. human flourishing genetic algorithms spectral clustering data visualisation data reduction human progress
9	Text Clustering with String Kernels in R Karatzoglou, Alexandros, Feinerer, Ingo January 2006 (has links) (PDF) We present a package which provides a general framework, including tools and algorithms, for text mining in R using the S4 class system. Using this package and the kernlab R package we explore the use of kernel methods for clustering (e.g., kernel k-means and spectral clustering) on a set of text documents, using string kernels. We compare these methods to a more traditional clustering technique like k-means on a bag of word representation of the text and evaluate the viability of kernel-based methods as a text clustering technique. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
10	Improving Search Results with Automated Summarization and Sentence Clustering Cotter, Steven 23 March 2012 (has links) Have you ever searched for something on the web and been overloaded with irrelevant results? Many search engines tend to cast a very wide net and rely on ranking to show you the relevant results first. But, this doesn't always work. Perhaps the occurrence of irrelevant results could be reduced if we could eliminate the unimportant content from each webpage while indexing. Instead of casting a wide net, maybe we can make the net smarter. Here, I investigate the feasibility of using automated document summarization and clustering to do just that. The results indicate that such methods can make search engines more precise, more efficient, and faster, but not without costs. / McAnulty College and Graduate School of Liberal Arts / Computational Mathematics / MS / Thesis

Search results