Global ETD Search

1	Consistency of the Spectral Seriation Algorithm Natik, Amine 01 October 2019 (has links) Given n arbitrary objects x1, x2, . . . , xn and a similarity matrix P = (pi,j ) 1≤i,j≤n , where pi,j measures the similarity between xi and xj . If the objects can be ordered along a linear chain so that the similarity decreases as the distance increase within this chain, then the goal of the seriation problem is to recover this ordering π given only the similarity matrix. When the data matrix P is completely accurate, the true relative order can be recovered from the spectral seriation algorithm [1]. In most applications, the matrix P is noisy, but the basic spectral seriation algorithm is still very popular. In this thesis, we study the consistency of this algorithm for a wide variety of statistical models, showing both consistency and bounds on the convergence rates. More specifically, we consider a model matrix P satisfying certain assumptions, and construct a noisy matrix Pb where the input (i, j) is a coin flip with probability pi,j . We show that the output πˆ of the spectral seriation algorithm for the random matrix is very close to the true ordering π. seriation Robinson similarities
2	Dissimilarity Plots. A Visual Exploration Tool for Partitional Clustering. Hahsler, Michael, Hornik, Kurt January 2009 (has links) (PDF) For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. / Series: Research Report Series / Department of Statistics and Mathematics
3	Getting Things in Order: An Introduction to the R Package seriation Hahsler, Michael, Hornik, Kurt, Buchta, Christian 18 March 2008 (has links) (PDF) Seriation, i.e., finding a suitable linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem's combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides an infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented.
4	Sparse and discriminative clustering for complex data : application to cytology / Classification non supervisée discriminante et parcimonieuse pour des données complexes : une application à la cytologie Brunet, Camille 01 December 2011 (has links) Les thèmes principaux de ce mémoire sont la parcimonie et la discrimination pour la modélisation de données complexes. Dans un première partie de ce mémoire, nous nous plaçons dans un contexte de modèle de mélanges gaussiens: nous introduisons une nouvelle famille de modèles probabilistes qui simultanément classent et trouvent un espace discriminant tel que cet espace discrimine au mieux les groupes. Une famille de 12 modèles est introduite et se base sur deux idées clefs: tout d'abord, les données réelles vivent dans un sous-espace latent de dimension intrinsèque plus petite que celle de l'espace observé; deuxièmement, un sous-espace de dimensions K-1 est suffisant pour discriminer K groupes; enfin, l'espace observé et celui latent sont liés par une transformation linéaire. Une procédure d'estimation, appelée Fisher-EM, est proposée et améliore la plupart du temps les performances de clustering grâce à l'utilisation du sous-espace discriminant. Puisque chaque axe engendrant le sous-espace discriminant est une combinaison linéaire des variables d'origine, nous avons proposé trois méthodes différentes basées sur des critères pénalisés afin de faciliter l'interprétation des résultats. En particulier, ces méthodes permettent d'introduire de la parcimonie directement dans les composantes de la matrice de projection et peut se traduite comme une étape de sélection de variables discriminantes pour la classification. Dans une seconde partie, nous nous plaçons dans le contexte de la sériation. Nous proposons une mesure de dissimilarités basée sur le voisinage commun qui permet d'introduire de la parcimonie dans les données. Une procédure algorithmique appelée l'algorithme PB-Clus est introduite et permet d'obtenir une représentation diagonale par blocs des données. Cet outil permet de révéler la structure intrinsèque des données même dans le cas de données fortement bruitées ou de recouvrement de groupes. Ces deux méthodes ont été validées dans le cadre d'une application biologique basée sur la détection de cellules cancéreuses. / The main topics of this manuscript are sparsity and discrimination for modeling complex data. In a first part, we focus on the GMM context: we introduce a new family of probabilistic models which both clusters and finds a discriminative subspace chosen such as it best discriminates the groups. A family of 12 DLM models is introduced and is based on two three-ideas: firstly, the actual data live in a latent subspace with an intrinsic dimension lower than the dimension of the observed space; secondly, a subspace of K-1 dimensions is theoretically sufficient to discriminate K groups; thirdly, the observation and the latent spaces are linked by a linear transformation. An estimation procedure, named Fisher-EM is proposed and improves, most of the time, clustering performances owing to the use of a discriminative subspace. As each axis, spanning the discriminative subspace, is a linear combination of all original variables, we therefore proposed 3 different methods based on a penalized criterion in order to ease the interpretation results. In particular, it allows to introduce sparsity directly in the loadings of the projection matrix which enables also to make variable selection for clustering. In a second part, we deal with the seriation context. We propose a dissimilarity measure based on a common neighborhood which allows to deal with noisy data and overlapping groups. A forward stepwise seriation algorithm, called the PB-Clus algorithm, is introduced and allows to obtain a block representation form of the data. This tool enables to reveal the intrinsic structure of data even in the case of noisy data, outliers, overlapping and non-Gaussian groups. Both methods has been validated on a biological application based on the cancer cell detection. Méthode de sériation parcimonieuse Sparse seriation method
5	A connectionist model of the development of children's seriation abilities / Mareschal, Denis January 1992 (has links) No description available. Seriation by children (Psychology) Cognition in children.
6	A connectionist model of the development of children's seriation abilities / Mareschal, Denis January 1992 (has links) This study presents a modular connectionist model of the development of seriation in children. The model makes use of the cascade-correlation generative algorithm. The algorithm builds its own network topology as is required to solve the task. This model develops in a stage-like manner and goes beyond the previous rule based models by successfully capturing both the variability in strategies used and the sensitivity to differences in size increments. The application of a systematic operational procedure to a subset of the elements in the series is identified as a source of empirical seriation. Finally, the model predicts that the degree of disorder of the array under construction is a significant factor in determining the observed seriating behavior. A follow-up study involving 4- to 7-year-old children finds that the degree of disorder is a significant factor in children's abilities to recognize a completed series. Seriation by children (Psychology) Cognition in children.
7	A Study of the Psychological Relationships Among the Piagetian Operations of Transitivity, Seriation and Classification Brennan, Wendy Margaret 08 1900 (has links) <p> Five to eight year old children were subjects in this study which examined relationships among the Piagetian operations of class inclusion, transitivity and seriation. Behavioral and verbal measures were taken of the latter two and the stimuli varied along size and weight dimensions. Transitivity and seriation were related on the verbal level only. Behavioral measures showed less relationship across operations and dimensions than verbal measures and doubts were raised as to the adequacy of using behavioral measures alone in assessing operational understanding. Only one measure of class inclusion showed any relationship to transitivity and seriation. Independent tests revealed that while simultaneous combinativity may be a component ability of all these operations, successive combinativity definitely was not.</p> / Thesis / Doctor of Philosophy (PhD)
8	Visualization and Unsupervised Pattern Recognition in Multidimensional Data Using a New Heuristic for Linear Data Ordering Aliyev, Denis Aliyevich 30 November 2016 (has links) No description available. Statistics seriation heat map tree-penalized TSP tpTSP visualization
9	Getting Things in Order: An Introduction to the R package seriation Hahsler, Michael, Hornik, Kurt, Buchta, Christian January 2007 (has links) (PDF) Seriation, i.e., finding a linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem's combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides the infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented. / Series: Research Report Series / Department of Statistics and Mathematics RVK ST 600 ; MSC_05A05, CCS_G.2.1
10	The archaeology and ethnohistory of the Hasinai Caddo : material culture and the course of European contact Marceaux, Paul Shawn Joseph 01 June 2011 (has links) This dissertation compiles information related to Caddo archaeology and history and examines in detail the collections from various Historic Caddo sites and Spanish missions. The study uses materials from these sites, along with the archival records from early European expeditions and colonization efforts, to try to identify archaeological correlates of the groups that constituted the Hasinai Caddo. The objective is to determine if specific attributes of ceramic style and technology reflect the position and geographical extent of the principal tribes of the Hasinai Caddo as indicated by the historical records. To accomplish this I examined numerous collections from clusters of historic period sites in the Neches and Angelina River valleys of east Texas, including sites occupied by the Hasinai Caddo and two of the three Spanish missions discovered in east Texas. The study analyzes, organizes, and characterizes distinct ceramic assemblages and other artifacts in the collections. Another goal of this research is to better define the periods of use and chronological relationships of Historic Caddo sites. Ceramic frequency seriations of established types, supported by other evidence, demonstrate chronological orderings reflected in the collections. The cultural landscape of the Hasinai Caddo, broadly characterized, consisted of sedentary groups living in dispersed farmsteads as thriving agriculturalists, organized in a complex hierarchy of social and spiritual leaders. Sustained contact with Spanish missionaries brought trade materials and technology in tandem with social objectives and policies, many aimed at replacing Caddo cultural identity under the guise of religious conversion, relocation, and trade. While the number of Caddo groups identified in the ethnohistoric record decreased as time passed, it is clear from the archives that groups of the Hasinai endured and maintained distinct affiliations during the contact period. The ceramic analyses support the historic record on this point and demonstrate how assemblages are part of well-established and persistent ceramic traditions. At the same time, the study documents distinct archaeological signatures that may represent socio-cultural, political, and/or economic differences in the Hasinai Caddo. Evidence also demonstrates how the Hasinai Caddo were both willing participants in, and at the same time rejected, the Spanish mission system. / text Caddo archaeology Archeology East Texas Ceramic seriation Hasinai Hasinai Caddo archaeology Neches River

Search results