Global ETD Search

361	Sparse Similarity and Network Navigability for Markov Clustering Enhancement Durán Cancino, Claudio Patricio 29 September 2021 (has links) Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability. info:eu-repo/classification/ddc/004 ddc:004
362	Clustering Techniques for Mining and Analysis of Evolving Data Devagiri, Vishnu Manasa January 2021 (has links) The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed. The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure. The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data. Clustering analysis Concept drift Evolutionary clustering Machine learning Streaming data Computer Sciences Datavetenskap (datalogi)
363	Clustering dans les noyaux légers : une approche multi-méthodique / Clustering in light nuclear systems : a multi-methodic approach Dell'aquila, Daniele 15 January 2018 (has links) Les phénomènes de clustering caractérisent plusieurs domaines des sciences naturelles et de la sociologie. Ils consistent en l'auto-organisation de groupes d'objets en sous-groupes corrélés, en introduisant des symétries et, dans certains cas, un certain degré d'ordre dans le système global. En physique nucléaire, ces aspects représentent l'un des effets les plus fascinants induits par le principe de Pauli dans les noyaux. Leur investigation est un outil extrêmement puissant pour comprendre le comportement des forces nucléaires dans les systèmes d’N corps avec interactions les unes les autres. Dans ce thèse, je discute des résultats d'une campagne expérimentale qui explore les aspects de clustering dans les systèmes nucléaires légers à partir d'une approche multi-méthodique et en utilisant des techniques différentes et complémentaires. Le travail commence avec le noyau 10Be, prévu pour être constitué par une structure de cluster moléculaire de deux particules alpha liées par les deux neutrons de valence supplémentaire. L'expérience a été réalisée avec des faisceaux de 10Be produits au laboratoire INFN-LNS avec la technique de fragmentation des projectiles FRIBs. À travers des techniques de corrélation particule-particule, des signaux d'un nouvel état appartenant possiblement à la bande de rotation moléculaire de 10Be ont été observés. Autres noyaux appartenants à la chaîne isotopique du carbone ont également été étudiés pour comprendre comment les phénomènes de clustering évoluent avec l'excès de neutrons. Pour 11C et 13C, nous avons utilisé les réactions nucléaires 10B(p,a) et 9Be(a,a), respectivement, à basse énergie. Ces mesures ont été conduites à l'accélérateur TANDEM de Naples. Les sections efficaces différentielles et les distributions angulaires, ainsi que autres données disponibles dans la littérature, ont été reproduites par des calculs R-matrix, ce qui nous a permis d'affiner la structure de ces noyaux et de suggérer l'existence d'états à cluster.Le noyau 16C a été étudié avec la même configuration expérimentale utilisée dans le cas du 10Be mais avec un faisceau secondaire très intense. J'ai observé des contributions non négligeables dans les voies de désintégration à deux et à trois corps pour le 16C, voies qui représentent des désintégrations extrêmement rares. Enfin, l'état de Hoyle dans 12C (7.654 MeV,0+) a été étudié avec une expérience de haute précision en utilisant la réaction 14N(d,a) à 10,5 MeV à INFN-LNS. L'étude a fourni un limite supérieure pour la voie de désintégration directe en trois alpha avec une précision sans précédent. Ce résultat, qui améliore d'un facteur 5 l'état actuel de la technique, fournit une contrainte importante aux modèles de structure théorique ainsi qu'aux calculs de nucléosynthèse stellaire responsables de l'origine des éléments dans l'univers. Les phénomènes de clustering ont également été étudiés dans les noyaux 19F et 20Ne avec la réaction 19F(p,a) à très basse énergie à l'accélérateur AN-2000 de l'INFN-LNL. Une analyse par R-matrix de la section efficace intégrée a été utilisée pour fournir des informations sur la structure du noyau composé 20Ne avec des implications astrophysiques sur le cycle CNO dans les étoiles. J'ai également étudié les collisions entre les ions lourds à des énergies intermédiaires pour explorer les phénomènes de clustering dans la matière nucléaire diluée et chaude. J'ai développé un modèle thermique des corrélations particules-particules pour décrire la population d'états non liés produits lors de l'évolution des collisions Ar+Ni violentes à 32-95 MeV par nucléon. Les limites d'une approche purement thermique dans un tel système dynamique ont été discutées, avec des idées possibles pour expliquer le mécanisme qui peuplent les états internes dans les noyaux 8Be en discutant l'interconnexion entre la thermodynamique et les effets d'interaction d’état final. Ces études sont importants pour décrire la formation de clusters dans la matière nucléaire. / Clustering phenomena characterise several fields of natural sciences and sociology. They consist on the self-organisation of groups of objects in correlated sub-groups, introducing symmetries and, in some cases, a certain degree of order in the overall system. In nuclear physics, these aspects represent one of the most fascinating effects induced by the Pauli principle in nuclei. Their investigation is an extremely powerful tool to understand the behaviour of nuclear forces in N-body interacting systems. In this thesis, I discuss the results of an experimental campaign that explores clustering aspects in light nuclear systems from a multi-methodic approach and by using different and complementary techniques.The work start with the 10Be nucleus, predicted to be constituted by a molecular cluster structure of two alpha particles kept bound by the two extra valence neutrons. The experiment has been performed with 10Be beams produced at the INFN-LNS laboratory with the FRIBs projectile fragmentation technique. By means of particle-particle correlation techniques, signals of a new state possibly belonging to the 10Be molecular rotational band were observed. Other nuclei along the carbon isotopic chain were also investigated to understand how clustering phenomena evolve with neutron excess. For 11C and 13C we used 10B(p,alpha) and 9Be(alpha,alpha) nuclear reactions, respectively, at low energies. These measurements were made at the tandem accelerator in Naples. Measured Differential cross sections and angular distributions, together with other data available in the literature, were reproduced by R-matrix calculations, which allowed us to refine the spectroscopy of such nuclei and suggest the existence of cluster states, possibly members of molecular rotational bands. The 16C nucleus was investigated with the same setup used in the 10Be case with a very intense secondary beam. I have observed non vanishing yields in both two-body and three-body cluster disintegration channels for 16C which represent extremely rare decays. Finally, the Hoyle state in 12C (7.654 MeV, 0+) was investigated in a high-precision experiment by using the 14N(d,a) reaction at 10.5 MeV at INFN-LNS. The study has provided an upper limit to the direct three-alpha decay process of such state with an unprecedented precision. This result, which improves of a factor 5 the existing state of the art, provides important constraint to theoretical structure models as well as to stellar nucleosynthesis calculations aiming at revealing the origin of elements in the universe. Clustering phenomena have also been studied in 19F and 20Ne nuclei with the 19F(p,a) reaction at very low energies at the AN-2000 accelerator of the INFN-LNL. An R-matrix analysis of the integrated cross-section was used to provide information on the structure of the 20Ne compound nucleus with its astrophysical implications on the CNO cycle in stars.I have also used heavy ion collisions at intermediate energies to explore clustering phenomena in dilute and hot nuclear matter. I have developed a thermal model of particle-particle correlations whit the aim of describing the population of decaying unbound states produced during the evolution of violent Ar+Ni collisions at 32-95 MeV per nucleon. The limitations of a purely thermal approach in such a dynamical system have been discussed, with possible ideas to explain the mechanism which populate internal states in 8Be cluster states accounting for the interplay of thermodynamics with final state interaction effects. Such studies are relevant to model cluster formation in nuclear matter. Clustering Noyaux Matiere Nucleaire Reactions nucleaires Clustering Nuclei Nuclear Matter Nuclear Reactions
364	Assessment of photos in albums based on aesthetics and context / Évaluation de photos dans des albums basée sur l'esthétique et le contexte Kuzovkin, Dmitry 21 June 2019 (has links) Le processus de sélection de photos dans des albums peut être considérablement amélioré à l’aide d’un critère d’évaluation automatique des qualités d’une photo. Cependant, les méthodes existantes abordent ce problème de manière indépendante, c’est à dire en évaluant chaque image séparément des autres images d'un album. Dans cette thèse, nous explorons la modélisation du contexte d’une photo via une approche de clustering de collections de photos et la possibilité d'appliquer l’information de contexte à l'évaluation d’une photo. Nous avons effectué des études subjectives permettant d’étudier la manière dont les utilisateurs regroupent et sélectionnent des photos dans un album. Ces études ont permis une estimation du niveau de l’accord entre les différents utilisateurs. Nous avons aussi étudié la manière dont le contexte influence leurs décisions. Après avoir étudié la nature des décisions des utilisateurs, nous proposons une approche informatique pour modéliser leur comportement. Tout d'abord, nous introduisons une méthode de clustering hiérarchique, qui permet de regrouper des photos similaires selon une structure de similarité à plusieurs niveaux, basée sur des descripteurs visuels. Ensuite, les informations de contexte de la photo sont utilisées pour adapter le score de la photo pré-calculé indépendamment, en utilisant les données basées sur des statistiques et une approche d'apprentissage automatique. De plus, comme la majorité des méthodes récentes d'évaluation de la photo sont basées sur des réseaux de neurones convolutionnels, nous avons exploré et visualisé les caractéristiques esthétiques apprises par ces méthodes. / An automatic photo assessment can significantly aid the process of photo selection within photo collections. However, existing computational methods approach this problem in an independent manner, by evaluating each image apart from other images in a photo album. In this thesis, we explore the modeling of photo context via a clustering approach for photo collections and the possibility of applying such context information in photo assessment. To better understand user actions within photo albums, we conduct experimental user studies, where we study how users cluster and select photos in photo collections. We estimate the level of agreement between users and investigate how the context, defined by similar photos in corresponding clusters, influences their decisions. After studying the nature of user decisions, we propose a computational approach to model user behavior. First, we introduce a hierarchical clustering method, which allows to group similar photos according to a multi-level similarity structure, based on visual descriptors. Then, the photo context information is extracted from the obtained cluster data and used to adapt a pre-computed independent photo score, using the statistics-based data and a machine learning approach. In addition, as the majority of recent methods for photo assessment are based on convolutional neural networks, we explore and visualize the aesthetic characteristics learned by such methods. Évaluation d'image Sélection de photos Clustering Organisation d'albums photo Image assessment Photo selection Clustering Photo collection organization
365	Topological Hierarchies and Decomposition: From Clustering to Persistence Brown, Kyle A. 27 May 2022 (has links) No description available. Computer Science topological data analysis hierarchical clustering exploratory data analysis topology clustering data science
366	Clustering High-dimensional Noisy Categorical and Mixed Data Zhiyi Tian (10925280) 27 July 2021 (has links) Clustering is an unsupervised learning technique widely used to group data into homogeneous clusters. For many real-world data containing categorical values, existing algorithms are often computationally costly in high dimensions, do not work well on noisy data with missing values, and rarely provide theoretical guarantees on clustering accuracy. In this thesis, we propose a general categorical data encoding method and a computationally efficient spectral based algorithm to cluster high-dimensional noisy categorical (nominal or ordinal) data. Under a statistical model for data on m attributes from n subjects in r clusters with missing probability epsilon, we show that our algorithm exactly recovers the true clusters with high probability when mn(1-epsilon) >= CMr<sup>2</sup> log<sup>3</sup>M, with M=max(n,m) and a fixed constant C. Moreover, we show that mn(1- epsilon)<sup>2</sup> >= r *delta/2 with 0< delta <1 is necessary for any algorithm to succeed with probability at least (1+delta)/2. In case, where m=n and r is fixed, for example, the sufficient condition matches with the necessary condition up to a polylog(n) factor, showing that our proposed algorithm is nearly optimal. We also show our algorithm outperforms several existing algorithms in both clustering accuracy and computational efficiency, both theoretically and numerically. In addition, we propose a spectral algorithm with standardization to cluster mixed data. This algorithm is computationally efficient and its clustering accuracy has been evaluated numerically on both real world data and synthetic data. Statistics Clustering Categorical values Noisy and sparse data High dimensions Spectral algorithm Clustering accuracy
367	Apprentissage de structures dans les valeurs extrêmes en grande dimension / Discovering patterns in high-dimensional extremes Chiapino, Maël 28 June 2018 (has links) Nous présentons et étudions des méthodes d’apprentissage non-supervisé de phénomènes extrêmes multivariés en grande dimension. Dans le cas où chacune des distributions marginales d’un vecteur aléatoire est à queue lourde, l’étude de son comportement dans les régions extrêmes (i.e. loin de l’origine) ne peut plus se faire via les méthodes usuelles qui supposent une moyenne et une variance finies. La théorie des valeurs extrêmes offre alors un cadre adapté à cette étude, en donnant notamment une base théorique à la réduction de dimension à travers la mesure angulaire. La thèse s’articule autour de deux grandes étapes : - Réduire la dimension du problème en trouvant un résumé de la structure de dépendance dans les régions extrêmes. Cette étape vise en particulier à trouver les sous-groupes de composantes étant susceptible de dépasser un seuil élevé de façon simultané. - Modéliser la mesure angulaire par une densité de mélange qui suit une structure de dépendance déterminée à l’avance. Ces deux étapes permettent notamment de développer des méthodes de classification non-supervisée à travers la construction d’une matrice de similarité pour les points extrêmes. / We present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension. Théorie des valeurs extrêmes Apprentissage non-supervisé Réduction de dimension Clustering Extreme value theory Unsupervised learning Dimension reduction Clustering
368	Bio-inspired Solutions for Optimal Management in Wireless Sensor Networks / Intégration des Solutions Bio-inspirées pour une Gestion optimale dans les Réseaux de Capteur sans Fils Abba Ari, Ado adamou 12 July 2016 (has links) Au cours de ces dernières années, les réseaux de capteurs sans ﬁls ont connu un intérêt croissant à la fois au sein de la communauté scientiﬁque et industrielle en raison du large potentiel en terme d’applications oﬀertes. Toutefois, les capteurs sont conçus avec d’extrêmes contraintes en ressources, en particulier la limitation de l’énergie. Il est donc nécessaire de concevoir des protocoles eﬃcaces, évolutifs et moins consommateur d’énergie aﬁn de prolonger la durée de vie de ces réseaux. Le clustering est une approche très populaire, utilisée pour l’optimisation de la consommation d’énergie des capteurs. Cette technique permet d’inﬂuencer fortement la performance globale du réseau. En outre, dans de tels réseaux, le routage génère un nombre assez élevé d’opérations non négligeables qui affectent considérablement la durée de vie du réseau ainsi que le débit offert. Dans cette thèse, nous nous sommes intéressés d’une part aux problèmes de clustering et de routage en utilisant des méthodes d’optimisation inspirées de certaines sociétés biologiques fournissant des modèles puissants qui conduisent à l’établissement d’une intelligence globale en se basant sur des comportements individuels très simples. Nous avons proposé une approche de clustering distribuée basée sur le processus de sélection des sites de nidiﬁcation chez les colonies d’abeilles. Nous avons formulé le problème de clustering distribuée comme un processus social de prise de décision dans lequel les capteurs agissent d’une manière collective pour choisir des représentants au sein de leurs clusters respectifs. Le protocole proposé assure une distribution de l’équilibrage de charge entre les membres de chaque cluster aﬁn de prolonger la durée de vie du réseau en faisant un compromis entre la consommation d’énergie et la qualité du canal de communication. D’autre part, nous avons proposé un protocole de routage basé sur des clusters en utilisant un algorithme inspiré du phénomène de butinage des abeilles. Nous avons formulé le problème de clustring comme un problème de programmation linéaire alors que le problème du routage est résolu par une fonction de coûts. L’algorithme de clustering permet la construction eﬃcace des clusters en faisant un compromis entre la consommation d’énergie et la qualité du canal communication au sein des clusters tandis que le routage est réalisé de manière distribuée. Les protocoles proposés ont été intensivement expérimentés sur plusieurs topologies dans diﬀérents scénarios de réseaux et comparés avec des protocoles bien connus de clustering et routage. Les résultats obtenus démontrent l’efficacité des protocoles proposés. / During the past few years, wireless sensor networks witnessed an increased interest in both the industrial and the scientiﬁc community due to the potential wide area of applications. However, sensors’ components are designed with extreme resource constraints, especially the power supply limitation. It is therefore necessary to design low power, scalable and energy eﬃcient protocols in order to extend the lifetime of such networks. Cluster-based sensor networks are the most popular approach for optimizing the energy consumption of sensor nodes, in order to strongly inﬂuence the overall performance of the network. In addition, routing involves non negligible operations that considerably aﬀect the network lifetime and the throughput. In this thesis, we addressed the clustering and routing problems by hiring intelligent optimization methods through biologically inspired computing, which provides the most powerful models that enabled a global intelligence through local and simple behaviors. We proposed a distributed clustering approach based on the nest-sites selection process of a honeybee swarm. We formulated the distributed clustering problem as a social decision-making process in which sensors act in a collective manner to choose their cluster heads. To achieve this choice, we proposed a multi- objective cost-based ﬁtness function. In the design of our proposed algorithm, we focused on the distribution of load balancing among each cluster member in order to extend network lifetime by making a tradeoﬀ between the energy consumption and the quality of the communication link among sensors. Then, we proposed a centralized cluster-based routing protocol for wireless sensor networks by using the fast and eﬃcient searching features of the artiﬁcial bee colony algorithm. We formulated the clustering as a linear programming problem and the routing problem is solved by proposing a cost-based function. We designed a multi-objective ﬁtness function that uses the weighted sum approach, in the assignment of sensors to a cluster. The clustering algorithm allows the eﬃcient building of clusters by making a tradeoﬀ between the energy consumption and the quality of the communication link within clusters while the routing is realized in a distributed manner. The proposed protocols have been intensively experimented with a number of topologies in various network scenarios and the results are compared with the well-known cluster-based routing protocols. The results demonstrated the eﬀectiveness of the proposed protocols. Rcsf Routage Clustering Solutions bio-Inspirées WSNs Routing Clustering Bio-Inspired computing 006.3
369	Contributions to variable selection, clustering and statistical estimation inhigh dimension / Quelques contributions à la sélection de variables, au clustering et à l’estimation statistique en grande dimension Ndaoud, Mohamed 03 July 2019 (has links) Cette thèse traite les problèmes statistiques suivants : la sélection de variables dans le modèle de régression linéaire en grande dimension, le clustering dans le modèle de mélange Gaussien, quelques effets de l'adaptabilité sous l'hypothèse de parcimonie ainsi que la simulation des processus Gaussiens.Sous l'hypothèse de parcimonie, la sélection de variables correspond au recouvrement du "petit" ensemble de variables significatives. Nous étudions les propriétés non-asymptotiques de ce problème dans la régression linéaire en grande dimension. De plus, nous caractérisons les conditions optimales nécessaires et suffisantes pour la sélection de variables dans ce modèle. Nous étudions également certains effets de l'adaptation sous la même hypothèse. Dans le modèle à vecteur parcimonieux, nous analysons les changements dans les taux d'estimation de certains des paramètres du modèle lorsque le niveau de bruit ou sa loi nominale sont inconnus.Le clustering est une tâche d'apprentissage statistique non supervisée visant à regrouper des observations proches les unes des autres dans un certain sens. Nous étudions le problème de la détection de communautés dans le modèle de mélange Gaussien à deux composantes, et caractérisons précisément la séparation optimale entre les groupes afin de les recouvrir de façon exacte. Nous fournissons également une procédure en temps polynomial permettant un recouvrement optimal des communautés.Les processus Gaussiens sont extrêmement utiles dans la pratique, par exemple lorsqu'il s'agit de modéliser les fluctuations de prix. Néanmoins, leur simulation n'est pas facile en général. Nous proposons et étudions un nouveau développement en série à taux optimal pour simuler une grande classe de processus Gaussiens. / This PhD thesis deals with the following statistical problems: Variable selection in high-Dimensional Linear Regression, Clustering in the Gaussian Mixture Model, Some effects of adaptivity under sparsity and Simulation of Gaussian processes.Under the sparsity assumption, variable selection corresponds to recovering the "small" set of significant variables. We study non-asymptotic properties of this problem in the high-dimensional linear regression. Moreover, we recover optimal necessary and sufficient conditions for variable selection in this model. We also study some effects of adaptation under sparsity. Namely, in the sparse vector model, we investigate, the changes in the estimation rates of some of the model parameters when the noise level or its nominal law are unknown.Clustering is a non-supervised machine learning task aiming to group observations that are close to each other in some sense. We study the problem of community detection in the Gaussian Mixture Model with two components, and characterize precisely the sharp separation between clusters in order to recover exactly the clusters. We also provide a fast polynomial time procedure achieving optimal recovery.Gaussian processes are extremely useful in practice, when it comes to model price fluctuations for instance. Nevertheless, their simulation is not easy in general. We propose and study a new rate-optimal series expansion to simulate a large class of Gaussian processes. Grande dimension Regression linéaire Clustering Transition de phase Linear regression High dimension Clustering Phase transition 510
370	Vocation Clustering for Heavy-Duty Vehicles Kobold, Daniel, Jr. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The identification of the vocation of an unknown heavy-duty vehicle is valuable to parts manufacturers who may not have otherwise access to this information on a consistent basis. This study proposes a methodology for vocation identification that is based on clustering techniques. Two clustering algorithms are considered: K-Means and Expectation Maximization. These algorithms are used to first construct the operating profile of each vocation from a set of vehicles with known vocations. The vocation of an unknown vehicle is then determined using different assignment methods. These methods fall under two main categories: one-versus-all and one-versus-one. The one-versus-all approach compares an unknown vehicle to all potential vocations. The one-versus-one approach compares the unknown vehicle to two vocations at a time in a tournament fashion. Two types of tournaments are investigated: round-robin and bracket. The accuracy and efficiency of each of the methods is evaluated using the NREL FleetDNA dataset. The study revealed that some of the vocations may have unique operating profiles and are therefore easily distinguishable from others. Other vocations, however, can have confounding profiles. This indicates that different vocations may benefit from profiles with varying number of clusters. Determining the optimal number of clusters for each vocation can not only improve the assignment accuracy, but also enhance the computational efficiency of the application. The optimal number of clusters for each vocation is determined using both static and dynamic techniques. Static approaches refer to methods that are completed prior to training and may require multiple iterations. Dynamic techniques involve clusters being split or removed during training. The results show that the accuracy of dynamic techniques is comparable to that of static approaches while benefiting from a reduced computational time. Heavy-Duty Vehicles Vocation Clustering Classification Expectation-Maximization K-Means Clustering Vocation

Search results