Global ETD Search

251	Galaxy Cluster Detection using Nonparametric Maximum Likelihood Estimation of Features in Voronoi Tessellations Pizarro Pizarro, Daniel Iván January 2007 (has links) No description available. Computación Clustering Cúmulos Galaxias Voronoi Verosimilitud
252	La visualisation d’information à l’ère du Big Data : résoudre les problèmes de scalabilité par l’abstraction multi-échelle / Information Visualization in the Big Data era : tackling scalability issues using multiscale abstractions Perrot, Alexandre 27 November 2017 (has links) L’augmentation de la quantité de données à visualiser due au phénomène du Big Data entraîne de nouveaux défis pour le domaine de la visualisation d’information. D’une part, la quantité d’information à représenter dépasse l’espace disponible à l’écran, entraînant de l’occlusion. D’autre part, ces données ne peuvent pas être stockées et traitées sur une machine conventionnelle. Un système de visualisation de données massives doit permettre la scalabilité de perception et de performances. Dans cette thèse, nous proposons une solution à ces deux problèmes au travers de l’abstraction multi-échelle des données. Plusieurs niveaux de détail sont précalculés sur une infrastructure Big Data pour permettre de visualiser de grands jeux de données jusqu’à plusieurs milliards de points. Pour cela, nous proposons deux approches pour implémenter l’algorithme de canopy clustering sur une plateforme de calcul distribué. Nous présentons une application de notre méthode à des données géolocalisées représentées sous forme de carte de chaleur, ainsi qu’à des grands graphes. Ces deux applications sont réalisées à l’aide de la bibliothèque de visualisation dynamique Fatum, également présentée dans cette thèse. / With the advent of the Big Data era come new challenges for Information Visualization. First, the amount of data to be visualized exceeds the available screen space. Second, the data cannot be stored and processed on a conventional computer. To alleviate both of these problems, a Big Data visualization system must provide perceptual and performance scalability. In this thesis, we propose to use multi-scale abstractions as a solution to both of these issues. Several levels of detail can be precomputed using a Big Data Infrastructure in order to visualize big datasets up to several billion points. For that, we propose two approaches to implementing the canopy clustering algorithm for a distributed computation cluster. We present applications of our method to geolocalized data visualized through a heatmap, and big graphs. Both of these applications use the dynamic visualization library, which is also presented in this thesis Mégadonnées Partitionnement Visualisation Big Data Clustering Visualization
253	Facing the real challenges in wireless sensor network-based applications : an adaptative cross-layer self-organization WSN protocol / Se confronter aux exigences des applications à base de réseaux de capteurs en environnement réel : une approche cross-layer adaptative et auto-configurante Guzzo, Natale 15 December 2015 (has links) Le réseau de capteurs sans fil (WSN) est un des protagonistes contribuant à l’évolution et au développement de l’Internet des objets (IoT). Plusieurs cas d’usage peuvent être trouvés dans les différents domaines comme l’industrie du transport maritime où le fret conteneurisé compte environ pour 60% du commerce mondial. Dans ce contexte, la société TRAXENS a développé un dispositif radio alimenté par batterie appelé TRAX-BOX et conçu pour être fixé aux containeurs dans l’objectif de les traquer et les surveiller tout au long de la chaine logistique. Dans cette thèse, je vais présenter une nouvelle pile protocolaire WSN appelée TRAX-NET et conçue pour permettre aux TRAX-BOX de s’auto-organiser dans un réseau sans fil et coopérer pour délivrer les données acquises au serveur TRAXENS d’une façon énergiquement efficiente. Les résultats des simulations et des tests sur le terrain montrent que TRAX-NET est bien optimisé pour les différents scenarios pour lesquels il a été développé et satisfait les exigences de l’application concernée mieux que les autres solutions étudiées dans la littérature. TRAX-NET est une solution complète et adaptée au suivi des conteneurs de fret de par le monde. / Wireless Sensor Networks (WSN) is one of the protagonists contributing to the evolution and the development of the Internet of Things (IoT). Several use cases can be found today in the different fields of the modern technology including the container shipping industry where containerized cargo accounts for about 60 percent of all world seaborne trade. In this context, TRAXENS developed a battery-powered device named TRAX-BOX designed to be attached to the freight containers in order to track and monitor the shipping goods along the whole supply chain. In this thesis, we present a new energy-efficient self-organizing WSN protocol stack named TRAX-NET designed to allow the TRAX-BOX devices to cooperate to deliver the sensed data to the TRAXENS platform.The results of simulations and field tests show that TRAX-NET well perform in the different scenarios in which it is supposed to operate and better fulfil the requirements of the assumed application in comparison with the existing schemes. Suivi de conteneurs Pile protocolaire Clustering 004.65
254	Learning and identification of fuzzy systems Lee, Shin-Jye January 2011 (has links) This thesis concentrates on learning and identification of fuzzy systems, and this thesis is composed about learning fuzzy systems from data for regression and function approximation by constructing complete, compact, and consistent fuzzy systems. Fuzzy systems are prevalent to solve pattern recognition problems and function approximation problems as a result of the good knowledge representation. With the development of fuzzy systems, a lot of sophisticated methods based on them try to completely solve pattern recognition problems and function approximation problems by constructing a great diversity of mathematical models. However, there exists a conflict between the degree of the interpretability and the accuracy of the approximation in general fuzzy systems. Thus, how to properly make the best compromise between the accuracy of the approximation and the degree of the interpretability in the entire system is a significant study of the subject.The first work of this research is concerned with the clustering technique on constructing fuzzy models in fuzzy system identification, and this method is a part of clustering based learning of fuzzy systems. As the determination of the proper number of clusters and the appropriate location of clusters is one of primary considerations on constructing an effectively fuzzy model, the task of the clustering technique aims at recognizing the proper number of clusters and the appropriate location as far as possible, which gives a good preparation for the construction of fuzzy models. In order to acquire the mutually exclusive performance by constructing effectively fuzzy models, a modular method to fuzzy system identification based on a hybrid clustering-based technique has been considered. Due to the above reasons, a hybrid clustering algorithm concerning input, output, generalization and specialization has hence been introduced in this work. Thus, the primary advantage of this work is the proposed clustering technique integrates a variety of clustering properties to positively identify the proper number of clusters and the appropriate location of clusters by carrying out a good performance of recognizing the precise position of each dataset, and this advantage brings fuzzy systems more complete.The second work of this research is an extended work of the first work, and two ways to improve the original work have been considered in the extended work, including the pruning strategy for simplifying the structure of fuzzy systems and the optimization scheme for parameters optimization. So far as the pruning strategy is concerned, the purpose of which aims at refining rule base by the similarity analysis of fuzzy sets, fuzzy numbers, fuzzy membership functions or fuzzy rules. By other means, through the similarity analysis of which, the complete rules can be kept and the redundant rules can be reduced probably in the rule base of fuzzy systems. Also, the optimization scheme can be regarded as a two-layer parameters optimization in the extended work, because the parameters of the initial fuzzy model have been fine tuning by two phases gradation on layer. Hence, the extended work primarily puts focus on enhancing the performance of the initial fuzzy models toward the positive reliability of the final fuzzy models. Thus, the primary advantage of this work consists of the simplification of fuzzy rule base by the similarity-based pruning strategy, as well as more accuracy of the optimization by the two-layer optimization scheme, and these advantages bring fuzzy systems more compact and precise.So far as a perfect modular method for fuzzy system identification is concerned, in addition to positively solve pattern recognition problems and function approximation problems, it should primarily comprise the following features, including the well-understanding interpretability, low-degree dimensionality, highly reliability, stable robustness, highly accuracy of the approximation, less computational cost, and maximum performance. However, it is extremely difficult to meet all of these conditions above. Inasmuch as attaining the highly achievement from the features above as far as possible, the research works of this thesis try to present a modular method concerning a variety of requirements to fuzzy systems identification. 006.3
255	Optimization Frameworks for Graph Clustering Luke N Veldt (6636218) 15 May 2019 (has links) <div>In graph theory and network analysis, communities or clusters are sets of nodes in a graph that share many internal connections with each other, but are only sparsely connected to nodes outside the set. Graph clustering, the computational task of detecting these communities, has been studied extensively due to its widespread applications and its theoretical richness as a mathematical problem. This thesis presents novel optimization tools for addressing two major challenges associated with graph clustering.</div><div></div><div>The first major challenge is that there already exists a plethora of algorithms and objective functions for graph clustering. The relationship between different methods is often unclear, and it can be very difficult to determine in practice which approach is the best to use for a specific application. To address this challenge, we introduce a generalized discrete optimization framework for graph clustering called LambdaCC, which relies on a single tunable parameter. The value of this parameter controls the balance between the internal density and external sparsity of clusters that are formed by optimizing an underlying objective function. LambdaCC unifies the landscape of graph clustering techniques, as a large number of previously developed approaches can be recovered as special cases for a fixed value of the LambdaCC input parameter. </div><div> </div><div>The second major challenge of graph clustering is the computational intractability of detecting the best way to cluster a graph with respect to a given NP-hard objective function. To address this intractability, we present new optimization tools and results which apply to LambdaCC as well as a broader class of graph clustering problems. In particular, we develop polynomial time approximation algorithms for LambdaCC and other more generalized clustering objectives. In particular, we show how to obtain a polynomial-time 2-approximation for cluster deletion, which improves upon the previous best approximation factor of 3. We also present a new optimization framework for solving convex relaxations of NP-hard graph clustering problems, which are frequently used in the design of approximation algorithms. Finally, we develop a new framework for efficiently setting tunable parameters for graph clustering objective functions, so that practitioners can work with graph clustering techniques that are especially well suited to their application. </div> Computation Theory and Mathematics Graph clustering optimization algorithms
256	Enhancing preprocessing and clustering of single-cell RNA sequencing data Wang, Zhe 04 October 2021 (has links) Single-cell RNA sequencing (scRNA-seq) is the leading technique for characterizing cellular heterogeneity in biological samples. Various scRNA-seq protocols have been developed that can measure the transcriptome from thousands of cells in a single experiment. With these methods readily available, the ability to transform raw data into biological understanding of complex systems is now a rate-limiting step. In this dissertation, I introduce novel computational software and tools which enhance preprocessing and clustering of scRNA-seq data and evaluate their performance compared to existing methods. First, I present scruff, an R/Bioconductor package that preprocesses data generated from scRNA-seq protocols including CEL-Seq or CEL-Seq2 and reports comprehensive data quality metrics and visualizations. scruff rapidly demultiplexes, aligns, and counts the reads mapped to genomic features with deduplication of unique molecular identifier (UMI) tags and provides novel and extensive functions to visualize both pre- and post-alignment data quality metrics for cells from multiple experiments. Second, I present Celda, a novel Bayesian hierarchical model that can perform simultaneous co-clustering of genes into transcriptional modules and cells into subpopulations for scRNA-seq data. Celda identified novel cell subpopulations in a publicly available peripheral blood mononuclear cell (PBMC) dataset and outperformed a PCA-based approach for gene clustering on simulated data. Third, I extend the application of Celda by developing a multimodal clustering method that utilizes both mRNA and protein expression information generated from single-cell sequencing datasets with multiple modalities, and demonstrate that Celda multimodal clustering captured meaningful biological patterns which are missed by transcriptome- or protein-only clustering methods. Collectively, this work addresses limitations present in the computational analyses of scRNA-seq data by providing novel methods and solutions that enhance scRNA-seq data preprocessing and clustering. Bioinformatics Clustering scRNA-seq Single-cell sequencing
257	Graph clustering as a method to investigate riboswitch variation: Crum, Matthew January 2021 (has links) Thesis advisor: Michelle M. Meyer / Non-coding RNA (ncRNA) perform vital functions in cells, but the impact of diversity across structure and function of homologous motifs has yet to be fully investigated. One reason for this is that the standard phylogenetic analysis used to address these questions in proteins cannot easily be applied to ncRNA due to their inherent characteristics. Compared to proteins, ncRNA have shorter sequence lengths, lower sequence conservation, and secondary structures that need to be incorporated into the analysis. This has necessitated an effort to develop methodology for investigating the evolutionary and functional relationship between sets of ncRNA. In this pursuit, I studied closely related riboswitches. Riboswitches are structured ncRNA found in bacterial mRNA that regulate gene expressions using their two major components: the aptamer and the expression platform. The aptamer of a riboswitch is able to bind a specific small molecule (ligand), and the bound/unbound state of the aptamer influences conformational changes in the expressions platform that can lead to increased or decreased downstream gene expression. Utilizing sequence and structural similarity metrics combined with graph clustering and de novo community detection algorithms I have determined a methodology for investigating the functional and evolutionary relationship between closely related riboswitches, and other ncRNA by extension, that are found across a range of diverse phyla. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology. Graph Clustering ncRNA Riboswitch Structural Homology
258	POPULATION STRUCTURE INFERENCE USING PCA AND CLUSTERING ALGORITHMS Rimal, Suraj 01 September 2021 (has links) Genotype data, consisting large numbers of markers, is used as demographic and association studies to determine genes related to specific traits or diseases. Handling of these datasets usually takes a significant amount of time in its application of population structure inference. Therefore, we suggested applying PCA on genotyped data and then clustering algorithms to specify the individuals to their particular subpopulations. We collected both real and simulated datasets in this study. We studied PCA and selected significant features, then applied five different clustering techniques to obtain better results. Furthermore, we studied three different methods for predicting the optimal number of subpopulations in a collected dataset. The results of four different simulated datasets and two real human genotype datasets show that our approach performs well in the inference of population structure. NbClust is more effective to infer subpopulations in the population. In this study, we showed that centroid-based clustering: such as k-means and PAM, performs better than model-based, spectral, and hierarchical clustering algorithms. This approach also has the benefit of being fast and flexible in the inference of population structure. Clustering Data Genotype PCA Population Structure
259	Internetové souřadnicové systémy / Internet coordinating systems Krajčír, Martin January 2009 (has links) Network coordinates (NC) system is an efficient mechanism for prediction of Internet distance with limited number of measurement. This work focus on distributed coordinates system which is evaluated by relative error. According to experimental results from simulated application, was created own algorithm to compute network coordinates. Algorithm was tested by using simulated network as well as RTT values from network PlanetLab. Experiments show that clustered nodes achieve positive results of synthetic coordinates with limited connection between nodes. This work propose implementation of own NC system in network with hierarchical aggregation. Created application was placed on research projects web page of the Department of Telecommunications.
260	Spectral methods for the detection and characterization of Topologically Associated Domains Cresswell, Kellen Garrison 01 January 2019 (has links) The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for identification of hierarchies. Additionally, there are no publicly available tools for comparison of TADs across datasets. These tools are necessary to conduct large-scale genome-wide analysis and comparison of 3D structure. To address the challenge of TAD identification, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. Our method, implemented in an R package, SpectralTAD, has automatic parameter selection, is robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TADs. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. SpectralTAD is available at http://bioconductor.org/packages/SpectralTAD/. To address the problem of TAD comparison, we developed TADCompare. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of TAD boundary differences between datasets. Using this measure, we introduce methods for identifying differential and consensus TAD boundaries and tracking TAD boundary changes over time. We further propose a novel framework for the systematic classification of TAD boundary changes. Colocalization- and gene enrichment analysis of different types of TAD boundary changes revealed distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare. Genomics Spectral Genetics Biostatistics Statistical Clustering Biostatistics

Search results