Global ETD Search

1	Assessment of aCGH Clustering Methodologies Baker, Serena F. 18 October 2010 (has links) (PDF) Array comparative genomic hybridization (aCGH) is a technique for identifying duplications and deletions of DNA at specific locations across a genome. Potential objectives of aCGH analysis are the identification of (1) altered regions for a given subject, (2) altered regions across a set of individuals, and (3) clinically relevant clusters of hybridizations. aCGH analysis can be particularly useful when it identifies previously unknown clusters with clinical relevance. This project focuses on the assessment of existing aCGH clustering methodologies. Three methodologies are considered: hierarchical clustering, weighted clustering of called aCGH data, and clustering based on probabilistic recurrent regions of alteration within subsets of individuals. Assessment is conducted first through the analysis of aCGH data obtained from patients with ovarian cancer and then through simulations. Performance assessment for the data analysis is based on cluster assignment correlation with clinical outcomes (e.g., survival). For each method, 1,000 simulations are summarized with Cohen's kappa coefficient, interpreted as the proportion of correct cluster assignments beyond random chance. Both the data analysis and the simulation results suggest that hierarchical clustering tends to find more clinically relevant clusters when compared to the other methods. Additionally, these clusters are composed of more patients who belong in the clusters to which they are assigned. array CGH hierarchical clustering WECCA Statistics and Probability
2	Design and implementation of scalable hierarchical density based clustering Dhandapani, Sankari 09 November 2010 (has links) Clustering is a useful technique that divides data points into groups, also known as clusters, such that the data points of the same cluster exhibit similar properties. Typical clustering algorithms assign each data point to at least one cluster. However, in practical datasets like microarray gene dataset, only a subset of the genes are highly correlated and the dataset is often polluted with a huge volume of genes that are irrelevant. In such cases, it is important to ignore the poorly correlated genes and just cluster the highly correlated genes. Automated Hierarchical Density Shaving (Auto-HDS) is a non-parametric density based technique that partitions only the relevant subset of the dataset into multiple clusters while pruning the rest. Auto-HDS performs a hierarchical clustering that identifies dense clusters of different densities and finds a compact hierarchy of the clusters identified. Some of the key features of Auto-HDS include selection and ranking of clusters using custom stability criterion and a topologically meaningful 2D projection and visualization of the clusters discovered in the higher dimensional original space. However, a key limitation of Auto-HDS is that it requires O(nn) storage, and O(nn*logn) computational complexity, making it scale up to only a few 10s of thousands of points. In this thesis, two extensions to Auto-HDS are presented for lower dimensional datasets that can generate clustering identical to Auto-HDS but can scale to much larger datasets. We first introduce Partitioned Auto-HDS that provides significant reduction in time and space complexity and makes it possible to generate the Auto-HDS cluster hierarchy on much larger datasets with 100s of millions of data points. Then, we describe Parallel Auto-HDS that takes advantage of the inherent parallelism available in Partitioned Auto-HDS to scale to even larger datasets without a corresponding increase in actual run time when a group of processors are available for parallel execution. Partitioned Auto-HDS is implemented on top of GeneDIVER, a previously existing Java based streaming implementation of Auto-HDS, and thus it retains all the key features of Auto-HDS including ranking, automatic selection of clusters and 2D visualization of the discovered cluster topology. / text Density based clustering Hierarchical clustering Hadoop Map-reduce
3	Visualization of gene ontology and cluster analysis results Aleksakhin, Vladyslav January 2012 (has links) The purpose of the thesis is to develop a new visualization method for Gene Ontologiesand hierarchical clustering. These are both important tools in biology andmedicine to study high-throughput data such as transcriptomics and metabolomicsdata. Enrichment of ontology terms in the data is used to identify statistically overrepresentedontology terms, that give insight into relevant biological processes orfunctional modules. Hierarchical clustering is a standard method to analyze andvisualize data to nd relatively homogeneous clusters of experimental data points.Both methods support the analysis of the same data set, but are usually consideredindependently. However, often a combined view such as: visualizing a large data setin the context of an ontology under consideration of a clustering of the data.The result of the current work is a user-friendly program that combines twodi erent views for analysing Gene Ontology and Cluster simultaneously. To makeexplorations of such a big data possible we developed new visualization approach. Graph Visualization Gene Ontology Hierarchical Clustering Mappings Interaction
4	Development of a hierarchical k-selecting clustering algorithm – application to allergy. Malm, Patrik January 2007 (has links) <p>The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package.</p> bioinformatics partitional clustering hierarchical clustering allergy crossreactivity Bioinformatics Bioinformatik
5	Functional Mixed Data Clustering with Fourier Basis Smoothing Amartey, Ishmael 01 December 2021 (has links) Clustering is an important analytical technique that has proven to affect human life positively through its application in cancer research, market segmentation, city planning etc. In this time of growing technological systems, mixed data has seen another face of longitudinal, directional and functional attributes which is worth paying attention to and analyzing. Previous research works on clustering relied largely on the inverse weight technique and B-spline in smoothing data and assessing the performance of various clustering algorithms. In 1971, Gower proposed a method of clustering for mixed variable types which has been extended to include functional and directional variables by Hendrickson (2014). In this study, we will do a comparative analysis of the performance of the hierarchical clustering mechanism using a simulated Functional data with mixed structure. We will adopt the Fourier basis smoothing procedure and use the Rand index (Rand 1971) and adjusted Rand index for the comparison of the various clustering algorithms. Hierarchical clustering Mixed data Gower coefficient Functional Data. Multivariate Analysis
6	Discovering Subclones and Their Driver Genes in Tumors Sequenced at Standard Depths January 2019 (has links) abstract: Understanding intratumor heterogeneity and their driver genes is critical to designing personalized treatments and improving clinical outcomes of cancers. Such investigations require accurate delineation of the subclonal composition of a tumor, which to date can only be reliably inferred from deep-sequencing data (>300x depth). The resulting algorithm from the work presented here, incorporates an adaptive error model into statistical decomposition of mixed populations, which corrects the mean-variance dependency of sequencing data at the subclonal level and enables accurate subclonal discovery in tumors sequenced at standard depths (30-50x). Tested on extensive computer simulations and real-world data, this new method, named model-based adaptive grouping of subclones (MAGOS), consistently outperforms existing methods on minimum sequencing depth, decomposition accuracy and computation efficiency. MAGOS supports subclone analysis using single nucleotide variants and copy number variants from one or more samples of an individual tumor. GUST algorithm, on the other hand is a novel method in detecting the cancer type specific driver genes. Combination of MAGOS and GUST results can provide insights into cancer progression. Applications of MAGOS and GUST to whole-exome sequencing data of 33 different cancer types’ samples discovered a significant association between subclonal diversity and their drivers and patient overall survival. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2019 Bioinformatics Cancer Evolution Model Based Hierarchical Clustering Tumor Heterogeneity
7	Mécanismes pour la cohérence, l'atomicité et les communications au niveau des clusters : application au clustering hiérarchique distribué adaptatif / Mechanism for coherence, atomicity and communications at clusters level : application to adaptative distributed hierarchical clustering Avril, François 29 September 2015 (has links) Nous nous intéressons dans cette thèse à l'organisation des systèmes distribués dynamiquesde grande taille : ensembles de machines capables de communiquer entre elles et pouvant à toutinstant se connecter ou se déconnecter. Nous proposons de partitionner le système en groupesconnexes, appelés clusters. Afin d'organiser des réseaux de grande taille, nous construisons unestructure hiérarchique imbriquée dans laquelle les clusters d'un niveau sont regroupés au seinde clusters du niveau supérieur. Pour mener à bien ce processus, nous mettons en place desmécanismes permettant aux clusters d'être les noeuds d'un nouveau système distribué exécutantl'algorithme de notre choix. Cela nécessite en particulier des mécanismes assurant la cohérence decomportement pour le niveau supérieur au sein de chaque cluster. En permettant aux clusters deconstituer un nouveau système distribué exécutant notre algorithme de clustering, nous construisonsune hiérarchie de clusters par une approche ascendante. Nous démontrons cet algorithme endéfinissant formellement le système distribué des clusters, et en démontrant que chaque exécutionde notre algorithme induit sur ce système une exécution de l'algorithme de niveau supérieur. Celanous permet, en particulier, de démontrer par récurrence que nous calculons bien un clusteringhiérarchique imbriqué. Enfin, nous appliquons cette démarche à la résolution des collisions dansles réseaux de capteurs. Pour éviter ce phénomène, nous proposons de calculer un clusteringadapté du système, qui nous permet de calculer un planning organisant les communications ausein du réseau et garantissant que deux messages ne seront jamais émis simultanément dans laportée de communication de l'un des capteurs / To manage and handle large scale distributed dynamic distributed systems, constitutedby communicating devices that can connect or disconnect at any time, we propose to computeconnected subgraphs of the system, called clusters. We propose to compute a hierarchical structure,in which clusters of a level are grouped into clusters of the higher level. To achieve this goal,we introduce mechanisms that allow clusters to be the nodes of a distinct distributed system,that executes an algorithm. In particular, we need mechanisms to maintain the coherence of thebehavior among the nodes of a cluster regarding the higher level. By allowing clusters to be nodesof a distributed system that executes a clustering algorithm, we compute a nested hierarchicalclustering by a bottom-up approach. We formally define the distributed system of clusters, andprove that any execution of our algorithm induces an execution of the higher level algorithm onthe distributed system of clusters. Then, we prove by induction that our algorithm computes anested hierarchical clustering of the system. Last, we use this approach to solve a problem thatappears in sensor networks : collision. To avoid collisions, we propose to compute a clusteringof the system. This clustering is then used to compute a communication schedule in which twomessages cannot be sent at the same time in the range of a sensor Clustering Clustering hiérarchique Marche aléatoire Clustering Hierarchical clustering Random walk
8	Distributed Hierarchical Clustering Loganathan, Satish Kumar January 2018 (has links) No description available. Computer Science Distributed Data Mining Distributed Clustering Hierarchical Clustering
9	Scalable Clustering for Immune Repertoire Sequence Analysis Bhusal, Prem 24 May 2019 (has links) No description available. Computer Science Clustering Immune-Repertoire Sequence Hierarchical Clustering
10	The development and application of metaheuristics for problems in graph theory : a computational study Consoli, Sergio January 2008 (has links) It is known that graph theoretic models have extensive application to real-life discrete optimization problems. Many of these models are NP-hard and, as a result, exact methods may be impractical for large scale problem instances. Consequently, there is a great interest in developing e±cient approximate methods that yield near-optimal solutions in acceptable computational times. A class of such methods, known as metaheuristics, have been proposed with success. This thesis considers some recently proposed NP-hard combinatorial optimization problems formulated on graphs. In particular, the min- imum labelling spanning tree problem, the minimum labelling Steiner tree problem, and the minimum quartet tree cost problem, are inves- tigated. Several metaheuristics are proposed for each problem, from classical approximation algorithms to novel approaches. A compre- hensive computational investigation in which the proposed methods are compared with other algorithms recommended in the literature is reported. The results show that the proposed metaheuristics outper- form the algorithms recommended in the literature, obtaining optimal or near-optimal solutions in short computational running times. In addition, a thorough analysis of the implementation of these methods provide insights for the implementation of metaheuristic strategies for other graph theoretic problems. 519

Search results