Global ETD Search

1	Solving Intelligence Analysis Problems using Biclusters Fiaux, Patrick O. 09 March 2012 (has links) Analysts must filter through an ever-growing amount of data to obtain information relevant to their investigations. Looking at every piece of information individually is in many cases not feasible; there is hence a growing need for new filtering tools and techniques to improve the analyst process with large datasets. We present MineVis — an analytics system that integrates biclustering algorithms and visual analytics tools in one seamless environment. The combination of biclusters and visual data glyphs in a visual analytics spatial environment enables a novel type of filtering. This design allows for rapid exploration and navigation across connected documents. Through a user study we conclude that our system has the potential to help analysts filter data by allowing them to i) form hypotheses before reading documents and subsequently ii) validating them by reading a reduced and focused set of documents. / Master of Science Visual Analytics Biclustering
2	Detection of Similarly-structured Anomalous sets of nodes in Graphs Sharma, Nikita 04 October 2021 (has links) No description available. Computer Science Anomaly Detection Biclustering TRIMAX Biclustering Graph attributes
3	Biologically-Interpretable Disease Classification Based on Gene Expression Data Grothaus, Gregory 14 June 2005 (has links) Classification of tissues and diseases based on gene expression data is a powerful application of DNA microarrays. Many popular classifiers like support vector machines, nearest-neighbour methods, and boosting have been applied successfully to this problem. However, it is difficult to determine from these classifiers which genes are responsible for the distinctions between the diseases. We propose a novel framework for classification of gene expression data based on notion of condition-specific clusters of co-expressed genes called xMotifs. Our xMotif-based classifier is biologically interpretable: we show how we can detect relationships between xMotifs and gene functional annotations. Our classifier achieves high-accuracy on leave-one-out cross-validation on both two-class and multi-class data. Our technique has the potential to be the method of choice for researchers interested in disease and tissue classification. / Master of Science Classification Biclustering Gene Expression Microarrays
4	Semi-Supervised Learning Algorithm for Large Datasets Using Spark Environment Kacheria, Amar January 2021 (has links) No description available. Computer Science Semi-supervised learning Real-Valued Biclustering Minimizing Labelling Cost SSL using Biclustering Distributed biclustering Spark
5	Optimization Techniques for Protein-Protein Co-Regulation and Interaction Prediction Gremalschi, Stefan 01 December 2009 (has links) The availability of large gene expression microarray data has brought along many challenges for biological data mining. Many different clustering methods have been proposed and widely used to analyze gene expression data. The underlying concept allows to identify sets of genes sharing similar expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and data sets. Currently, there are several biclustering methods that use different techniques; however, it is not clear how to compare the resulted biclusters with respect to biological relevance. So far, there are no available guidelines for choosing a biclustering technique from available ones. In this work, we propose two new Mean Squared Residue (MSR) based biclustering methods. The first method is a dual biclustering algorithm which finds a set of biclusters using a greedy approach. The second method combines dual biclustering algorithm with quadratic programming. The dual biclustering algorithm reduces the size of the matrix, so that the quadratic program can find an optimal bicluster reasonably fast. We also describe the comparison method, explain how we handle bicluster’s overlap and how we treat missing data. Mean Squared Residue Biclustering Computer Sciences
6	Mining Genome-Scale Growth Phenotype Data through Constant-Column Biclustering Alzahrani, Majed A. 10 July 2017 (has links) Growth phenotype profiling of genome-wide gene-deletion strains over stress conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. Systematically identifying groups of genes from such recently emerging high-throughput data that share similar patterns of conditional essentiality and dispensability under various environmental conditions can elucidate how genetic interactions of the growth phenotype are regulated in response to the environment. In this dissertation, we first demonstrate that detecting such “co-fit” gene groups can be cast as a less well-studied problem in biclustering, i.e., constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data. Here, we propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large collection of widely used biclustering methods that cover different types of algorithms designed to detect different types of biclusters. Gracob showed superior performance on finding co-fit genes over all the existing methods on both a variety of synthetic data sets with a wide range of settings, and three real growth phenotype data sets for E. coli, proteobacteria, and yeast. data mining biclustering phenotype profiling data
7	Role of Weather in The Occurrence of Migraine: Personalized Prediction of Onset of Headache Hsiao, Hung-I January 2021 (has links) No description available. Biostatistics Headache Migriane Biclustering Weather Factor
8	Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer Blake, Patrick Michael 31 January 2019 (has links) Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts' understanding of the relationships within complex data sets and their ability to make informed decisions from such data. / Master of Science / Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts’ understanding of the relationships within complex data sets and their ability to make informed decisions from such data. high-dimensional data biclustering VISDA VISDApy
9	Framework para classificação das mutações de vírus HIV / HIV mutation classification framework Ozahata, Mina Cintho 15 May 2014 (has links) Um grande número de medicamentos utilizados no tratamento contra o HIV agem procurando inibir a ação das proteínas transcriptase reversa e protease. Mutações existentes nas sequências dessas proteínas podem estar relacionadas à resistência aos medicamentos e podem prejudicar o desempenho de um tratamento. O estudo do genótipo dos vírus pode ajudar na tomada de escolhas específicas em tratamentos para cada indivíduo, tornando maiores a chance de sucesso. Com a maior acessibilidade a exames de genotipagem, uma grande quantidade de sequências do vírus está disponível, contendo um grande volume de informação. Padrões de ocorrência de mutações são exemplos de informações contidas nessas sequências e são importantes por estarem relacionados à resistência aos medicamentos. Um dos caminhos que pode ser capaz de nos levar ao entendimento desses padrões de mutações é a aplicação de técnicas de agrupamento e biclustering. Essas técnicas visam a geração de grupos ou biclusters que possuam dados com propriedades em comum. São empregadas em casos em que não há grande quantidade de informação prévia e existem poucas hipóteses sobre os dados. Assim, pode-se encontrar os padrões de mutações que ocorrem nessas sequências e tentar relacioná-los com a resistência aos medicamentos, utilizando métodos de agrupamento e bicluster em sequências de protease e transcriptase reversa. Existem alguns sistemas que tentam predizer a resistência ou susceptibilidade das sequências, porém, devido à grande complexidade dessa relação, ainda é necessário esclarecer o vínculo entre combinações de mutações e níveis de resistência fenotípica. Desta forma, a principal contribuição deste trabalho é o desenvolvimento de um framework baseado na aplicação dos algoritmos KMédias e Bimax às sequências de transcriptase reversa e protease de pacientes infectados com HIV, em uma codificação binária. O presente trabalho também introduz uma representação visual dos grupos e biclusters baseada em dados de microarranjos para casos em que se tem grandes volumes de dados, de forma a facilitar a visualização da informação extraída e a caracterização dos grupos e biclusters no domínio da doença. / Drugs used in HIV treatment intend to inhibit protease and reverse transcriptase. Mutations in the sequences of these proteins can be related to drug resistance and can reduce treatment efficacy. Studying virus genotype may help choosing specific treatments for each patient, increasing success probability. As genotyping tests become available, a great amount of virus sequences, which comprehend lots of information, are more accessible. Patterns of mutation are examples of information comprised in the sequences and are important since are related to drug resistance. One way that can lead to the understanding of these mutation patterns is the use of clustering and biclustering techniques. These techniques search for clusters or biclusters comprising data with similar attributes. They are used when there is not a lot of previous information and there are few hypothesis about the data. Therefore, it may be possible to find patterns of mutations in the sequences and to relate them to drug resistance using clustering and biclustering techniques with protease and reverse transcriptase sequences. There are a few systems that predict drug resistance according to the sequence of the virus, however, due to the complexity of the relationship, it is still necessary to elucidate the connection between mutation combinations and the level of phenotypic resistance. Accordingly, this work main contribution is the development of a framework based on Kmeans and Bimax algorithms with protease and reverse transcriptase sequences from HIV patients in a binary form. This work also presents a visual representation of the clusters and biclusters based on microarray data suitable for large data volumes, helping the visualization of information extracted from data and cluster and bicluster characterization in the disease domain. Biclustering Clustering Clustering HIV HIV Mutations Protease Reverse transcriptase
10	Optimisation combinatoire et extraction de connaissances sur données hétérogènes et temporelles : application à l’identification de parcours patients / Combinatorial optimization and knowledge extraction on heterogeneous and temporal data : application to patients profiles discovery Vandromme, Maxence 30 May 2017 (has links) Les données hospitalières présentent de nombreuses spécificités qui rendent difficilement applicables les méthodes de fouille de données traditionnelles. Dans cette thèse, nous nous intéressons à l'hétérogénéité de ces données ainsi qu'à leur aspect temporel. Dans le cadre du projet ANR ClinMine et d'une convention CIFRE avec la société Alicante, nous proposons deux nouvelles méthodes d'extraction de connaissances adaptées à ces types de données. Dans la première partie, nous développons l'algorithme MOSC (Multi-Objective Sequence Classification) pour la classification supervisée sur données hétérogènes, numériques et temporelles. Cette méthode accepte, en plus des termes binaires ou symboliques, des termes numériques et des séquences d'événements temporels pour former des ensembles de règles de classification. MOSC est le premier algorithme de classification supportant simultanément ces types de données. Dans la seconde partie, nous proposons une méthode de biclustering pour données hétérogènes, un problème qui n'a à notre connaissance jamais été exploré. Cette méthode, HBC (Heterogeneous BiClustering), est étendue pour supporter les données temporelles de différents types : événements temporels et séries temporelles irrégulières. HBC est utilisée pour un cas d'étude sur un ensemble de données hospitalières, dont l'objectif est d'identifier des groupes de patients ayant des profils similaires. Les résultats obtenus sont cohérents et intéressants d'un point de vue médical ; et amènent à la définition de cas d'étude plus précis. L'intégration dans une solution logicielle est également engagée, avec une version parallèle de HBC et un outil de visualisation des résultats. / Hospital data exhibit numerous specificities that make the traditional data mining tools hard to apply. In this thesis, we focus on the heterogeneity associated with hospital data and on their temporal aspect. This work is done within the frame of the ANR ClinMine research project and a CIFRE partnership with the Alicante company. In this thesis, we propose two new knowledge discovery methods suited for hospital data, each able to perform a variety of tasks: classification, prediction, discovering patients profiles, etc.In the first part, we introduce MOSC (Multi-Objective Sequence Classification), an algorithm for supervised classification on heterogeneous, numeric and temporal data. In addition to binary and symbolic terms, this method uses numeric terms and sequences of temporal events to form sets of classification rules. MOSC is the first classification algorithm able to handle these types of data simultaneously. In the second part, we introduce HBC (Heterogeneous BiClustering), a biclustering algorithm for heterogeneous data, a problem that has never been studied so far. This algorithm is extended to support temporal data of various types: temporal events and unevenly-sampled time series. HBC is used for a case study on a set of hospital data, whose goal is to identify groups of patients sharing a similar profile. The results make sense from a medical viewpoint; they indicate that relevant, and sometimes new knowledge is extracted from the data. These results also lead to further, more precise case studies. The integration of HBC within a software is also engaged, with the implementation of a parallel version and a visualization tool for biclustering results. Biclustering Classification double Données hétérogènes Données temporelles 006.31

Search results