Global ETD Search

11	High Performance and Scalable Matching and Assembly of Biological Sequences Abu Doleh, Anas 21 December 2016 (has links) No description available. Computer Engineering Bioinformatics bioinformatics sequence similarity indexing graphical processing unit Apache Spark de Bruijn graph de novo assembly metagenomics
12	Efficient algorithms for de novo assembly of alternative splicing events from RNA-seq data / Algorithmes efficaces pour l’assemblage de novo d’événements d’épissage alternatif dans des données de RNA-seq Tominaga Sacomoto, Gustavo Akio 06 March 2014 (has links) Dans cette thèse, nous abordons le problème de l'identification et de la quantification de variants (épissage alternatif et polymorphisme génomique) dans des données de RNA-seq sans génome de référence, et sans faire un assemblage complet des transcripts. Basé sur l'idée que chaque variant correspond à un motif reconnaissable, qu'on appelle une bulle, dans un graphe de Bruijn construit à partir des lectures de RNA-seq, nous proposons un modèle pour les variants dans de tels graphes. Nous introduisons ensuite une méthode, appelé KisSplice, pour extraire les événements d'épissage alternatif, et nous montrons qu'il trouve plus d'événements corrects que les assembleurs de transcriptome traditionnels. Afin d'améliorer son temps d'exécution, nous proposons un nouvel algorithme polynomial pour énumérer les bulles. On montre qu'il est plusieurs ordres de grandeur plus rapide que les approches précédentes. Afin de réduire sa consommation en mémoire, nous proposons une nouvelle façon de représenter un graphe de Bruijn. Nous montrons que notre approche utilise 30% à 40% moins de mémoire que l'état de l'art. Nous appliquons les techniques développées pour énumérer les bulles à deux problémes classiques. Nous donnons le premier algorithme optimal pour énumérer les cycles dans des graphes non orientés. Il s'agit de la première amélioration à ce probléme en près de 40 ans. Nous considérons ensuite une variante du problème des K chemins plus courts: au lieu de limiter le nombre des chemins, nous limitons leurs poids. Nous présentons de nouveaux algorithmes qui utilisent exponentiellement moins mémoire que les approches précédentes / In this thesis, we address the problem of identifying and quantifying variants (alternative splicing and genomic polymorphism) in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the idea that each variant corresponds to a recognizable pattern, a bubble, in a de Bruijn graph constructed from the RNA-seq reads, we propose a general model for all variants in such graphs. We then introduce an exact method, called KisSplice, to extract alternative splicing events and show that it outperforms general purpose transcriptome assemblers. We put an extra effort to make KisSplice as scalable as possible. In order to improve the running time, we propose a new polynomial delay algorithm to enumerate bubbles. We show that it is several orders of magnitude faster than previous approaches. In order to reduce its memory consumption, we propose a new compact way to build and represent a de Bruijn graph. We show that our approach uses 30% to 40% less memory than the state of the art, with an insignificant impact on the construction time. Additionally, we apply the techniques developed to list bubbles in two classical problems: cycle enumeration and the K-shortest paths problem. We give the first optimal algorithm to list cycles in undirected graphs, improving over Johnson’s algorithm. This is the first improvement to this problem in almost 40 years. We then consider a different parameterization of the K-shortest (simple) paths problem: instead of bounding the number of st-paths, we bound the weight of the st-paths. We present new algorithms using exponentially less memory than previous approaches Algorithme Énumération Structure de données RNA-seq Épissage alternatif Graphe de de Bruijn Filtre de Bloom NGS Algorithm Enumeration Data structure RNA-seq Alternative splicing De Bruijn graph Bloom filter NGS 572.8
13	Graph-Based Whole Genome Phylogenomics Fujimoto, Masaki Stanley 01 June 2020 (has links) Understanding others is a deeply human urge basic in our existential quest. It requires knowing where someone has come from and where they sit amongst peers. Phylogenetic analysis and genome wide association studies seek to tell us where we’ve come from and where we are relative to one another through evolutionary history and genetic makeup. Current methods do not address the computational complexity caused by new forms of genomic data, namely long-read DNA sequencing and increased abundances of assembled genomes, that are becoming evermore abundant. To address this, we explore specialized data structures for storing and comparing genomic information. This work resulted in the creation of novel data structures for storing multiple genomes that can be used for identifying structural variations and other types of polymorphisms. Using these methods we illuminate the genetic history of organisms in our efforts to understand the world around us. Genomics Next-Gen Sequencing Parallel Programming Data Structures Phylogenetics Phylogenomics de Bruijn Graph NGS Read Mapping Whole Genome Alignment Whole Genome Analysis Physical Sciences and Mathematics
14	Vers une cartographie fine des polymorphismes liés à la résistance aux antimicrobiens / Fine mapping of antibiotic resistance determinants Jaillard Dancette, Magali 12 December 2018 (has links) Mieux comprendre les mécanismes de la résistance aux antibiotique est un enjeu important dans la lutte contre les maladies infectieuses, qui fait face à la propagation de bactéries multi-résistantes. Les études d'association à l'échelle des génomes sont des outils puissants pour explorer les polymorphismes liés aux variations phénotypiques dans une population. Leur cadre méthodologique est très documenté pour les eucaryotes, mais leur application aux bactéries est très récente. Durant cette thèse, j'ai cherché à rendre ces outils mieux adaptés aux génomes plastiques des bactéries, principalement en travaillant sur la représentation des variations génétiques. En effet, parce que les bactéries ont la capacité à échanger du matériel génétique avec leur environnement, leurs génomes peuvent être trop différents au sein d'une espèce pour être alignés contre une référence. La description des variations par des fragments de séquence de longueur k, les k-mers, offre la flexibilité nécessaire mais ne permet pas une interprétation directe des résultats obtenus. La méthode mise au point teste l'association de ces k-mers avec le phénotype, et s'appuie sur un graphe de De Bruijn pour permettre la visualisation du contexte génomique des k-mers identifiés par le test, sous forme de graphes. Cette vue synthétique renseigne sur la nature de la séquence identifiée: il peut par exemple s'agir de polymorphisme local dans un gène ou de l'acquisition d'un gène dans un plasmide. Le type de variant représenté dans un graphe peut être prédit avec une bonne performance à partir de descripteurs du graphe, rendant plus opérationnelles les approches par k-mers pour l'étude des génomes bactériens / The emergence and spread of multi-drug resistance has become a major worldwide public health concern, calling for better understanding of the underlying resistance mechanisms. Genome-wide association studies are powerful tools to finely map the genetic polymorphism linked to the phenotypic variability observed in a population. However well documented for eukaryotic genome analysis, these studies were only recently applied to prokaryota.Through this PhD project, I searched how to better adapt these tools to the highly plastic bacterial genomes, mainly by working on the representation of the genetic variations in these genomes. Indeed, because the bacteria have the faculty to acquire genetic material by a means other than direct inheritance from a parent cell, their genomes can differ too much within a species to be aligned against a reference. A representation using sequence fragments of length k - the so-called k-mers - offers the required flexibility but generates redundancy and does not allow for a direct interpretation of the identified associations. The method we set up tests the association of these k-mers with the phenotype, and takes advantage of a De Bruijn graph (DBG) built over all genomes to remove the local redundancy of k-mers, and offer a visualisation of the genomic context of the k-mers identified by the test. This synthetic view as DBG subgraphs informs on the nature of the identified sequence: e.g. local polymorphism in a gene or gene acquired through a plasmid. The type of variant can be predicted correctly in 96% of the cases from descriptors of the subgraphs, providing a tractable framework for k-mer-based association studies Antibiorésistance Graphes de De Bruijn Variations génomiques K-mers Graphe décoré Génétique des procaryotes Génomes bactériens Genome-wide association study Antibiotic resistance De Bruijn graph Genome variation K-mers Decorated graph Prokaryotic genetics Bacterial genome 570
15	Efficient algorithms for de novo assembly of alternative splicing events from RNA-seq data Tominaga Sacomoto, Gustavo Akio 06 March 2014 (has links) (PDF) In this thesis, we address the problem of identifying and quantifying variants (alternative splicing and genomic polymorphism) in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the idea that each variant corresponds to a recognizable pattern, a bubble, in a de Bruijn graph constructed from the RNA-seq reads, we propose a general model for all variants in such graphs. We then introduce an exact method, called KisSplice, to extract alternative splicing events and show that it outperforms general purpose transcriptome assemblers. We put an extra effort to make KisSplice as scalable as possible. In order to improve the running time, we propose a new polynomial delay algorithm to enumerate bubbles. We show that it is several orders of magnitude faster than previous approaches. In order to reduce its memory consumption, we propose a new compact way to build and represent a de Bruijn graph. We show that our approach uses 30% to 40% less memory than the state of the art, with an insignificant impact on the construction time. Additionally, we apply the techniques developed to list bubbles in two classical problems: cycle enumeration and the K-shortest paths problem. We give the first optimal algorithm to list cycles in undirected graphs, improving over Johnson's algorithm. This is the first improvement to this problem in almost 40 years. We then consider a different parameterization of the K-shortest (simple) paths problem: instead of bounding the number of st-paths, we bound the weight of the st-paths. We present new algorithms using exponentially less memory than previous approaches Algorithm Enumeration Data structure RNA-seq Alternative splicing De Bruijn graph Bloom filter NGS

Page generated in 0.0379 seconds