Global ETD Search

1	De Bruijn Graphs and Lamplighter Groups Alharthy, Shathaa 20 February 2019 (has links) De Bruijn graphs were originally introduced for finding a superstring representation for all fixed length words of a given finite alphabet. Later they found numerous applications, for instance, in DNA sequencing. Here we study a relationship between de Bruijn graphs and the family of lamplighter groups (a particular class of wreath products). We show how de Bruijn graphs and their generalizations can be presented as Cayley and Schreier graphs of lamplighter groups. De Bruijn Graph Languages Group Wreath Product Lamplighter
2	The Coloring and Routing Problems on de Bruijn Interconnection Networks Mao, Jyh-Wen 01 September 2003 (has links) de Bruijn graphs are attractive due to its simplicity of routing messages between two nodes and the capability of fault tolerance. The shortest path from a node V to a node W in the directed binary de Bruijn graph can be obtained by firstly determining the longest substring, common to the right/left of V and to the left/right of W. Then L-operations/R-operations are performed to finish this routing process. However, this method does not always find the shortest path in the undirected binary de Bruijn graph. In this dissertation, we propose a shortest path routing algorithm which requires O(m2) time. We also design a fault-tolerant routing algorithm which provides the shortest path and another node-disjoint path of length at most m + log2m + 4. Our algorithm can tolerate one node failure in the m-dimensional binary de Bruijn network. In concurrent systems, a 1-fair alternator design is optimal if each processor can execute the critical step once in the fewest steps. This problem corresponds to use the minimum number of colors to color the processors in the system. Thus, the optimal design of a 1-fair alternator problem can be transformed into the coloring problem. We propose a simple and fast algorithm to solve the node coloring problem on the undirected binary de Bruijn graph. In our algorithm, the number of colors used is 3, and it is an optimal design. We also extend our method to solve the coloring problem on k-ary de Bruijn graphs. We first present a simple algorithm which needs 2k colors. By slight improvement, the number of required colors is reduced to k+1. routing coloring fault-tolerant de Bruijn graph
3	1-Fair Alternator Designs for the de Bruijn Network Lin, Hsu-Shen 01 September 2006 (has links) An alternator is a self-stabilizing system which consists of a network of concurrent processors. One of its properties is that any two processors of an alternator system cannot execute the critical step at the same time if they are adjacent. This exclusion property transforms the alternator design problem into the coloring problem. And an alternator is said to be 1-fair if no processor executes the critical step twice when one or more other processors have not executed the critical step yet. The simplicity of routing message and the capability of fault tolerance of de Bruijn networks attract us to design 1-fair alternator on them. In this thesis, two algorithms are proposed to solve the coloring problem on the de Bruijn network. The first one uses $2ceil{log_2k}+1$ colors to color the $k$-ary de Bruijn graph with two digits, while the second one uses $p+1$ only colors, where ${{p-1}choose{floor{(p-1)/2}}} < k leq {pchoose{floor{p/2}}}$. We also prove that the second coloring method is optimal when $k = {pchoose{floor{p/2}}}$. In other words, the chromatic number of the $k$-ary de Bruijn graph with two digits is $p+1$, where $k = {pchoose{floor{p/2}}}$. Furthermore, the extension of our coloring method can be applied to the $k$-ary de Bruijn graph with three or more digits. coloring de Bruijn graph alternator 1-fair
4	Méthodes bioinformatiques pour l'analyse de données de séquençage dans le contexte du cancer / Bioinformatics methods for cancer sequencing data analysis Rudewicz, Justine 30 June 2017 (has links) Le cancer résulte de la prolifération excessive de cellules qui dérivent toutes de la même cellule initiatrice et suivent un processus Darwinien de diversification et de sélection. Ce processus est défini par l'accumulation d'altérations génétiques et épigénétiques dont la caractérisation est un élément majeur pour pouvoir proposer une thérapie ciblant spécifiquement les cellules tumorales. L'avènement des nouvelles technologies de séquençage haut débit permet cette caractérisation à un niveau moléculaire. Cette révolution technologique a entraîné le développement de nombreuses méthodes bioinformatiques. Dans cette thèse, nous nous intéressons particulièrement au développement de nouvelles méthodes computationnelles d'analyse de données de séquençage d'échantillons tumoraux permettant une identification précise d'altérations spécifiques aux tumeurs et une description fine des sous populations tumorales. Dans le premier chapitre, il s'agît d'étudier des méthodes d'identification d'altérations ponctuelles dans le cadre de séquençage ciblé, appliquées à une cohorte de patientes atteintes du cancer du sein. Nous décrivons deux nouvelles méthodes d'analyse, chacune adaptée à une technologie de séquençage, spécifiquement Roche 454 et Pacifique Biosciences.Dans le premier cas, nous avons adapté des approches existantes au cas particulier de séquences de transcrits. Dans le second cas, nous avons été confronté à un bruit de fond élevé entraînant un fort taux de faux positifs lors de l'utilisation d'approches classiques. Nous avons développé une nouvelle méthode, MICADo, basée sur les graphes de De Bruijn et permettant une distinction efficace entre les altérations spécifiques aux patients et les altérations communes à la cohorte, ce qui rend les résultats exploitables dans un contexte clinique. Le second chapitre aborde l'identification d'altérations de nombre de copies. Nous décrivons l'approche mise en place pour leur identification efficace à partir de données de très faible couverture. L'apport principal de ce travail consiste en l'élaboration d'une stratégie d'analyse statistique afin de mettre en évidence des changements locaux et globaux au niveau du génome survenus durant le traitement administré à des patientes atteintes de cancer du sein. Notre méthode repose sur la construction d'un modèle linéaire permettant d'établir des scores de différences entre les échantillons avant et après traitement. Dans le troisième chapitre, nous nous intéressons au problème de reconstruction clonale. Cette problématique récente est actuellement en plein essor, mais manque cependant d'un cadre formel bien établi. Nous proposons d'abord une formalisation du problème de reconstruction clonale. Ensuite nous utilisons ce formalisme afin de mettre en place une méthode basée sur les modèles de mélanges Gaussiens. Cette méthode utilise les altérations ponctuelles et de nombre de copies - comme celles abordées dans les deux chapitres précédents - afin de caractériser et quantifier les différentes populations clonales présentes dans un échantillon tumoral. / Cancer results from the excessive proliferation of cells decending from the same founder cell and following a Darwinian process of diversification and selection. This process is defined by the accumulation of genetic and epigenetic alterations whose characterization is a key element for establishing a therapy that would specifically target tumor cells. The advent of new high-throughput sequencing technologies enables this characterization at the molecular level. This technological revolution has led to the development of numerous bioinformatics methods. In this thesis, we are particularly interested in the development of new computational methods for the analysis of sequencing data of tumor samples allowing precise identification of tumor-specific alterations and an accurate description of tumor subpopulations. In the first chapter, we explore methods for identifying single nucleotide alterations in targeted sequencing data and apply them to a cohort of breast cancer patients. We introduce two new methods of analysis, each tailored to a particular sequencing technology, namely Roche 454 and Pacific Biosciences. In the first case, we adapted existing approaches to the particular case of transcript sequencing. In the second case, when using conventional approaches, we were confronted with a high background noise resulting in a high rate of false positives. We have developed a new method, MICADo, based on the De Bruijn graphs and making possible an effective distinction between patient-specific alterations and alterations common to the cohort, which makes the results usable in a clinical context. Second chapter deals with the identification of copy number alterations. We describe the approach put in place for their efficient identification from very low coverage data. The main contribution of this work is the development of a strategy for statistical analysis in order to emphasise local and global changes in the genome that occurred during the treatment administered to patients with breast cancer. Our method is based on the construction of a linear model to establish scores of differences between samples before and after treatment. In the third chapter, we focus on the problem of clonal reconstruction. This problem has recently gathered a lot of interest, but it still lacks a well-established formal framework. We first propose a formalization of the clonal reconstruction problem. Then we use this formalism to put in place a method based on Gaussian mixture models. Our method uses single nucleotide and copy number alterations - such as those discussed in the previous two chapters - to characterize and quantify different clonal populations present in a tumor sample. Cancer Bioinformatique NGS TGS Graphes de de Bruijn Modèles de mélanges Cancer Bioinformatics NGS TGS De Bruijn graphs Mixture models
5	Určování genetických variant z masivně paralelních sekvenačních dat pomocí lokálních reassembly / Variant calling using local reference-helped assemblies Dráb, Martin January 2017 (has links) Despite active development during past years, the task of sequencing a genome still remains a challenge. Our current technologies are not able to read the whole genome in one piece. Instead, we shatter the target genome into a large amounts of small pieces that are then sequenced separately. The process of assembling these small pieces together, in order to obtain sequence of the whole genome, is painful and rsource-consuming. Multiple algorithms to solve the assembly problem were developed. This thesis presents yet another assembly algorithm, based on the usage of de Bruijn graphs, and focusing on sequencing short genome regions. The algorithm is compared to well-known solutions in the field. 1
6	[en] A NOVEL APPROACH FOR DE BRUIJN GRAPH CONSTRUCTION IN DE NOVO GENOME FRAGMENT ASSEMBLY / [pt] UMA NOVA ABORDAGEM PARA A CONSTRUÇÃO DO GRAFO DE BRUIJN NA MONTAGEM DE NOVO DE FRAGMENTOS DE GENOMA ELVISMARY MOLINA DE ARMAS 04 May 2020 (has links) [pt] A montagem de fragmentos de sequências biológicas é um problema fundamental na bioinformática. Na montagem de tipo De Novo, onde não existe um genoma de referência, é usada a estrutura de dados do grafo de Bruijn para auxiliar com o processamento computacional. Em particular, é necessário considerar um conjunto grande de k-mers, substrings das sequências biológicas. No entanto, a construção deste grafo tem grande custo computacional, especialmente muito consumo de memoria principal, tornando-se inviável no caso da montagem de grandes conjuntos de k-mers. Há soluções na literatura que utilizam o modelo de memória externa para conseguir executar o procedimento. Porém, todas envolvem alta redundância nos cálculos envolvendo os k-mers, aumentando consideravelmente o número de operações de E/S. Esta tese propõe uma nova abordagem para a construção do grafo de Bruijn que torna desnecessária a geração de todos os k-mer. A solução permite uma redução dos requisitos computacionais e a viabilidade da execução, o que é confirmado com os resultados experimentais. / [en] Fragment assembly is a current fundamental problem in bioinformatics. In the absence of a reference genome sequence that could guide the whole process, a de Bruijn Graph data structure has been considered to improve the computational processing. Notably, we need to count on a broad set of k-mers, biological sequences substrings. However, the construction of de Bruijn Graphs has a high computational cost, primarily due to main memory consumption. Some approaches use external memory processing to achieve feasibility. These solutions generate all k-mers with high redundancy, increasing the number of managed data and, consequently, the number of I/O operations. This thesis proposes a new approach for de Bruijn Graph construction that does not need to generate all k-mers. The solution enables to reduce computational requirements and execution feasibility, which is confirmed with the experimental results. [pt] MONTAGEM DE GENOMAS [pt] K MER [pt] GRAFO DE BRUIJN [en] GENOME ASSEMBLY [en] K MER [en] DE BRUIJN GRAPH
7	Graphes et cycles de de Bruijn dans des langages avec des restrictions Eduardo, Moreno 30 May 2005 (has links) (PDF) Soit un langage composé par tous les mots d'une longueur donnée $n$. Un cycle de de Bruijn d'ordre $n$ est un mot cyclique tel que tous les mots du langage apparaissent exactement une fois comme facteurs de ce cycle. Un algorithme pour construire le cycle de de Bruijn lexicographiquement minimal est dû à Fredricksen et Maiorana, il utilise les mots de Lyndon du langage. Cette thèse étudie comment généraliser le concept de cycles de de Bruijn pour un langage composé par un sous-ensemble de mots de longueur $n$, en particulier les langages de tous les mots de longueur $n$ sans facteurs dans une liste de facteurs interdits. Premièrement, nous étudions le cas des mots sans le facteur 11. Nous fournissons de nouvelles preuves de l'algorithme de Fredricksen et Maiorana qui nous permettent de prolonger ce resultat au cas des mots sans le facteur $1^i$ pour n'importe quel $i$. Nous caractérisons pour quels langages de mots de longueur $n$ existe un cycle de de Bruijn, et nous étudions également quelques propriétés de la dynamique symbolique de ces langages, en particulier des langages définis par des facteurs interdits. Pour ces genres de langages, nous présentons un algorithme pour produire un cycle de de Bruijn, en utilisant les mots de Lyndon du langage. Ces résultats utilisent la notion du graphe de de Bruijn et réduit le problème à construire un cycle eulérien dans ce graphe. Nous étudions le problème de la construction du cycle minimal dans un langage avec des facteurs interdits en employant le graphe de de Bruijn. Nous étudions deux algorithmes, un algorithme glouton simple et efficace qui fonctionne avec quelques familles de langages, et un algorithme plus complexe qui résout ce problème pour n'importe quel graphe eulérien. [INFO] Computer Science langages formels théorie des graphes cycle de de Bruijn
8	de Bruijn-sekvenserDet effektiva paketbudet / de Bruijn Sequences – TheEffective Courier Löthgren, Anders January 2014 (has links) Denna uppsats behandlar specialfall av de Bruijn-sekvenser där varje sekvens av längd n i de Bruijn-sekvensen innehåller samtliga k olika element från ett alfabet Ak. Uppsatsen kommer att demonstrera hur man kan generera de Bruijn-sekvenser med hjälp av Eulercykler. Arbetet kommer därför även att ge en bakgrund om Eulercykler och även ange en metod för att bestämma antalet unika cykler. de Bruijn-sekvens graf Eulervandring Hamiltoncykel Stirlingtal av andra sorten
9	Omnitig listing and contig assembly for genomic De Bruijn graphs Zirondelli, Elia Carlo 11 February 2022 (has links) Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. If one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. All such safe walks were recently characterized as omnitigs, leading to the first safe and complete genome assembly algorithm. Even if omnitig finding was improved to quadratic time, it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We describe an O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig, with two consequences: a linear-time output sensitive algorithm enumerating all maximal omnitigs and a compact O(m) representation of all maximal omnitigs. This safe and complete genome assembly algorithm was followed by other works improving the time bounds, as well as extending the results for different notions of assembly solution. But it remained open whether one can be complete also for models of genome assembly of practical applicability. In this dissertation, we also present a universal framework for obtaining safe and complete algorithms which unify the previous results, while also allowing to characterize different assembly problems. This is based on a novel graph structure, called the hydrostructure of a walk, which highlights the reachability properties of the graph from the perspective of the walk. Almost all of our characterizations are directly adaptable to optimal verification algorithms, and simple enumeration algorithms. Most of these algorithms are also improved to optimality using an incremental computation procedure and a previous optimal algorithm of a specific model.
10	Graph-based genomic signatures Pati, Amrita 14 May 2008 (has links) Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons to regions of conserved or divergent gene order. Genomic signatures work by capturing one or more such features efficiently into a compact mathematical structure. This work examines the unique manner in which oligonucleotides fit together to comprise a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a marriage of a de Bruijn graph and a finite Markov chain. By representing a DNA sequence as a walk over a DBC and retaining specific information at nodes and edges, we are able to obtain the de Bruijn chain genomic signature (DBCGS), based on both graph structure and the stationary distribution of the DBC. We demonstrate that DBCGS is information-rich, efficient, sufficiently representative of the sequence from which it is derived, and superior to existing genomic signatures such as the dinucleotides odds ratio and word frequency based signatures. We develop a mathematical framework to elucidate the power of the DBCGS signature to distinguish between sequences hypothesized to be generated by DBCs of distinct parameters. We study the effect of order of the DBCGS signature on accuracy while presenting relationships with genome size and genome variety. We illustrate its practical value in distinguishing genomic sequences and predicting the origin of short DNA sequences of unknown origin, while highlighting its superior performance compared to existing genomic signatures including the dinucleotides odds ratio. Additionally, we describe details of the CMGS database, a centralized repository for raw and value-added data particular to C. elegans. / Ph. D. Markov chains de Bruijn graphs Genomic signatures DNA words

Search results