1 |
De Bruijn Graphs and Lamplighter GroupsAlharthy, Shathaa 20 February 2019 (has links)
De Bruijn graphs were originally introduced for finding a superstring representation for all fixed length words of a given finite alphabet. Later they found numerous applications, for instance, in DNA sequencing. Here we study a relationship between de Bruijn graphs and the family of lamplighter groups (a particular class of wreath products). We show how de Bruijn graphs and their generalizations can be presented as Cayley and Schreier graphs of lamplighter groups.
|
2 |
The Coloring and Routing Problems on de Bruijn Interconnection NetworksMao, Jyh-Wen 01 September 2003 (has links)
de Bruijn graphs are attractive due to its simplicity of routing messages between two nodes and the capability of fault tolerance. The shortest path from a node V to a node W in the directed binary de Bruijn graph can be obtained by firstly determining the longest substring, common to the right/left of V and to the left/right of W. Then L-operations/R-operations are performed to finish this routing process. However, this method does not always find the shortest path in the undirected binary de Bruijn graph. In this dissertation, we propose a shortest path routing algorithm which requires O(m2) time. We also design a fault-tolerant routing algorithm which provides the shortest path and another node-disjoint path of length at most m + log2m + 4. Our algorithm can tolerate one node failure in the m-dimensional binary de Bruijn network.
In concurrent systems, a 1-fair alternator design is optimal if each processor can execute the critical step once in the fewest steps. This problem corresponds to use the minimum number of colors to color the processors in the system. Thus, the optimal
design of a 1-fair alternator problem can be transformed into the coloring problem. We propose a simple and fast algorithm to solve the node coloring problem on the undirected binary de Bruijn graph. In our algorithm, the number of colors used is 3, and it is an optimal design. We also extend our method to solve the coloring problem on k-ary de Bruijn graphs. We first present a simple algorithm which needs 2k colors. By slight improvement, the number of required colors is reduced to k+1.
|
3 |
1-Fair Alternator Designs for the de Bruijn NetworkLin, Hsu-Shen 01 September 2006 (has links)
An alternator is a self-stabilizing system which consists of a network of concurrent processors. One of its properties is that any two processors of an alternator system cannot execute the critical step at the same time if
they are adjacent. This exclusion property transforms the alternator design problem into the coloring problem.
And an alternator is said to be 1-fair if no processor executes the critical step twice when one or more other processors have not executed the critical step yet. The simplicity of routing message and the capability of fault
tolerance of de Bruijn networks attract us to design 1-fair alternator on them.
In this thesis, two algorithms are proposed to solve the coloring problem on the de Bruijn network. The first one uses $2ceil{log_2k}+1$ colors to color the $k$-ary de Bruijn graph with two digits, while the second one uses $p+1$ only colors, where ${{p-1}choose{floor{(p-1)/2}}} < k leq {pchoose{floor{p/2}}}$. We also prove that the second coloring method is optimal when $k = {pchoose{floor{p/2}}}$. In other words, the chromatic number of the
$k$-ary de Bruijn graph with two digits is $p+1$, where
$k = {pchoose{floor{p/2}}}$. Furthermore, the extension of our coloring method can be applied to the $k$-ary de Bruijn graph with three or more digits.
|
4 |
Omnitig listing and contig assembly for genomic De Bruijn graphsZirondelli, Elia Carlo 11 February 2022 (has links)
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. If one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. All such safe walks were recently characterized as omnitigs, leading to the first safe and complete genome assembly algorithm. Even if omnitig finding was improved to quadratic time, it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We describe an O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig, with two consequences: a linear-time output sensitive algorithm enumerating all maximal omnitigs and a compact O(m) representation of all maximal omnitigs.
This safe and complete genome assembly algorithm was followed by other works improving the time
bounds, as well as extending the results for different notions of assembly solution. But it
remained open whether one can be complete also for models of genome assembly of practical
applicability.
In this dissertation, we also present a universal framework for obtaining safe and complete
algorithms which unify the previous results, while also allowing to characterize different
assembly problems. This is based on a novel graph structure,
called the hydrostructure of a walk, which highlights the reachability properties of the graph from the
perspective of the walk. Almost all of our characterizations are directly
adaptable to optimal verification algorithms, and simple enumeration algorithms. Most of these
algorithms are also improved to optimality using an incremental computation procedure and a
previous optimal algorithm of a specific model.
|
5 |
On the chromatic number of the <em>AO</em>(2, <em>k </em>, <em>k</em>-1) graphs.Arora, Navya 06 May 2006 (has links)
The alphabet overlap graph is a modification of the well known de Bruijn graph. De Bruijn graphs have been highly studied and hence many properties of these graphs have been determined. However, very little is known about alphabet overlap graphs. In this work we determine the chromatic number for a special case of these graphs.
We define the alphabet overlap graph by G = AO(a, k, t, where a, k and t are positive integers such that 0 ≤ t ≤ k. The vertex set of G is the set of all k-letter sequences over an alphabet of size a. Also there is an edge between vertices u, v if and only if the last t letters in u match the first t letters in v or the first t letters in u match the last t letters in v. We consider the chromatic number for the AO(a, k, t graphs when k > 2, t = k - 1 and a = 2.
|
6 |
The Frobenius Problem in a Free MonoidXu, Zhi January 2009 (has links)
Given positive integers c1,c2,...,ck with gcd(c1,c2,...,ck) = 1, the Frobenius problem (FP) is to compute the largest integer g(c1,c2,...,ck) that cannot be written as a non-negative integer linear combination of c1,c2,...,ck. The Frobenius problem in a free monoid (FPFM) is a non-commutative generalization of the Frobenius problem. Given words x1,x2,...,xk such that there are only finitely many words that cannot be written as concatenations of words in {x1,x2,...,xk}, the FPFM is to find the longest such words. Unlike the FP, where the upper bound g(c1,c2,...,ck)≤max 1≤i≤k ci2 is quadratic, the upper bound on the length of the longest words in the FPFM can be exponential in certain measures and some of the exponential upper bounds are tight. For the 2FPFM, where the given words over Σ are of only two distinct lengths m and n with 1<m<n, the length of the longest omitted words is ≤g(m, m|Σ|n-m + n - m).
In Chapter 1, I give the definition of the FP in integers and summarize some of the interesting properties of the FP. In Chapter 2, I give the definition of the FPFM and discuss some general properties of the FPFM. Then I mainly focus on the 2FPFM. I discuss the 2FPFM from different points of view and present two equivalent problems, one of which is about combinatorics on words and the other is about the word graph. In Chapter 3, I discuss some variations on the FPFM and related problems, including input in other forms, bases with constant size, the case of infinite words, the case of concatenation with overlap, and the generalization of the local postage-stamp problem in a free monoid. In Chapter 4, I present the construction of some essential examples to complement the theory of the 2FPFM discussed in Chapter 2. The theory and examples of the 2FPFM are the main contribution of the thesis. In Chapter 5, I discuss the algorithms for and computational complexity of the FPFM and related problems. In the last chapter, I summarize the main results and list some open problems.
Part of my work in the thesis has appeared in the papers.
|
7 |
The Frobenius Problem in a Free MonoidXu, Zhi January 2009 (has links)
Given positive integers c1,c2,...,ck with gcd(c1,c2,...,ck) = 1, the Frobenius problem (FP) is to compute the largest integer g(c1,c2,...,ck) that cannot be written as a non-negative integer linear combination of c1,c2,...,ck. The Frobenius problem in a free monoid (FPFM) is a non-commutative generalization of the Frobenius problem. Given words x1,x2,...,xk such that there are only finitely many words that cannot be written as concatenations of words in {x1,x2,...,xk}, the FPFM is to find the longest such words. Unlike the FP, where the upper bound g(c1,c2,...,ck)≤max 1≤i≤k ci2 is quadratic, the upper bound on the length of the longest words in the FPFM can be exponential in certain measures and some of the exponential upper bounds are tight. For the 2FPFM, where the given words over Σ are of only two distinct lengths m and n with 1<m<n, the length of the longest omitted words is ≤g(m, m|Σ|n-m + n - m).
In Chapter 1, I give the definition of the FP in integers and summarize some of the interesting properties of the FP. In Chapter 2, I give the definition of the FPFM and discuss some general properties of the FPFM. Then I mainly focus on the 2FPFM. I discuss the 2FPFM from different points of view and present two equivalent problems, one of which is about combinatorics on words and the other is about the word graph. In Chapter 3, I discuss some variations on the FPFM and related problems, including input in other forms, bases with constant size, the case of infinite words, the case of concatenation with overlap, and the generalization of the local postage-stamp problem in a free monoid. In Chapter 4, I present the construction of some essential examples to complement the theory of the 2FPFM discussed in Chapter 2. The theory and examples of the 2FPFM are the main contribution of the thesis. In Chapter 5, I discuss the algorithms for and computational complexity of the FPFM and related problems. In the last chapter, I summarize the main results and list some open problems.
Part of my work in the thesis has appeared in the papers.
|
8 |
Chromatic Number of the Alphabet Overlap Graph, <em>G</em>(2, <em>k </em>, <em>k</em>-2).Farley, Jerry Brent 15 December 2007 (has links) (PDF)
A graph G(a, k, t) is called an alphabet overlap graph where a, k, and t are positive integers such that 0 ≤ t < k and the vertex set V of G is defined as, V = {v : v = (v1v2...vk); vi ∊ {1, 2, ..., a}, (1 ≤ i ≤ k)}. That is, each vertex, v, is a word of length k over an alphabet of size a. There exists an edge between two vertices u, v if and only if the last t letters in u equal the first t letters in v or the first t letters in u equal the last t letters in v. We determine the chromatic number of G(a, k, t) for all k ≥ 3, t = k − 2, and a = 2; except when k = 7, 8, 9, and 11.
|
9 |
[en] A NOVEL APPROACH FOR DE BRUIJN GRAPH CONSTRUCTION IN DE NOVO GENOME FRAGMENT ASSEMBLY / [pt] UMA NOVA ABORDAGEM PARA A CONSTRUÇÃO DO GRAFO DE BRUIJN NA MONTAGEM DE NOVO DE FRAGMENTOS DE GENOMAELVISMARY MOLINA DE ARMAS 04 May 2020 (has links)
[pt] A montagem de fragmentos de sequências biológicas é um problema fundamental na bioinformática. Na montagem de tipo De Novo, onde não existe um genoma de referência, é usada a estrutura de dados do grafo de Bruijn para auxiliar com o processamento computacional. Em particular, é necessário considerar um conjunto grande de k-mers, substrings das sequências biológicas. No entanto, a construção deste grafo tem grande custo computacional, especialmente muito consumo de memoria principal, tornando-se inviável no caso da montagem de grandes conjuntos de k-mers. Há soluções na literatura que utilizam o modelo de memória externa para conseguir executar o procedimento. Porém, todas envolvem alta redundância nos cálculos envolvendo os k-mers, aumentando consideravelmente o número de operações de E/S. Esta tese propõe uma nova abordagem para a construção do grafo de Bruijn que torna desnecessária a geração de todos os k-mer. A solução permite uma redução dos requisitos computacionais e a viabilidade da execução, o que é confirmado com os resultados experimentais. / [en] Fragment assembly is a current fundamental problem in bioinformatics. In the absence of a reference genome sequence that could guide the whole process, a de Bruijn Graph data structure has been considered to improve the computational processing. Notably, we need to count on a broad set of k-mers, biological sequences substrings. However, the construction of de Bruijn Graphs has a high computational cost, primarily due to main memory consumption. Some approaches use external memory processing to achieve feasibility. These solutions generate all k-mers with high redundancy, increasing the number of managed data and, consequently, the number of I/O operations. This thesis proposes a new approach for de Bruijn Graph construction that does not need to generate all k-mers. The solution enables to reduce computational requirements and execution feasibility, which is confirmed with the experimental results.
|
10 |
Rekonstrukce opakujících se segmentů DNA / Reconstruction of Repetitive DNA SegmentsBikár, Robert January 2016 (has links)
Hlavní motivací diplomové práce bylo najít vhodný algoritmus, který by vytvořil grafovou reprezentaci NGS sekvenačních dat v lineárním čase. Zvolenou metodou pro reprezentaci je de Bruijnův graf. V další části práce byl navrhnut nástroj, který je schopen transformovat graf do přijatelné podoby pro vykreslování, a dále je schopen odstraňovat chyby, které vznikají při konstrukci grafu. Cílem práce je vytvořit nástroj, který rekonstruuje repetitivní segmenty v DNA. Implementovaný nástroj byl otestován a je schopen identifikovat opakující se segmenty, určit jejich typy, vizualizovat je a sestavit jejich sekvenci na jednodušších genomech s velkou přesnotí. Při použití složitějších genomů, nástroj nalezne pouze fragmenty repetitivních segmentů.
|
Page generated in 0.0792 seconds