Return to search

Computational Methods for the Analysis of Mitochondrial Genomes: Using Annotated de Bruijn Graphs

Much of our understanding of eukaryotic life has come from studying mitochondrial DNA, giving rise to leading hypotheses in evolution. To enable these studies, efficient algorithms are needed to interpret, analyze, and draw relevant conclusions from the available mitochondrial sequence data. The central theme of this work is to provide such algorithms for two biological problems in mitogenomes. The key element of both methods is the de Bruijn graph. Small sequence segments of length k, called k-mers, of the genomes represented in the graph form the vertices. Two vertices are connected if the suffix of length (k-1) of the first vertex is equal to the prefix of length (k-1) of the second vertex. The edges are thus specified by the (k+1)-mer consisting of the k-prefix of the first vertex and the last character of the second vertex.

The first problem is the automated accurate annotation of genes in complete mitochondrial sequences. For this purpose, a new method, called DeGeCI, is presented. The method uses a large collection of mitogenomes whose sequence data is represented as an annotated de Bruijn graph. To annotate an input genome sequence, initially, a subgraph induced by all (k+1)-mers of the sequence is constructed. Unmapped parts of the sequence result in disconnected components in this subgraph, which are bridged in the next step. For this purpose, alternative trails with a high sequence similarity to the respective unmapped subsequences of the input genome are identified in the database graph and added to the subgraph. Using a clustering approach, DeGeCI aggregates annotations contained in the resulting subgraph to obtain gene predictions for the input sequence.

The thesis also presents the follow-up version of DeGeCI, which offers additional features and, in contrast to DeGeCI, can be used via a web server front-end.

Genome rearrangements, which change the arrangement of the genes in the genome, are particularly common in mitogenomes. The locations in the genomes where the gene order differs are called breakpoints. The second objective of this thesis is to localize these breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. A novel method, DeBBI, is presented to address this task. The method constructs a colored de Bruijn graph of the input sequences, where each color is associated with one of the sequences. This graph is searched for certain structures that can be associated with the breakpoint locations. These so-called breakpoint bulges are common paths that branch into two separate paths and rejoin again at another location. One of the branches is short and of a single color, while the other branch is long and color-alternating. Sequence dissimilarities distort these structures by introducing additional branches. To identify the bulges despite these distortions, DeBBI uses a heuristic algorithm.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:91193
Date02 May 2024
CreatorsFiedler, Lisa
ContributorsUniversität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds