In the last years, the amount of available data in the field of computational biology steadily increased. In order to be able to analyze these data, various algorithms have been developed by bioinformaticians to process them efficiently. Moreover, computational models were developed to predict for instance biological relationships of species. Furthermore, the prediction of properties like the structure of certain biological molecules is modeled by complex algorithms. Despite these advances in handling such complicated tasks with automated workflows and a huge variety of freely available tools, the expert still needs to supervise the data analysis pipeline inspecting the quality of both the input data and the results. Additionally, choosing appropriate parameters of a model is quite involved.
Visual support puts the expert into the data analysis loop by providing visual encodings of the data and the analysis results together with interaction facilities. In order to meet the requirements of the experts, the visualizations usually have to be adapted for the application purpose or completely new representations have to be developed. Furthermore, it is necessary to combine these visualizations with the algorithms of the experts to prepare the data. These in-situ visualizations are needed due to the amount of data handled within the analysis pipeline in this domain.
In this thesis, algorithms and visualizations are presented that were developed in two different research areas of computational biology. On the one hand, the multi-replicate peak-caller Sierra Platinum was developed, which is capable of predicting significant regions of histone modifications occurring in genomes based on experimentally generated input data. This algorithm can use several input data sets simultaneously to calculate statistically meaningful results. Multiple quality measurements and visualizations were integrated into to the data analysis pipeline to support the analyst. Based on these in-situ visualizations, the analyst can modify the parameters of the algorithm to obtain the best results for a given input data set. Furthermore, Sierra Platinum and related algorithms were benchmarked against an artificial data set to evaluate the performance under specific conditions of the input data set, e.g., low read quality or undersequenced data. It turned out that Sierra Platinum achieved the best results in every test scenario. Additionally, the performance of Sierra Platinum was evaluated with experimental data confirming existing knowledge. It should be noticed that the results of the other algorithms seemed to contradict this knowledge.
On the other hand, this thesis describes two new visualizations for RNA secondary structures. First, the interactive dot plot viewer iDotter is described that is able to visualize RNA secondary structure predictions as a web service. Several interaction techniques were implemented that support the analyst exploring RNA secondary structure dot plots. iDotter provides an API to share or archive annotated dot plots. Additionally, the API enables the embedding of iDotter in existing data analysis pipelines.
Second, the algorithm RNApuzzler is presented that generates (outer-)planar graph drawings for all RNA secondary structure predictions. Previously presented algorithms failed in always producing crossing-free graphs. First, several drawing constraints were derived from the literature. Based on these, the algorithm RNAturtle was developed that did not always produced planar drawings. Therefore, some drawing constraints were relaxed and additional drawing constraints were established. Building on these modified constraints, RNApuzzler was developed. It takes the drawing generated by RNAturtle as an input and resolves the possible intersections of the graph. Due to the resolving mechanism, modified loops can become very large during the intersection resolving step. Therefore, an optimization was developed. During a post-processing step the radii of the heavily modified loops are reduced to a minimum. Based on the constraints and the intersection resolving mechanism, it can be shown that RNApuzzler is able to produce planar drawings for any RNA secondary structure. Finally, the results of RNApuzzler are compared to other algorithms.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:34380 |
Date | 02 July 2019 |
Creators | Wiegreffe, Daniel |
Contributors | Universität Leipzig |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/acceptedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | urn:nbn:de:bsz:15-qucosa2-344026, qucosa:34402, qucosa:34402 |
Page generated in 0.0021 seconds