RNA-seq sequencing has revolutionized the landscape of whole transcriptome sequencing and analysis. With its capacity of sequencing in a high-throughput and low-cost way, it produced ever increasingly amount of RNA-seq reads that are mines of treasure in biological and therapeutic studies. However, due to the complex nature and relatively un-developed knowledge base of transcription process, many challenges exist in the modeling and investigation of RNA-seq read data. It is of high importance to develop efficient computational tools to satisfy these needs.
The first part of this thesis concentrates on algorithms for both upstream and downstream analysis of RNA-seq data. For the upstream, we aim to tackle down the problems of RNA-seq reads alignment where the segmental alignment causes the major difficulty. By employing a strategy of rigid extensive tries on read segmentations indices, we implemented an accurate algorithm for returning two-segmental alignments based on bi-directional BWT. For the downstream analysis, we study two types of gene fusion events which play a critical role in the formation of cancers. Unlike previous down-scoping-search methods, we applied a search-validate approach to design the framework. By introducing key techniques such as masking, two-segmental alignment and retention of multiple maps, we developed an efficient and robust tool for detecting gene fusions with high accuracy that proved by extensive simulation and real data tests.
Optical mapping is a cutting edge technique for the study of genomic structural variations which address the defect and limitation of paired-end sequencing. It was designed with great improvement in accuracy, resolution and throughput than current techniques. Also, it produces much longer molecules which enables us to explore genomic regions rich in repetitive sequences. Optical mapping has the potential to enable us to draw a complete picture of the genome structure polymorphism and it is important for us to design tools for analysis of the data.
The second part of the thesis is dedicated to the algorithms for both upstream and downstream analysis of optical map data. For the upstream, we formulated a robust scoring function, which combines the effectiveness of heuristic functions and the accuracy of statistical functions. Based on it, we implemented the high performance OMDP algorithm. For the downstream, we developed BP-OMDP which makes use of both split-mapping and disparity of coverage depth to call inversions in NA12878 human genome sample. / published_or_final_version / Computer science / Doctoral / Doctor of Philosophy
Identifer | oai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/195960 |
Date | January 2013 |
Creators | Wu, Jikun, 武继坤 |
Contributors | Lam, TW, Yiu, SM |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Source Sets | Hong Kong University Theses |
Language | English |
Detected Language | English |
Type | PG_Thesis |
Rights | Creative Commons: Attribution 3.0 Hong Kong License, The author retains all proprietary rights, (such as patent rights) and the right to use in future works. |
Relation | HKU Theses Online (HKUTO) |
Page generated in 0.0017 seconds