1 |
Efficient algorithms for optimizing whole genome alignmentLu, Ning, 陸宁 January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science / Master / Master of Philosophy
|
2 |
Gene fusion discovery through RNA-seq and inversion detection via optical mappingWu, Jikun, 武继坤 January 2013 (has links)
RNA-seq sequencing has revolutionized the landscape of whole transcriptome sequencing and analysis. With its capacity of sequencing in a high-throughput and low-cost way, it produced ever increasingly amount of RNA-seq reads that are mines of treasure in biological and therapeutic studies. However, due to the complex nature and relatively un-developed knowledge base of transcription process, many challenges exist in the modeling and investigation of RNA-seq read data. It is of high importance to develop efficient computational tools to satisfy these needs.
The first part of this thesis concentrates on algorithms for both upstream and downstream analysis of RNA-seq data. For the upstream, we aim to tackle down the problems of RNA-seq reads alignment where the segmental alignment causes the major difficulty. By employing a strategy of rigid extensive tries on read segmentations indices, we implemented an accurate algorithm for returning two-segmental alignments based on bi-directional BWT. For the downstream analysis, we study two types of gene fusion events which play a critical role in the formation of cancers. Unlike previous down-scoping-search methods, we applied a search-validate approach to design the framework. By introducing key techniques such as masking, two-segmental alignment and retention of multiple maps, we developed an efficient and robust tool for detecting gene fusions with high accuracy that proved by extensive simulation and real data tests.
Optical mapping is a cutting edge technique for the study of genomic structural variations which address the defect and limitation of paired-end sequencing. It was designed with great improvement in accuracy, resolution and throughput than current techniques. Also, it produces much longer molecules which enables us to explore genomic regions rich in repetitive sequences. Optical mapping has the potential to enable us to draw a complete picture of the genome structure polymorphism and it is important for us to design tools for analysis of the data.
The second part of the thesis is dedicated to the algorithms for both upstream and downstream analysis of optical map data. For the upstream, we formulated a robust scoring function, which combines the effectiveness of heuristic functions and the accuracy of statistical functions. Based on it, we implemented the high performance OMDP algorithm. For the downstream, we developed BP-OMDP which makes use of both split-mapping and disparity of coverage depth to call inversions in NA12878 human genome sample. / published_or_final_version / Computer science / Doctoral / Doctor of Philosophy
|
3 |
Large-scale phylogenetic analysisWang, Li-san 28 August 2008 (has links)
Not available / text
|
4 |
The development and application of informatics-based systems for the analysis of the human transcriptome.Kelso, Janet January 2003 (has links)
<p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br />
<br />
In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p>
|
5 |
A study on predicting gene relationship from a computational perspectiveChan, Pui-yee., 陳沛儀. January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy
|
6 |
Making sense of cDNA : automated annotation, storing in an interactive database, mapping to genomic DNAShmeleva, Nataliya V. 08 1900 (has links)
No description available.
|
7 |
The development and application of informatics-based systems for the analysis of the human transcriptome.Kelso, Janet January 2003 (has links)
<p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br />
<br />
In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p>
|
8 |
Quantitative trait variation and adaptation in contemporary humansMostafavi, Hakhamanesh January 2019 (has links)
Human genomic data sets are now reaching sample sizes on the order of hundreds of thousands and soon exceeding millions, providing unprecedented opportunities to understand human evolution. Most studies of human adaptation so far have focused on selection that has acted over the past million to few thousand years. However, powered by large data sets, it is now feasible to study allele frequency changes that occur within the short timescale of a few generations, directly observing selection acting in contemporary humans. I take this approach in the work presented in Chapter 1 of this thesis, where we performed a genome-wide scan to identify a set of genetic variants that influence age-specific mortality in present-day samples. Our findings include two variants in the APOE and CHRNA3 loci, as well as sets of variants contributing to a number of traits, including coronary artery disease and cholesterol levels, and intriguingly, to timing of puberty and child birth.
New research directions have also opened up with the advent of large-scale genome-wide association studies (GWAS), which have begun to uncover genetic variants underlying a number of human traits, ranging from disease susceptibility to social and behavioral traits such as educational attainment and neuroticism. One such direction is the use of polygenic scores (PGS), which aggregate GWAS findings into one score as a measure of genetic propensity for traits, for phenotypic prediction. A major obstacle to this application is that the prediction accuracy of PGS drops in samples that have a different genetic ancestry than the GWAS sample. Our work, presented in Chapter 2, demonstrates that PGS prediction accuracy is also variable within genetic ancestries depending on factors such as age, sex, and socioeconomic status, as well as GWAS study design. These findings have important implications for the increasing use of these measures in diverse disciplines such as social sciences and human genetics.
|
9 |
Non-coding RNA identification along genomeWong, king-fung., 黃景峰. January 2011 (has links)
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
10 |
Genomic data mining for the computational prediction of small non-coding RNA genesTran, Thao Thanh Thi 20 January 2009 (has links)
The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.
|
Page generated in 0.1208 seconds