1 |
Workflows for identifying differentially expressed small RNAs and detection of low copy repeats in humanLiu, Xuan, 刘璇 January 2014 (has links)
With the rapid development of next-generation sequencing NGS technology, we are able to investigate various aspects biological problems, including genome and transcriptome sequencing, genomic structural variation and the mechanism of regulatory small RNAs, etc. An enormous number of associated computational methods have been proposed to study the biological problems using NGS reads, at a low cost of expense and time. Regulatory small RNAs and genomic structure variations are two main problems that we have studied.
In the area of regulatory small RNA, various computational tools have been designed from the prediction of small RNA to target prediction. Regulatory small RNAs play essential roles in plants and bacteria such as in responses to environmental stresses. We focused on sRNAs that in act by base pairing with target mRNA in complementarity. A comprehensive analysis workflow that is able to integrate sRNA-Seq and RNA-Seq analysis and generate regulatory network haven't been designed yet. Thus, we proposed and implemented two small RNA analysis workflow for plants and bacteria respectively.
In the area of genomic structural variations (SV), two types of disease-related SVs have been investigated, including complex low copy repeats (LCRs, also termed as segmental duplications) and tandem duplication (TD). LCRs provide structural basis to form a combination of other SVs which may in turn lead to some serious genetic diseases and TDs of specific areas have been reported for patients. Locating LCRs and TDs in human genome can help researchers to further interrogate the mechanism of related diseases. Therefore, we proposed two computational methods to predict novel LCRs and TDs in human genome. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
2 |
Filtering of false positive microRNA candidates by a clustering-based approachLeung, Wing-sze, 梁穎思 January 2009 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
3 |
Reconstructing gene regulatory networks with new datasets. / CUHK electronic theses & dissertations collectionJanuary 2013 (has links)
競爭性內源核糖核酸(ceRNA) 假設最近已成為生物訊息學研究中最熱門的話題之一。Cell 是在生物科學界上經常被引用的學術期刊,早前亦有一班學者在Cell 2011年同一期成功發佈四篇關於ceRNA 假設的學術文章。跟據有關ceRNA 假設的學術文章,大部份學者均以不同的個別例子成功驗證假定,可是,欠缺一個大規模的及全面性的分析。 / 在我兩年碩士的研究中,我引入了一個新的概念微核糖核酸及其目標對向聚類(MTB) 運用了ceRNA 的假設,還提出算法,成功從微核糖核酸與信使核糖核酸的相互數據中找出一系列的MTB' 還利用GENCODE 項目上大量的微核糖核酸及信使核糖核酸的表達數據去驗証MTB 的概念。一方面,我從大量的表達數據中成功推斷出微核糖核酸與信使核糖核酸之間的相反關連、信使核糖核酸之間的正面關運和微核糖核酸之間的正面關連;另一方面,這些關連進一步肯定ceRNA 假設的真實性。此外,我提出一個從大量基因組中找出基因功能分析的方法,並在大量的MTB 的基因組中找出重要的基因註解。最後,我提出另一個MTB 概念的應用一新算法來預測微核糖核酸與信使核糖核酸的相互影響。總括而吉, MTB 概念從複雜且混亂的微核糖核酸與信使核糖核酸網絡中定義簡單且穩固的模姐,提供一個系統生物學分析微核糖核酸調節能力的方法。 / The competing Endogenous RNA (ceRNA) hypothesis has become one of the hottest topics in bioinformatics research recently. Four papers related to the ceRNA hypothesis were published simultaneously in Cell in 2011, a top journal in life sciences. For most papers related to the ceRNA hypothesis, the corresponding studies have successfully validated the hypothesis with different individual examples, without a large-scale and comprehensive analysis. / In my Master of Philosophy study, a novel concept, called mi-RNA Target Bicluster (MTB), is introduced to model the ceRNA hypothesis. The MTBs are identified computationally from validated and/or predicted miRNA-mRNA interaction pairs. The MTB models were tested with the mRNAs and miRNAs expression data from the GENCODE Project. Statistically significant miRNA-mRNA anti-correlation, mRNA-mRNA correlation and miRNA-miRNA correlation in expression data are found, verifying the correlation relations among mRNAs and miRNAs stated in the ceRNA hypothesis with large-scale data support. Moreover, a novel large-scale functional enrichment analysis is performed, and the mRNAs selected by the MTBs are found to be biologically relevant. Besides, some new target prediction algorithms are suggested, as another application of the MTBs, are suggested. Overall, the concept of MTB defines simple and robust modules from the complex and noisy miRNA-mRNA network, suggesting ways for system biology analyses in miRNA-mediated regulations. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Yip, Kit Sang Danny. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves [117]-126). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Contributions --- p.1 / Chapter 1.2 --- Thesis Outline --- p.2 / Chapter 2 --- Background --- p.3 / Chapter 2.1 --- Bioinformatics --- p.3 / Chapter 2.2 --- Biological Background --- p.7 / Chapter 2.2.1 --- The Central Dogma of Molecular Biology . --- p.7 / Chapter 2.2.2 --- RNAs --- p.8 / Chapter 2.2.3 --- Competing Endogenous RNA (ceRNA) hypothesis --- p.9 / Chapter 2.2.4 --- Biological Considerations in Functional Enrichment Analysis --- p.11 / Chapter 2.3 --- Computational Background --- p.12 / Chapter 2.3.1 --- miRNA Genomic Annotation Prediction --- p.13 / Chapter 2.3.2 --- miRNA Target Interaction Prediction --- p.14 / Chapter 2.3.3 --- Applying Computational Algorithms on Related Problems --- p.16 / Chapter 2.3.4 --- Algorithms in Functional Enrichment Analysis --- p.16 / Chapter 2.4 --- Experiments and Data --- p.17 / Chapter 2.4.1 --- miRNA Target Interactions --- p.17 / Chapter 2.4.2 --- Expression Data --- p.18 / Chapter 2.4.3 --- Annotation Datasets --- p.19 / Chapter 2.5 --- Research Motivations --- p.20 / Chapter 3 --- Definitions of miRNA Target Biclusters (MTB) --- p.22 / Chapter 3.1 --- Representations --- p.22 / Chapter 3.1.1 --- Binary Association Matrix Representation --- p.23 / Chapter 3.1.2 --- Bipartite Graph Representation --- p.23 / Chapter 3.1.3 --- Mathematical Representation --- p.24 / Chapter 3.2 --- Concept of MTB --- p.24 / Chapter 3.2.1 --- MTB Restrictive Type (Type R) --- p.27 / Chapter 3.2.2 --- MTB Restrictive Type on miRNA (Type Rmi) --- p.31 / Chapter 3.2.3 --- MTB Restrictive Type on mRNA (Type Rm) --- p.34 / Chapter 3.2.4 --- MTB Restrictive and General Type (Type Rgen) --- p.37 / Chapter 3.2.5 --- MTB Loose Type (Type L) --- p.44 / Chapter 3.2.6 --- MTB Loose Type but restricts on miRNA (Type Lmi) --- p.47 / Chapter 3.2.7 --- MTB Loose Type but restricts on mRNA (Type Lm) --- p.50 / Chapter 3.2.8 --- MTB Loose and General Type (Type Lgen) --- p.53 / Chapter 3.2.9 --- A General Definition on all Eight Types --- p.58 / Chapter 3.2.10 --- Discussions --- p.60 / Chapter 4 --- MTB Workflow in Checking Correlation Relations --- p.61 / Chapter 4.1 --- MTB Workflow in Checking Correlation Relations --- p.61 / Chapter 4.1.1 --- MTB Identification --- p.62 / Chapter 4.1.2 --- Correlation Coefficients --- p.63 / Chapter 4.1.3 --- Scoring Scheme --- p.64 / Chapter 4.1.4 --- Background Construction --- p.65 / Chapter 4.1.5 --- Wilcoxon Rank-sum Test --- p.66 / Chapter 4.1.6 --- Preliminary Studies --- p.67 / Chapter 4.2 --- miRNA-mRNA Anti-correlation in Expression Data --- p.68 / Chapter 4.2.1 --- Interaction Datasets --- p.69 / Chapter 4.2.2 --- Expression Datasets --- p.72 / Chapter 4.2.3 --- Independence of the Choices of Datasets --- p.73 / Chapter 4.2.4 --- Independence of the Types of MTBs --- p.76 / Chapter 4.2.5 --- Independence of the Choices of Correlation Coefficients --- p.78 / Chapter 4.2.6 --- Dependence on the Way to Score --- p.79 / Chapter 4.2.7 --- Independence of theWay to Construct Background --- p.81 / Chapter 4.2.8 --- Independence of Natural Bias in Datasets --- p.82 / Chapter 4.3 --- mRNA-mRNA Correlation in Expression Data --- p.84 / Chapter 4.3.1 --- Variations in the Analysis --- p.85 / Chapter 4.3.2 --- Discussions --- p.87 / Chapter 4.4 --- miRNA-miRNA Correlation in Expression Data --- p.88 / Chapter 4.4.1 --- Variations in the Analysis --- p.89 / Chapter 4.4.2 --- Discussions --- p.92 / Chapter 5 --- Target Prediction Aided by MTB --- p.94 / Chapter 5.1 --- Workflow in Target Prediction --- p.94 / Chapter 5.2 --- Contingency Table Approach --- p.96 / Chapter 5.2.1 --- One-tailed Hypothesis Testing --- p.97 / Chapter 5.3 --- Ranked List Approach --- p.98 / Chapter 5.3.1 --- Wilcoxon Signed Rank Test --- p.99 / Chapter 5.4 --- Results and Discussions --- p.99 / Chapter 6 --- Large-scale Functional Enrichment Analysis --- p.102 / Chapter 6.1 --- Principles in Functional Enrichment Analysis --- p.102 / Chapter 6.1.1 --- Annotation Files --- p.104 / Chapter 6.1.2 --- Functional Enrichment Analysis on a gene --- p.set105 / Chapter 6.1.3 --- Functional Enrichment Analysis on many gene sets --- p.106 / Chapter 6.2 --- Results and Discussions --- p.107 / Chapter 7 --- Future Perspectives and Conclusions --- p.112 / Chapter 7.1 --- Applying MTB definition on other problems --- p.112 / Chapter 7.2 --- Matrix Definitions and Optimization Problems --- p.113 / Chapter 7.3 --- Non-binary association matrix problem settings --- p.114 / Chapter 7.4 --- Limitations --- p.114 / Chapter 7.5 --- Conclusions --- p.116 / Bibliography --- p.117 / Chapter A --- Publications --- p.127 / Chapter A.1 --- Publications --- p.127
|
4 |
Comparison and improvement of siRNA design toolsMui, Yuen-chi., 梅宛芝. January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy
|
5 |
Database construction and computational analysis of bacterial small regulatory RNAs. / CUHK electronic theses & dissertations collectionJanuary 2013 (has links)
Li, Lei. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 85-91). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese.
|
6 |
Non-coding RNA identification along genomeWong, king-fung., 黃景峰. January 2011 (has links)
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
7 |
Low dimensional structure in single cell dataKunes, Russell Allen Zhang January 2024 (has links)
This thesis presents the development of three methods, each of which concerns the estimation of interpretable low dimensional representations of high dimensional data. The first two chapters consider methods for fitting low dimensional nonlinear representations. In Chapter 1, we discuss the deterministic input, noisy "and" gate (DINA) model and in Chapter 2, binary variational autoencoders. We present an example of application to single cell assay for transposase accessible chromatin sequencing data (single cell ATACseq), where the DINA model uncovers meaningful discrete representations of cell state. In scientific applications, practitioners have substantial prior knowledge of the latent components driving variation in the data. The third Chapter develops a supervised matrix factorization method, Spectra, that leverages annotations from experts and previous biological experiments to uncover latent representations of single cell RNAseq data.
Variational inference for the DINA model:
The deterministic input, noisy "and" gate (DINA) model allows for matrix decomposition where latent factors are allowed to interact via an "and" relationship. We develop a variational inference approach for estimating the parameters of the DINA model. Previous approaches based on variational inference enumerate the space of latent binary parameters (requiring exponential numbers of parameters) and cannot fit an unknown number of latent components. Here, we report that a practical mean field variational inference approach relying on a nonparametric cumulative shrinkage process prior and stochastic coordinate ascent updates achieves competitive results with existing methods while simultaneously determining the number of latent components. This approach allows scaling exploratory Q-matrix estimation to datasets of practical size with minimal hyperparameter tuning.
Gradient estimation for binary latent variable models:
In order to fit binary variational autoencoders, the gradient of the objective function must be estimated. Generally speaking, gradient estimation is often necessary for fitting generative models with discrete latent variables. Examples of this occur in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator bitflip-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, unbiased gradient variance clipping (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM.Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
The Spectra model for supervised matrix decomposition:
Factor analysis decomposes single-cell gene expression data into a minimal set of gene programs that correspond to processes executed by cells in a sample. However, matrix factorization methods are prone to technical artifacts and poor factor interpretability. We address these concerns with Spectra, an algorithm that combines user-provided gene programs with the detection of novel programs that together best explain expression covariation. Spectra incorporates existing gene sets and cell type labels as prior biological information. It explicitly models cell type and represents input gene sets as a gene-gene knowledge graph, using a penalty function to guide factorization towards the input graph. We show that Spectra outperforms existing approaches in challenging tumor immune contexts: it finds factors that change under immune checkpoint therapy, disentangles the highly correlated features of CD8+ T-cell tumor reactivity and exhaustion, finds a program that explains continuous macrophage state changes under therapy, and identifies cell-type-specific immune metabolic programs.
|
Page generated in 0.0846 seconds