Global ETD Search

1	A two-step integrated approach to detect differentially expressed genes in RNA-Seq data / Two step integrated approach to detect differentially expressed genes in RNA-Seq data Mahi, Naim Al 03 May 2014 (has links) Motivation: RNA-Sequence or RNA-Seq experiments produce millions of discrete DNA sequence reads, as a measure of gene expression levels. It enable researchers to investigate complex aspects of the genomic studies. These include but not limited to identi cation of di erentially expressed (DE) genes in two or more treatment conditions and detection of novel transcripts. One of the common assumptions of RNA-Seq data is that, all gene counts follow an overdispersed Poisson or negative binomial (NB) distribution which is sometimes misleading because some genes may have stable transcription levels with no overdispersion. Thus, a more realistic assumption in RNA-Seq data is to consider two sets of genes: overdispersed and non-overdispersed. Method: We propose a new two step integrated approach to detect di erentially expressed (DE) genes in RNA-Seq data using standard Poisson model for non-overdispersed genes and NB model for overdispersed genes. This is an integrated approach because this method can be combined with any other NB based methods for detecting DE genes. Results: We evaluate the proposed approach using two simulated and two real RNA-Seq data sets. We compare the performance of our proposed method combined with the four popular R-software packages edgeR, DESeq, sSeq, and NBPSeq with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these R-packages. / Access to thesis permanently restricted to Ball State community only. Gene expression
2	Computational Methods for Inferring Mechanisms of Biological Heterogeneity in Single-Cell Data Persad, Sitara Camini January 2024 (has links) Single-cell sequencing techniques, such as single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq), have revolutionized our understanding of cellular diversity and function. Genetic and epigenetic factors influence phenotypic heterogeneity in ways that are just beginning to be understood. In this work, we develop methods for inferring mechanisms of biological heterogeneity in single-cell data, with particular applications to cancer biology. First, we develop a kernel archetype analysis method for overcoming noise and sparsity in single-cell data by aggregating single cells into high-resolution cell states. We show that the proposed approach captures robust and biologically meaningful cell states and enables the inference of epigenetic regulation of phenotypic heterogeneity. In the second part of this thesis, we develop methods for linking genotypic and phenotypic information, first by using aggregated single-cell RNA sequencing and a hidden Markov model to infer copy number variation. We demonstrate that aggregation improves copy number inference over existing approaches. We then integrate DNA sequencing with single-cell RNA sequencing to infer copy number profiles in a rapid autopsy of a patient with metastatic pancreatic cancer. We develop a scalable algorithm for inferring phylogenetic relationships between cells from noisy copy number profiles. We show that our approach more accurately recovers phylogenetic relationships between cells and apply it to understand the relationship between genotype and phenotype in metastatic cancer. Finally, we develop a metric for quantifying the extent to which genotype determines phenotype in lineage tracing data. We show that it more accurately quantifies phenotypic plasticity compared to existing approaches. Altogether, these methods can be used to help uncover the mechanisms underlying phenotypic heterogeneity in biological systems. Computer science Biometry Biology Nucleotide sequence--Computer programs Nucleotide sequence--Mathematical models RNA Phenotype Pancreas--Cancer Cancer--Research
3	Graphical representation of biological sequences and its applications. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2010 (has links) Among all existing alignment-free methods for comparing biological sequences, the sequence graphical representation provides a simple approach to view, sort, and compare gene structures. The aim of graphical representation is to display DNA or protein sequences graphically so that we can easily find out visually how similar or how different they are. Of course, only the visual comparison of sequences is not enough for the follow-up research work. We need more accurate comparison. This leads us to develop the application of the graphical representation for biological sequences. / In this thesis, we have two main contributions: (1) We construct a protein map with the help of our proposed new graphical representation for protein sequences. Each protein sequence can be represented as a point in this map, and cluster analysis of proteins can be performed for comparison between the points. This protein map can be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence. (2) We construct a novel genome space with biological geometry, which is a subspace in RN . In this space each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships. / Yu, Chenglong. / Adviser: Luk Hing Sun. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 59-64). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Amino acid sequence--Mathematical models Computational biology Nucleotide sequence--Mathematical models Base Sequence Computational Biology Mathematics Sequence Alignment Sequence Analysis, DNA Sequence Analysis, Protein

1

Page generated in 0.124 seconds