Global ETD Search

1	Model based approaches to array CGH data analysis Shah, Sohrab P. 05 1900 (has links) DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups. Array CGH HMM DNA copy number
2	Model based approaches to array CGH data analysis Shah, Sohrab P. 05 1900 (has links) DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups. Array CGH HMM DNA copy number
3	Model based approaches to array CGH data analysis Shah, Sohrab P. 05 1900 (has links) DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups. / Science, Faculty of / Computer Science, Department of / Graduate Array CGH HMM DNA copy number
4	Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping Arrays Kowgier, Matthew 31 August 2012 (has links) DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays. A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead to biologically implausible results. Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally, the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data. DNA copy number Bayesian HMM SNP array 0308 0715
5	Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping Arrays Kowgier, Matthew 31 August 2012 (has links) DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays. A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead to biologically implausible results. Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally, the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data. DNA copy number Bayesian HMM SNP array 0308 0715
6	Copy number variations in hepatocellular carcinoma / CUHK electronic theses & dissertations collection January 2016 (has links) Chan, Ho Ching. / Thesis M.Phil. Chinese University of Hong Kong 2016. / Includes bibliographical references (leaves 159-166). / Abstracts also in Chinese. / Title from PDF title page (viewed on 15, September, 2016). Liver--Cancer--Genetic aspects Carcinoma, Hepatocellular--genetics DNA Copy Number Variations
7	Mitochondrial DNA Copy Number, Insulinemic Potential of Lifestyle, and Colorectal Cancer Yang, Keming 03 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Because colorectal cancer (CRC) is the fourth most common cancer and the second leading cause of cancer death in the US, identifying biomarkers that might inform disease prevention and early diagnosis is of great public health importance. Mitochondria are key cytoplasmic organelles containing an independent genome, i.e., mitochondrial DNA (mtDNA). It has been increasingly recognized that mtDNA copy number (mtDNAcn) is a biomarker for mitochondrial function and cellular oxidative stress. To date, the few studies that have assessed associations between mtDNAcn and CRC outcomes have yielded inconsistent findings. Further, no epidemiologic study has examined the relationship between insulinemic potential of lifestyle and mtDNAcn. Therefore, in this dissertation, three studies were conducted using data from the Nurses’ Health Study and the Health Professionals Follow-Up Study. First, the association between pre-diagnostic leukocyte mtDNAcn and CRC risk was studied in a nested casecontrol study (324 cases/658 controls). Lower mtDNAcn was significantly associated with increased risk of CRC and proximal colon cancer. That inverse association remained significant among individuals with ≥ 8 years’ follow-up since blood collection, suggesting that mtDNAcn might serve as a long-term predictor of CRC risk. Second, possible associations of pre-diagnostic mtDNAcn with overall and CRC-specific survival were examined among 587 CRC patients. MtDNAcn was not significantly associated with survival overall or in subgroups by cancer location, grade, or stage. Among current smokers, there was an inverse association between one standard deviation (SD) decrease in mtDNAcn and increased overall death risk. Among patients diagnosed at or before 70.5 years of age and those with anti-inflammatory diets, reduced mtDNAcn was associated with lower CRC-specific death risk. Lastly, the cross-sectional association between empirical lifestyle index for hyperinsulinemia (ELIH) and mtDNAcn was investigated among 2,835 subjects without major chronic diseases (cancers, diabetes, and cardiovascular diseases). A significant inverse association was found: least-squares means ± SD of mtDNAcn z-score decreased dramatically across ELIH quintiles. Overall, the findings from this dissertation will contribute to the evaluation of mtDNAcn as a potential biomarker for CRC risk and prognosis, and inform future interventions designed to reduce the insulinemic potential of lifestyle factors to preserve mitochondrial function. / 2022-04-06 carcinogenesis colorectal cancer insulin sensitivity lifestyle mitochondrial DNA mitochondrial DNA copy number
8	Distinguishing Melanocytic Nevi From Melanoma by DNA Copy Number Changes: Array-Comparative Genomic Hybridization As a Research Tool Mahas, Ahmed Ibrahim 07 August 2015 (has links) No description available. Molecular Biology Genetics Oncology melanoma benign nevi DNA copy number changes SNP microarrays GISTIC Genepattern
9	Modeling and Characterization of Dynamic Changes in Biological Systems from Multi-platform Genomic Data Zhang, Bai 30 September 2011 (has links) Biological systems constantly evolve and adapt in response to changed environment and external stimuli at the molecular and genomic levels. Building statistical models that characterize such dynamic changes in biological systems is one of the key objectives in bioinformatics and computational biology. Recent advances in high-throughput genomic and molecular profiling technologies such as gene expression and and copy number microarrays provide ample opportunities to study cellular activities at the individual gene and network levels. The aim of this dissertation is to formulate mathematically dynamic changes in biological networks and DNA copy numbers, to develop machine learning algorithms to learn these statistical models from high-throughput biological data, and to demonstrate their applications in systems biological studies. The first part (Chapters 2-4) of the dissertation focuses on the dynamic changes taking placing at the biological network level. Biological networks are context-specific and dynamic in nature. Under different conditions, different regulatory components and mechanisms are activated and the topology of the underlying gene regulatory network changes. We report a differential dependency network (DDN) analysis to detect statistically significant topological changes in the transcriptional networks between two biological conditions. Further, we formalize and extend the DDN approach to an effective learning strategy to extract structural changes in graphical models using l1-regularization based convex optimization. We discuss the key properties of this formulation and introduce an efficient implementation by the block coordinate descent algorithm. Another type of dynamic changes in biological networks is the observation that a group of genes involved in certain biological functions or processes coordinate to response to outside stimuli, producing distinct time course patterns. We apply the echo stat network, a new architecture of recurrent neural networks, to model temporal gene expression patterns and analyze the theoretical properties of echo state networks with random matrix theory. The second part (Chapter 5) of the dissertation focuses on the changes at the DNA copy number level, especially in cancer cells. Somatic DNA copy number alterations (CNAs) are key genetic events in the development and progression of human cancers, and frequently contribute to tumorigenesis. We propose a statistically-principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately detect genomic deletion type, estimate normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. / Ph. D. differential dependency networks biological networks echo state networks DNA copy number changes structural changes in graphical models
10	Detection and Characterization of Multilevel Genomic Patterns Feng, Yuanjian 28 June 2010 (has links) DNA microarray has become a powerful tool in genetics, molecular biology, and biomedical research. DNA microarray can be used for measuring the genotypes, structural changes, and gene expressions of human genomes. Detection and characterization of multilevel, high-throughput microarray genomic data pose new challenges to statistical pattern recognition and machine learning research. In this dissertation, we propose novel computational methods for analyzing DNA copy number changes and learning the trees of phenotypes using DNA microarray data. DNA copy number change is an important form of structural variations in human genomes. The copy number signals measured by high-density DNA microarrays usually have low signal-to-noise ratios and complex patterns due to inhomogeneous composition of tissue samples. We propose a robust detection method for extracting copy number changes in a single signal profile and consensus copy number changes in the signal profiles of a population. We adapt a solution-path algorithm to efficiently solve the optimization problems associated with the proposed method. We tested the proposed method on both simulation and real CGH and SNP microarray datasets, and observed competitively improved performance as compared to several widely-adopted copy number change detection methods. We also propose a chromosome instability measure to summarize the extracted copy number changes for assessing chromosomal instabilities of tumor genomes. The proposed measure demonstrates distinct patterns between different subtypes of ovarian serous carcinomas and normal samples. Among active research on complex human diseases using genomic data, little effort and progress have been made in discovering the relational structural information embedded in the molecular data. We propose two stability analysis based methods to learn stable and highly resolved trees of phenotypes using microarray gene expression data of heterogeneous diseases. In the first method, we use a hierarchical, divisive visualization approach to explore the tree of phenotypes and a leave-one-out cross validation to select stable tree structures. In the second method, we propose a node bandwidth constraint to construct stable trees that can balance the descriptive power and reproducibility of tree structures. Using a top-down merging procedure, we modify the binary tree structures learned by hierarchical group clustering methods to achieve a given node bandwidth. We use a bootstrap based stability analysis to select stable tree structures under different node bandwidth constraints. The experimental results on two microarray gene expression datasets of human diseases show that the proposed methods can discover stable trees of phenotypes that reveal the relationships between multiple diseases with biological plausibility. / Ph. D. Gene Expressions DNA Copy Number Changes Stability Analysis Regression Analysis Tree of Phenotypes

Search results