1 |
Contributions to Gene Set Analysis of Correlated, Paired-Sample Transcriptome Data to Enable Precision MedicineSchissler, Alfred Grant, Schissler, Alfred Grant January 2017 (has links)
This dissertation serves as a unifying document for three related articles developed during my dissertation research. The projects involve the development of single-subject transcriptome (i.e. gene expression data) methodology for precision medicine and related applications. Traditional statistical approaches are largely unavailable in this setting due to prohibitive sample size and lack of independent replication. This leads one to rely on informatic devices including knowledgebase integration (e.g., gene set annotations) and external data sources (e.g., estimation of inter-gene correlation). Common statistical themes include multivariate statistics (such as Mahalanobis distance and copulas) and large-scale significance testing. Briefly, the first work describes the development of clinically relevant single-subject metrics of gene set (pathway) differential expression, N-of-1-pathways Mahalanobis distance (MD) scores. Next, the second article describes a method which overcomes a major shortcoming of the MD framework by accounting for inter-gene correlation. Lastly, the statistics developed in the previous works are re-purposed to analyze single-cell RNA-sequencing data derived from rare cells. Importantly, these works represent an interdisciplinary effort and show that creative solutions for pressing issues become possible at the intersection of statistics, biology, medicine, and computer science.
|
2 |
Gene-pair based statistical methods for testing gene set enrichment in microarray gene expression studiesZhao, Kaiqiong 16 September 2016 (has links)
Gene set enrichment analysis aims to discover sets of genes, such as biological pathways or protein complexes, which may show moderate but coordinated differentiation across experimental conditions. The existing gene set enrichment approaches utilize single gene statistic as a measure of differentiation for individual genes.
These approaches do not utilize any inter-gene correlations, but it has been known that genes in a pathway often interact with each other.
Motivated by the need for taking gene dependence into account, we propose a novel gene set enrichment algorithm, where the gene-gene correlation is addressed via a gene-pair representation strategy. Relying on an appropriately defined gene pair statistic, the gene set statistic is formulated using a competitive null hypothesis.
Extensive simulation studies show that our proposed approach can correctly control the type I error (false positive rate), and retain good statistical power for detecting true differential expression. The new method is also applied to analyze several gene expression datasets. / October 2016
|
3 |
Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity / k-最近傍法に基づく類似性尺度による、遺伝子セットの多重ネットワーク解析Zheng, Cheng 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(医学) / 甲第25192号 / 医博第5078号 / 新制||医||1072(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 村川 泰裕, 教授 斎藤 通紀, 教授 李 聖林 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
|
4 |
Comparing Performance of Gene Set Test Methods Using Biologically Relevant Simulated DataLambert, Richard M. 01 December 2018 (has links)
Today we know that there are many genetically driven diseases and health conditions.These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic causes of that particular cancer. Understanding these genetic causes could potentially lead to improved treatment options.
Initially, gene expression was tested on a per gene level for statistical difference. In more recent years, it has been determined that grouping genes together by biological processes into gene sets and comparing groups at the gene set level probably makes more sense biologically. A number of gene set test methods have since been developed. It is critically important that we know if these gene set test methods are accurate.
In this research, we compare the accuracy of a group of popular gene set test methods across a range of biologically realistic scenarios. In order to measure accuracy, we need to know whether each gene set is differentially expressed or not. Since this is not possible in real gene expression data, we use simulated data. We develop a simulation framework that generates gene expression data that is representative of actual gene expression data and use it to test each gene set method over a range of biologically relevant scenarios. We then compare the power and false discovery rate of each method across these scenarios.
|
5 |
Knowledge Based Gene Set analysis (KB-GSA) : A novel method for gene expression analysisJadhav, Trishul January 2010 (has links)
Microarray technology allows measurement of the expression levels of thousand of genes simultaneously. Several gene set analysis (GSA) methods are widely used for extracting useful information from microarrays, for example identifying differentially expressed pathways associated with a particular biological process or disease phenotype. Though GSA methods like Gene Set Enrichment Analysis (GSEA) are widely used for pathway analysis, these methods are solely based on statistics. Such methods can be awkward to use if knowledge of specific pathways involved in particular biological processes are the aim of the study. Here we present a novel method (Knowledge Based Gene Set Analysis: KB-GSA) which integrates knowledge about user-selected pathways that are known to be involved in specific biological processes. The method generates an easy to understand graphical visualization of the changes in expression of the genes, complemented with some common statistics about the pathway of particular interest.
|
6 |
Knowledge Based Gene Set analysis (KB-GSA) : A novel method for gene expression analysisJadhav, Trishul January 2010 (has links)
<p>Microarray technology allows measurement of the expression levels of thousand of genes simultaneously. Several gene set analysis (GSA) methods are widely used for extracting useful information from microarrays, for example identifying differentially expressed pathways associated with a particular biological process or disease phenotype. Though GSA methods like Gene Set Enrichment Analysis (GSEA) are widely used for pathway analysis, these methods are solely based on statistics. Such methods can be awkward to use if knowledge of specific pathways involved in particular biological processes are the aim of the study. Here we present a novel method <strong><em>(Knowledge Based Gene Set Analysis: KB-GSA</em></strong>) which integrates knowledge about user-selected pathways that are known to be involved in specific biological processes. The method generates an easy to understand graphical visualization of the changes in expression of the genes, complemented with some common statistics about the pathway of particular interest.</p>
|
7 |
Identificação de cascatas gênicas com base na modulação transcricional de células sanguíneas mononucleares periféricas de pacientes com diabetes mellitus do tipo 1 / Identification of gene cascades based on the transcriptional modulation of peripheral blood mononuclear cells from type 1 diabetes mellitus patients.Arns, Thais Cristine 15 March 2013 (has links)
O diabetes mellitus do tipo 1 (DM1) é uma doença autoimune crônica, durante a qual as células beta pancreáticas, responsáveis pela secreção de insulina, são seletivamente destruídas. O desenvolvimento desta doença é uma consequência da predisposição genética combinada a fatores ambientais largamente desconhecidos e eventos estocásticos. Neste trabalho foi proposta a comparação da expressão gênica transcricional em grande escala (transcriptoma) entre amostras de pacientes de DM1 e controles, obtidas a partir de células mononucleares do sangue periférico (PBMCs). As alterações resultantes na expressão gênica causada pela doença podem ser amostradas em PBMCs, uma vez que as células imunes efetoras estão presumivelmente em equilíbrio com a população celular circulante. A fim de identificar alterações na expressão gênica, foram utilizados métodos analíticos como a tecnologia de microarrays e o cálculo do coeficiente de correlação de Pearson, sendo possível observar aumento ou diminuição na expressão gênica e também a magnitude desta mudança. Além disso, foi realizada análise de grupos gênicos (gene sets ou GSA), método baseado na significância de conjuntos gênicos pré-definidos, ao invés de genes individuais. Este procedimento é mais adequado para análise de uma doença poligênica, tal como o DM1. A análise de GSA possibilitou a seleção de genes envolvidos, por exemplo, nas seguintes vias: cascata de I-kappaB kinase/NF-kappaB, regulação da via de sinalização do receptor de TGF-ß, regulação da cascata de JAK-STAT e via de sinalização mediada por citocinas e quimiocinas, das quais podem ser identificados marcadores transcricionais. A análise imparcial do transcriptoma de PBMCs permitiu a identificação de gene sets e genes associados ao DM1, seu perfil de expressão preferencial em tipos celulares do sistema imune e seus padrões de modulação. / Type 1 diabetes mellitus (T1DM) is a chronic autoimmune disease, in which the pancreatic beta cells responsible for secretion of insulin are selectively destroyed. The development of this disease is a result of genetic predisposition combined with largely unknown environmental factors and stochastic events. In this work it was proposed to compare the large scale transcriptional gene expression (transcriptome) between samples obtained from T1DM patients and healthy controls, obtained from peripheral blood mononuclear cells (PBMCs). The resulting changes in gene expression caused by the disease can be sampled in PBMCs, as immune effector cells are presumably in equilibrium with the circulating cell population. In order to identify changes in gene expression, we used analytical methods such as microarray technology and calculating the Pearson correlation coefficient, where it was possible to observe increases or decreases in gene expression and also the magnitude of change. Furthermore, we performed a gene set analysis (GSA) method based on the significance of predefined gene sets instead of individual genes. This procedure is more suitable for analyzing a polygenic disease such as T1DM. GSA analysis enabled the selection of genes involved for example, in the following pathways: I-kappaB kinase/NF-kappaB cascade, regulation of TGF-ß receptor signaling pathway, regulation of JAK-STAT cascade and cytokine and chemokine mediated signaling pathway, from which transcriptional markers can be identified. An unbiased transcriptome analysis of PBMCs allowed the identification of gene sets and genes associated with T1DM, its preferential expression profile in cell types of the immune system and its modulation patterns.
|
8 |
Identificação de cascatas gênicas com base na modulação transcricional de células sanguíneas mononucleares periféricas de pacientes com diabetes mellitus do tipo 1 / Identification of gene cascades based on the transcriptional modulation of peripheral blood mononuclear cells from type 1 diabetes mellitus patients.Thais Cristine Arns 15 March 2013 (has links)
O diabetes mellitus do tipo 1 (DM1) é uma doença autoimune crônica, durante a qual as células beta pancreáticas, responsáveis pela secreção de insulina, são seletivamente destruídas. O desenvolvimento desta doença é uma consequência da predisposição genética combinada a fatores ambientais largamente desconhecidos e eventos estocásticos. Neste trabalho foi proposta a comparação da expressão gênica transcricional em grande escala (transcriptoma) entre amostras de pacientes de DM1 e controles, obtidas a partir de células mononucleares do sangue periférico (PBMCs). As alterações resultantes na expressão gênica causada pela doença podem ser amostradas em PBMCs, uma vez que as células imunes efetoras estão presumivelmente em equilíbrio com a população celular circulante. A fim de identificar alterações na expressão gênica, foram utilizados métodos analíticos como a tecnologia de microarrays e o cálculo do coeficiente de correlação de Pearson, sendo possível observar aumento ou diminuição na expressão gênica e também a magnitude desta mudança. Além disso, foi realizada análise de grupos gênicos (gene sets ou GSA), método baseado na significância de conjuntos gênicos pré-definidos, ao invés de genes individuais. Este procedimento é mais adequado para análise de uma doença poligênica, tal como o DM1. A análise de GSA possibilitou a seleção de genes envolvidos, por exemplo, nas seguintes vias: cascata de I-kappaB kinase/NF-kappaB, regulação da via de sinalização do receptor de TGF-ß, regulação da cascata de JAK-STAT e via de sinalização mediada por citocinas e quimiocinas, das quais podem ser identificados marcadores transcricionais. A análise imparcial do transcriptoma de PBMCs permitiu a identificação de gene sets e genes associados ao DM1, seu perfil de expressão preferencial em tipos celulares do sistema imune e seus padrões de modulação. / Type 1 diabetes mellitus (T1DM) is a chronic autoimmune disease, in which the pancreatic beta cells responsible for secretion of insulin are selectively destroyed. The development of this disease is a result of genetic predisposition combined with largely unknown environmental factors and stochastic events. In this work it was proposed to compare the large scale transcriptional gene expression (transcriptome) between samples obtained from T1DM patients and healthy controls, obtained from peripheral blood mononuclear cells (PBMCs). The resulting changes in gene expression caused by the disease can be sampled in PBMCs, as immune effector cells are presumably in equilibrium with the circulating cell population. In order to identify changes in gene expression, we used analytical methods such as microarray technology and calculating the Pearson correlation coefficient, where it was possible to observe increases or decreases in gene expression and also the magnitude of change. Furthermore, we performed a gene set analysis (GSA) method based on the significance of predefined gene sets instead of individual genes. This procedure is more suitable for analyzing a polygenic disease such as T1DM. GSA analysis enabled the selection of genes involved for example, in the following pathways: I-kappaB kinase/NF-kappaB cascade, regulation of TGF-ß receptor signaling pathway, regulation of JAK-STAT cascade and cytokine and chemokine mediated signaling pathway, from which transcriptional markers can be identified. An unbiased transcriptome analysis of PBMCs allowed the identification of gene sets and genes associated with T1DM, its preferential expression profile in cell types of the immune system and its modulation patterns.
|
9 |
Bayesian pathway analysis in epigeneticsWright, Alan January 2013 (has links)
A typical gene expression data set consists of measurements of a large number of gene expressions, on a relatively small number of subjects, classified according to two or more outcomes, for example cancer or non-cancer. The identification of associations between gene expressions and outcome is a huge multiple testing problem. Early approaches to this problem involved the application of thousands of univariate tests with corrections for multiplicity. Over the past decade, numerous studies have demonstrated that analyzing gene expression data structured into predefined gene sets can produce benefits in terms of statistical power and robustness when compared to alternative approaches. This thesis presents the results of research on gene set analysis. In particular, it examines the properties of some existing methods for the analysis of gene sets. It introduces novel Bayesian methods for gene set analysis. A distinguishing feature of these methods is that the model is specified conditionally on the expression data, whereas other methods of gene set analysis and IGA generally make inferences conditionally on the outcome. Computer simulation is used to compare three common established methods for gene set analysis. In this simulation study a new procedure for the simulation of gene expression data is introduced. The simulation studies are used to identify situations in which the established methods perform poorly. The Bayesian approaches developed in this thesis apply reversible jump Markov chain Monte Carlo (RJMCMC) techniques to model gene expression effects on phenotype. The reversible jump step in the modelling procedure allows for posterior probabilities for activeness of gene set to be produced. These mixture models reverse the generally accepted conditionality and model outcome given gene expression, which is a more intuitive assumption when modelling the pathway to phenotype. It is demonstrated that the two models proposed may be superior to the established methods studied. There is considerable scope for further development of this line of research, which is appealing in terms of the use of mixture model priors that reflect the belief that a relatively small number of genes, restricted to a small number of gene sets, are associated with the outcome.
|
10 |
Network-based strategies for discovering functional associations of uncharacterized genes and gene setsWang, Peggy I. 12 November 2013 (has links)
High-throughput technology is changing the face of research biology, generating an ever growing amount of large-scale data sets. With experiments utilizing next-generation gene sequencing, mass spectrometry, and various other global surveys of proteins, the task of translating the plethora of data into biology has become a daunting task. In response, functional networks have been developed as a means for integrating the data into models of proteomic organization. In these networks, proteins are linked if they are evidenced to operate together in the same function, facilitating predictions about the functions, phenotypes, and disease associations of uncharacterized genes. In this body of work, we explore different applications of this so-called "guilt-by-association" concept to predict loss-of-function phenotypes and diseases associated with genes in yeast, worm, and human. We also scrutinize certain limitations associated with the functional networks, predictive methods, and measures of performance used in our studies. Importantly, the predictive method and performance measure, if not chosen appropriately for the biological objective at hand, can largely distort the results and interpretation of a study. These findings are incorporated in the development of RIDDLE, a method for characterizing whole sets of genes. This machine learning-based method provides a measure of network distance, and thus functional association, between two sets of genes. RIDDLE may be applied to a wide range of potential applications, as we demonstrate with several biological examples, including linking microRNA-450a to ocular development and disease. In the last decade, functional networks have proven to be a useful strategy for interpreting large-scale proteomic and genomic data sets. With the continued growth of genome coverage in networks and the innovation of predictive methods, we will surely advance towards our ultimate goal of understanding the genetic changes that underlie disease. / text
|
Page generated in 0.0639 seconds