Spelling suggestions: "subject:"gene tet enrichment 2analysis"" "subject:"gene tet enrichment 3analysis""
1 |
Gene-pair based statistical methods for testing gene set enrichment in microarray gene expression studiesZhao, Kaiqiong 16 September 2016 (has links)
Gene set enrichment analysis aims to discover sets of genes, such as biological pathways or protein complexes, which may show moderate but coordinated differentiation across experimental conditions. The existing gene set enrichment approaches utilize single gene statistic as a measure of differentiation for individual genes.
These approaches do not utilize any inter-gene correlations, but it has been known that genes in a pathway often interact with each other.
Motivated by the need for taking gene dependence into account, we propose a novel gene set enrichment algorithm, where the gene-gene correlation is addressed via a gene-pair representation strategy. Relying on an appropriately defined gene pair statistic, the gene set statistic is formulated using a competitive null hypothesis.
Extensive simulation studies show that our proposed approach can correctly control the type I error (false positive rate), and retain good statistical power for detecting true differential expression. The new method is also applied to analyze several gene expression datasets. / October 2016
|
2 |
Analyzing Gene Expression Data in Terms of Gene Sets: Gene Set Enrichment AnalysisLi, Wei 01 December 2009 (has links)
The DNA microarray biotechnology simultaneously monitors the expression of thousands of genes and aims to identify genes that are differently expressed under different conditions. From the statistical point of view, it can be restated as identify genes strongly associated with the response or covariant of interest. The Gene Set Enrichment Analysis (GSEA) method is one method which focuses the analysis at the functional related gene sets level instead of single genes. It helps biologists to interpret the DNA microarray data by their previous biological knowledge of the genes in a gene set. GSEA has been shown to efficiently identify gene sets containing known disease-related genes in the real experiments. Here we want to evaluate the statistical power of this method by simulation studies. The results show that the the power of GSEA is good enough to identify the gene sets highly associated with the response or covariant of interest.
|
3 |
Multi-omics Data Integration for Identifying Disease Specific Biological PathwaysLu, Yingzhou 05 June 2018 (has links)
Pathway analysis is an important task for gaining novel insights into the molecular architecture of many complex diseases. With the advancement of new sequencing technologies, a large amount of quantitative gene expression data have been continuously acquired. The springing up omics data sets such as proteomics has facilitated the investigation on disease relevant pathways.
Although much work has previously been done to explore the single omics data, little work has been reported using multi-omics data integration, mainly due to methodological and technological limitations. While a single omic data can provide useful information about the underlying biological processes, multi-omics data integration would be much more comprehensive about the cause-effect processes responsible for diseases and their subtypes.
This project investigates the combination of miRNAseq, proteomics, and RNAseq data on seven types of muscular dystrophies and control group. These unique multi-omics data sets provide us with the opportunity to identify disease-specific and most relevant biological pathways. We first perform t-test and OVEPUG test separately to define the differential expressed genes in protein and mRNA data sets. In multi-omics data sets, miRNA also plays a significant role in muscle development by regulating their target genes in mRNA dataset. To exploit the relationship between miRNA and gene expression, we consult with the commonly used gene library - Targetscan to collect all paired miRNA-mRNA and miRNA-protein co-expression pairs. Next, by conducting statistical analysis such as Pearson's correlation coefficient or t-test, we measured the biologically expected correlation of each gene with its upstream miRNAs and identify those showing negative correlation between the aforementioned miRNA-mRNA and miRNA-protein pairs. Furthermore, we identify and assess the most relevant disease-specific pathways by inputting the differential expressed genes and negative correlated genes into the gene-set libraries respectively, and further characterize these prioritized marker subsets using IPA (Ingenuity Pathway Analysis) or KEGG. We will then use Fisher method to combine all these p-values derived from separate gene sets into a joint significance test assessing common pathway relevance. In conclusion, we will find all negative correlated paired miRNA-mRNA and miRNA-protein, and identifying several pathophysiological pathways related to muscular dystrophies by gene set enrichment analysis.
This novel multi-omics data integration study and subsequent pathway identification will shed new light on pathophysiological processes in muscular dystrophies and improve our understanding on the molecular pathophysiology of muscle disorders, preventing and treating disease, and make people become healthier in the long term. / Master of Science / Identification of biological pathways play a central role in understanding both human health and diseases. A biological pathway is a series of information processing steps via interactions among molecules in a cell that partially determines the phenotype of a cell. Specifically, identifying disease-specific pathway will guide focused studies on complex diseases, thus potentially improve the prevention and treatment of diseases.
To identify disease-specific pathways, it is crucial to develop computational methods and statistical tests that can integrate multi-omics (multiple omes such as genome, proteome, etc) data. Compared to single omics data, multi-omics data will help gaining a more comprehensive understanding on the molecular architecture of disease processes.
In this thesis, we propose a novel data analytics pipeline for multi-omics data integration. We test and apply our method on/to the real proteomics data sets on muscular dystrophy subtypes, and identify several biologically plausible pathways related to muscular dystrophies.
|
4 |
TESTING FOR DIFFERENTIALLY EXPRESSED GENES AND KEY BIOLOGICAL CATEGORIES IN DNA MICROARRAY ANALYSISSARTOR, MAUREEN A. January 2007 (has links)
No description available.
|
5 |
Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity / k-最近傍法に基づく類似性尺度による、遺伝子セットの多重ネットワーク解析Zheng, Cheng 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(医学) / 甲第25192号 / 医博第5078号 / 新制||医||1072(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 村川 泰裕, 教授 斎藤 通紀, 教授 李 聖林 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
|
6 |
Pathway-centric approaches to the analysis of high-throughput genomics dataHänzelmann, Sonja, 1981- 11 October 2012 (has links)
In the last decade, molecular biology has expanded from a reductionist view to a systems-wide view that tries to unravel the complex interactions of cellular components. Owing to the emergence of high-throughput technology it is now possible to interrogate entire genomes at an unprecedented resolution. The dimension and unstructured nature of these data made it evident that new methodologies and tools are needed to turn data into biological knowledge. To contribute to this challenge we exploited the wealth of publicly available high-throughput genomics data and developed bioinformatics methodologies focused on extracting information at the pathway rather than the single gene level. First, we developed Gene Set Variation Analysis (GSVA), a method that facilitates the organization and condensation of gene expression profiles into gene sets. GSVA enables pathway-centric downstream analyses of microarray and RNA-seq gene expression data. The method estimates sample-wise pathway variation over a population and allows for the integration of heterogeneous biological data sources with pathway-level expression measurements. To illustrate the features of GSVA, we applied it to several use-cases employing different data types and addressing biological questions. GSVA is made available as an R package within the Bioconductor project.
Secondly, we developed a pathway-centric genome-based strategy to reposition drugs in type 2 diabetes (T2D). This strategy consists of two steps, first a regulatory network is constructed that is used to identify disease driving modules and then these modules are searched for compounds that might target them. Our strategy is motivated by the observation that disease genes tend to group together in the same neighborhood forming disease modules and that multiple genes might have to be targeted simultaneously to attain an effect on the pathophenotype. To find potential compounds, we used compound exposed genomics data deposited in public databases. We collected about 20,000 samples that have been exposed to about 1,800 compounds. Gene expression can be seen as an intermediate phenotype reflecting underlying dysregulatory pathways in a disease. Hence, genes contained in the disease modules that elicit similar transcriptional responses upon compound exposure are assumed to have a potential therapeutic effect. We applied the strategy to gene expression data of human islets from diabetic and healthy individuals and identified four potential compounds, methimazole, pantoprazole, bitter orange extract and torcetrapib that might have a positive effect on insulin secretion. This is the first time a regulatory network of human islets has been used to reposition compounds for T2D.
In conclusion, this thesis contributes with two pathway-centric approaches to important bioinformatic problems, such as the assessment of biological function and in silico drug repositioning. These contributions demonstrate the central role of pathway-based analyses in interpreting high-throughput genomics data. / En l'última dècada, la biologia molecular ha evolucionat des d'una perspectiva reduccionista cap a una perspectiva a nivell de sistemes que intenta desxifrar les complexes interaccions entre els components cel•lulars. Amb l'aparició de les tecnologies d'alt rendiment actualment és possible interrogar genomes sencers amb una resolució sense precedents. La dimensió i la naturalesa desestructurada d'aquestes dades ha posat de manifest la necessitat de desenvolupar noves eines i metodologies per a convertir aquestes dades en coneixement biològic. Per contribuir a aquest repte hem explotat l'abundància de dades genòmiques procedents d'instruments d'alt rendiment i disponibles públicament, i hem desenvolupat mètodes bioinformàtics focalitzats en l'extracció d'informació a nivell de via molecular en comptes de fer-ho al nivell individual de cada gen. En primer lloc, hem desenvolupat GSVA (Gene Set Variation Analysis), un mètode que facilita l'organització i la condensació de perfils d'expressió dels gens en conjunts. GSVA possibilita anàlisis posteriors en termes de vies moleculars amb dades d'expressió gènica provinents de microarrays i RNA-seq. Aquest mètode estima la variació de les vies moleculars a través d'una població de mostres i permet la integració de fonts heterogènies de dades biològiques amb mesures d'expressió a nivell de via molecular. Per il•lustrar les característiques de GSVA, l'hem aplicat a diversos casos usant diferents tipus de dades i adreçant qüestions biològiques. GSVA està disponible com a paquet de programari lliure per R dins el projecte Bioconductor.
En segon lloc, hem desenvolupat una estratègia centrada en vies moleculars basada en el
genoma per reposicionar fàrmacs per la diabetis tipus 2 (T2D). Aquesta estratègia consisteix
en dues fases: primer es construeix una xarxa reguladora que s'utilitza per identificar mòduls
de regulació gènica que condueixen a la malaltia; després, a partir d'aquests mòduls es busquen compostos que els podrien afectar. La nostra estratègia ve motivada per l'observació que els gens que provoquen una malaltia tendeixen a agrupar-se, formant mòduls patogènics, i pel fet que podria caldre una actuació simultània sobre múltiples gens per assolir un efecte en el fenotipus de la malaltia. Per trobar compostos potencials, hem usat dades genòmiques exposades a compostos dipositades en bases de dades públiques. Hem recollit unes 20.000 mostres que han estat exposades a uns 1.800 compostos. L'expressió gènica es pot interpretar com un fenotip intermedi que reflecteix les vies moleculars desregulades subjacents a una malaltia. Per tant, considerem que els gens d'un mòdul patològic que responen, a nivell transcripcional, d'una manera similar a l'exposició del medicament tenen potencialment un efecte terapèutic. Hem aplicat aquesta estratègia a dades d'expressió gènica en illots pancreàtics humans corresponents a individus sans i diabètics, i hem identificat quatre compostos potencials (methimazole, pantoprazole, extracte de taronja amarga i torcetrapib) que podrien tenir un efecte positiu sobre la secreció de la insulina. Aquest és el primer cop que una xarxa reguladora d'illots pancreàtics humans s'ha utilitzat per reposicionar compostos per a T2D.
En conclusió, aquesta tesi aporta dos enfocaments diferents en termes de vies moleculars
a problemes bioinformàtics importants, com ho son el contrast de la funció biològica i el
reposicionament de fàrmacs "in silico". Aquestes contribucions demostren el paper central
de les anàlisis basades en vies moleculars a l'hora d'interpretar dades genòmiques procedents
d'instruments d'alt rendiment.
|
Page generated in 0.1181 seconds