Spelling suggestions: "subject:"1expression data"" "subject:"1expression mata""
21 |
Learning gene interactions from gene expression data dynamic Bayesian networksSigursteinsdottir, Gudrun January 2004 (has links)
Microarray experiments generate vast amounts of data that evidently reflect many aspects of the underlying biological processes. A major challenge in computational biology is to extract, from such data, significant information and knowledge about the complex interplay between genes/proteins. An analytical approach that has recently gained much interest is reverse engineering of genetic networks. This is a very challenging approach, primarily due to the dimensionality of the gene expression data (many genes, few time points) and the potentially low information content of the data. Bayesian networks (BNs) and its extension, dynamic Bayesian networks (DBNs) are statistical machine learning approaches that have become popular for reverse engineering. In the present study, a DBN learning algorithm was applied to gene expression data produced from experiments that aimed to study the etiology of necrotizing enterocolitis (NEC), a gastrointestinal inflammatory (GI) disease that is the most common GI emergency in neonates. The data sets were particularly challenging for the DBN learning algorithm in that they contain gene expression measurements for relatively few time points, between which the sampling intervals are long. The aim of this study was, therefore, to evaluate the applicability of DBNs when learning genetic networks for the NEC disease, i.e. from the above-mentioned data sets, and use biological knowledge to assess the hypothesized gene interactions. From the results, it was concluded that the NEC gene expression data sets were not informative enough for effective derivation of genetic networks for the NEC disease with DBNs and Bayesian learning.
|
22 |
Deriving Genetic Networks from Gene Expression Data and Prior KnowledgeLindlöf, Angelica January 2001 (has links)
<p>In this work three different approaches for deriving genetic association networks were tested. The three approaches were Pearson correlation, an algorithm based on the Boolean network approach and prior knowledge. Pearson correlation and the algorithm based on the Boolean network approach derived associations from gene expression data. In the third approach, prior knowledge from a known genetic network of a related organism was used to derive associations for the target organism, by using homolog matching and mapping the known genetic network to the related organism. The results indicate that the Pearson correlation approach gave the best results, but the prior knowledge approach seems to be the one most worth pursuing</p>
|
23 |
Learning gene interactions from gene expression data dynamic Bayesian networksSigursteinsdottir, Gudrun January 2004 (has links)
<p>Microarray experiments generate vast amounts of data that evidently reflect many aspects of the underlying biological processes. A major challenge in computational biology is to extract, from such data, significant information and knowledge about the complex interplay between genes/proteins. An analytical approach that has recently gained much interest is reverse engineering of genetic networks. This is a very challenging approach, primarily due to the dimensionality of the gene expression data (many genes, few time points) and the potentially low information content of the data. Bayesian networks (BNs) and its extension, dynamic Bayesian networks (DBNs) are statistical machine learning approaches that have become popular for reverse engineering. In the present study, a DBN learning algorithm was applied to gene expression data produced from experiments that aimed to study the etiology of necrotizing enterocolitis (NEC), a gastrointestinal inflammatory (GI) disease that is the most common GI emergency in neonates. The data sets were particularly challenging for the DBN learning algorithm in that they contain gene expression measurements for relatively few time points, between which the sampling intervals are long. The aim of this study was, therefore, to evaluate the applicability of DBNs when learning genetic networks for the NEC disease, i.e. from the above-mentioned data sets, and use biological knowledge to assess the hypothesized gene interactions. From the results, it was concluded that the NEC gene expression data sets were not informative enough for effective derivation of genetic networks for the NEC disease with DBNs and Bayesian learning.</p>
|
24 |
Analysis of Additive Risk Model with High Dimensional Covariates Using Correlation Principal Component RegressionWang, Guoshen 22 April 2008 (has links)
One problem of interest is to relate genes to survival outcomes of patients for the purpose of building regression models to predict future patients¡¯ survival based on their gene expression data. Applying semeparametric additive risk model of survival analysis, this thesis proposes a new approach to conduct the analysis of gene expression data with the focus on model¡¯s predictive ability. The method modifies the correlation principal component regression to handle the censoring problem of survival data. Also, we employ the time dependent AUC and RMSEP to assess how well the model predicts the survival time. Furthermore, the proposed method is able to identify significant genes which are related to the disease. Finally, this proposed approach is illustrated by simulation data set, the diffuse large B-cell lymphoma (DLBCL) data set, and breast cancer data set. The results show that the model fits both of the data sets very well.
|
25 |
Analyzing Gene Expression Data in Terms of Gene Sets: Gene Set Enrichment AnalysisLi, Wei 01 December 2009 (has links)
The DNA microarray biotechnology simultaneously monitors the expression of thousands of genes and aims to identify genes that are differently expressed under different conditions. From the statistical point of view, it can be restated as identify genes strongly associated with the response or covariant of interest. The Gene Set Enrichment Analysis (GSEA) method is one method which focuses the analysis at the functional related gene sets level instead of single genes. It helps biologists to interpret the DNA microarray data by their previous biological knowledge of the genes in a gene set. GSEA has been shown to efficiently identify gene sets containing known disease-related genes in the real experiments. Here we want to evaluate the statistical power of this method by simulation studies. The results show that the the power of GSEA is good enough to identify the gene sets highly associated with the response or covariant of interest.
|
26 |
Replacing qpcr non-detects with microarray expression data : An initialized approach towards microarray and qPCR data integrationSehlstedt, Jonas January 2018 (has links)
Gene expression analysis can be performed by a number of methods. One of the most common methods is using relative qPCR to assess the relative expression of a determined set of genes compared to a reference gene. Analysis methods benefits from an as homogeneous sample set as possible, as great variety in original sample disease status, quality, type, or distribution may yield an uneven base expression between replicates. Additionally normalization of qPCR data will not work if there are missing values in the data. There are methods for handling non-detects (i.e. missing values) in the data, where most of them are only recommended to use when there is a single, or very few, value missing. By integrating microarray expression data with qPCR data, the data quality could be improved on, eradicating the need to redo an entire experiment when too much data is missing or sample data too is heterogeneous. In this project, publically available microarray data, with similar sample status of a given qPCR dataset, was downloaded and processed. The qPCR dataset included 51 genes, where a set of four DLG genes has been chosen for in-depth analysis. For handling missing values, mean imputation and inserting Cq value 40 were used, as well as a novel method initialized where microarray data was used to replace missing values. In summary replacing missing values with microarray data did not show any significant difference to the other two methods in three of the four DLG genes. From this project, it is also suggested an initialized approach towards testing the possibility of qPCR and microarray data integration.
|
27 |
Variance of Difference as Distance Like Measure in Time Series Microarray Data ClusteringMukhopadhyay, Sayan January 2014 (has links) (PDF)
Our intention is to find similarity among the time series expressions of the genes in microarray experiments. It is hypothesized that at a given time point the concentration of one gene’s mRNA is directly affected by the concentration of other gene’s mRNA, and may have biological significance. We define dissimilarity between two time-series data set as the variance of Euclidean distances of each time points. The large numbers of gene expressions make the calculation of variance of distance in each point computationally expensive and therefore computationally challenging in terms of execution time. For this reason we use autoregressive model which estimates nineteen points gene expression to a three point vector. It allows us to find variance of difference between two data sets without point-to-point matching. Previous analysis from the microarray experiments data found that 62 genes are regulated following EGF (Epidermal Growth Factor) and HRG (Heregulin) treatment of the MCF-7 breast cancer cells. We have chosen these suspected cancer-related genes as our reference and investigated which additional set of genes has similar time point expression profiles. Keeping variance of difference as a measure of distance, we have used several methods for clustering the gene expression data, such as our own maximum clique finding heuristics and hierarchical clustering. The results obtained were validated through a text mining study. New predictions from our study could be a basis for further investigations in the genesis of breast cancer. Overall in 84 new genes are found in which 57 genes are related to cancer among them 35 genes are associated with breast cancer.
|
28 |
Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classificationBruno Feres de Souza 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
|
29 |
Application of Committee k-NN Classifiers for Gene Expression Profile ClassificationDhawan, Manik January 2008 (has links)
No description available.
|
30 |
Low dimensional structure in single cell dataKunes, Russell Allen Zhang January 2024 (has links)
This thesis presents the development of three methods, each of which concerns the estimation of interpretable low dimensional representations of high dimensional data. The first two chapters consider methods for fitting low dimensional nonlinear representations. In Chapter 1, we discuss the deterministic input, noisy "and" gate (DINA) model and in Chapter 2, binary variational autoencoders. We present an example of application to single cell assay for transposase accessible chromatin sequencing data (single cell ATACseq), where the DINA model uncovers meaningful discrete representations of cell state. In scientific applications, practitioners have substantial prior knowledge of the latent components driving variation in the data. The third Chapter develops a supervised matrix factorization method, Spectra, that leverages annotations from experts and previous biological experiments to uncover latent representations of single cell RNAseq data.
Variational inference for the DINA model:
The deterministic input, noisy "and" gate (DINA) model allows for matrix decomposition where latent factors are allowed to interact via an "and" relationship. We develop a variational inference approach for estimating the parameters of the DINA model. Previous approaches based on variational inference enumerate the space of latent binary parameters (requiring exponential numbers of parameters) and cannot fit an unknown number of latent components. Here, we report that a practical mean field variational inference approach relying on a nonparametric cumulative shrinkage process prior and stochastic coordinate ascent updates achieves competitive results with existing methods while simultaneously determining the number of latent components. This approach allows scaling exploratory Q-matrix estimation to datasets of practical size with minimal hyperparameter tuning.
Gradient estimation for binary latent variable models:
In order to fit binary variational autoencoders, the gradient of the objective function must be estimated. Generally speaking, gradient estimation is often necessary for fitting generative models with discrete latent variables. Examples of this occur in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator bitflip-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, unbiased gradient variance clipping (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM.Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
The Spectra model for supervised matrix decomposition:
Factor analysis decomposes single-cell gene expression data into a minimal set of gene programs that correspond to processes executed by cells in a sample. However, matrix factorization methods are prone to technical artifacts and poor factor interpretability. We address these concerns with Spectra, an algorithm that combines user-provided gene programs with the detection of novel programs that together best explain expression covariation. Spectra incorporates existing gene sets and cell type labels as prior biological information. It explicitly models cell type and represents input gene sets as a gene-gene knowledge graph, using a penalty function to guide factorization towards the input graph. We show that Spectra outperforms existing approaches in challenging tumor immune contexts: it finds factors that change under immune checkpoint therapy, disentangles the highly correlated features of CD8+ T-cell tumor reactivity and exhaustion, finds a program that explains continuous macrophage state changes under therapy, and identifies cell-type-specific immune metabolic programs.
|
Page generated in 0.0634 seconds