• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 4
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 46
  • 46
  • 46
  • 10
  • 10
  • 10
  • 9
  • 9
  • 8
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Learning gene interactions from gene expression data dynamic Bayesian networks

Sigursteinsdottir, Gudrun January 2004 (has links)
<p>Microarray experiments generate vast amounts of data that evidently reflect many aspects of the underlying biological processes. A major challenge in computational biology is to extract, from such data, significant information and knowledge about the complex interplay between genes/proteins. An analytical approach that has recently gained much interest is reverse engineering of genetic networks. This is a very challenging approach, primarily due to the dimensionality of the gene expression data (many genes, few time points) and the potentially low information content of the data. Bayesian networks (BNs) and its extension, dynamic Bayesian networks (DBNs) are statistical machine learning approaches that have become popular for reverse engineering. In the present study, a DBN learning algorithm was applied to gene expression data produced from experiments that aimed to study the etiology of necrotizing enterocolitis (NEC), a gastrointestinal inflammatory (GI) disease that is the most common GI emergency in neonates. The data sets were particularly challenging for the DBN learning algorithm in that they contain gene expression measurements for relatively few time points, between which the sampling intervals are long. The aim of this study was, therefore, to evaluate the applicability of DBNs when learning genetic networks for the NEC disease, i.e. from the above-mentioned data sets, and use biological knowledge to assess the hypothesized gene interactions. From the results, it was concluded that the NEC gene expression data sets were not informative enough for effective derivation of genetic networks for the NEC disease with DBNs and Bayesian learning.</p>
22

Analysis of Additive Risk Model with High Dimensional Covariates Using Correlation Principal Component Regression

Wang, Guoshen 22 April 2008 (has links)
One problem of interest is to relate genes to survival outcomes of patients for the purpose of building regression models to predict future patients¡¯ survival based on their gene expression data. Applying semeparametric additive risk model of survival analysis, this thesis proposes a new approach to conduct the analysis of gene expression data with the focus on model¡¯s predictive ability. The method modifies the correlation principal component regression to handle the censoring problem of survival data. Also, we employ the time dependent AUC and RMSEP to assess how well the model predicts the survival time. Furthermore, the proposed method is able to identify significant genes which are related to the disease. Finally, this proposed approach is illustrated by simulation data set, the diffuse large B-cell lymphoma (DLBCL) data set, and breast cancer data set. The results show that the model fits both of the data sets very well.
23

Analyzing Gene Expression Data in Terms of Gene Sets: Gene Set Enrichment Analysis

Li, Wei 01 December 2009 (has links)
The DNA microarray biotechnology simultaneously monitors the expression of thousands of genes and aims to identify genes that are differently expressed under different conditions. From the statistical point of view, it can be restated as identify genes strongly associated with the response or covariant of interest. The Gene Set Enrichment Analysis (GSEA) method is one method which focuses the analysis at the functional related gene sets level instead of single genes. It helps biologists to interpret the DNA microarray data by their previous biological knowledge of the genes in a gene set. GSEA has been shown to efficiently identify gene sets containing known disease-related genes in the real experiments. Here we want to evaluate the statistical power of this method by simulation studies. The results show that the the power of GSEA is good enough to identify the gene sets highly associated with the response or covariant of interest.
24

Variance of Difference as Distance Like Measure in Time Series Microarray Data Clustering

Mukhopadhyay, Sayan January 2014 (has links) (PDF)
Our intention is to find similarity among the time series expressions of the genes in microarray experiments. It is hypothesized that at a given time point the concentration of one gene’s mRNA is directly affected by the concentration of other gene’s mRNA, and may have biological significance. We define dissimilarity between two time-series data set as the variance of Euclidean distances of each time points. The large numbers of gene expressions make the calculation of variance of distance in each point computationally expensive and therefore computationally challenging in terms of execution time. For this reason we use autoregressive model which estimates nineteen points gene expression to a three point vector. It allows us to find variance of difference between two data sets without point-to-point matching. Previous analysis from the microarray experiments data found that 62 genes are regulated following EGF (Epidermal Growth Factor) and HRG (Heregulin) treatment of the MCF-7 breast cancer cells. We have chosen these suspected cancer-related genes as our reference and investigated which additional set of genes has similar time point expression profiles. Keeping variance of difference as a measure of distance, we have used several methods for clustering the gene expression data, such as our own maximum clique finding heuristics and hierarchical clustering. The results obtained were validated through a text mining study. New predictions from our study could be a basis for further investigations in the genesis of breast cancer. Overall in 84 new genes are found in which 57 genes are related to cancer among them 35 genes are associated with breast cancer.
25

Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classification

Bruno Feres de Souza 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
26

Application of Committee k-NN Classifiers for Gene Expression Profile Classification

Dhawan, Manik January 2008 (has links)
No description available.
27

Low dimensional structure in single cell data

Kunes, Russell Allen Zhang January 2024 (has links)
This thesis presents the development of three methods, each of which concerns the estimation of interpretable low dimensional representations of high dimensional data. The first two chapters consider methods for fitting low dimensional nonlinear representations. In Chapter 1, we discuss the deterministic input, noisy "and" gate (DINA) model and in Chapter 2, binary variational autoencoders. We present an example of application to single cell assay for transposase accessible chromatin sequencing data (single cell ATACseq), where the DINA model uncovers meaningful discrete representations of cell state. In scientific applications, practitioners have substantial prior knowledge of the latent components driving variation in the data. The third Chapter develops a supervised matrix factorization method, Spectra, that leverages annotations from experts and previous biological experiments to uncover latent representations of single cell RNAseq data. Variational inference for the DINA model: The deterministic input, noisy "and" gate (DINA) model allows for matrix decomposition where latent factors are allowed to interact via an "and" relationship. We develop a variational inference approach for estimating the parameters of the DINA model. Previous approaches based on variational inference enumerate the space of latent binary parameters (requiring exponential numbers of parameters) and cannot fit an unknown number of latent components. Here, we report that a practical mean field variational inference approach relying on a nonparametric cumulative shrinkage process prior and stochastic coordinate ascent updates achieves competitive results with existing methods while simultaneously determining the number of latent components. This approach allows scaling exploratory Q-matrix estimation to datasets of practical size with minimal hyperparameter tuning. Gradient estimation for binary latent variable models: In order to fit binary variational autoencoders, the gradient of the objective function must be estimated. Generally speaking, gradient estimation is often necessary for fitting generative models with discrete latent variables. Examples of this occur in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator bitflip-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, unbiased gradient variance clipping (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM.Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem. The Spectra model for supervised matrix decomposition: Factor analysis decomposes single-cell gene expression data into a minimal set of gene programs that correspond to processes executed by cells in a sample. However, matrix factorization methods are prone to technical artifacts and poor factor interpretability. We address these concerns with Spectra, an algorithm that combines user-provided gene programs with the detection of novel programs that together best explain expression covariation. Spectra incorporates existing gene sets and cell type labels as prior biological information. It explicitly models cell type and represents input gene sets as a gene-gene knowledge graph, using a penalty function to guide factorization towards the input graph. We show that Spectra outperforms existing approaches in challenging tumor immune contexts: it finds factors that change under immune checkpoint therapy, disentangles the highly correlated features of CD8+ T-cell tumor reactivity and exhaustion, finds a program that explains continuous macrophage state changes under therapy, and identifies cell-type-specific immune metabolic programs.
28

In silico prediction of cis-regulatory elements of genes involved in hypoxic-ischaemic insult

Fu, Wai, 符慧 January 2006 (has links)
published_or_final_version / abstract / Paediatrics and Adolescent Medicine / Master / Master of Philosophy
29

Confounding effects in gene expression and their impact on downstream analysis

Lachmann, Alexander January 2016 (has links)
The reconstruction of gene regulatory networks is one of the milestones of computational system biology. We introduce a new implementation of ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) to reverse engineer transcriptional regulatory networks with improved mutual information estimators and significant improvement in performance. In the context of data driven network inference we identify two major confounding biases and introduce solutions to remove some of the discussed biases. First we identify prevalent spatial biases in gene expression studies derived from plate based designs. We investigate the gene expression profiles of a million samples from the LINCS dataset and find that the vast majority (96%) of the tested plates is affected by significant spatial bias. We can show that our proposed method to correct these biases results in a significant improvement of similarity between biological replicates assayed in different plates. Lastly we discuss the effect of CNV on gene expression and its confounding effect on the correlation landscape of genes in the context of cancer samples. We propose a method that removes the variance in gene expression explained by CNV and show that TF target predictions can be significantly improved.
30

Geometric algorithms for component analysis with a view to gene expression data analysis

Journée, Michel 04 June 2009 (has links)
The research reported in this thesis addresses the problem of component analysis, which aims at reducing large data to lower dimensions, to reveal the essential structure of the data. This problem is encountered in almost all areas of science - from physics and biology to finance, economics and psychometrics - where large data sets need to be analyzed. Several paradigms for component analysis are considered, e.g., principal component analysis, independent component analysis and sparse principal component analysis, which are naturally formulated as an optimization problem subject to constraints that endow the problem with a well-characterized matrix manifold structure. Component analysis is so cast in the realm of optimization on matrix manifolds. Algorithms for component analysis are subsequently derived that take advantage of the geometrical structure of the problem. When formalizing component analysis into an optimization framework, three main classes of problems are encountered, for which methods are proposed. We first consider the problem of optimizing a smooth function on the set of n-by-p real matrices with orthonormal columns. Then, a method is proposed to maximize a convex function on a compact manifold, which generalizes to this context the well-known power method that computes the dominant eigenvector of a matrix. Finally, we address the issue of solving problems defined in terms of large positive semidefinite matrices in a numerically efficient manner by using low-rank approximations of such matrices. The efficiency of the proposed algorithms for component analysis is evaluated on the analysis of gene expression data related to breast cancer, which encode the expression levels of thousands of genes gained from experiments on hundreds of cancerous cells. Such data provide a snapshot of the biological processes that occur in tumor cells and offer huge opportunities for an improved understanding of cancer. Thanks to an original framework to evaluate the biological significance of a set of components, well-known but also novel knowledge is inferred about the biological processes that underlie breast cancer. Hence, to summarize the thesis in one sentence: We adopt a geometric point of view to propose optimization algorithms performing component analysis, which, applied on large gene expression data, enable to reveal novel biological knowledge.

Page generated in 0.1175 seconds