• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 4
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 46
  • 46
  • 46
  • 10
  • 10
  • 10
  • 10
  • 9
  • 8
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Learning gene interactions from gene expression data dynamic Bayesian networks

Sigursteinsdottir, Gudrun January 2004 (has links)
<p>Microarray experiments generate vast amounts of data that evidently reflect many aspects of the underlying biological processes. A major challenge in computational biology is to extract, from such data, significant information and knowledge about the complex interplay between genes/proteins. An analytical approach that has recently gained much interest is reverse engineering of genetic networks. This is a very challenging approach, primarily due to the dimensionality of the gene expression data (many genes, few time points) and the potentially low information content of the data. Bayesian networks (BNs) and its extension, dynamic Bayesian networks (DBNs) are statistical machine learning approaches that have become popular for reverse engineering. In the present study, a DBN learning algorithm was applied to gene expression data produced from experiments that aimed to study the etiology of necrotizing enterocolitis (NEC), a gastrointestinal inflammatory (GI) disease that is the most common GI emergency in neonates. The data sets were particularly challenging for the DBN learning algorithm in that they contain gene expression measurements for relatively few time points, between which the sampling intervals are long. The aim of this study was, therefore, to evaluate the applicability of DBNs when learning genetic networks for the NEC disease, i.e. from the above-mentioned data sets, and use biological knowledge to assess the hypothesized gene interactions. From the results, it was concluded that the NEC gene expression data sets were not informative enough for effective derivation of genetic networks for the NEC disease with DBNs and Bayesian learning.</p>
22

Analysis of Additive Risk Model with High Dimensional Covariates Using Correlation Principal Component Regression

Wang, Guoshen 22 April 2008 (has links)
One problem of interest is to relate genes to survival outcomes of patients for the purpose of building regression models to predict future patients¡¯ survival based on their gene expression data. Applying semeparametric additive risk model of survival analysis, this thesis proposes a new approach to conduct the analysis of gene expression data with the focus on model¡¯s predictive ability. The method modifies the correlation principal component regression to handle the censoring problem of survival data. Also, we employ the time dependent AUC and RMSEP to assess how well the model predicts the survival time. Furthermore, the proposed method is able to identify significant genes which are related to the disease. Finally, this proposed approach is illustrated by simulation data set, the diffuse large B-cell lymphoma (DLBCL) data set, and breast cancer data set. The results show that the model fits both of the data sets very well.
23

Analyzing Gene Expression Data in Terms of Gene Sets: Gene Set Enrichment Analysis

Li, Wei 01 December 2009 (has links)
The DNA microarray biotechnology simultaneously monitors the expression of thousands of genes and aims to identify genes that are differently expressed under different conditions. From the statistical point of view, it can be restated as identify genes strongly associated with the response or covariant of interest. The Gene Set Enrichment Analysis (GSEA) method is one method which focuses the analysis at the functional related gene sets level instead of single genes. It helps biologists to interpret the DNA microarray data by their previous biological knowledge of the genes in a gene set. GSEA has been shown to efficiently identify gene sets containing known disease-related genes in the real experiments. Here we want to evaluate the statistical power of this method by simulation studies. The results show that the the power of GSEA is good enough to identify the gene sets highly associated with the response or covariant of interest.
24

Variance of Difference as Distance Like Measure in Time Series Microarray Data Clustering

Mukhopadhyay, Sayan January 2014 (has links) (PDF)
Our intention is to find similarity among the time series expressions of the genes in microarray experiments. It is hypothesized that at a given time point the concentration of one gene’s mRNA is directly affected by the concentration of other gene’s mRNA, and may have biological significance. We define dissimilarity between two time-series data set as the variance of Euclidean distances of each time points. The large numbers of gene expressions make the calculation of variance of distance in each point computationally expensive and therefore computationally challenging in terms of execution time. For this reason we use autoregressive model which estimates nineteen points gene expression to a three point vector. It allows us to find variance of difference between two data sets without point-to-point matching. Previous analysis from the microarray experiments data found that 62 genes are regulated following EGF (Epidermal Growth Factor) and HRG (Heregulin) treatment of the MCF-7 breast cancer cells. We have chosen these suspected cancer-related genes as our reference and investigated which additional set of genes has similar time point expression profiles. Keeping variance of difference as a measure of distance, we have used several methods for clustering the gene expression data, such as our own maximum clique finding heuristics and hierarchical clustering. The results obtained were validated through a text mining study. New predictions from our study could be a basis for further investigations in the genesis of breast cancer. Overall in 84 new genes are found in which 57 genes are related to cancer among them 35 genes are associated with breast cancer.
25

Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classification

Bruno Feres de Souza 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
26

Application of Committee k-NN Classifiers for Gene Expression Profile Classification

Dhawan, Manik January 2008 (has links)
No description available.
27

Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data

Peng, P., Addam, O., Elzohbi, M., Ozyer, S., Elhajj, Ahmad, Gao, S., Liu, Y., Ozyer, T., Kaya, M., Ridley, Mick J., Rokne, J., Alhajj, R. 14 November 2013 (has links)
No / Clustering is an essential research problem which has received considerable attention in the research community for decades. It is a challenge because there is no unique solution that fits all problems and satisfies all applications. We target to get the most appropriate clustering solution for a given application domain. In other words, clustering algorithms in general need prior specification of the number of clus- ters, and this is hard even for domain experts to estimate especially in a dynamic environment where the data changes and/or become available incrementally. In this paper, we described and analyze the effec- tiveness of a robust clustering algorithm which integrates multi-objective genetic algorithm into a frame- work capable of producing alternative clustering solutions; it is called Multi-objective K-Means Genetic Algorithm (MOKGA). We investigate its application for clustering a variety of datasets, including micro- array gene expression data. The reported results are promising. Though we concentrate on gene expres- sion and mostly cancer data, the proposed approach is general enough and works equally to cluster other datasets as demonstrated by the two datasets Iris and Ruspini. After running MOKGA, a pareto-optimal front is obtained, and gives the optimal number of clusters as a solution set. The achieved clustering results are then analyzed and validated under several cluster validity techniques proposed in the litera- ture. As a result, the optimal clusters are ranked for each validity index. We apply majority voting to decide on the most appropriate set of validity indexes applicable to every tested dataset. The proposed clustering approach is tested by conducting experiments using seven well cited benchmark data sets. The obtained results are compared with those reported in the literature to demonstrate the applicability and effectiveness of the proposed approach.
28

In silico prediction of cis-regulatory elements of genes involved in hypoxic-ischaemic insult

Fu, Wai, 符慧 January 2006 (has links)
published_or_final_version / abstract / Paediatrics and Adolescent Medicine / Master / Master of Philosophy
29

Confounding effects in gene expression and their impact on downstream analysis

Lachmann, Alexander January 2016 (has links)
The reconstruction of gene regulatory networks is one of the milestones of computational system biology. We introduce a new implementation of ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) to reverse engineer transcriptional regulatory networks with improved mutual information estimators and significant improvement in performance. In the context of data driven network inference we identify two major confounding biases and introduce solutions to remove some of the discussed biases. First we identify prevalent spatial biases in gene expression studies derived from plate based designs. We investigate the gene expression profiles of a million samples from the LINCS dataset and find that the vast majority (96%) of the tested plates is affected by significant spatial bias. We can show that our proposed method to correct these biases results in a significant improvement of similarity between biological replicates assayed in different plates. Lastly we discuss the effect of CNV on gene expression and its confounding effect on the correlation landscape of genes in the context of cancer samples. We propose a method that removes the variance in gene expression explained by CNV and show that TF target predictions can be significantly improved.
30

Geometric algorithms for component analysis with a view to gene expression data analysis

Journée, Michel 04 June 2009 (has links)
The research reported in this thesis addresses the problem of component analysis, which aims at reducing large data to lower dimensions, to reveal the essential structure of the data. This problem is encountered in almost all areas of science - from physics and biology to finance, economics and psychometrics - where large data sets need to be analyzed. Several paradigms for component analysis are considered, e.g., principal component analysis, independent component analysis and sparse principal component analysis, which are naturally formulated as an optimization problem subject to constraints that endow the problem with a well-characterized matrix manifold structure. Component analysis is so cast in the realm of optimization on matrix manifolds. Algorithms for component analysis are subsequently derived that take advantage of the geometrical structure of the problem. When formalizing component analysis into an optimization framework, three main classes of problems are encountered, for which methods are proposed. We first consider the problem of optimizing a smooth function on the set of n-by-p real matrices with orthonormal columns. Then, a method is proposed to maximize a convex function on a compact manifold, which generalizes to this context the well-known power method that computes the dominant eigenvector of a matrix. Finally, we address the issue of solving problems defined in terms of large positive semidefinite matrices in a numerically efficient manner by using low-rank approximations of such matrices. The efficiency of the proposed algorithms for component analysis is evaluated on the analysis of gene expression data related to breast cancer, which encode the expression levels of thousands of genes gained from experiments on hundreds of cancerous cells. Such data provide a snapshot of the biological processes that occur in tumor cells and offer huge opportunities for an improved understanding of cancer. Thanks to an original framework to evaluate the biological significance of a set of components, well-known but also novel knowledge is inferred about the biological processes that underlie breast cancer. Hence, to summarize the thesis in one sentence: We adopt a geometric point of view to propose optimization algorithms performing component analysis, which, applied on large gene expression data, enable to reveal novel biological knowledge.

Page generated in 0.1294 seconds