Global ETD Search

31	Mixture Modeling and Outlier Detection in Microarray Data Analysis George, Nysia I. 16 January 2010 (has links) Microarray technology has become a dynamic tool in gene expression analysis because it allows for the simultaneous measurement of thousands of gene expressions. Uniqueness in experimental units and microarray data platforms, coupled with how gene expressions are obtained, make the field open for interesting research questions. In this dissertation, we present our investigations of two independent studies related to microarray data analysis. First, we study a recent platform in biology and bioinformatics that compares the quality of genetic information from exfoliated colonocytes in fecal matter with genetic material from mucosa cells within the colon. Using the intraclass correlation coe�cient (ICC) as a measure of reproducibility, we assess the reliability of density estimation obtained from preliminary analysis of fecal and mucosa data sets. Numerical findings clearly show that the distribution is comprised of two components. For measurements between 0 and 1, it is natural to assume that the data points are from a beta-mixture distribution. We explore whether ICC values should be modeled with a beta mixture or transformed first and fit with a normal mixture. We find that the use of mixture of normals in the inverse-probit transformed scale is less sensitive toward model mis-specification; otherwise a biased conclusion could be reached. By using the normal mixture approach to compare the ICC distributions of fecal and mucosa samples, we observe the quality of reproducible genes in fecal array data to be comparable with that in mucosa arrays. For microarray data, within-gene variance estimation is often challenging due to the high frequency of low replication studies. Several methodologies have been developed to strengthen variance terms by borrowing information across genes. However, even with such accommodations, variance may be initiated by the presence of outliers. For our second study, we propose a robust modification of optimal shrinkage variance estimation to improve outlier detection. In order to increase power, we suggest grouping standardized data so that information shared across genes is similar in distribution. Simulation studies and analysis of real colon cancer microarray data reveal that our methodology provides a technique which is insensitive to outliers, free of distributional assumptions, effective for small sample size, and data adaptive. gene expression data ICC colon cancer non-invasive screening mixture-modeling outlier detection shrinkage variance estimator variance microarray
32	Johnson's system of distributions and microarray data analysis George, Florence 01 June 2007 (has links) Microarray technology permit us to study the expression levels of thousands of genes simultaneously. The technique has a wide range of applications including identification of genes that change their expression in cells due to disease or drug stimuli. The dissertation is addressing statistical methods for the selection of differentially expressed genes in two experimental conditions. We propose two different methods for the selection of differentially expressed genes. The first method is a classical approach, where we consider a common distribution for the summary measure of equally expressed genes. To estimate this common distribution, the Johnson system of distribution is used. The advantage of using Johnson system is that, there is no need of a parametric assumption for gene expression data. In contrast to other classical methods, in the proposed method, there is a sharing of information across the genes by the assumption of a common distribution for the summary measure of equally expressed genes. The second method is the gene selection using a mixture model approach and Baye's theorem. This approach also uses the Johnson System of distribution for the estimation of distribution of summary measure. Johnson system of distribution has the flexibility of covering a wide variety of distributional shapes. This system provides a unique distribution corresponding to each pair of mathematically possible values of skewness and kurtosis. The significant flexibility of Johnson system is very useful in characterizing the complicated data set like microarray data. In this dissertation we propose a novel algorithm for the estimation of the four parameters of the Johnson system. Gene expression data Differentially expressed genes Transformed distributions Baye's formula Mixture model approach American Studies Arts and Humanities
33	Analysis And Prediction Of Gene Expression Patterns By Dynamical Systems, And By A Combinatorial Algorithm Tastan, Mesut 01 September 2005 (has links) (PDF) Modeling and prediction of gene-expression patterns has an important place in computational biology and bioinformatics. The measure of gene expression is determined from the genomic analysis at the mRNA level by means of microarray technologies. Thus, mRNA analysis informs us not only about genetic viewpoints of an organism but also about the dynamic changes in environment of that organism. Different mathematical methods have been developed for analyzing experimental data. In this study, we discuss the modeling approaches and the reasons why we concentrate on models derived from differential equations and improve the pioneering works in this field by including affine terms on the right-hand side of the nonlinear differential equations and by using Runge- Kutta instead of Euler discretization, especially, with Heun&rsquo / s method. Herewith, for stability analysis we apply modified Brayton and Tong algorithm to time-discrete dynamics in an extended space. QA Differential Equations 370-387
34	A Novel Ensemble Method using Signed and Unsigned Graph Convolutional Networks for Predicting Mechanisms of Action of Small Molecules from Gene Expression Data Karim, Rashid Saadman 24 May 2022 (has links) No description available. Bioinformatics Graph Convolutional Neural Network Drug Mechanism of Action Prediction Ensemble Learning Unsigned and Signed Networks Bioinformatics Deep Learning on Gene Expression Data
35	Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control Problem Erdogdu, Utku 01 April 2012 (has links) (PDF) The need to analyze and closely study the gene related mechanisms motivated the research on the modeling and control of gene regulatory networks (GRN). Dierent approaches exist to model GRNs / they are mostly simulated as mathematical models that represent relationships between genes. Though it turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem / it is mostly ignored and substituted by the assumption that states of GRN are known precisely, prescribed as full observability. On the other hand, current works addressing partially observability focus on formulating algorithms for the nite horizon GRN control problem. So, in this work we explore the feasibility of realizing the problem in a partially observable setting, mainly with Partially Observable Markov Decision Processes (POMDP). We proposed a POMDP formulation for the innite horizon version of the problem. Knowing the fact that POMDP problems suer from the curse of dimensionality, we also proposed a POMDP solution method that automatically decomposes the problem by isolating dierent unrelated parts of the problem, and then solves the reduced subproblems. We also proposed a method to enrich gene expression data sets given as input to POMDP control task, because in available data sets there are thousands of genes but only tens or rarely hundreds of samples. The method is based on the idea of generating more than one model using the available data sets, and then sampling data from each of the models and nally ltering the generated samples with the help of metrics that measure compatibility, diversity and coverage of the newly generated samples.
36	Αναγνώριση λειτουργικών υπο-δομών στο πρωτεϊνικό δίκτυο του Saccharomyces cerevisae συνδυάζοντας δεδομένα έκφρασης γονιδίων και αλληλεπίδρασης πρωτεϊνών Δημητρακοπούλου, Κωνσταντίνα 23 December 2008 (has links) Τα τελευταία χρόνια κυριαρχεί στο χώρο της γενωμικής έρευνας η τεχνολογία των μικροσυστοιχιών, η οποία επέτρεψε την ποσοτική μέτρηση της έκφρασης χιλιάδων γονιδίων ταυτόχρονα. Παρόλο που τα δεδομένα έκφρασης των γονιδίων μπορεί να εμπεριέχουν θόρυβο και να μην είναι πλήρως αντικειμενικά, εντούτοις περιγράφουν την έκφραση όλου του γονιδιώματος ενός οργανισμού, κάτι το οποίο δεν ήταν εφικτό τις προηγούμενες δεκαετίες. Επίσης ένα άλλο είδος δεδομένων που συνέβαλλε δραστικά στην κατανόηση των δυναμικών διεργασιών του κυττάρου ήταν τα δεδομένα πρωτεϊνικών αλληλεπιδράσεων (πρωτεΐνη-πρωτεΐνη). Μεγάλης κλίμακας τεχνικές όπως το διυβριδικό σύστημα του σακχαρομύκητα και η φασματομετρία μάζας καθαρισμένων πρωτεϊνικών συμπλόκων παρήγαγαν μεγάλη ποσότητα πληροφορίας για τις σχέσεις μεταξύ των γονιδιακών προϊόντων. Επίσης και αυτό το είδος δεδομένων χαρακτηρίζεται από πολλές αναληθείς αλληλεπιδράσεις και στην εργασία αυτή χρησιμοποιούνται οι πιο έγκυρες από αυτές. Ταυτόχρονα ξεκίνησε μια προσπάθεια να περιγραφούν οι δυναμικές διεργασίες του κυττάρου μέσα από βιολογικά δίκτυα π.χ. γονιδιακά, πρωτεϊνικά, μεταβολικά κτλ. Ακόμα μεγαλύτερη πρόκληση είναι η εύρεση υποδικτύων με βιολογικά διακριτό ρόλο, τα οποία ονομάζονται λειτουργικές υπο-δομές. Η ανίχνευση τέτοιων υπο-δομών θα συντελέσει στην κατανόηση των σχέσεων μεταξύ των γονιδίων ή των προϊόντων τους αλλά και στην επισήμανση γονιδίων ή πρωτεϊνών που δεν έχουν χαρακτηριστεί ακόμα. Στην εργασία αυτή τέλος περιγράφονται τρόποι ομαδοποίησης των δεδομένων γονιδιακής έκφρασης, αναλύονται διεξοδικά τα δίκτυα αλληλεπίδρασης πρωτεϊνών και παρουσιάζονται τρόποι ομαδοποίησης αυτών. Επίσης προτείνεται ενοποίηση των παραπάνω δεδομένων στον οργανισμό Saccharomyces cerevisiae με σκοπό την ανίχνευση λειτουργικών υπο-δομών στον πρωτεϊνικό του γράφο. Επιπρόσθετα, η ανίχνευση αυτών των υπο-δομών υλοποιήθηκε με έναν νέο αλγόριθμο, τον Detect Module from Seed Protein (DMSP), ο οποίος δεν διαμερίζει το γράφο σε ομάδες όπως οι κλασικοί τρόποι ομαδοποίησης αλλά χτίζει υπο-δομές ξεκινώντας από μια πρωτεΐνη-«σπόρο». / - Βιοπληροφορική Λειτουργική υπο-δομή 572.865 1 Bioinformatics Gene expression data Protein interaction network Functional module
37	Méthodes bayésiennes pour l'analyse génétique / Bayesian methods for gene expression factor analysis Bazot, Cécile 27 September 2013 (has links) Ces dernières années, la génomique a connu un intérêt scientifique grandissant, notamment depuis la publication complète des cartes du génome humain au début des années 2000. A présent, les équipes médicales sont confrontées à un nouvel enjeu : l'exploitation des signaux délivrés par les puces ADN. Ces signaux, souvent de grande taille, permettent de connaître à un instant donné quel est le niveau d'expression des gênes dans un tissu considéré, sous des conditions particulières (phénotype, traitement, ...), pour un individu. Le but de cette recherche est d'identifier des séquences temporelles caractéristiques d'une pathologie, afin de détecter, voire de prévenir, une maladie chez un groupe de patients observés. Les solutions développées dans cette thèse consistent en la décomposition de ces signaux en facteurs élémentaires (ou signatures génétiques) selon un modèle bayésien de mélange linéaire, permettant une estimation conjointe de ces facteurs et de leur proportion dans chaque échantillon. L’utilisation de méthodes de Monte Carlo par chaînes de Markov sera tout particulièrement appropriée aux modèles bayésiens hiérarchiques proposés puisqu'elle permettra de surmonter les difficultés liées à leur complexité calculatoire. / In the past few years, genomics has received growing scientic interest, particularly since the map of the human genome was completed and published in early 2000's. Currently, medical teams are facing a new challenge: processing the signals issued by DNA microarrays. These signals, often of voluminous size, allow one to discover the level of a gene expression in a given tissue at any time, under specic conditions (phenotype, treatment, ...). The aim of this research is to identify characteristic temporal gene expression proles of host response to a pathogen, in order to detect or even prevent a disease in a group of observed patients. The solutions developed in this thesis consist of the decomposition of these signals into elementary factors (genetic signatures) following a Bayesian linear mixing model, allowing for joint estimation of these factors and their relative contributions to each sample. The use of Markov chain Monte Carlo methods is particularly suitable for the proposed hierarchical Bayesian models. Indeed they allow one to overcome the diculties related to their computational complexity. Analyse génétique Méthodes MCMC Inférence bayésienne Traitement du signal Données d’expression des gènes Factor analysis MCMC methods Bayesian inference Signal processing Gene expression data
38	Computational Cancer Research: Network-based analysis of cancer data disentangles clinically relevant alterations from molecular measurements Seifert, Michael 12 September 2022 (has links) Cancer is a very complex genetic disease driven by combinations of mutated genes. This complexity strongly complicates the identification of driver genes and puts enormous challenges to reveal how they influence cancerogenesis, prognosis or therapy response. Thousands of molecular profiles of the major human types of cancer have been measured over the last years. Apart from well-studied frequently mutated genes, still only little is known about the role of rarely mutated genes in cancer or the interplay of mutated genes in individual cancers. Gene expression and mutation profiles can be measured routinely, but computational methods for the identification of driver candidates along with the prediction of their potential impacts on downstream targets and clinically relevant characteristics only rarely exist. Instead of only focusing on frequently mutated genes, each cancer patient should better be analyzed by using the full information in its cancer-specific molecular profiles to improve the understanding of cancerogenesis and to more precisely predict prognosis and therapy response of individual patients. This requires novel computational methods for the integrative analysis of molecular cancer data. A promising way to realize this is to consider cancer as a disease of cellular networks. Therefore, I have developed a novel network-based approach for the integrative analysis of molecular cancer data over the last years. This approach directly learns gene regulatory networks form gene expression and copy number data and further enables to quantify impacts of altered genes on clinically relevant downstream targets using network propagation. This habilitation thesis summarizes the results of seven of my publications. All publications have a focus on the integrative analysis of molecular cancer data with an overarching connection to the newly developed network-based approach. In the first three publications, networks were learned to identify major regulators that distinguish characteristic gene expression signatures with applications to astrocytomas, oligodendrogliomas, and acute myeloid leukemia. Next, the central publication of this habilitation thesis, which combines network inference with network propagation, is introduced. The great value of this approach is demonstrated by quantifying potential direct and indirect impacts of rare and frequent gene copy number alterations on patient survival. Further, the publication of the corresponding user-friendly R package regNet is introduced. Finally, two additional publications that also strongly highlight the value of the developed network-based approach are presented with the aims to predict cancer gene candidates within the region of the 1p/19q co-deletion of oligodendrogliomas and to determine driver candidates associated with radioresistance and relapse of prostate cancer. All seven publications are embedded into a brief introduction that motivates the scientific background and the major objectives of this thesis. The background is briefly going from the hallmarks of cancer over the complexity of cancer genomes down to the importance of networks in cancer. This includes a short introduction of the mathematical concepts that underlie the developed network inference and network propagation algorithms. Further, I briefly motivate and summarize my studies before the original publications are presented. The habilitation thesis is completed with a general discussion of the major results with a specific focus on the utilized network-based data analysis strategies. Major biologically and clinically relevant findings of each publication are also briefly summarized. info:eu-repo/classification/ddc/610 ddc:610
39	Statistical Methods for Genetic Pathway-Based Data Analysis Cheng, Lulu 13 November 2013 (has links) The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set. / Ph. D. Adaptive GLASSO Gaussian Random Process Gene Expression Data GLASSO Marginal Likelihood Multi-Level Gaussian Graphical Model Pathway-Based Analysis Unknown Link Estimation Zero Inflated Poisson.
40	Bioinformatic analyses for T helper cell subtypes discrimination and gene regulatory network reconstruction Kröger, Stefan 02 August 2017 (has links) Die Etablierung von Hochdurchsatz-Technologien zur Durchführung von Genexpressionsmessungen führte in den letzten 20 Jahren zu einer stetig wachsende Menge an verfügbaren Daten. Sie ermöglichen durch Kombination einzelner Experimente neue Vergleichsstudien zu kombinieren oder Experimente aus verschiedenen Studien zu großen Datensätzen zu vereinen. Dieses Vorgehen wird als Meta-Analyse bezeichnet und in dieser Arbeit verwendet, um einen großen Genexpressionsdatensatz aus öffentlich zugänglichen T-Zell Experimenten zu erstellen. T-Zellen sind Immunzellen, die eine Vielzahl von unterschiedlichen Funktionen des Immunsystems inititiieren und steuern. Sie können in verschiedene Subtypen mit unterschiedlichen Funktionen differenzieren. Der mittels Meta-Analyse erstellte Datensatz beinhaltet nur Experimente zu einem T-Zell-Subtyp, den regulatorischen T-Zellen (Treg) bzw. der beiden Untergruppen, natürliche Treg (nTreg) und induzierte Treg (iTreg) Zellen. Eine bisher unbeantwortete Frage lautet, welche subtyp-spezifischen gen-regulatorische Mechanismen die T-Zell Differenzierung steuern. Dazu werden in dieser Arbeit zwei spezifische Herausforderungen der Treg Forschung behandelt: (i) die Identifikation von Zelloberflächenmarkern zur Unterscheidung und Charakterisierung der Subtypen, sowie (ii) die Rekonstruktion von Treg-Zell-spezifischen gen-regulatorischen Netzwerken (GRN), die die Differenzierungsmechanismen beschreiben. Die implementierte Meta-Analyse kombiniert mehr als 150 Microarray-Experimente aus über 30 Studien in einem Datensatz. Dieser wird benutzt, um mittels Machine Learning Zell-spezifische Oberflächenmarker an Hand ihres Expressionsprofils zu identifizieren. Mit der in dieser Arbeit entwickelten Methode wurden 41 Genen extrahiert, von denen sechs Oberflächenmarker sind. Zusätzliche Validierungsexperimente zeigten, dass diese sechs Gene die Experimenten beider T-Zell Subtypen sicher unterscheiden können. Zur Rekonstruktion von GRNs vergleichen wir unter Verwendung des erstellten Datensatzes 11 verschiedene Algorithmen und evaluieren die Ergebnisse mit Informationen aus Interaktionsdatenbanken. Die Evaluierung zeigt, dass die derzeit verfügbaren Methoden nicht in der Lage sind den Wissensstand Treg-spezifischer, regulatorsicher Mechanismen zu erweitern. Abschließend präsentieren wir eine Datenintegrationstrategie zur Rekonstruktion von GRN am Beispiel von Th2 Zellen. Aus Hochdurchsatzexperimenten wird ein Th2-spezifisches GRN bestehend aus 100 Genen rekonstruiert. Während 89 dieser Gene im Kontext der Th2-Zelldifferenzierung bekannt sind, wurden 11 neue Kandidatengene ohne bisherige Assoziation zur Th2-Differenzierung ermittelt. Die Ergebnisse zeigen, dass Datenintegration prinzipiell die GRN Rekonstruktion ermöglicht. Mit der Verfügbarkeit von mehr Daten mit besserer Qualität ist zu erwarten, dass Methoden zur Rekonstruktion maßgeblich zum besseren Verstehen der zellulären Differenzierung im Immunsystem und darüber hinaus beitragen können und so letztlich die Ursachenforschung von Dysfunktionen und Krankheiten des Immunsystems ermöglichen werden. / Within the last two decades high-throughput gene expression screening technologies have led to a rapid accumulation of experimental data. The amounts of information available have enabled researchers to contrast and combine multiple experiments by synthesis, one of such approaches is called meta-analysis. In this thesis, we build a large gene expression data set based on publicly available studies for further research on T cell subtype discrimination and the reconstruction of T cell specific gene regulatory events. T cells are immune cells which have the ability to differentiate into subtypes with distinct functions, initiating and contributing to a variety of immune processes. To date, an unsolved problem in understanding the immune system is how T cells obtain a specific subtype differentiation program, which relates to subtype-specific gene regulatory mechanisms. We present an assembled expression data set which describes a specific T cell subset, regulatory T (Treg) cells, which can be further categorized into natural Treg (nTreg) and induced Treg (iTreg) cells. In our analysis we have addressed specific challenges in regulatory T cell research: (i) discriminating between different Treg cell subtypes for characterization and functional analysis, and (ii) reconstructing T cell subtype specific gene regulatory mechanisms which determine the differences in subtype-specific roles for the immune system. Our meta-analysis strategy combines more than one hundred microarray experiments. This data set is applied to a machine learning based strategy of extracting surface protein markers to enable Treg cell subtype discrimination. We identified a set of 41 genes which distinguish between nTregs and iTregs based on gene expression profile only. Evaluation of six of these genes confirmed their discriminative power which indicates that our approach is suitable to extract candidates for robust discrimination between experiment classes. Next, we identify gene regulatory interactions using existing reconstruction algorithms aiming to extend the number of known gene-gene interactions for Treg cells. We applied eleven GRN reconstruction tools based on expression data only and compared their performance. Taken together, our results suggest that the available methods are not yet sufficient to extend the current knowledge by inferring so far unreported Treg specific interactions. Finally, we present an approach of integrating multiple data sets based on different high-throughput technologies to reconstruct a subtype-specific GRN. We constructed a Th2 cell specific gene regulatory network of 100 genes. While 89 of these are known to be related to Th2 cell differentiation, we were able to attribute 11 new candidate genes with a function in Th2 cell differentiation. We show that our approach to data integration does, in principle, allow for the reconstruction of a complex network. Future availability of more and more consistent data may enable the use of the concept of GRN reconstruction to improve understanding causes and mechanisms of cellular differentiation in the immune system and beyond and, ultimately, their dysfunctions and diseases. T-Zelle Microarray Genexpressionsdaten Feature Selection Datenintegration gen-regulatorische Interaktionen Netzwerkrekonstruktion Meta-Analyse T cell gene expression data meta-analysis gene regulatory network reconstruction data integration microarray analysis feature selection 004 Datenverarbeitung; Informatik WC 7700 ddc:004

Search results