Global ETD Search

41	Αναγνώριση λειτουργικών υπο-δομών στο πρωτεϊνικό δίκτυο του Saccharomyces cerevisae συνδυάζοντας δεδομένα έκφρασης γονιδίων και αλληλεπίδρασης πρωτεϊνών Δημητρακοπούλου, Κωνσταντίνα 23 December 2008 (has links) Τα τελευταία χρόνια κυριαρχεί στο χώρο της γενωμικής έρευνας η τεχνολογία των μικροσυστοιχιών, η οποία επέτρεψε την ποσοτική μέτρηση της έκφρασης χιλιάδων γονιδίων ταυτόχρονα. Παρόλο που τα δεδομένα έκφρασης των γονιδίων μπορεί να εμπεριέχουν θόρυβο και να μην είναι πλήρως αντικειμενικά, εντούτοις περιγράφουν την έκφραση όλου του γονιδιώματος ενός οργανισμού, κάτι το οποίο δεν ήταν εφικτό τις προηγούμενες δεκαετίες. Επίσης ένα άλλο είδος δεδομένων που συνέβαλλε δραστικά στην κατανόηση των δυναμικών διεργασιών του κυττάρου ήταν τα δεδομένα πρωτεϊνικών αλληλεπιδράσεων (πρωτεΐνη-πρωτεΐνη). Μεγάλης κλίμακας τεχνικές όπως το διυβριδικό σύστημα του σακχαρομύκητα και η φασματομετρία μάζας καθαρισμένων πρωτεϊνικών συμπλόκων παρήγαγαν μεγάλη ποσότητα πληροφορίας για τις σχέσεις μεταξύ των γονιδιακών προϊόντων. Επίσης και αυτό το είδος δεδομένων χαρακτηρίζεται από πολλές αναληθείς αλληλεπιδράσεις και στην εργασία αυτή χρησιμοποιούνται οι πιο έγκυρες από αυτές. Ταυτόχρονα ξεκίνησε μια προσπάθεια να περιγραφούν οι δυναμικές διεργασίες του κυττάρου μέσα από βιολογικά δίκτυα π.χ. γονιδιακά, πρωτεϊνικά, μεταβολικά κτλ. Ακόμα μεγαλύτερη πρόκληση είναι η εύρεση υποδικτύων με βιολογικά διακριτό ρόλο, τα οποία ονομάζονται λειτουργικές υπο-δομές. Η ανίχνευση τέτοιων υπο-δομών θα συντελέσει στην κατανόηση των σχέσεων μεταξύ των γονιδίων ή των προϊόντων τους αλλά και στην επισήμανση γονιδίων ή πρωτεϊνών που δεν έχουν χαρακτηριστεί ακόμα. Στην εργασία αυτή τέλος περιγράφονται τρόποι ομαδοποίησης των δεδομένων γονιδιακής έκφρασης, αναλύονται διεξοδικά τα δίκτυα αλληλεπίδρασης πρωτεϊνών και παρουσιάζονται τρόποι ομαδοποίησης αυτών. Επίσης προτείνεται ενοποίηση των παραπάνω δεδομένων στον οργανισμό Saccharomyces cerevisiae με σκοπό την ανίχνευση λειτουργικών υπο-δομών στον πρωτεϊνικό του γράφο. Επιπρόσθετα, η ανίχνευση αυτών των υπο-δομών υλοποιήθηκε με έναν νέο αλγόριθμο, τον Detect Module from Seed Protein (DMSP), ο οποίος δεν διαμερίζει το γράφο σε ομάδες όπως οι κλασικοί τρόποι ομαδοποίησης αλλά χτίζει υπο-δομές ξεκινώντας από μια πρωτεΐνη-«σπόρο». / - Βιοπληροφορική Λειτουργική υπο-δομή 572.865 1 Bioinformatics Gene expression data Protein interaction network Functional module
42	Méthodes bayésiennes pour l'analyse génétique / Bayesian methods for gene expression factor analysis Bazot, Cécile 27 September 2013 (has links) Ces dernières années, la génomique a connu un intérêt scientifique grandissant, notamment depuis la publication complète des cartes du génome humain au début des années 2000. A présent, les équipes médicales sont confrontées à un nouvel enjeu : l'exploitation des signaux délivrés par les puces ADN. Ces signaux, souvent de grande taille, permettent de connaître à un instant donné quel est le niveau d'expression des gênes dans un tissu considéré, sous des conditions particulières (phénotype, traitement, ...), pour un individu. Le but de cette recherche est d'identifier des séquences temporelles caractéristiques d'une pathologie, afin de détecter, voire de prévenir, une maladie chez un groupe de patients observés. Les solutions développées dans cette thèse consistent en la décomposition de ces signaux en facteurs élémentaires (ou signatures génétiques) selon un modèle bayésien de mélange linéaire, permettant une estimation conjointe de ces facteurs et de leur proportion dans chaque échantillon. L’utilisation de méthodes de Monte Carlo par chaînes de Markov sera tout particulièrement appropriée aux modèles bayésiens hiérarchiques proposés puisqu'elle permettra de surmonter les difficultés liées à leur complexité calculatoire. / In the past few years, genomics has received growing scientic interest, particularly since the map of the human genome was completed and published in early 2000's. Currently, medical teams are facing a new challenge: processing the signals issued by DNA microarrays. These signals, often of voluminous size, allow one to discover the level of a gene expression in a given tissue at any time, under specic conditions (phenotype, treatment, ...). The aim of this research is to identify characteristic temporal gene expression proles of host response to a pathogen, in order to detect or even prevent a disease in a group of observed patients. The solutions developed in this thesis consist of the decomposition of these signals into elementary factors (genetic signatures) following a Bayesian linear mixing model, allowing for joint estimation of these factors and their relative contributions to each sample. The use of Markov chain Monte Carlo methods is particularly suitable for the proposed hierarchical Bayesian models. Indeed they allow one to overcome the diculties related to their computational complexity. Analyse génétique Méthodes MCMC Inférence bayésienne Traitement du signal Données d’expression des gènes Factor analysis MCMC methods Bayesian inference Signal processing Gene expression data
43	Computational Cancer Research: Network-based analysis of cancer data disentangles clinically relevant alterations from molecular measurements Seifert, Michael 12 September 2022 (has links) Cancer is a very complex genetic disease driven by combinations of mutated genes. This complexity strongly complicates the identification of driver genes and puts enormous challenges to reveal how they influence cancerogenesis, prognosis or therapy response. Thousands of molecular profiles of the major human types of cancer have been measured over the last years. Apart from well-studied frequently mutated genes, still only little is known about the role of rarely mutated genes in cancer or the interplay of mutated genes in individual cancers. Gene expression and mutation profiles can be measured routinely, but computational methods for the identification of driver candidates along with the prediction of their potential impacts on downstream targets and clinically relevant characteristics only rarely exist. Instead of only focusing on frequently mutated genes, each cancer patient should better be analyzed by using the full information in its cancer-specific molecular profiles to improve the understanding of cancerogenesis and to more precisely predict prognosis and therapy response of individual patients. This requires novel computational methods for the integrative analysis of molecular cancer data. A promising way to realize this is to consider cancer as a disease of cellular networks. Therefore, I have developed a novel network-based approach for the integrative analysis of molecular cancer data over the last years. This approach directly learns gene regulatory networks form gene expression and copy number data and further enables to quantify impacts of altered genes on clinically relevant downstream targets using network propagation. This habilitation thesis summarizes the results of seven of my publications. All publications have a focus on the integrative analysis of molecular cancer data with an overarching connection to the newly developed network-based approach. In the first three publications, networks were learned to identify major regulators that distinguish characteristic gene expression signatures with applications to astrocytomas, oligodendrogliomas, and acute myeloid leukemia. Next, the central publication of this habilitation thesis, which combines network inference with network propagation, is introduced. The great value of this approach is demonstrated by quantifying potential direct and indirect impacts of rare and frequent gene copy number alterations on patient survival. Further, the publication of the corresponding user-friendly R package regNet is introduced. Finally, two additional publications that also strongly highlight the value of the developed network-based approach are presented with the aims to predict cancer gene candidates within the region of the 1p/19q co-deletion of oligodendrogliomas and to determine driver candidates associated with radioresistance and relapse of prostate cancer. All seven publications are embedded into a brief introduction that motivates the scientific background and the major objectives of this thesis. The background is briefly going from the hallmarks of cancer over the complexity of cancer genomes down to the importance of networks in cancer. This includes a short introduction of the mathematical concepts that underlie the developed network inference and network propagation algorithms. Further, I briefly motivate and summarize my studies before the original publications are presented. The habilitation thesis is completed with a general discussion of the major results with a specific focus on the utilized network-based data analysis strategies. Major biologically and clinically relevant findings of each publication are also briefly summarized. info:eu-repo/classification/ddc/610 ddc:610
44	Statistical Methods for Genetic Pathway-Based Data Analysis Cheng, Lulu 13 November 2013 (has links) The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set. / Ph. D. Adaptive GLASSO Gaussian Random Process Gene Expression Data GLASSO Marginal Likelihood Multi-Level Gaussian Graphical Model Pathway-Based Analysis Unknown Link Estimation Zero Inflated Poisson.
45	DifFUZZY : a novel clustering algorithm for systems biology Cominetti Allende, Ornella Cecilia January 2012 (has links) Current studies of the highly complex pathobiology and molecular signatures of human disease require the analysis of large sets of high-throughput data, from clinical to genetic expression experiments, containing a wide range of information types. A number of computational techniques are used to analyse such high-dimensional bioinformatics data. In this thesis we focus on the development of a novel soft clustering technique, DifFUZZY, a fuzzy clustering algorithm applicable to a larger class of problems than other soft clustering approaches. This method is better at handling datasets that contain clusters that are curved, elongated or are of different dispersion. We show how DifFUZZY outperforms a number of frequently used clustering algorithms using a number of examples of synthetic and real datasets. Furthermore, a quality measure based on the diffusion distance developed for DifFUZZY is presented, which is employed to automate the choice of its main parameter. We later apply DifFUZZY and other techniques to data from a clinical study of children from The Gambia with different types of severe malaria. The first step was to identify the most informative features in the dataset which allowed us to separate the different groups of patients. This led to us reproducing the World Health Organisation classification for severe malaria syndromes and obtaining a reduced dataset for further analysis. In order to validate these features as relevant for malaria across the continent and not only in The Gambia, we used a larger dataset for children from different sites in Sub-Saharan Africa. With the use of a novel network visualisation algorithm, we identified pathobiological clusters from which we made and subsequently verified clinical hypotheses. We finish by presenting conclusions and future directions, including image segmentation and clustering time-series data. We also suggest how we could bridge data modelling with bioinformatics by embedding microarray data into cell models. Towards this end we take as a case study a multiscale model of the intestinal crypt using a cell-vertex model. 518.1
46	Variational Approximations and Other Topics in Mixture Models Dang, Sanjeena 24 August 2012 (has links) Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction almost fifty years ago. Families of mixture models are said to arise when the component parameters, usually the component covariance matrices, are decomposed and a number of constraints are imposed. Within the family setting, it is necessary to choose the member of the family --- i.e., the appropriate covariance structure --- in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for this model selection process, and the expectation-maximization (EM) algorithm has been predominantly used for parameter estimation. We deviate from the EM-BIC rubric, using variational Bayes approximations for parameter estimation and the deviance information criterion (DIC) for model selection. The variational Bayes approach alleviates some of the computational complexities associated with the EM algorithm. We use this approach on the most famous family of Gaussian mixture models known as Gaussian parsimonious clustering models (GPCM). These models have an eigen-decomposed covariance structure. Cluster-weighted modelling (CWM) is another flexible statistical framework for modelling local relationships in heterogeneous populations on the basis of weighted combinations of local models. In particular, we extend cluster-weighted models to include an underlying latent factor structure of the independent variable, resulting in a novel family of models known as parsimonious cluster-weighted factor analyzers. The EM-BIC rubric is utilized for parameter estimation and model selection. Some work on a mixture of multivariate t-distributions is also presented, with a linear model for the mean and a modified Cholesky-decomposed covariance structure leading to a novel family of mixture models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters are estimated using the EM algorithm and another approach to model selection other than the BIC is also considered. / NSERC PGS-D High-dimensional data Variational Bayes Approximations Mixture Models EM Algorithm Factor Analyzers Longitudinal Data Gene Expression Data Cluster-Weighted Models Classification Clustering Model-based clustering Family of Mixture Models Model-based Classification Cluster-Weighted Factor Analyzers
47	Seleção de características a partir da integração de dados por meio de análise de variação de número de cópias (CNV) para associação genótipo-fenótipo de doenças complexas Meneguin, Christian Reis January 2018 (has links) Orientador: Prof. Dr. David Corrêa Martins Júnior / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, Santo André, 2018. / As pesquisas em biologia sistêmica caracterizam-se pela interdisciplinaridade, a compreensão com visão ampla sobre as interações ocorridas internamente em organismos biológicos, hereditariedade e a influência de fatores ambientais. Neste cenário, é constituída uma rede complexa de interações na qual seus componentes são de diferentes tipos, como as variações do número de cópias (Copy Number Variation - CNVs), genes, entre outros. As doenças complexas que ocorrem neste contexto normalmente são consequências de perturbações intracelulares e intercelulares em tecidos e órgãos, sendo desenvolvidas de forma multifatorial, ou seja, a causa e o desenvolvimento dessas doenças são fruto de diversos fatores genéticos e ambientais. Nos últimos anos, tem sido produzido um volume bastante elevado de dados biológicos gerados por técnicas de sequenciamento de alto desempenho, requerendo pesquisas que envolvam para uma análise integrada desses dados. As variações do número de cópias (Copy Number Variation - CNVs), ou seja, a variação no número de repetições de subsequências de DNA entre indivíduos, se mostram úteis visto que estão relacionadas com outros tipos de dados como genes e dados de expressão gênica (abundâncias de mRNAs transcritos pelos genes em diferentes contextos). Devido a natureza heterogênea e a imensa quantidade de dados, a análise integrativa é um desafio computacional para o qual abordagens vêm sendo propostas. Neste sentido, nesta dissertação foi proposto um método que realiza a integração de dados (CNVs, dados de expressão gênica, haploinsuficiência, imprint, entre outros) por meio de um processo que permite identificar trechos comuns de CNVs entre amostras de diferentes indivíduos, sejam estas amostras de caso ou de controle e que possuem informações obtidas a partir das integrações feitas. Com este processo, o método aqui proposto diferencia-se dos métodos que realizam integração de dados por meio da análise de sobreposição dos dados biológicos, mas não geram novos dados contendo intervalos de CNVs existentes entre as amostras. O método proposto foi analisado com base no estudo de caso do autismo (Transtornos do Espectro Autista - TEA). O autismo, além de ser considerado uma doença complexa, possui algumas particularidades que dificultam o seu estudo quando comparado a outros tipos de doenças complexas como o câncer, por exemplo. Foram realizados dois experimentos que envolveram dados dos CNVs de indivíduos com TEA (caso) e indivíduos sem este transtorno (controle). Também foi feito um experimento utilizando amostras de CNVs de TEA e amostras de CNVs relacionados a outras doenças do neurodesenvolvimento. Os experimentos envolveram a integração dos tipos de dados propostos. Foi possível identificar trechos de CNVs que estão presentes somente em amostras associadas aos casos e não em controles, ou cenários de trechos de CNVs presentes em amostras de TEA e ausentes nas amostras de outras doenças do neurodesenvolvimento, e vice-versa. Os resultados também refletiram a tendência de indivíduos do gênero masculino serem mais afetados por TEA em relação ao feminino. Foi possível também identificar genes associados e informações como o biotipo e se estão presentes em dados de haploinsuficiência, imprint ou ainda dados de expressão agrupados em regiões e períodos. Finalmente, análises de enriquecimento das listas de genes dos CNVs resultantes do método apontam para diversas vias relacionadas com o TEA, tais como as vias de sinalização do receptor toll-like dependente de TRIF, do ácido gama-aminobutírico (GABA), de transmissão sináptica e secreção neurotransmissora, de recepção da insulina, de percepção sensorial olfativa, e de adesão celular independente de cálcio. / Researches in systems biology are characterized by interdisciplinarity, wide-ranging understanding of interactions within biological organisms, heredity, and the influence of environmental factors. In this scenario, a complex network of interactions is constituted of different types of components, such as CNVs (Copy Number Variations), genes, and others. Complex diseases that occur in this context are usually consequences of intracellular, intercellular, tissue, organ, and multifactorial disorders, i.e., the cause and development of these diseases are the result of various genetic and environmental factors. In recent years, a very large volume of biological data generated by high performance sequencing techniques has been produced, requiring researches involving an integrated analysis of these data. CNVs, i.e., the variation in the number of DNA subsequences between individuals, are useful because they are related to other types of data such as genes and gene expression data (abundances of mRNAs transcribed by genes in different contexts). Due to the heterogeneous nature and the immense amount of data, integrative analysis is a computational challenge for which approaches have been proposed. In this sense, in this dissertation a method was proposed that performs a data integration (CNVs, gene expression data, haploinsufficiency, imprint, among others) through a process that allows to identify common portions of CNVs between samples of different individuals, being these case or control samples and that have information obtained from the integration performed. In this context, the method proposed here differs from the methods that carry out data integration through the analysis of the overlay of the biological data, but does not generate new data containing ranges of CNVs existing between the samples. The proposed method was analyzed on the basis of the case study of Autistic Spectrum Disorder (ASD). Besides being considered a complex disease, TEA has some peculiarities that hinder its study when compared to other types of complex diseases such as cancer, for example. As a case study, two experiments were carried out that involved data from the CNVs of individuals with ASD (case) and individuals without this disorder (control). An experiment was also done using samples of ASD CNVs and CNVs samples related to other neurodevelopmental diseases. The experiments involved the integration of the proposed data types. Among the results, the method identified excerpts of CNVs that are present only in samples associated with the cases and not in controls, or scenarios of CNVs snippets present in TEA samples and not present in other neurodevelopmental disease samples, and vice-versa. The results also reflected the tendency for males to be more affected by TEA compared to the females. In the excerpts of CNVs in certain results, it was possible to identify associated gene informations such as the biotype and whether they are present in Haploinsufficiency, imprint or even expression data grouped in regions and periods. Finally, enrichment analyses involving lists of genes from the resulting CNVs point to several signaling pathways related to TEA, such as TRIF-dependent toll-like receptor signaling, gamma aminobutyric acid (GABA), synaptic transmission and neurotransmitter secretion, insulin reception, olfactory sensorial perception, and calcium independent cell-cell adhesion. VARIAÇÃO NO NÚMERO DE CÓPIAS DADOS DE EXPRESSÃO GÊNICA DOENÇAS COMPLEXAS INTEGRAÇÃO DE DADOS MINERAÇÃO DE DADOS COPY NUMBER VARIATION GENE EXPRESSION DATA COMPLEX DISEASES DATA INTEGRATION DATA MINING
48	Bioinformatic analyses for T helper cell subtypes discrimination and gene regulatory network reconstruction Kröger, Stefan 02 August 2017 (has links) Die Etablierung von Hochdurchsatz-Technologien zur Durchführung von Genexpressionsmessungen führte in den letzten 20 Jahren zu einer stetig wachsende Menge an verfügbaren Daten. Sie ermöglichen durch Kombination einzelner Experimente neue Vergleichsstudien zu kombinieren oder Experimente aus verschiedenen Studien zu großen Datensätzen zu vereinen. Dieses Vorgehen wird als Meta-Analyse bezeichnet und in dieser Arbeit verwendet, um einen großen Genexpressionsdatensatz aus öffentlich zugänglichen T-Zell Experimenten zu erstellen. T-Zellen sind Immunzellen, die eine Vielzahl von unterschiedlichen Funktionen des Immunsystems inititiieren und steuern. Sie können in verschiedene Subtypen mit unterschiedlichen Funktionen differenzieren. Der mittels Meta-Analyse erstellte Datensatz beinhaltet nur Experimente zu einem T-Zell-Subtyp, den regulatorischen T-Zellen (Treg) bzw. der beiden Untergruppen, natürliche Treg (nTreg) und induzierte Treg (iTreg) Zellen. Eine bisher unbeantwortete Frage lautet, welche subtyp-spezifischen gen-regulatorische Mechanismen die T-Zell Differenzierung steuern. Dazu werden in dieser Arbeit zwei spezifische Herausforderungen der Treg Forschung behandelt: (i) die Identifikation von Zelloberflächenmarkern zur Unterscheidung und Charakterisierung der Subtypen, sowie (ii) die Rekonstruktion von Treg-Zell-spezifischen gen-regulatorischen Netzwerken (GRN), die die Differenzierungsmechanismen beschreiben. Die implementierte Meta-Analyse kombiniert mehr als 150 Microarray-Experimente aus über 30 Studien in einem Datensatz. Dieser wird benutzt, um mittels Machine Learning Zell-spezifische Oberflächenmarker an Hand ihres Expressionsprofils zu identifizieren. Mit der in dieser Arbeit entwickelten Methode wurden 41 Genen extrahiert, von denen sechs Oberflächenmarker sind. Zusätzliche Validierungsexperimente zeigten, dass diese sechs Gene die Experimenten beider T-Zell Subtypen sicher unterscheiden können. Zur Rekonstruktion von GRNs vergleichen wir unter Verwendung des erstellten Datensatzes 11 verschiedene Algorithmen und evaluieren die Ergebnisse mit Informationen aus Interaktionsdatenbanken. Die Evaluierung zeigt, dass die derzeit verfügbaren Methoden nicht in der Lage sind den Wissensstand Treg-spezifischer, regulatorsicher Mechanismen zu erweitern. Abschließend präsentieren wir eine Datenintegrationstrategie zur Rekonstruktion von GRN am Beispiel von Th2 Zellen. Aus Hochdurchsatzexperimenten wird ein Th2-spezifisches GRN bestehend aus 100 Genen rekonstruiert. Während 89 dieser Gene im Kontext der Th2-Zelldifferenzierung bekannt sind, wurden 11 neue Kandidatengene ohne bisherige Assoziation zur Th2-Differenzierung ermittelt. Die Ergebnisse zeigen, dass Datenintegration prinzipiell die GRN Rekonstruktion ermöglicht. Mit der Verfügbarkeit von mehr Daten mit besserer Qualität ist zu erwarten, dass Methoden zur Rekonstruktion maßgeblich zum besseren Verstehen der zellulären Differenzierung im Immunsystem und darüber hinaus beitragen können und so letztlich die Ursachenforschung von Dysfunktionen und Krankheiten des Immunsystems ermöglichen werden. / Within the last two decades high-throughput gene expression screening technologies have led to a rapid accumulation of experimental data. The amounts of information available have enabled researchers to contrast and combine multiple experiments by synthesis, one of such approaches is called meta-analysis. In this thesis, we build a large gene expression data set based on publicly available studies for further research on T cell subtype discrimination and the reconstruction of T cell specific gene regulatory events. T cells are immune cells which have the ability to differentiate into subtypes with distinct functions, initiating and contributing to a variety of immune processes. To date, an unsolved problem in understanding the immune system is how T cells obtain a specific subtype differentiation program, which relates to subtype-specific gene regulatory mechanisms. We present an assembled expression data set which describes a specific T cell subset, regulatory T (Treg) cells, which can be further categorized into natural Treg (nTreg) and induced Treg (iTreg) cells. In our analysis we have addressed specific challenges in regulatory T cell research: (i) discriminating between different Treg cell subtypes for characterization and functional analysis, and (ii) reconstructing T cell subtype specific gene regulatory mechanisms which determine the differences in subtype-specific roles for the immune system. Our meta-analysis strategy combines more than one hundred microarray experiments. This data set is applied to a machine learning based strategy of extracting surface protein markers to enable Treg cell subtype discrimination. We identified a set of 41 genes which distinguish between nTregs and iTregs based on gene expression profile only. Evaluation of six of these genes confirmed their discriminative power which indicates that our approach is suitable to extract candidates for robust discrimination between experiment classes. Next, we identify gene regulatory interactions using existing reconstruction algorithms aiming to extend the number of known gene-gene interactions for Treg cells. We applied eleven GRN reconstruction tools based on expression data only and compared their performance. Taken together, our results suggest that the available methods are not yet sufficient to extend the current knowledge by inferring so far unreported Treg specific interactions. Finally, we present an approach of integrating multiple data sets based on different high-throughput technologies to reconstruct a subtype-specific GRN. We constructed a Th2 cell specific gene regulatory network of 100 genes. While 89 of these are known to be related to Th2 cell differentiation, we were able to attribute 11 new candidate genes with a function in Th2 cell differentiation. We show that our approach to data integration does, in principle, allow for the reconstruction of a complex network. Future availability of more and more consistent data may enable the use of the concept of GRN reconstruction to improve understanding causes and mechanisms of cellular differentiation in the immune system and beyond and, ultimately, their dysfunctions and diseases. T-Zelle Microarray Genexpressionsdaten Feature Selection Datenintegration gen-regulatorische Interaktionen Netzwerkrekonstruktion Meta-Analyse T cell gene expression data meta-analysis gene regulatory network reconstruction data integration microarray analysis feature selection 004 Informatik WC 7700 ddc:004
49	應用存活分析在微陣列資料的基因表面定型之探討 / Gene Expression Profiling with Survival Analysis on Microarray Data 張仲凱, Chang,Chunf-Kai Unknown Date (has links) 如何藉由DNA微陣列資料跟存活資料的資訊來找出基因表現定型一直是個重要的議題。這些研究的主要目標是從大量的基因中找出那些真正跟存活時間或其它重要的臨床結果有顯著關係的小部分。Threshold Gradient Directed Regularization (TGDR)是ㄧ種已經被應用在高維度迴歸問題中能同時處理變數選取以及模型配適的演算法。然而，TGDR採用一種梯度投影型態的演算法使得收斂速率緩慢。在本篇論文中，我們建議新的包含Newton-Raphson求解演算法類型的改良版TGDR方法。我們建議的方法有類似TGDR的特性但卻有比較快的收斂速率。文中並利用一筆附有設限存活時間的真實微陣列癌症資料來做示範。本篇論文的第二部份是關於適用於區間設限存活資料的重複抽樣Peto-Peto檢定。這個重複抽樣Peto-Peto檢定能夠評估存活函數估計方法的檢定力，例如Turnbull的估計方法以及Kaplan-Meier的估計方法。這個檢定方法顯示出在區間設限資料時Kaplan-Meier的估計方法的檢定力要比Turnbull的估計方法的檢定力來得低。這個檢定方法將以模擬的區間設限資料以及一筆真實關於乳癌研究的區間設限資料來說明。 / Analyzing censored survival data with high-dimensional covariates arising from the microarray data has been an important issue. The main goal is to find genes that have pivotal influence with patient's survival time or other important clinical outcomes. Threshold Gradient Directed Regularization (TGDR) method has been used for simultaneous variable selection and model building in high-dimensional regression problems. However, the TGDR method adopts a gradient-projection type of method and would have slow convergence rate. In this thesis, we proposed Modified TGDR algorithms which incorporate Newton-Raphson type of search algorithm. Our proposed approaches have the similar characteristics with TGDR but faster convergence rates. A real cancer microarray data with censored survival times is used for demonstration. The second part of this thesis is about a proposed resampling based Peto-Peto test for survival functions on interval censored data. The proposed resampling based Peto-Peto test can evaluate the power of survival function estimation methods, such as Turnbull’s Procedure and Kaplan-Meier estimate. The test shows that the power based on Kaplan-Meier estimate is lower than that based on Turnbull’s estimation on interval censored data. This proposed test is demonstrated on simulated data and a real interval censored data from a breast cancer study. 基因表現資料設限存活資料 Cox比例風險模型重複抽樣Peto-Peto檢定 Gene expression data Censored survival data Cox proportional hazards model Rasmpling based Peto-Peto test
50	A signal transduction score flow algorithm for cyclic cellular pathway analysis, which combines transcriptome and ChIP-seq data Isik, Zerrin, Ersahin, Tulin, Atalay, Volkan, Aykanat, Cevdet, Cetin-Atalay, Rengul 08 April 2014 (has links) (PDF) Determination of cell signalling behaviour is crucial for understanding the physiological response to a specific stimulus or drug treatment. Current approaches for large-scale data analysis do not effectively incorporate critical topological information provided by the signalling network. We herein describe a novel model- and data-driven hybrid approach, or signal transduction score flow algorithm, which allows quantitative visualization of cyclic cell signalling pathways that lead to ultimate cell responses such as survival, migration or death. This score flow algorithm translates signalling pathways as a directed graph and maps experimental data, including negative and positive feedbacks, onto gene nodes as scores, which then computationally traverse the signalling pathway until a pre-defined biological target response is attained. Initially, experimental data-driven enrichment scores of the genes were computed in a pathway, then a heuristic approach was applied using the gene score partition as a solution for protein node stoichiometry during dynamic scoring of the pathway of interest. Incorporation of a score partition during the signal flow and cyclic feedback loops in the signalling pathway significantly improves the usefulness of this model, as compared to other approaches. Evaluation of the score flow algorithm using both transcriptome and ChIP-seq data-generated signalling pathways showed good correlation with expected cellular behaviour on both KEGG and manually generated pathways. Implementation of the algorithm as a Cytoscape plug-in allows interactive visualization and analysis of KEGG pathways as well as user-generated and curated Cytoscape pathways. Moreover, the algorithm accurately predicts gene-level and global impacts of single or multiple in silico gene knockouts. / Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich. Genexpression biologische Signalwege computergestützte Datenanalyse Brustkrebs molekulare Netzwerke Kyoto Encyclopedia of Genes and Genomes KEGG ChIP-Seq gene expression data biological pathways breast cancer collaborative construction molecular networks binding sites tool deregulation environment ontology ddc:540 ddc:610 ddc:570 rvk:VA 1120 rvk:XA 10000 rvk:WA 15000

Search results