41 |
Conception d'heuristiques d'optimisation pour les problèmes de grande dimension : application à l'analyse de données de puces à ADN / Heuristics implementation for high-dimensional problem optimization : application in microarray data analysisGardeux, Vincent 30 November 2011 (has links)
Cette thèse expose la problématique récente concernant la résolution de problèmes de grande dimension. Nous présentons les méthodes permettant de les résoudre ainsi que leurs applications, notamment pour la sélection de variables dans le domaine de la fouille de données. Dans la première partie de cette thèse, nous exposons les enjeux de la résolution de problèmes de grande dimension. Nous nous intéressons principalement aux méthodes de recherche linéaire, que nous jugeons particulièrement adaptées pour la résolution de tels problèmes. Nous présentons ensuite les méthodes que nous avons développées, basées sur ce principe : CUS, EUS et EM323. Nous soulignons en particulier la très grande vitesse de convergence de CUS et EUS, ainsi que leur simplicité de mise en oeuvre. La méthode EM323 est issue d'une hybridation entre la méthode EUS et un algorithme d'optimisation unidimensionnel développé par F. Glover : l'algorithme 3-2-3. Nous montrons que ce dernier algorithme obtient des résultats d'une plus grande précision, notamment pour les problèmes non séparables, qui sont le point faible des méthodes issues de la recherche linéaire. Dans une deuxième partie, nous nous intéressons aux problèmes de fouille de données, et plus particulièrement l'analyse de données de puces à ADN. Le but est de classer ces données et de prédire le comportement de nouveaux exemples. Dans un premier temps, une collaboration avec l'hôpital Tenon nous permet d'analyser des données privées concernant le cancer du sein. Nous développons alors une méthode exacte, nommée delta-test, enrichie par la suite d'une méthode permettant la sélection automatique du nombre de variables. Dans un deuxième temps, nous développons une méthode heuristique de sélection de variables, nommée ABEUS, basée sur l'optimisation des performances du classifieur DLDA. Les résultats obtenus sur des données publiques montrent que nos méthodes permettent de sélectionner des sous-ensembles de variables de taille très faible,ce qui est un critère important permettant d'éviter le sur-apprentissage / This PhD thesis explains the recent issue concerning the resolution of high-dimensional problems. We present methods designed to solve them, and their applications for feature selection problems, in the data mining field. In the first part of this thesis, we introduce the stakes of solving high-dimensional problems. We mainly investigate line search methods, because we consider them to be particularly suitable for solving such problems. Then, we present the methods we developed, based on this principle : CUS, EUS and EM323. We emphasize, in particular, the very high convergence speed of CUS and EUS, and their simplicity of implementation. The EM323 method is based on an hybridization between EUS and a one-dimensional optimization algorithm developed by F. Glover : the 3-2-3 algorithm. We show that the results of EM323 are more accurate, especially for non-separable problems, which are the weakness of line search based methods. In the second part, we focus on data mining problems, and especially those concerning microarray data analysis. The objectives are to classify data and to predict the behavior of new samples. A collaboration with the Tenon Hospital in Paris allows us to analyze their private breast cancer data. To this end, we develop an exact method, called delta-test, enhanced by a method designed to automatically select the optimal number of variables. In a second time, we develop an heuristic, named ABEUS, based on the optimization of the DLDA classifier performances. The results obtained from publicly available data show that our methods manage to select very small subsets of variables, which is an important criterion to avoid overfitting
|
42 |
Expressão gênica diferencial durante a esporulação de Blastocladiella emersonii e estudo da sinalização por GMP cíclico / Differential gene expression during Blastocladiella emersonii sporulation and analysis of the cyclic GMP signaling pathwayVieira, André Luiz Gomes 24 April 2009 (has links)
Neste trabalho realizamos a análise das variações na expressão gênica global durante a fase de esporulação do fungo aquático Blastocladiella emersonii utilizando a tecnologia dos microarranjos de cDNA em lâminas contendo 3.773 genes distintos. Ao todo 615 genes foram classificados como induzidos enquanto 645 foram classificados como reprimidos ao longo da esporulação. As categorias funcionais mais representadas entre os genes induzidos foram: microtúbulo e citoesqueleto, transmissão de sinal, atividade de ligação ao íon Ca2+, proteólise (apenas no início da esporulação) e biogênese e organização do cromossomo (apenas no final da esporulação). Dentre os genes reprimidos, as categorias funcionais mais representadas foram: biossíntese de proteína, transporte de carboidratos e metabolismo energético. A comparação dos dados de expressão gênica da esporulação com aqueles obtidos recentemente em nosso laboratório para a germinação mostrou um grande número de genes regulados inversamente ao longo das duas fases de diferenciação do ciclo de vida de B. emersonii. Muitos genes induzidos na esporulação são reprimidos na germinação e vice versa. Analisamos também o efeito de glicose e triptofano sobre a expressão gênica durante a formação dos zoósporos, tendo em vista que tais nutrientes são capazes de inibir a esporulação de B. emersonii. Nossos resultados mostraram que na presença de glicose (1%) genes envolvidos na composição e atividade do citoesqueleto foram superexpressos, enquanto na presença do aminoácido triptofano houve um aumento na expressão de genes envolvidos no processo de enovelamento de proteínas e proteólise, e na resposta ao estresse oxidativo. Além disso, genes envolvidos no processo de esporulação propriamente dito foram reprimidos durante o tratamento com triptofano. Investigamos também a via de sinalização por GMP cíclico (cGMP), cujos níveis aumentam consideravelmente durante a esporulação de B. emersonii. Iniciamos o estudo com uma busca no banco de ESTs de B. emersonii (http://blasto.iq.usp.br) por seqüências que codificassem enzimas envolvidas na síntese e degradação de cGMP. Foram encontradas três ESTs que codificam domínios catalíticos que parecem pertencer a três diferentes guanilato ciclases e uma EST codificando uma fosfodiesterase com alta similaridade com fosfodiesterases que possuem alta afinidade por cGMP. Experimentos de microarranjos de cDNA validados por RT-PCR quantitativo em tempo real mostraram que os quatro transcritos são expressos durante esporulação, com picos de indução durante a fase tardia da esporulação, momento em que ocorre a biogênese dos zoósporos. Além disso, dados obtidos a partir de experimentos in vivo e in vitro utilizando inibidores das enzimas guanilato ciclase e óxido nítrico sintase, sugeriram a participação do íon Ca2+ e do radical livre óxido nítrico (•NO) na atividade de guanilato ciclase, em uma via do tipo Ca2+-•NO-cGMP. / In the present work, we analyzed global gene expression changes during the sporulation phase of the aquatic fungus Blastocladiella emersonii using cDNA microarray technology with chips containing 3773 distinct genes. A total of 615 genes were upregulated and 645 were down-regulated along the sporulation of the fungus. The overrepresented functional categories among the induced genes were: microtubule and cytoskeleton, signal transduction, Ca2+ binding activity, proteolysis (only at the beginning of sporulation), and chromosome biogenesis and organization (only at the end of sporulation). Among the down-regulated genes, the over-represented functional categories were: protein biosynthesis, carbohydrate transport, and energetic metabolism. Sporulation gene expression data were compared with those obtained recently in our laboratory for the germination phase, showing that a great number of genes are inversely regulated along the two differentiation stages of B. emersonii life cycle. We also analyzed the effects of glucose and tryptophan on gene expression during biogenesis of the zoospores, as such nutrients are able to inhibit B. emersonii sporulation. Our results showed that in the presence of glucose (1%) genes related to activity and composition of cytoskeleton were over-expressed, while in the presence of tryptophan genes involved in protein folding, proteolysis and oxidative stress were induced. In addition, genes involved in the sporulation process per se were downregulated by tryptophan treatment. We also investigated the cyclic GMP signaling pathway, as the levels of this cyclic nucleotide increase considerably during B. emersonii sporulation. Firstly, we searched for sequences encoding enzymes involved in cGMP synthesis and degradation using the B. emersonii EST databank (http://blasto.iq.usp.br). Three sequences were found encoding distinct guanylate cyclase catalytic domains, and one showed high similarity with phosphodiesterases that exhibit high affinity for cGMP. Microarray experiments, validated by real time quantitative RT-PCR, showed that the four transcripts are induced during sporulation, reaching maximum levels at the late stages of sporulation, when zoospore biogenesis occurs. In addition, data obtained from in vivo and in vitro experiments using inhibitors for the enzymes guanylate cyclase and nitric oxide synthase indicated the involvement of the ion Ca2+ and the free radical nitric oxide (•NO) in guanylate cyclase activity, suggesting the existence of a Ca2+-• NO-cGMP signaling pathway.
|
43 |
Identificação de perfis de expressão de RNAs codificadores e não codificadores de proteína como preditores de recorrência de câncer de próstata / Identification of protein-coding and non-coding RNA expression profiles as prognostic marker of prostate cancer biochemical recurrenceMoreira, Yuri José de Camargo Barros 27 August 2010 (has links)
O câncer de próstata é o quinto tipo mais comum de câncer no mundo e o mais comum em homens. Fatores clínicos e anatomopatológicos atualmente usados na clínica não são capazes de distinguir entre a doença indolente e a agressiva. Existe uma grande necessidade de novos marcadores de prognóstico, a fim de melhorar o gerenciamento clínico de pacientes de câncer de próstata. Além das anormalidades em genes codificadores de proteínas, alterações em RNAs não codificadores (ncRNAs) contribuem para a patogênese do câncer e, portanto, representam outra fonte potencial de biomarcadores de câncer de próstata. Entretanto, até o momento, poucos estudos de perfis de expressão de ncRNAs foram publicados. Este projeto teve como principal objetivo identificar perfis de expressão de genes codificadores e não codificadores de proteína correlacionados com recorrência de tumor de próstata, a fim de gerar um perfil prognóstico com potencial uso como biomarcadores e elucidar o possível papel de ncRNAs no desenvolvimento do câncer. Para isso, foram analisados os perfis de expressão de genes codificadores e não codificadores de proteína de um conjunto de 42 amostras de tecido tumoral de câncer de próstata de pacientes de amostras de pacientes submetidos à prostatectomia radical, com longo acompanhamento clínico (cinco anos) e conhecida evolução da doença Nós utilizamos microarranjos por nós desenhados e fabricados pela Agilent sob encomenda, interrogando aproximadamente 18.709 transcritos não codificadores longos (>500 nt), sem evidência de splicing, que mapeiam em regiões intrônicas dentro de 5.660 loci genômicos. Os dados de expressão foram extraídos de cada arranjo, normalizados entre todas as 42 amostras de pacientes. Usando uma estratégia de múltipla amostragem, foi identificado um perfil de expressão de mau prognóstico, contendo 51 transcritos intrônicos não codificadores de proteína. O perfil prognóstico de ncRNAs foi aplicado a um conjunto teste independente de 22 pacientes, classificando corretamente 82% das amostras. Uma análise de Kaplan-Meier dos pacientes do conjunto teste indicou que as curvas de sobrevida dos grupos de alto e baixo risco foram significativamente distintas (Log-rank test p = 0,0009; Hazard ratio = 23,4, 95% CI = 3,62 a 151,2), confirmando assim que este classificador é útil para identificar pacientes com alto risco de recorrência. Além disso, estas descobertas indicam um potencial papel destes RNAs intrônicos não codificadores na progressão do tumor de próstata e apontam para os RNAs intrônicos como potenciais novos marcadores de câncer / Prostate cancer is the fifth most common type of cancer in the world, and the most common in men. Clinical and anatomo-pathological factors currently used in clinic are not able to distinguish between the indolent and the aggressive disease. There is a major need of new prognostic makers in order to improve the clinical management of prostate cancer patients. Apart from abnormalities in protein-coding genes, changes in non-coding RNAs (ncRNAs) contribute to the pathogenesis of cancer and thus represent another potential source of prostate cancer biomarkers. However, few studies of expression profiles of ncRNAs have been published. This project aimed to identify expression profiles of protein-coding and non-coding genes correlated to prostate cancer biochemical recurrence. For this, we analyzed the expression profile of 42 prostate cancer samples from patients undergoing radical prostatectomy, with long follow-up (five years), and know disease outcome. We used a custom microarray designed by us and printed by Agilent, that probes 18,709 long (>500 nt) ncRNAs mapping to intronic regions within 5,660 genomic loci. The expression data were extracted from each array and normalized across all 42 samples. Using a multiple random sampling validation strategy, we identified an expression profile of poor prognosis, comprising 51 ncRNAs. The prognostic profile of ncRNAs was applied to an independent test set of 22 patients, correctly classifying 82% of the samples. A Kaplan-Meier analysis of the test set of patients indicated that the survival curves of high and low risk groups were significantly different (Log-rank test p = 0.0009, Hazard ratio = 23.4, 95% CI = 3.62 to 151.2) thus confirming that this classifier is useful for identifying patients at high risk of recurrence. Furthermore, these findings indicate a potential role of these intronic non-coding RNAs in the progression of prostate tumors and points to the intronic ncRNAs as potential new markers of cancer.
|
44 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
45 |
Gene Expression in the Brains of Two Lines of Chicken Divergently Selected for High and Low Body WeightKa, Sojeong January 2009 (has links)
Artificial divergent selection of chickens for high and low body weight at 8 weeks of age has produced two lines: the high (HWS) and low (LWS) body weight chicken lines. In addition to the difference in body weight, the lines show extreme differences in feeding behaviour and body composition. The aim of this study was to uncover the genetic and molecular factors that contribute to and determine these differences, especially regarding body energy regulation and appetite. In papers I and II, genome-wide gene expression in a brain sample containing hypothalamus and in dissected hypothalamus was analysed using DNA microarray and qRT-PCR. We found that levels of differential expression were generally moderate, which was consistent with the idea that polygenic factors were involved in the establishment of the chicken lines. Genes associated with neural plasticity, lipid metabolism and body energy regulation were differentially expressed. This result indicated that the neural systems regulating feeding behaviour and body weight were altered in the chicken lines. However, genes that were involved in the central melanocortin system were not systematically differentially expressed. Interestingly, the biggest differences in expression between the lines found in endogenous retrovirus sequences of the ALV subgroup E. Thus, in paper III, we characterized the number of integrations, the expression of ALVE retroviral elements and their effects on body weight. A significant correlation between low body weight and high ALVE expression was observed in female F9 birds from an HWS x LWS advanced intercross line. This implied that ev-loci contributing to increased ALVE expression levels were genetically linked to loci influencing the low body weight of the pullets. In paper IV, the carnitine palmitoyltransferase-1b gene (CPT1B), which was highly differentially expressed in the hypothalami, was investigated. We mapped chicken CPT1B to the distal tip of chromosome 1p. The levels of CPT1B mRNA in the HWS line were higher in the hypothalamus and lower in muscle than in the LWS line. This pattern of differential expression indicates that this gene could contribute to the remarkable phenotypic differences between HWS and LWS chickens. However, comparison with quantitative trait loci data showed that the expression of CPT1B is a trans effect, rather than a direct causative locus. In conclusion, the data suggested that the long-term selection for body weight resulted in differential gene expression in the brains of the selected chicken lines. These results may have relevance for the poultry industry and will also contribute to increasing knowledge about human diseases such as obesity and anorexia.
|
46 |
Computational methods for analysis and modeling of time-course gene expression dataWu, Fangxiang 31 August 2004
Genes encode proteins, some of which in turn regulate other genes. Such interactions make up gene regulatory relationships or (dynamic) gene regulatory networks. With advances in the measurement technology for gene expression and in genome sequencing, it has become possible to measure the expression level of thousands of genes simultaneously in a cell at a series of time points over a specific biological process. Such time-course gene expression data may provide a snapshot of most (if not all) of the interesting genes and may lead to a better understanding gene regulatory relationships and networks. However, inferring either gene regulatory relationships or networks puts a high demand on powerful computational methods that are capable of sufficiently mining the large quantities of time-course gene expression data, while reducing the complexity of the data to make them comprehensible. This dissertation presents several computational methods for inferring gene regulatory relationships and gene regulatory networks from time-course gene expression. These methods are the result of the authors doctoral study.
Cluster analysis plays an important role for inferring gene regulatory relationships, for example, uncovering new regulons (sets of co-regulated genes) and their putative cis-regulatory elements. Two dynamic model-based clustering methods, namely the Markov chain model (MCM)-based clustering and the autoregressive model (ARM)-based clustering, are developed for time-course gene expression data. However, gene regulatory relationships based on cluster analysis are static and thus do not describe the dynamic evolution of gene expression over an observation period. The gene regulatory network is believed to be a time-varying system. Consequently, a state-space model for dynamic gene regulatory networks from time-course gene expression data is developed. To account for the complex time-delayed relationships in gene regulatory networks, the state space model is extended to be the one with time delays. Finally, a method based on genetic algorithms is developed to infer the time-delayed relationships in gene regulatory networks. Validations of all these developed methods are based on the experimental data available from well-cited public databases.
|
47 |
Computational methods for analysis and modeling of time-course gene expression dataWu, Fangxiang 31 August 2004 (has links)
Genes encode proteins, some of which in turn regulate other genes. Such interactions make up gene regulatory relationships or (dynamic) gene regulatory networks. With advances in the measurement technology for gene expression and in genome sequencing, it has become possible to measure the expression level of thousands of genes simultaneously in a cell at a series of time points over a specific biological process. Such time-course gene expression data may provide a snapshot of most (if not all) of the interesting genes and may lead to a better understanding gene regulatory relationships and networks. However, inferring either gene regulatory relationships or networks puts a high demand on powerful computational methods that are capable of sufficiently mining the large quantities of time-course gene expression data, while reducing the complexity of the data to make them comprehensible. This dissertation presents several computational methods for inferring gene regulatory relationships and gene regulatory networks from time-course gene expression. These methods are the result of the authors doctoral study.
Cluster analysis plays an important role for inferring gene regulatory relationships, for example, uncovering new regulons (sets of co-regulated genes) and their putative cis-regulatory elements. Two dynamic model-based clustering methods, namely the Markov chain model (MCM)-based clustering and the autoregressive model (ARM)-based clustering, are developed for time-course gene expression data. However, gene regulatory relationships based on cluster analysis are static and thus do not describe the dynamic evolution of gene expression over an observation period. The gene regulatory network is believed to be a time-varying system. Consequently, a state-space model for dynamic gene regulatory networks from time-course gene expression data is developed. To account for the complex time-delayed relationships in gene regulatory networks, the state space model is extended to be the one with time delays. Finally, a method based on genetic algorithms is developed to infer the time-delayed relationships in gene regulatory networks. Validations of all these developed methods are based on the experimental data available from well-cited public databases.
|
48 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
49 |
High Resolution Genotyping of Chlamydia trachomatisChristerson, Linus January 2011 (has links)
Chlamydia trachomatis is an obligate intracellular bacterium of major human health concern, causing urogential chlamydia infections, lymphogranuloma venereum (LGV) and trachoma. Chlamydia is one of the most common sexually transmitted infections worldwide and can cause infertility. In the first four papers described herein we used a high resolution multilocus sequence typing (MLST) system to investigate the epidemiology of C. trachomatis, and showed that MLST is superior to conventional ompA genotyping with respect to resolution. In the fifth paper we simplified the methodology by developing and validating a multilocus typing (MLT) DNA microarray based on the MLST system. In more detail, MLST analysis of consecutive specimens from 2006 in Örebro County in Sweden, and comparison to specimens from 1999-2000, showed that the new variant C. trachomatis (nvCT) is monoclonal and likely has appeared in recent years. MLST analysis of LGV specimens from men who have sex with men (MSM) showed that the increase of LGV in Europe in the last decade indeed was a clonal outbreak, contrary to the USA where LGV might have been present all along. In the third paper, clinical symptoms could not be correlated with the MLST genotypes, suggesting, together with the combined results of all previous studies, that bacterial factors, if important, need to be understood in the context of host factors. MLST analysis of specimens from a high incidence C. trachomatis area in North Norway revealed interesting epidemiological details concerning unusual genetic variants, the nvCT and MSM, but found no significant difference in genetic diversity compared to two other geographic areas in Norway. Lastly, we developed a MLT array that provides high resolution while being rapid and cost-effective, which makes it an interesting alternative for C. trachomatis genotyping. In conclusion, the MLST system and the MLT array have proven to be useful tools and should now be applied in further investigations to improve our understanding of C. trachomatis epidemiology.
|
50 |
Expressão gênica diferencial durante a esporulação de Blastocladiella emersonii e estudo da sinalização por GMP cíclico / Differential gene expression during Blastocladiella emersonii sporulation and analysis of the cyclic GMP signaling pathwayAndré Luiz Gomes Vieira 24 April 2009 (has links)
Neste trabalho realizamos a análise das variações na expressão gênica global durante a fase de esporulação do fungo aquático Blastocladiella emersonii utilizando a tecnologia dos microarranjos de cDNA em lâminas contendo 3.773 genes distintos. Ao todo 615 genes foram classificados como induzidos enquanto 645 foram classificados como reprimidos ao longo da esporulação. As categorias funcionais mais representadas entre os genes induzidos foram: microtúbulo e citoesqueleto, transmissão de sinal, atividade de ligação ao íon Ca2+, proteólise (apenas no início da esporulação) e biogênese e organização do cromossomo (apenas no final da esporulação). Dentre os genes reprimidos, as categorias funcionais mais representadas foram: biossíntese de proteína, transporte de carboidratos e metabolismo energético. A comparação dos dados de expressão gênica da esporulação com aqueles obtidos recentemente em nosso laboratório para a germinação mostrou um grande número de genes regulados inversamente ao longo das duas fases de diferenciação do ciclo de vida de B. emersonii. Muitos genes induzidos na esporulação são reprimidos na germinação e vice versa. Analisamos também o efeito de glicose e triptofano sobre a expressão gênica durante a formação dos zoósporos, tendo em vista que tais nutrientes são capazes de inibir a esporulação de B. emersonii. Nossos resultados mostraram que na presença de glicose (1%) genes envolvidos na composição e atividade do citoesqueleto foram superexpressos, enquanto na presença do aminoácido triptofano houve um aumento na expressão de genes envolvidos no processo de enovelamento de proteínas e proteólise, e na resposta ao estresse oxidativo. Além disso, genes envolvidos no processo de esporulação propriamente dito foram reprimidos durante o tratamento com triptofano. Investigamos também a via de sinalização por GMP cíclico (cGMP), cujos níveis aumentam consideravelmente durante a esporulação de B. emersonii. Iniciamos o estudo com uma busca no banco de ESTs de B. emersonii (http://blasto.iq.usp.br) por seqüências que codificassem enzimas envolvidas na síntese e degradação de cGMP. Foram encontradas três ESTs que codificam domínios catalíticos que parecem pertencer a três diferentes guanilato ciclases e uma EST codificando uma fosfodiesterase com alta similaridade com fosfodiesterases que possuem alta afinidade por cGMP. Experimentos de microarranjos de cDNA validados por RT-PCR quantitativo em tempo real mostraram que os quatro transcritos são expressos durante esporulação, com picos de indução durante a fase tardia da esporulação, momento em que ocorre a biogênese dos zoósporos. Além disso, dados obtidos a partir de experimentos in vivo e in vitro utilizando inibidores das enzimas guanilato ciclase e óxido nítrico sintase, sugeriram a participação do íon Ca2+ e do radical livre óxido nítrico (•NO) na atividade de guanilato ciclase, em uma via do tipo Ca2+-•NO-cGMP. / In the present work, we analyzed global gene expression changes during the sporulation phase of the aquatic fungus Blastocladiella emersonii using cDNA microarray technology with chips containing 3773 distinct genes. A total of 615 genes were upregulated and 645 were down-regulated along the sporulation of the fungus. The overrepresented functional categories among the induced genes were: microtubule and cytoskeleton, signal transduction, Ca2+ binding activity, proteolysis (only at the beginning of sporulation), and chromosome biogenesis and organization (only at the end of sporulation). Among the down-regulated genes, the over-represented functional categories were: protein biosynthesis, carbohydrate transport, and energetic metabolism. Sporulation gene expression data were compared with those obtained recently in our laboratory for the germination phase, showing that a great number of genes are inversely regulated along the two differentiation stages of B. emersonii life cycle. We also analyzed the effects of glucose and tryptophan on gene expression during biogenesis of the zoospores, as such nutrients are able to inhibit B. emersonii sporulation. Our results showed that in the presence of glucose (1%) genes related to activity and composition of cytoskeleton were over-expressed, while in the presence of tryptophan genes involved in protein folding, proteolysis and oxidative stress were induced. In addition, genes involved in the sporulation process per se were downregulated by tryptophan treatment. We also investigated the cyclic GMP signaling pathway, as the levels of this cyclic nucleotide increase considerably during B. emersonii sporulation. Firstly, we searched for sequences encoding enzymes involved in cGMP synthesis and degradation using the B. emersonii EST databank (http://blasto.iq.usp.br). Three sequences were found encoding distinct guanylate cyclase catalytic domains, and one showed high similarity with phosphodiesterases that exhibit high affinity for cGMP. Microarray experiments, validated by real time quantitative RT-PCR, showed that the four transcripts are induced during sporulation, reaching maximum levels at the late stages of sporulation, when zoospore biogenesis occurs. In addition, data obtained from in vivo and in vitro experiments using inhibitors for the enzymes guanylate cyclase and nitric oxide synthase indicated the involvement of the ion Ca2+ and the free radical nitric oxide (•NO) in guanylate cyclase activity, suggesting the existence of a Ca2+-• NO-cGMP signaling pathway.
|
Page generated in 0.0623 seconds