111 |
Mapping SH3 Domain InteractomesXin, Xiaofeng 21 April 2010 (has links)
Src homology 3 (SH3) domains are one family of the peptide recognition modules (PRMs), which bind peptides rich in proline or positively charged residues in the target proteins, and play important assembly or regulatory functions in dynamic eukaryotic cellular processes, especially in signal transduction and endocytosis. SH3 domains are conserved from yeast to human, and improper SH3 domain mediated protein-protein interaction (PPI) leads to defects in cellular function and may even result in disease states. Since commonly used large-scale PPI mapping strategies employed full-length proteins or random protein fragments as screening probes and did not identify the particular PPIs mediated by the SH3 domains, I employed a combined experimental and computational strategy to address this problem.
I used yeast two-hybrid (Y2H) as my major experimental tool, as well as individual SH3 domains as baits, to map SH3 domain mediated PPI networks, “SH3 domain interactomes”. One of my important contributions has been the improvement for Y2H technology. First, I generated a pair of Y2H host strains that improved the efficiency of high-throughput Y2H screening and validated their usage. These strains were employed in my own research and also were adopted by other researchers in their large-scale PPI network mapping projects. Second, in collaboration with Nicolas Thierry-Mieg, I developed a novel smart-pooling method, Shifted Transversal Design (STD) pooling, and validated its application in large-scale Y2H. STD pooling was proven to be superior among currently available methods for obtaining large-scale PPI maps with higher coverage, high sensitivity and high specificity.
I mapped the SH3 domain interactomes for both budding yeast Saccharomyces cerevisiae and nematode worm Caenorhabditis elegans, which contain 27 and 84 SH3 domains, respectively. Comparison of these two SH3 interactomes revealed that the role of the SH3 domain is conserved at a functional but not a structural level, playing a major role in the assembly of an endocytosis network from yeast to worm. Moreover, the worm SH3 domains are additionally involved in metazoan-specific functions such as neurogenesis and vulval development. These results provide valuable insights for our understanding of two important evolutionary processes from single cellular eukaryotes to animals: the functional expansion of the SH3 domains into new cellular modules, as well as the conservation and evolution of some cellular modules at the molecular level, particularly the endocytosis module.
|
112 |
Computational Prediction of Gene Function From High-throughput Data SourcesMostafavi, Sara 31 August 2011 (has links)
A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion.
Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans.
Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions.
|
113 |
Conception de microARNs pour attenuer l'expression de genesCaron, Maxime 09 1900 (has links)
Les microARNs appartiennent à la famille des petits ARNs non-codants et agissent
comme inhibiteurs des ARN messagers et/ou de leurs produits protéiques. Les mi-
croARNs sont différents des petits ARNs interférants (siARN) car ils atténuent l’ex-
pression au lieu de l’éliminer. Dans les dernières années, de nombreux microARNs
et leurs cibles ont été découverts chez les mammifères et les plantes. La bioinforma-
tique joue un rôle important dans ce domaine, et des programmes informatiques de
découvertes de cibles ont été mis à la disposition de la communauté scientifique. Les
microARNs peuvent réguler chacun des centaines de gènes, et les profils d’expression de
ces derniers peuvent servir comme classificateurs de certains cancers. La modélisation
des microARNs artificiels est donc justifiable, où l’un pourrait cibler des oncogènes
surexprimés et promouvoir une prolifération de cellules en santé. Un outil pour créer
des microARNs artificiels, nommé MultiTar V1.0, a été créé et est disponible comme
application web. L’outil se base sur des propriétés structurelles et biochimiques des
microARNs et utilise la recherche tabou, une métaheuristique. Il est démontré que
des microARNs conçus in-silico peuvent avoir des effets lorsque testés in-vitro. Les sé-
quences 3’UTR des gènes E2F1, E2F2 et E2F3 ont été soumises en entrée au programme
MultiTar, et les microARNs prédits ont ensuite été testés avec des essais luciférases, des
western blots et des courbes de croissance cellulaire. Au moins un microARN artificiel
est capable de réguler les trois gènes par essais luciférases, et chacun des microARNs a
pu réguler l’expression de E2F1 et E2F2 dans les western blots. Les courbes de crois-
sance démontrent que chacun des microARNs interfère avec la croissance cellulaire.
Ces résultats ouvrent de nouvelles portes vers des possibilités thérapeutiques. / MicroRNAs belong to the family of small non-coding RNAs and act as down regula-
tors of messenger RNAs and/or their protein products. microRNAs differ from siRNAs
by downregulating instead of shutting down. In recent years, numerous microRNAs and
their targets have been found in mammals and plants. Bioinformatics plays a big role
in this field, as software has emerged to find new microRNA targets. Each individual
microRNA can regulate hundreds of genes, and it has been shown that microRNA
expression profiles can classify human cancers. The need for artificially created mi-
croRNAs is then justified, as one could target overexpressed oncogenes and promote
healthy cell proliferation. MultiTar V1.0, a tool for creating artificial microRNAs, has
been implemented and is available as a web application. The tool relies on structural
and biological properties of microRNAs and uses a Tabusearch metaheuristic. A typical
biological problem is presented and it is shown that an in-silico microRNA has in-vitro
effects. The 3’UTR sequences of E2F1, E2F2 and E2F3 were given as input to the
tool, and predicted microRNAs were then tested using luciferase essays, western blots
and growth curves. At least one microRNA is able to regulate the three genes with
luciferase essays and all of the created microRNAs were able to regulate the expres-
sion of E2F1 and E2F2 with western blots. Growth curves were also studied in order
to investigate overall biological effects, and reduction in growth was observed for all
solutions. Results obtained with the predicted microRNAs and the target genes open
a new door into therapeutic possibilities.
|
114 |
Analyse de la corrélation conditionnelle dérivée de la coévolution d’un système de trois gènes par un modèle du maximum de vraisemblanceBenoit Bouvrette, Louis Philip 08 1900 (has links)
Les gènes codant pour des protéines peuvent souvent être regroupés et intégrés en modules fonctionnels par rapport à un organelle. Ces modules peuvent avoir des composantes qui suivent une évolution corrélée pouvant être conditionnelle à un phénotype donné. Les gènes liés à la motilité possèdent cette caractéristique, car ils se suivent en cascade en réponse à des stimuli extérieurs. L’hyperthermophilie, d’autre part, est interreliée à la reverse gyrase, cependant aucun autre élément qui pourrait y être associé avec
certitude n’est connu. Ceci peut être dû à un déplacement de gènes non orthologues encore non résolu. En utilisant une approche bio-informatique, une modélisation mathématique d’évolution conditionnelle corrélée pour trois gènes a été développée et appliquée sur des profils phylétiques d’archaea. Ceci a permis d’établir des théories quant à la fonction potentielle du gène du flagelle FlaD/E ainsi que l’histoire évolutive des gènes lui étant liés et ayant contribué à sa formation. De plus, une histoire évolutive théorique a été établie pour une ligase liée à l’hyperthermophilie. / Protein coding gene may often be grouped and integrated in functional modules with respect to an organelle. These modules may have constituents that follow a conditional correlated evolution to a given phenotype. Genes linked to motility posses this characteristic as they follow a cascade in response to external stimuli. Similarly, hyperthermophily is related to reverse gyrase, however no other element that could be associated with certainty is known. This may be caused by an unresolved case of non-orthologous gene displacement. Using a bioinformatic approach, a mathematical model for conditional correlated evolution for three genes has been developed and applied to the phyletic
profiles of archaea. This has helped to develop theories about the potential functions of the flagellar gene FlaD/E and the evolutionary history of the genes that are linked to it and that may have contributed to its formation. In addition, a theoretical evolutionary history has been established for a ligase associated with hyperthermophily.
|
115 |
Comparaison des méthodes d'analyse de l'expression différentielle basée sur la dépendance des niveaux d'expressionLefebvre, François 03 1900 (has links)
La technologie des microarrays demeure à ce jour un outil important pour la mesure de l'expression génique. Au-delà de la technologie elle-même, l'analyse des données provenant des microarrays constitue un problème statistique complexe, ce qui explique la myriade de méthodes proposées pour le pré-traitement et en particulier, l'analyse de l'expression différentielle. Toutefois, l'absence de données de calibration ou de méthodologie de comparaison appropriée a empêché l'émergence d'un consensus quant aux méthodes d'analyse optimales. En conséquence, la décision de l'analyste de choisir telle méthode plutôt qu'une autre se fera la plupart du temps de façon subjective, en se basant par exemple sur la facilité d'utilisation, l'accès au logiciel ou la popularité. Ce mémoire présente une approche nouvelle au problème de la comparaison des méthodes d'analyse de l'expression différentielle.
Plus de 800 pipelines d'analyse sont appliqués à plus d'une centaine d'expériences sur deux plateformes Affymetrix différentes. La performance de chacun des pipelines est évaluée en calculant le niveau moyen de co-régulation par l'entremise de scores d'enrichissements pour différentes collections de signatures moléculaires. L'approche comparative proposée repose donc sur un ensemble varié de données biologiques pertinentes, ne confond pas la reproductibilité avec l'exactitude et peut facilement être appliquée à de nouvelles méthodes. Parmi les méthodes testées, la supériorité de la sommarisation FARMS et de la statistique de l'expression différentielle TREAT est sans équivoque. De plus, les résultats obtenus quant à la statistique d'expression différentielle corroborent les conclusions d'autres études récentes à propos de l'importance de prendre en compte la grandeur du changement en plus de sa significativité statistique. / Microarrays remain an important tool for the measurement of gene expression, and a myriad of methods for their pre-processing or statistical testing of differential expression has been proposed in the past. However, insufficient and sometimes contradictory evidence has prevented the emergence of a strong consensus over a preferred methodology. This leaves microarray practitioners to somewhat arbitrarily decide which method should be used to analyze their data. Here we present a novel approach to the problem of comparing methods for the identification of differentially expressed genes.
Over eight hundred analytic pipelines were applied to more than a hundred independent microarray experiments. The accuracy of each analytic pipeline was assessed by measuring the average level of co-regulation uncovered across all data sets. This analysis thus relies on a varied set of biologically relevant data, does not confound reproducibility for accuracy and can easily be extended to future analytic pipelines. This procedure identified FARMS summarization and the TREAT gene ordering statistic as algorithms significantly more accurate than other alternatives. Most interestingly, our results corroborate recent findings about the importance of taking the magnitude of change into account along with an assessment of statistical significance.
|
116 |
Prédiction de boucles de régulation associant microARN et gènes régulés par le récepteur de l'acide rétinoïque dans le cancer du seinBoufaden, Asma 06 1900 (has links)
Le récepteur de l'acide rétinoïque RAR est une protéine de la superfamille des récepteurs nucléaires liant le ligand acide rétinoïque (AR). En présence de son ligand, RAR induit la transcription de ses gènes cibles alors qu'en son absence la transcription est inhibée. Le mécanisme de régulation de RAR est altéré dans les lignées cellulaires humaines de carcinome mammaire dû à une baisse de capacité de synthèse de l'AR. Aussi, l'expression des microARN (miR) est perturbée dans le cancer du sein et un grand nombre de gènes ont été identifiés, après une analyse in-silico, comme des cibles prédites des miRs. Ces derniers peuvent être régulés pas des facteurs de transcription et ils sont capables d'inhiber la prolifération cellulaire et d'induire l'apoptose via la régulation de leurs cibles. Ainsi, les miRs peuvent jouer un rôle dans le mécanisme de régulation de RAR et être impliqués dans des boucles de régulation avec ce récepteur.
Dans le cadre de ce travail, nous décrivons une approche développée pour prédire et caractériser des circuits de régulation au niveau transcriptionnel et post-transcriptionnel dans le cancer du sein. Nous nous sommes intéressés aux boucles de régulation de type feed-forward où RAR régule un miR et en commun ils régulent un ensemble de gènes codants pour des protéines dans les cellules tumorales mammaires MCF7 et SKBR3. Ces circuits ont été construits en combinant des données de ChIP-chip de RAR et des données de micro-puces d'ADN tout en utilisant des outils in-silico de prédiction des gènes cibles de miRs. Afin de proposer le modèle approprié de régulation, une analyse in-silico des éléments de réponse de l'AR (RARE) dans les promoteurs des miRs est réalisée. Cette étape permet de prédire si la régulation par RAR est directe ou indirecte. Les boucles ainsi prédites sont filtrées en se basant sur des données d'expression de miR existantes dans des bases de données et dans différentes lignées cellulaires, en vue d'éliminer les faux positifs. De plus, seuls les circuits pertinents sur le plan biologique et trouvés enrichis dans Gene Ontology sont retenus. Nous proposons également d'inférer l'activité des miRs afin d'orienter leur régulation par RAR. L'approche a réussi à identifier des boucles validées expérimentalement. Plusieurs circuits de régulation prédits semblent être impliqués dans divers aspects du développement de l'organisme, de la prolifération et de la différenciation cellulaire. De plus, nous avons pu valider que let-7a peut être induit par l'AR dans les MCF7. / The retinoic acid receptor (RAR) is a type of nuclear receptor that is activated by the ligand retinoic acid (RA). In the presence of ligand, RAR induces the transcription of its targets whereas in the absence of ligand the transcription is blocked. The mechanism of regulation of RAR is altered in breast cancer cell lines due to a reduced capacity to synthesize RA. Also aberrant patterns of microRNA (miR) expression have been reported in human breast cancer and a number of genes involved in breast cancer progression have been identified by in-silico analysis to be targets of miRs. The miRs could be controlled by transcription factors and via the regulation of their mRNA targets, the miRs could promote apoptosis and even inhibit cell proliferation. Hence, the miRs may play a role in the mechanism of regulation of RAR and could be involved in regulatory loops with this receptor.
In this work, we describe an approach developed for the prediction and characterization of mixed transcriptional and post-transcriptional regulatory circuits in breast cancer. We concentrated in particular on feed-forward loops, in which RAR regulates a miR, and together with it, a set of joint target protein coding genes in human breast cancer cell lines MCF7 and SKBR3. These loops are constructed by combining ChIP-chip datasets of RAR with datasets of DNA microarrays and by using miR target prediction tools. In order to predict the appropriate model of regulation, in-silico analysis was performed to look for retinoic acid response element (RARE) in miR promoter. This step could identify if the regulation by RAR is direct or indirect. The regulatory loops will be then filtered, in order to reduce the number of false positive, based on databases designed to represent human miR expression profiles in different tissues or cell types. Moreover, only biologically relevant circuits enriched in Gene Ontology were retained. Also, we propose to infer miR activity in order to detect their regulation by RAR. This approach was able to find some existing experimental data. Several regulatory circuits seem to be involved in various aspects of organism development, proliferation and cell differentiation. Furthermore, we were able to validate the induction of let-7a by RA in MCF7 cells.
|
117 |
Développement d’outils pour l’analyse de données de ChIP-seq et l’identification des facteurs de transcriptionMercier, Eloi 10 1900 (has links)
La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation
avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs
de transcription à grande échelle. Le traitement des grandes quantités de données ainsi
générées nécessite des moyens informatiques performants et de nombreux outils ont vu
le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun
une étape de l’analyse engendre des problèmes de compatibilité et complique les
analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et
flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet
d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules
PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de
transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre
ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment
concernant le traitement des résultats par MotIV conduisant à la découverte de motifs
non détectés par les autres algorithmes. / ChIP-seq is a technology combining the chromatin immunoprecipitation method
with high-throughput sequencing and allowing the analysis of transcription factors in
vivo on a genome wide scale. The treatment of such amount of data generated by this
method requires strong computer resources and new tools have been recently developed.
Though this proliferation of software performing only one step of the analyze leads to
compatibility problems and complicates the analysis. Thus, there is a real need for an
integrated, powerful and flexible pipeline for motifs identification. Here we proposed a
complete pipeline for the analysis of ChIP-seq data freely available in R and composed
of three R packages PICS, rGADEM and MotIV. Analyzing four data sets for the human
transcription factors CTCF, STAT1, FOXA1 and ER we demonstrated the efficiency of
or pipeline and highlighted its new features, especially concerning the processing of the
results by MotIV that led to the identification of motif not detected by other methods.
|
118 |
Development and Implementation of Gene Ontology Cluster Analysis of Protein Array DataWolting, Cheryl 05 September 2012 (has links)
Decoding the genomes from organisms that encompass all taxonomies provides the foundation for extensive, large scale studies of biological molecules such as RNA, protein and carbohydrates. The high-throughput studies facilitated by the existence of these genome sequences necessitate the development of new analytic methods for the interpretation of large sets of results. The work herein focuses on the development of a novel clustering method for the analysis of protein array results and examines its utilization in the analysis of integrated interaction data sets. Sets of proteins that interact with a molecule of interest were clustered according to their functional similarity. The simUI distance metric in the statistical analysis package BioConductor was applied to measure the similarity of two proteins utilizing the assembly of their Gene Ontology annotation. Clusters were identified by partitioning around medoids and interpreted using the summary label provided by the Gene Ontology annotation of the medoid. The utility of the method was tested on two published yeast protein array data sets and shown to allow interpretation of the data to yield novel biological hypotheses. We performed a protein array screen using the E3 ubiquitin ligase and PDZ domain-containing protein LNX1. We combined these results with other published LNX1 interactors to produce a set of 220 proteins that was clustered according to Gene Ontology annotation. From the clustering results, 14 proteins were selected for subsequent examination by co-immunoprecipitation, of which 8 proteins were confirmed as LNX1 interactors. Recognition of 6 proteins by specific LNX1 PDZ domains was confirmed by fusion protein pull-downs. This work supports the role of LNX1 as a signalling scaffold. The interpretation of protein array results using our novel clustering method facilitated the identification of candidate molecules for subsequent experimental analysis. Thus our analytical method facilitates identification of biologically relevant molecules within a large data set, making this method an essential component of complex, high-throughput experimentation.
|
119 |
Development and Implementation of Gene Ontology Cluster Analysis of Protein Array DataWolting, Cheryl 05 September 2012 (has links)
Decoding the genomes from organisms that encompass all taxonomies provides the foundation for extensive, large scale studies of biological molecules such as RNA, protein and carbohydrates. The high-throughput studies facilitated by the existence of these genome sequences necessitate the development of new analytic methods for the interpretation of large sets of results. The work herein focuses on the development of a novel clustering method for the analysis of protein array results and examines its utilization in the analysis of integrated interaction data sets. Sets of proteins that interact with a molecule of interest were clustered according to their functional similarity. The simUI distance metric in the statistical analysis package BioConductor was applied to measure the similarity of two proteins utilizing the assembly of their Gene Ontology annotation. Clusters were identified by partitioning around medoids and interpreted using the summary label provided by the Gene Ontology annotation of the medoid. The utility of the method was tested on two published yeast protein array data sets and shown to allow interpretation of the data to yield novel biological hypotheses. We performed a protein array screen using the E3 ubiquitin ligase and PDZ domain-containing protein LNX1. We combined these results with other published LNX1 interactors to produce a set of 220 proteins that was clustered according to Gene Ontology annotation. From the clustering results, 14 proteins were selected for subsequent examination by co-immunoprecipitation, of which 8 proteins were confirmed as LNX1 interactors. Recognition of 6 proteins by specific LNX1 PDZ domains was confirmed by fusion protein pull-downs. This work supports the role of LNX1 as a signalling scaffold. The interpretation of protein array results using our novel clustering method facilitated the identification of candidate molecules for subsequent experimental analysis. Thus our analytical method facilitates identification of biologically relevant molecules within a large data set, making this method an essential component of complex, high-throughput experimentation.
|
120 |
Development and Application of Serum Assay to Monitor Response to Therapy and Predict for Relapse in Acute Myeloid LeukemiaGhahremanlou, Mohsen 22 November 2013 (has links)
The diagnosis and monitoring of AML relies predominantly on the identification of blast cells in the bone marrow and peripheral blood. While at the time of diagnosis the identification of leukemic cells is relatively easy, during remission the identification of small numbers of blasts is problematic. This is most evident by the fact that patients who achieve complete remission frequently relapse, despite pathologic examination indicating a marked reduction in leukemic cell burden. In this thesis I have explored the potential of using serum proteins secreted by leukemic cells as a means of monitoring disease in patients. To identify proteins that might be useful for monitoring, I took advantage of published gene expression arrays and looked into online bioinformatics databases. Using specific characteristics, I was able to identify approximately 107 candidate proteins secreted by AML cells. RT-PCR analysis and ELISA assays were performed to evaluate the variability of expressions and serum level differences of twelve different proteins in the list.
|
Page generated in 0.0351 seconds