Spelling suggestions: "subject:"biology - bioinformatics"" "subject:"biology - ioinformatics""
241 |
Représentation et recherche de motifs cycliques et structuraux d’ARN connus dans les structures secondairesLouis-Jeune, Caroline 04 1900 (has links)
L'acide désoxyribonucléique (ADN) et l'acide ribonucléique (ARN) sont des polymères de nucléotides essentiels à la cellule. À l'inverse de l'ADN qui sert principalement à stocker l'information génétique, les ARN sont impliqués dans plusieurs processus métaboliques. Par exemple, ils transmettent l’information génétique codée dans l’ADN. Ils sont essentiels pour la maturation des autres ARN, la régulation de l’expression génétique, la prévention de la dégradation des chromosomes et le ciblage des protéines dans la cellule. La polyvalence fonctionnelle de l'ARN résulte de sa plus grande diversité structurale.
Notre laboratoire a développé MC-Fold, un algorithme pour prédire la structure des ARN qu'on représente avec des graphes d'interactions inter-nucléotidiques. Les sommets de ces graphes représentent les nucléotides et les arêtes leurs interactions. Notre laboratoire a aussi observé qu'un petit ensemble de cycles d'interactions à lui seul définit la structure de n'importe quel motif d'ARN. La formation de ces cycles dépend de la séquence de nucléotides et MC-Fold détermine les cycles les plus probables étant donnée cette séquence.
Mon projet de maîtrise a été, dans un premier temps, de définir une base de données des motifs structuraux et fonctionnels d'ARN, bdMotifs, en terme de ces cycles. Par la suite, j’ai implanté un algorithme, MC-Motifs, qui recherche ces motifs dans des graphes d'interactions et, entre autres, ceux générés par MC-Fold. Finalement, j’ai validé mon algorithme sur des ARN dont la structure est connue, tels que les ARN ribosomaux (ARNr) 5S, 16S et 23S, et l'ARN utilisé pour prédire la structure des riborégulateurs.
Le mémoire est divisé en cinq chapitres. Le premier chapitre présente la structure chimique, les fonctions cellulaires de l'ARN et le repliement structural du polymère. Dans le deuxième chapitre, je décris la base de données bdMotifs. Dans le troisième chapitre, l’algorithme de recherche MC-Motifs est introduit. Le quatrième chapitre présente les résultats de la validation et des prédictions. Finalement, le dernier chapitre porte sur la discussion des résultats suivis d’une conclusion sur le travail. / Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are polymers of nucleotides essential for the survival of the cell. Contrary to DNA, whose main role is to store genetic information, RNA is involved in multiple metabolic processes. For example, RNA is involved in the transfer of information from DNA to protein, the processing and modification of other RNAs, the regulation of gene expression, the end-maintenance of chromosomes, and the sorting of proteins within the cell. This functional versatility of RNA comes from its structural diversity.
Our laboratory developed MC-Fold, an algorithm that predicts RNA structures by representing them with nucleotide interaction graphs. The nodes in these graphs represent the nucleotides, and the edges the interactions between them. Our laboratory also observed that a limited number of interaction cycles can define the structure of any RNA motif. The formation of these cycles is determined by the nucleotide sequence and MC-Fold determines the most likely cycles based on that sequence.
In this Master Degree project, I first built a database of structural and functional RNA motifs, bdMotifs, based on their constituent cycles. Then, I implemented an algorithm, MC-Motifs, which detects motifs within interaction graphs generated either by MC-Fold or by any other method. Finally, I validated my algorithm on known RNA structures such as the 5S, 16S and 23S ribosomal RNA (rRNA) and predicted structure of riboswitches.
The Master thesis is divided into five chapters. The first chapter presents the chemical structure of RNA, its cellular functions and the structural folding of the polymer. In the second chapter, the database bdMotifs is described. In the third chapter, the MC-Motifs algorithm is introduced. In the fourth chapter, I present the results of MC-Motifs. Finally, in the last chapter, I discuss theses results and I give a conclusion on the project.
|
242 |
Integrative methods for gene data analysis and knowledge discovery on the case study of KEDRI's brain gene ontology a thesis submitted to Auckland University of Technology in partial fulfilment of the requirements for the degree of Master of Computer and Information sciences, 2008 /Wang, Yuepeng January 2008 (has links)
Thesis (MCIS) -- AUT University, 2008. / Includes bibliographical references. Also held in print ( 131 leaves : ill. ; 30 cm.) in the Archive at the City Campus (T 616.99404200285 WAN)
|
243 |
BioEve: User Interface Framework Bridging IE and IRJanuary 2010 (has links)
abstract: Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search/navigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm. / Dissertation/Thesis / M.S. Computer Science 2010
|
244 |
Glucose and Altered Ceramide Biosynthesis Impact the Transcriptome and the Lipidome of Caenorhabditis elegansLadage, Mary Lee 08 1900 (has links)
The worldwide rise of diabetes and obesity has spurred research investigating the molecular mechanisms that mediate the deleterious effects associated with these diseases. Individuals with diabetes and/or obesity are at increased risk from a variety of health consequences, including heart attack, stroke and peripheral vascular disease; all of these complications have oxygen deprivation as the central component of their pathology. The nematode Caenorhabditis elegans has been established as a model system for understanding the genetic and molecular regulation of oxygen deprivation response, and in recent years methods have been developed to study the effects of excess glucose and altered lipid homeostasis. Using C. elegans, I investigated transcriptomic profiles of wild-type and hyl-2(tm2031) ( a ceramide biosynthesis mutant) animals fed a standard or a glucose supplemented diet. I then completed a pilot RNAi screen of differentially regulated genes and found that genes involved in the endobiotic detoxification pathway (ugt-63 and cyp-25A1) modulate anoxia response. I then used a lipidomic approach to determine whether glucose feeding or mutations in the ceramide biosynthesis pathway or the insulin-like signaling pathway impact lipid profiles. I found that gluocose alters the lipid profile of daf-2(e1370) (an insulin-like receptor mutant) animals. These studies indicate that a transcriptomic approach can be used to discover novel pathways involved in oxygen deprivation response and further validate C. elegans as a model for understanding diabetes and obesity.
|
245 |
Comparison and Genetic Analysis of Host Specificity in Cluster BD1 Bacteriophages infecting StreptomycesKlug, Hannah 05 1900 (has links)
Bacteriophages are viruses that specifically infect bacteria. When a phage infects a bacterium, it attaches itself to the surface of the bacteria and injects its DNA into the intracellular space. The phage DNA hijacks the cellular machinery of the bacteria and forces it to produce phage proteins. Eventually, the bacteria cell bursts or lyses, releasing new phage. The bacteria act as a host for phage reproduction. The ability for a phage to infect multiple bacterial species is known as host range. In siphoviridae bacteriophages, host range is thought to primarily be determined by proteins at the tip of their tail fibers. These proteins act as anti-receptors to specific receptors on the surface of bacteria. In siphoviridae Gram-positive infecting phages, the genes that code these proteins are typically located between the tape measure protein gene and the endolysin gene. It is hypothesized that phages that have similar anti-receptor proteins will have similar host range. In this study, the host ranges of 12 BD1 bacteriophages were tested on 9 different Streptomyces species. In these 12 phages, the genes between the tape measure protein gene and endolysin gene were compared. The 12 phages had high levels of variability in these genes. Five genes in this region had unknown functions and were called position A, B, C, D, and E. Position A-E were BLASTed on NCBI and Phages-DB and their results were recorded. The functions of position A, C, and E remain unknown. The function of position D is most likely a minor tail protein. Position B had BLAST hits for a collagen-like protein and a putative tail fiber protein. Position B was inspected further, and it was found that it contained Gly-X-Y repeats in its amino acid sequence. Position B also had some conservation in its N-terminal amino acid sequence, specifically where the Gly-X-Y repeats were located. Position B had strong conservation in the C-terminal end of its amino acid sequence. Glycine repeats and conservation in the N and C-terminal end of the amino acid sequence are both common factors in known host specificity related genes. There appeared to be no correlation in conservation of position A-E and host range. It was concluded that no single gene can predict a phages host range, but the discovery of collagen repeats could be used as a landmark to find genes related to host surface receptors.
|
246 |
Design, implementation and experimental validation of a network-based model to predict mitotic microtubule regulating proteinsKhan, Faisal Farooq January 2013 (has links)
The purpose of this thesis was to study mitosis in Drosophila, from a network biology perspective. The primary aim was to develop and test a network-based prediction model that could integrate available data in public databases (like Flybase) and, based on that, predict potential mitotic proteins. The approach taken to design the protein interaction network included the use of a priori knowledge about the microtubule composition of the mitotic spindle and the higher likelihood of microtubule-associated proteins (MAPs) to have a putative mitotic function. The design also included the integration of different complementary datasets, from gene expression and functional RNAi screens to cross species conservation of MAPs for fitting a network-based model for predicting mitotic proteins. I begin with the creation of the MAP interactome based on a MAP dataset in Drosophila. This initial network was extended by transferring homologs and interologues of MAP datasets from four other species, i.e. human, mouse, rat and Arabidopsis. These proteins were then used as seed proteins to conduct a virtual pull-down experiment, by adding indirect interactors into the network, i.e. proteins that directly bind to two or more MAPs within the network, which completed the MAP interactome. Data from genome-wide studies in Drosophila were gathered for each node in the MAP interactome. These ‘layers’ of data were then used as features to fit a prediction model that could score each node in the network, based on the likelihood of its role in mitosis. The final model performed with 96% accuracy after 10-fold cross validation and was used to rank all the proteins in the MAP interactome. By analysing the top 100 high scoring predicted mitotic proteins, a highly connected cluster of 33 proteins was identified that was subject to experimental validation in the lab. The first approach was to conduct an in vitro analysis using an RNAi screen to test for any spindle, chromosome or centrosome phenotypes upon gene knockdown. After two independent RNAi screens, around 80% of the proteins produced mutant mitotic phenotypes strongly supporting the results of the MAP prediction model. The second approach was to conduct an in vivo analysis by expressing GFP- fusion constructs of selected genes from the subcluster. These were expressed in Drosophila early embryos to study their subcellular localization during interphase and mitosis. A variety of localizations were observed ranging from chromatin and microtubules to more generic cytoplasmic localizations. These results suggested not all predicted proteins were co-localizing with microtubules, and therefore might not necessarily be microtubule associated proteins but can possibly be functioning as microtubule associated regulator proteins. Proteomics analysis of a subset of these genes showed a large proportion of false positive interactions but also picked new interactions between member proteins that highlighted a module within the subcluster. The RNAi hits from the in vitro analysis and the members of the module within subcluster-16 from the in vivo analysis provide interesting subjects for further characterization.
|
247 |
The evolution of eukaryotic ciliaHodges, Matthew Edmiston January 2011 (has links)
Eukaryotic cilia are complex, highly conserved microtubule-based organelles with a broad phylogenetic distribution. Cilia were present in the last eukaryotic common ancestor and many proteins involved in cilia function have been conserved through eukaryotic diversification. The evolution of these ciliary functions may be inferred from the distribution of the molecular components from which these organelles are composed. By linking protein distribution in 45 diverse eukaryotes with organismal biology, I define an ancestral ciliary inventory. Analysis of these core proteins allows the inference that the cenancestor of the eukaryotes possessed a cilium for motility and sensory function. I show that the centriolar basal body function is ancestral, whereas the centrosome is specific to the Holozoa, and I use this information to predict a number of roles for proteins based on their phylogenetic profile. I also show that while remarkably conserved, significant divergence in ciliary protein composition has occurred in many lineages, such as the unusual centriole of Caenorhabditis elegans and the transitional changes throughout the land plants. I exemplify this divergence through ultrastructural studies of the fern Ceratopteris richardii and the liverwort Marchantia polymorpha both of which have cilia that exhibit a number of distinctive morphological features, the most conspicuous of which is a general breakdown of canonical microtubule arrangements. Cilia have also been lost multiple times in different lineages: at least twice within the land plants. During these evolutionary transitions proteins with ancestral ciliary functions may be lost or co-opted into different functions. I have interrogated genomic data to identify proteins that I predict had an ancestral ciliary role, but which have been maintained in non-ciliated land plants. I demonstrate that several of these proteins have a flagellar localisation in protozoan trypanosomes and I use expression data correlation to predict potential non-ciliary plant roles.
|
248 |
Comparaison des méthodes d'analyse de l'expression différentielle basée sur la dépendance des niveaux d'expressionLefebvre, François 03 1900 (has links)
La technologie des microarrays demeure à ce jour un outil important pour la mesure de l'expression génique. Au-delà de la technologie elle-même, l'analyse des données provenant des microarrays constitue un problème statistique complexe, ce qui explique la myriade de méthodes proposées pour le pré-traitement et en particulier, l'analyse de l'expression différentielle. Toutefois, l'absence de données de calibration ou de méthodologie de comparaison appropriée a empêché l'émergence d'un consensus quant aux méthodes d'analyse optimales. En conséquence, la décision de l'analyste de choisir telle méthode plutôt qu'une autre se fera la plupart du temps de façon subjective, en se basant par exemple sur la facilité d'utilisation, l'accès au logiciel ou la popularité. Ce mémoire présente une approche nouvelle au problème de la comparaison des méthodes d'analyse de l'expression différentielle.
Plus de 800 pipelines d'analyse sont appliqués à plus d'une centaine d'expériences sur deux plateformes Affymetrix différentes. La performance de chacun des pipelines est évaluée en calculant le niveau moyen de co-régulation par l'entremise de scores d'enrichissements pour différentes collections de signatures moléculaires. L'approche comparative proposée repose donc sur un ensemble varié de données biologiques pertinentes, ne confond pas la reproductibilité avec l'exactitude et peut facilement être appliquée à de nouvelles méthodes. Parmi les méthodes testées, la supériorité de la sommarisation FARMS et de la statistique de l'expression différentielle TREAT est sans équivoque. De plus, les résultats obtenus quant à la statistique d'expression différentielle corroborent les conclusions d'autres études récentes à propos de l'importance de prendre en compte la grandeur du changement en plus de sa significativité statistique. / Microarrays remain an important tool for the measurement of gene expression, and a myriad of methods for their pre-processing or statistical testing of differential expression has been proposed in the past. However, insufficient and sometimes contradictory evidence has prevented the emergence of a strong consensus over a preferred methodology. This leaves microarray practitioners to somewhat arbitrarily decide which method should be used to analyze their data. Here we present a novel approach to the problem of comparing methods for the identification of differentially expressed genes.
Over eight hundred analytic pipelines were applied to more than a hundred independent microarray experiments. The accuracy of each analytic pipeline was assessed by measuring the average level of co-regulation uncovered across all data sets. This analysis thus relies on a varied set of biologically relevant data, does not confound reproducibility for accuracy and can easily be extended to future analytic pipelines. This procedure identified FARMS summarization and the TREAT gene ordering statistic as algorithms significantly more accurate than other alternatives. Most interestingly, our results corroborate recent findings about the importance of taking the magnitude of change into account along with an assessment of statistical significance.
|
249 |
Prédiction de boucles de régulation associant microARN et gènes régulés par le récepteur de l'acide rétinoïque dans le cancer du seinBoufaden, Asma 06 1900 (has links)
Le récepteur de l'acide rétinoïque RAR est une protéine de la superfamille des récepteurs nucléaires liant le ligand acide rétinoïque (AR). En présence de son ligand, RAR induit la transcription de ses gènes cibles alors qu'en son absence la transcription est inhibée. Le mécanisme de régulation de RAR est altéré dans les lignées cellulaires humaines de carcinome mammaire dû à une baisse de capacité de synthèse de l'AR. Aussi, l'expression des microARN (miR) est perturbée dans le cancer du sein et un grand nombre de gènes ont été identifiés, après une analyse in-silico, comme des cibles prédites des miRs. Ces derniers peuvent être régulés pas des facteurs de transcription et ils sont capables d'inhiber la prolifération cellulaire et d'induire l'apoptose via la régulation de leurs cibles. Ainsi, les miRs peuvent jouer un rôle dans le mécanisme de régulation de RAR et être impliqués dans des boucles de régulation avec ce récepteur.
Dans le cadre de ce travail, nous décrivons une approche développée pour prédire et caractériser des circuits de régulation au niveau transcriptionnel et post-transcriptionnel dans le cancer du sein. Nous nous sommes intéressés aux boucles de régulation de type feed-forward où RAR régule un miR et en commun ils régulent un ensemble de gènes codants pour des protéines dans les cellules tumorales mammaires MCF7 et SKBR3. Ces circuits ont été construits en combinant des données de ChIP-chip de RAR et des données de micro-puces d'ADN tout en utilisant des outils in-silico de prédiction des gènes cibles de miRs. Afin de proposer le modèle approprié de régulation, une analyse in-silico des éléments de réponse de l'AR (RARE) dans les promoteurs des miRs est réalisée. Cette étape permet de prédire si la régulation par RAR est directe ou indirecte. Les boucles ainsi prédites sont filtrées en se basant sur des données d'expression de miR existantes dans des bases de données et dans différentes lignées cellulaires, en vue d'éliminer les faux positifs. De plus, seuls les circuits pertinents sur le plan biologique et trouvés enrichis dans Gene Ontology sont retenus. Nous proposons également d'inférer l'activité des miRs afin d'orienter leur régulation par RAR. L'approche a réussi à identifier des boucles validées expérimentalement. Plusieurs circuits de régulation prédits semblent être impliqués dans divers aspects du développement de l'organisme, de la prolifération et de la différenciation cellulaire. De plus, nous avons pu valider que let-7a peut être induit par l'AR dans les MCF7. / The retinoic acid receptor (RAR) is a type of nuclear receptor that is activated by the ligand retinoic acid (RA). In the presence of ligand, RAR induces the transcription of its targets whereas in the absence of ligand the transcription is blocked. The mechanism of regulation of RAR is altered in breast cancer cell lines due to a reduced capacity to synthesize RA. Also aberrant patterns of microRNA (miR) expression have been reported in human breast cancer and a number of genes involved in breast cancer progression have been identified by in-silico analysis to be targets of miRs. The miRs could be controlled by transcription factors and via the regulation of their mRNA targets, the miRs could promote apoptosis and even inhibit cell proliferation. Hence, the miRs may play a role in the mechanism of regulation of RAR and could be involved in regulatory loops with this receptor.
In this work, we describe an approach developed for the prediction and characterization of mixed transcriptional and post-transcriptional regulatory circuits in breast cancer. We concentrated in particular on feed-forward loops, in which RAR regulates a miR, and together with it, a set of joint target protein coding genes in human breast cancer cell lines MCF7 and SKBR3. These loops are constructed by combining ChIP-chip datasets of RAR with datasets of DNA microarrays and by using miR target prediction tools. In order to predict the appropriate model of regulation, in-silico analysis was performed to look for retinoic acid response element (RARE) in miR promoter. This step could identify if the regulation by RAR is direct or indirect. The regulatory loops will be then filtered, in order to reduce the number of false positive, based on databases designed to represent human miR expression profiles in different tissues or cell types. Moreover, only biologically relevant circuits enriched in Gene Ontology were retained. Also, we propose to infer miR activity in order to detect their regulation by RAR. This approach was able to find some existing experimental data. Several regulatory circuits seem to be involved in various aspects of organism development, proliferation and cell differentiation. Furthermore, we were able to validate the induction of let-7a by RA in MCF7 cells.
|
250 |
Développement d’outils pour l’analyse de données de ChIP-seq et l’identification des facteurs de transcriptionMercier, Eloi 10 1900 (has links)
La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation
avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs
de transcription à grande échelle. Le traitement des grandes quantités de données ainsi
générées nécessite des moyens informatiques performants et de nombreux outils ont vu
le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun
une étape de l’analyse engendre des problèmes de compatibilité et complique les
analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et
flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet
d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules
PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de
transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre
ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment
concernant le traitement des résultats par MotIV conduisant à la découverte de motifs
non détectés par les autres algorithmes. / ChIP-seq is a technology combining the chromatin immunoprecipitation method
with high-throughput sequencing and allowing the analysis of transcription factors in
vivo on a genome wide scale. The treatment of such amount of data generated by this
method requires strong computer resources and new tools have been recently developed.
Though this proliferation of software performing only one step of the analyze leads to
compatibility problems and complicates the analysis. Thus, there is a real need for an
integrated, powerful and flexible pipeline for motifs identification. Here we proposed a
complete pipeline for the analysis of ChIP-seq data freely available in R and composed
of three R packages PICS, rGADEM and MotIV. Analyzing four data sets for the human
transcription factors CTCF, STAT1, FOXA1 and ER we demonstrated the efficiency of
or pipeline and highlighted its new features, especially concerning the processing of the
results by MotIV that led to the identification of motif not detected by other methods.
|
Page generated in 0.3067 seconds