Cette thèse s'inscrit dans le cadre de la bioinformatique. Les outils mathématiques les plus utilisés dans ce travail relèvent de la théorie des graphes, des statistiques, de la théorie des ensembles et des mathématiques discrètes. Ces mathématiques ont permis de développer des modèles de systèmes biologiques ainsi que des algorithmes efficaces dans l'étude concrète de ces modèles. La nécessité d'analyses de jeux de données de très grande taille a rendu critique dans notre démarche cette notion d'efficacité des algorithmes. Il faut enfin remarquer que le champ biologique qui a servi de support à cette thèse nous a conduit à explorer un domaine particulier au sein de la théorie de la complexité, à savoir le développement et l'analyse des algorithmes d'énumération [etc...] / In the results presented in the present manuscripts, graph theory and combinatorial optimizationtecniques, have been used to model and solve biological problems. The manuscript is divided in twoparts, each one containing the mathematical and biological background of a given application and ouroriginal contributions to it.Part I groups a set of results designed for phylogenetics analysis, and in particular for reconstructingthe co-evolution of two groups of organisms (the so called co-phylogeny reconstruction problem).Although the addressed problem was treated in the available there was no method that solved suchproblem in a complete and efficient way. We thus developed and implemented a new one, calledEucalypt, with this purpose in mind. This not only provides a novel and usable software for cophylogenyreconstruction but also allows to investigate how the event-based model performs inpractice in terms of thenumber and quality of the solutions obtained. We compared our method to the available software. Bylooking at the results obtained, some interesting considerations about the advantages anddisadvantages of the commonly accepted mathematical model could be drawn. Finally, we introduceda new version of the problem where the host-switches are distance bounded: the k-bounded-All-MPRproblem. Eucalypt solves both problems in polynomial delay. These results have been accepted forpublication by the jounal Algorithms for Molecular Biology. The relative software is publicyavailable.Our studies show that the 'most parsimonious scenario' approach presents some limitationsthat cannot be ignored. To deal with these problems, we developed a second algorithm, called Coala,based on an approximate Bayesian computation approach for estimating the frequency of the events.The benefits of this method are twofold: it provides more confidence in the set of costs to be used in areconciliation, and it allows to estimate the frequency of the events in the cases where a reconciliationmethod cannot be applied. These results are currently under review by the jounal Systematic Biology.The relative software is publicy available.In Part 2 another set of studies is presented. Our original model for the contig scaffolding problem,and our algorithm MeDuSa, are presented and tested. Unlike traditional software, it does not rely eitheron paired-end information of sequencing reads or on a phylogenetic distance of the microorganismsused in the analysis. This drastically increases the usability of our software and, at the same time,reduces the computational time required for genome scaffolding. We show that the algorithmimplemented in MeDuSa, in most cases, is capable of producing less and longer scaffolds incomparison to commonly used scaffolders, while maintaining high accuracy and correctness of thepredicted joins. These results are currently under revision by the journal Bioinformatics.Finally, during the development of this method we encountered some pure theoretical open problemsand we decided to dedicate part of our job to their analysis. The last chapter is then dedicated to a setof problems, all related to the Implicit Hitting set enumeration problem. After some formal definitions,an original NP-completeness result is presented and the future directions of our work are described
Identifer | oai:union.ndltd.org:theses.fr/2014LYO10235 |
Date | 12 November 2014 |
Creators | Donati, Beatrice |
Contributors | Lyon 1, Università degli studi (Florence, Italie), Sagot, Marie-France, Crescenzi, Pierluigi |
Source Sets | Dépôt national des thèses électroniques françaises |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation, Text |
Page generated in 0.0024 seconds