Global ETD Search

1	Innovations pour l'annotation protéogénomique à grande échelle du vivant / Innovations for proteogenomic annotation on a large scale for microorganisms Bland, Céline 23 September 2013 (has links) La protéogénomique consiste à affiner l'annotation du génome d'organismes modèles pour lesquels des données protéomiques sont générées à haut-débit. Des erreurs d'annotation structurale ou fonctionnelle sont encore fréquentes. Innover dans les méthodologies permettant de lever ces ambiguïtés est essentiel. L'étude spécifique du N-terminome permet de vérifier expérimentalement l'identification du codon d'initiation de la traduction et de certifier les données obtenues. Pour cela, deux stratégies innovantes ont été développées basées sur : i) le marquage sélectif du N-terminal des protéines, ii) une digestion multienzymatique en parallèle, et ii) l'enrichissement spécifique des peptides N-terminaux marqués par chromatographies liquides successives ou immunocapture dirigée contre le groupement N-terminal ajouté. L'efficacité de ces méthodologies a été démontrée à partir du modèle bactérien Roseobacter denitrificans. Après enrichissement par chromatographie, 480 protéines ont été validées et 46 ré-annotées. Plusieurs sites d'initiation de la traduction ont été décelés et l'annotation par similarité a été remise en cause dans certains cas. Après immunocapture, 269 protéines ont été caractérisées dont 40% ont été identifiées spécifiquement après enrichissement. Trois gènes ont également été annotés pour la première fois. Les résultats complémentaires obtenus après analyse par spectrométrie de masse en tandem facilitent l'interprétation des données pour révéler les sites d'initiation réels de la synthèse des protéines et identifier de nouveaux produits d'expression des gènes. La ré-annotation peut devenir automatique et systématique pour améliorer les bases de données protéiques. / Proteogenomics is a recent field at the junction of genomics and proteomics which consists of refining the annotation of the genome of model organisms with the help of high-throughput proteomic data. Structural and functional errors are still frequent and have been reported on several occasions. Innovative methodologies to prevent such errors are essential. N-terminomics enables experimental validation of initiation codons and certification of the annotation data. With this objective in mind, two innovative strategies have been developed combining: i) selective N-terminal labeling of proteins, ii) multienzymatic digestion in parallel, and iii) specific enrichment of most N-terminal labeled peptides using either successive liquid chromatography steps or immunocapture directed towards the N-terminal label. Efficiency of these methodologies has been demonstrated using Roseobacter denitrificans as bacterial model organism. After enrichment with chromatography, 480 proteins were validated and 46 re-annotated. Several start sites for translation initiation were detected and homology driven annotation was challenged in some cases. After immunocapture, 269 proteins were characterized of which 40% were identified specifically after enrichment. Three novel genes were also annotated for the first time. Complementary results obtained after tandem mass spectrometry analysis allows easier data interpretation to reveal real start sites of translation initiation of proteins and to identify novel expressed products. In this way, the re-annotation process may become automatic and systematic to improve protein databases. Protéogénomique N-terminome Enrichissement spécifique N-terminal Spectrométrie de masse en tandem Ré-annotation du génome ProteogenomicN-terminomics Specific N-terminal enrichment Tandem mass spectrometry Genome re-annotation 576
2	Annotation of the human genome through the unsupervised analysis of high-dimensional genomic data / Annotation du génome humain grâce à l'analyse non supervisée de données de séquençage haut débit Morlot, Jean-Baptiste 12 December 2017 (has links) Le corps humain compte plus de 200 types cellulaires différents possédant une copie identique du génome mais exprimant un ensemble différent de gènes. Le contrôle de l'expression des gènes est assuré par un ensemble de mécanismes de régulation agissant à différentes échelles de temps et d'espace. Plusieurs maladies ont pour cause un dérèglement de ce système, notablement les certains cancers, et de nombreuses applications thérapeutiques, comme la médecine régénérative, reposent sur la compréhension des mécanismes de la régulation géniques. Ce travail de thèse propose, dans une première partie, un algorithme d'annotation (GABI) pour identifier les motifs récurrents dans les données de séquençage haut-débit. La particularité de cet algorithme est de prendre en compte la variabilité observée dans les réplicats des expériences en optimisant le taux de faux positif et de faux négatif, augmentant significativement la fiabilité de l'annotation par rapport à l'état de l'art. L'annotation fournit une information simplifiée et robuste à partir d'un grand ensemble de données. Appliquée à une base de données sur l'activité des régulateurs dans l'hématopoieïse, nous proposons des résultats originaux, en accord avec de précédentes études. La deuxième partie de ce travail s'intéresse à l'organisation 3D du génome, intimement lié à l'expression génique. Elle est accessible grâce à des algorithmes de reconstruction 3D à partir de données de contact entre chromosomes. Nous proposons des améliorations à l'algorithme le plus performant du domaine actuellement, ShRec3D, en permettant d'ajuster la reconstruction en fonction des besoins de l'utilisateur. / The human body has more than 200 different cell types each containing an identical copy of the genome but expressing a different set of genes. The control of gene expression is ensured by a set of regulatory mechanisms acting at different scales of time and space. Several diseases are caused by a disturbance of this system, notably some cancers, and many therapeutic applications, such as regenerative medicine, rely on understanding the mechanisms of gene regulation. This thesis proposes, in a first part, an annotation algorithm (GABI) to identify recurrent patterns in the high-throughput sequencing data. The particularity of this algorithm is to take into account the variability observed in experimental replicates by optimizing the rate of false positive and false negative, increasing significantly the annotation reliability compared to the state of the art. The annotation provides simplified and robust information from a large dataset. Applied to a database of regulators activity in hematopoiesis, we propose original results, in agreement with previous studies. The second part of this work focuses on the 3D organization of the genome, intimately linked to gene expression. This structure is now accessible thanks to 3D reconstruction algorithm from contact data between chromosomes. We offer improvements to the currently most efficient algorithm of the domain, ShRec3D, allowing to adjust the reconstruction according to the user needs. Séquençage haut-débit Apprentissage non supervisé Hématopoïèse Reconstruction 3D Annotation du génome Modèle probabiliste graphique Next generation sequencing Unsupervised learning 3D reconstruction 572.8

Search results

Innovations pour l'annotation protéogénomique à grande échelle du vivant / Innovations for proteogenomic annotation on a large scale for microorganisms

Annotation of the human genome through the unsupervised analysis of high-dimensional genomic data / Annotation du génome humain grâce à l'analyse non supervisée de données de séquençage haut débit