Global ETD Search

1	Dans les abysses du transcriptome : découverte de nouveaux biomarqueurs de cellules souches mésenchymateuses par analyse approfondie du RNAseq / In the abyss of the transcriptome : discovery of new biomarkers of mesenchymal stem cells by in-depth analysis of RNAseq Riquier, Sébastien 04 February 2019 (has links) Le développement du séquençage ARN, ou RNAseq, a permis l'essor de la recherche intensive de biomarqueurs dans de nombreux domaines de la biologie. L’information complète du transcriptome contenue dans les données de sorties, permet à un bioinformaticien assidu de dépasser les connaissances actuelles et d’accéder, grâce à des pipelines informatiques avancés, à d’innombrables signatures d’intérêts inédites. Dans cette thèse nous mettons en avant que ces marqueurs potentiels, essentiellement explorés pour répondre à des problématiques clinique en conditions pathologiques, peuvent être utilisés pour affiner la caractérisation de types de cellules sans marqueurs strictement spécifiques. Nous nous sommes intéressés aux cellules souches mésenchymateuses (MSCs), un type de cellules souches adultes multipotentes, fortement utilisées en clinique mais ne possédant pas de marqueurs positifs strictement spécifiques.Notre étude se concentre sur la recherche des ARN longs non-codants non annotés. Ces ARNs, aussi nommés "lncRNA", constituent une classe émergente de transcrits encore peu explorée à ce jour. De plus, cette catégorie démontre une spécificité conditionnelle et tissulaire élevée. Nous avons élaboré un pipeline d’analyse RNAseq optimisé pour la reconstruction et la quantification de lncRNAs non annotés.En utilisant les données publiques de RNAseq, venant de différentes sources de MSCs et d'autres types de cellules, nous avons identifié de nouveaux lncRNA non annotés exprimés spécifiquement dans les MSCs.Nous avons développé pour ce projet Kmerator.jl, un outil qui permet de décomposer un transcrit en sous séquences spécifiques (k-mers) afin de chercher et quantifier plus rapidement la signature de nos candidats dans un grand nombre de données RNAseq. Kmerator a également été utilisé dans d'autres applications pour tester la qualité des données RNA-seq disponibles en accés public.Après validation de ces nouveaux biomarqueurs de MSCs par qPCR, nous avons eu recours à plusieurs outils informatiques pour prédire leurs fonctions potentielles. Enfin, nous avons analysé des données RNAseq « single-cell » pour aborder l’hétérogénéité d’expression au sein des populations MSCs. / The development of RNA sequencing, or RNAseq, have opened the path of intensive biomarkers research in many areas of biology. The complete information of the transcriptome contained in the output data, allows a bioinformatician to surpass the current knowledge and to access, thanks to advanced computer pipelines, to signatures of new interest. In this thesis, we are showing that these potential markers, classically used in clinical and pathological conditions, can be used to characterize cell types without extensive markers profile. We have studied mesenchymal stem cells, a type of adult multipotent stem cells, strongly used in clinics but without strickly specific positive markers. Our study mainly focuses on the search for non-annotated, long non-coding RNAs. These RNAs, also called "lncRNA", constitute an emerging class of transcripts and are still lightly explored.In addition, this category presents a highly tissue-related specificity. We have developed an optimized RNAseq pipeline for the reconstruction and quantification of non-annotated lncRNAs.Using public data from RNAseq, coming from different sources of MSC and other cell types, we have identified new non-annotated lncRNAs clearly and specifically expressed in MSCs. to complete this project, we developed Kmerator.jl, a bioinformatical tool that allows to decompose a transcript in k-mer, and select specific sub-sequences, in order to search and quantify at a faster rate the signature of our candidates in a large number of RNAseq dataset. After validation of these new biomarkers of MSCs by qPCR, we used several computer tools to predict their potential functions. Finally, we analyzed single-cell RNAseq data to address the heterogeneity of expression within MSC populations. Cellules souches Mésenchymateuses Séquençage ARN ARNs longs non-Codants K-Mers Mesenchymal Stem Cells RNA-Seq Long non-Coding RNAs K-Mers
2	Développement de modèles spécifiques aux séquences génomique virales / Developing viral genomic data-specific classification models Schmitt, Louise-Amelie 19 July 2017 (has links) Le séquençage ADN d'échantillons complexes contenant plusieurs espèces est une technique de choix pour étudier le paysage viral d'un milieu donné. Or les génomes viraux sont difficiles à identifier, de par leur extrême variabilité et la relation étroite qu'ils entretiennent avec leurs hôtes. Nous proposons de nouvelles pistes de recherche pour apporter une solution spécifique aux séquences virales afin de répondre au besoin d'identification pour lequel les solutions génériques existantes n'apportent pas de réponse satisfaisante. / DNA sequencing of complex samples containing various living species is a choice approach to study the viral landscape of a given environment. Viral genomes are hard to identify due to their extreme variability and the tight relationship they have with their hosts. We hereby provide new leads for the development of a virusesspecific solution to the need for accurate identification that hasn't found a satisfactory solution in the existing universal software so far. Métagénomique Apprentissage machine Environnement Phylogénie Assignation taxonomique Classification supervisée K-mers Signature Virologie Metagenomics Machine learning Environment Phylogeny Taxonomic assignment Supervised classification K-mers Signature Virology
3	Vers une cartographie fine des polymorphismes liés à la résistance aux antimicrobiens / Fine mapping of antibiotic resistance determinants Jaillard Dancette, Magali 12 December 2018 (has links) Mieux comprendre les mécanismes de la résistance aux antibiotique est un enjeu important dans la lutte contre les maladies infectieuses, qui fait face à la propagation de bactéries multi-résistantes. Les études d'association à l'échelle des génomes sont des outils puissants pour explorer les polymorphismes liés aux variations phénotypiques dans une population. Leur cadre méthodologique est très documenté pour les eucaryotes, mais leur application aux bactéries est très récente. Durant cette thèse, j'ai cherché à rendre ces outils mieux adaptés aux génomes plastiques des bactéries, principalement en travaillant sur la représentation des variations génétiques. En effet, parce que les bactéries ont la capacité à échanger du matériel génétique avec leur environnement, leurs génomes peuvent être trop différents au sein d'une espèce pour être alignés contre une référence. La description des variations par des fragments de séquence de longueur k, les k-mers, offre la flexibilité nécessaire mais ne permet pas une interprétation directe des résultats obtenus. La méthode mise au point teste l'association de ces k-mers avec le phénotype, et s'appuie sur un graphe de De Bruijn pour permettre la visualisation du contexte génomique des k-mers identifiés par le test, sous forme de graphes. Cette vue synthétique renseigne sur la nature de la séquence identifiée: il peut par exemple s'agir de polymorphisme local dans un gène ou de l'acquisition d'un gène dans un plasmide. Le type de variant représenté dans un graphe peut être prédit avec une bonne performance à partir de descripteurs du graphe, rendant plus opérationnelles les approches par k-mers pour l'étude des génomes bactériens / The emergence and spread of multi-drug resistance has become a major worldwide public health concern, calling for better understanding of the underlying resistance mechanisms. Genome-wide association studies are powerful tools to finely map the genetic polymorphism linked to the phenotypic variability observed in a population. However well documented for eukaryotic genome analysis, these studies were only recently applied to prokaryota.Through this PhD project, I searched how to better adapt these tools to the highly plastic bacterial genomes, mainly by working on the representation of the genetic variations in these genomes. Indeed, because the bacteria have the faculty to acquire genetic material by a means other than direct inheritance from a parent cell, their genomes can differ too much within a species to be aligned against a reference. A representation using sequence fragments of length k - the so-called k-mers - offers the required flexibility but generates redundancy and does not allow for a direct interpretation of the identified associations. The method we set up tests the association of these k-mers with the phenotype, and takes advantage of a De Bruijn graph (DBG) built over all genomes to remove the local redundancy of k-mers, and offer a visualisation of the genomic context of the k-mers identified by the test. This synthetic view as DBG subgraphs informs on the nature of the identified sequence: e.g. local polymorphism in a gene or gene acquired through a plasmid. The type of variant can be predicted correctly in 96% of the cases from descriptors of the subgraphs, providing a tractable framework for k-mer-based association studies Antibiorésistance Graphes de De Bruijn Variations génomiques K-mers Graphe décoré Génétique des procaryotes Génomes bactériens Genome-wide association study Antibiotic resistance De Bruijn graph Genome variation K-mers Decorated graph Prokaryotic genetics Bacterial genome 570
4	Numerické metody pro klasifikaci metagenomických dat / Numerical methods for classification of metagenomic data Vaněčková, Tereza January 2016 (has links) This thesis deals with metagenomics and numerical methods for classification of metagenomic data. Review of alignment-free methods based on nucleotide word frequency is provided as they appear to be effective for processing of metagenomic sequence reads produced by next-generation sequencing technologies. To evaluate these methods, selected features based on k-mer analysis were tested on simulated dataset of metagenomic sequence reads. Then the data in original data space were enrolled for hierarchical clustering and PCA processed data were clustered by K-means algorithm. Analysis was performed for different lengths of nucleotide words and evaluated in terms of classification accuracy.

1

Page generated in 0.0233 seconds