• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 21
  • 19
  • 18
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 80
  • 23
  • 20
  • 18
  • 12
  • 10
  • 9
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Évolution génotypique et phénotypique d'une souche épidémique de Pseudomonas aeruginosa au cours des 11 ans de sa diffusion hospitalière / Genotypic and phenotypic evolution of a Pseudomonas aeruginosa ST395 strain during 11-year in hospital spread.

Petitjean, Marie 31 October 2017 (has links)
P. aeruginosa est une bactérie pathogène de l'homme, responsable d'infections nosocomiales chez les patients immunodéprimés. Bien que son évolution au sein d'un patient soit bien décrite, son évolution génomique globale au cours de sa propagation dans un hôpital est très mal connue. Le clone à haut-risque ST395 multirésistant aux antibiotiques a diffusé dans le Centre Hospitalier Regional Universitaire de Besançon entre 1997 et 2008 en infectant ou colonisant plus de 300 patients. Une approche WGS a été utilisée afin d'identifier l'origine de l'épidémie, les caractéristiques ayant aidé à son installation à l'hôpital ainsi que celles à l'origine de sa disparition. Les génomes de 54 isolats représentatifs de l'épidémie ont été séquencés. L’arbre phylogénétique a mis en évidence deux clusters distincts indiquant la présence de deux épidémies parallèles. La datation d'un ancêtre commun en 1979, date de début de la construction de l'hôpital, indiquerait une contamination précoce du réseau d'eau de l'hôpital. Cette hypothèse est soutenue par la présence d'un îlot génomique spécifique de ST395 portant les gènes codant 6 transporteurs du cuivre et associée à une résistance phénotypique à ce métal constituant les tuyaux du réseaux de distribution d'eau potable. Les isolats tardifs présentaient des signatures génomiques d'adaptation à l’infection chronique (altération du lipopolysaccharide et de la porine OprD – objectivées phénotypiquement, et extinction de la surproduction de la pompe d’efflux MexAB-OprM – contrôlée par RT-qPCR) suite à des mutations indépendantes. Certaines de ces mutations ont été associées à une perte de fitness bactérien. Nous émettons l’hypothèse que l’émergence indépendante d’isolats adaptés à l’infection chronique, et ainsi l’accumulation de culs-de-sac épidémiologiques, a participé à l’épuisement de l’épidémie hospitalière de P. aeruginosa ST395. / P. aeruginosa is an opportunistic pathogen responsible of hospital-acquired infections in immunocompromised patients. Although in-host evolution of P. aeruginosa is well documented, little is known about this pathogen evolution during its spread on a hospital scale. The high-risk multidrug resistant clone ST395 spread among more than 300 patients in the University Hospital of Besançon between 1997 and 2008. We used a WGS approach to identify the origin of the outbreak, the features that could have helped its implantation in our hospital and those associated with the end of the epidemics. The genomes of 54 representative isolates were fully sequenced. The phylogenetic tree indicated two distinct clusters corresponding to two parallel outbreaks. The ancestor of the ST395 clone possibly contaminated our hospital water network during its construction in 1979. This hypothesis is supported by the fact that the ST395 strain had a specific genomic island carrying 6 copper transporter genes implicated in copper resistance, correlated with the resistance to this metal which water supply network is made of. The late isolates displayed independent genomic signatures of chronic adaptation in patients (altered LPS and porin OprD, and extinction of MexAB-oprM efflux pump overproduction). Some of these mutations were associated with a decreased in vitro fitness. We hypothesize that the independent emergence of isolates adapted to chronic infection, and thus the accumulation of epidemiological dead-ends, participated to the end of the hospital outbreak of P. aeruginosa ST395.
22

Análise de elementos cis-acting em regiões promotoras de genes relacionados com desenvolvimento radicular em arroz (Oryza sativa L.) / ANALYSIS OF CIS-ACTING ELEMENTS IN THE REGIONS OF PROMOTING GENES RELATED TO ROOT DEVELOPMENT

Farias, Daniel da Rosa 28 June 2013 (has links)
Submitted by Gabriela Lopes (gmachadolopesufpel@gmail.com) on 2016-09-28T14:00:15Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2016-09-28T19:03:52Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) / Made available in DSpace on 2016-09-28T19:03:52Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) Previous issue date: 2013-06-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / As raízes possuem uma grande variedade de funções nas plantas, incluindo absorção de água, nutrientes e suporte estrutural. A combinação de métodos clássicos de genética e melhoramento com tecnologias moleculares de análise genômica abre uma nova perspectiva para a ampliação do conhecimento das bases genéticas e aceleração de programas de melhoramento. A maioria dos conhecimentos sobre as redes gênicas envolvidas no desenvolvimento radicular vem sendo acumuladas na espécie Arabidopsis thaliana, modelo de planta dicotiledônea. O entendimento dos mecanismos envolvidos na regulação da expressão dos genes é essencial para compreender a forma e a função dos sistemas. Os elementos cis-acting são regiões do DNA que atuam como interruptores moleculares envolvidos na regulação da transcrição de uma rede gênica dinâmica. Embora freqüentemente tenham somente cinco a 20 pb de tamanho, os elementos cis-acting são críticos para o entendimento da regulação gênica. O conhecimento destes elementos presentes na região promotora de famílias gênicas, poderá contribuir para a compreensão dos sistemas reguladores da expressão da rede gênica envolvida na formação do sistema radicular. O objetivo desse trabalho é identificar os elementos cis-acting presentes na região promotora de genes de arroz (Oryza sativa subsp japonica cv. Nipponbare) similares aos genes das famílias Argonauta, Cullin e Ara de Arabidopsis thaliana. A região promotora dos genes destas famílias no arroz foi investigada quanto à abundância destes elementos. As seqüências foram analisadas utilizando o programa “Signal Scan Search” do portal “Plant Cis-acting Regulatory DNA Elements” (PLACE) para a identificação dos diferentes elementos cis-acting. Foram detectados 96 diferentes elementos, sendo cinco destes (GAREAT (TAACAAR), TGACGTVMAMY (TGACGT), CCAATBOX1 (CCAAT), LECPLEACS2 (TAAAATAT) e SV40COREENHAN (GTGGWWHG), comuns as famílias gênicas Argonauta, Cullin e Ara. / The roots have a large range of functions in plants, including acquisition of water and nutrients, as well as structural support. The combination of classical methods of genetics and breeding with molecular technologies for genomic analysis opens a new perspective to expand the knowledge of the genetic basis and to accelerate breeding programs. The most advanced knowledge regarding gene networks involved in root development has been obtained in the model dicotyledon plant species Arabidopsis thaliana. Understanding the mechanisms involved in regulation of gene expression is essential to predict the form and function of systems. Cis-acting elements are DNA regions that act as molecular switches involved in the regulation of transcription of dynamic gene network. Although often having only five to 20 bp in size, cis-acting elements are critical to the understanding of gene regulation. Knowledge of the cis-acting elements present in the promoter region of gene families, can contribute to the understanding of the expression regulatory systems of these genes and others, involved with the root system. The objective of this study is to identify the cisacting elements present in the upstream region of rice (Oryza sativa subsp japonica cv. Nipponbare) genes, similar to gene families Argonauta, Cullin and Ara in Arabidopsis thaliana. The promoter region of these rice gene families were investigated for the abundance of cis-acting elements. The sequences were analyzed using the software “Signal Scan Search” of the website “Plant Cis-acting Regulatory DNA Elements” (PLACE) to the identification of different cis-acting elements. It were detected 96 different cis-acting elements, and five of these, (GAREAT (TAACAAR), TGACGTVMAMY (TGACGT), CCAATBOX1 (CCAAT), LECPLEACS2 (TAAAATAT) e SV40COREENHAN (GTGGWWHG) were common to the gene families Argonauta, Cullin and Ara.
23

Diversidade e prospecção de metagenoma microbiano em fermentadores de biogás produzindo H2 / Diversity analysis and bioprospection of microbial metagenome in a H2-producing biogas fermenter

Tomazetto, Geizecler, 1979- 22 August 2018 (has links)
Orientador: Valeria Maia Merzel / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Biologia / Made available in DSpace on 2018-08-22T20:30:53Z (GMT). No. of bitstreams: 1 Tomazetto_Geizecler_D.pdf: 2728814 bytes, checksum: 5ca8845e4db01805f12ccff354b0a0b0 (MD5) Previous issue date: 2013 / Resumo: O hidrogênio é apontado como o candidato mais promissor para substituição do combustível fóssil devido a sua maior eficiência na conversão de energia útil e ausência de emissão de substâncias tóxicas. A produção de hidrogênio a partir de resíduos orgânicos é realizada por meio de digestão anaeróbica, tornando-se uma alternativa ecologicamente correta para atender à futura demanda por hidrogênio. No entanto, os micro-organismos e os processos metabólicos envolvidos estão longe de serem exaustivamente caracterizados. Nesse trabalho, amostras de uma planta de tratamento de esgoto doméstico foram analisadas em dois estudos complementares visando à caracterização de sua diversidade filogenética e a descrição de novas hidrogenases. O primeiro trabalho combinou a análise dos genes de RNAr 16S e FeFehidrogenase (hydA) com ferramentas estatísticas para estimar a riqueza e diversidade da comunidade procariótica em nível filogenético e funcional. As análises filogenéticas e de diversidade das bibliotecas gênicas demonstraram que todas as sequências de arquéias foram afiliadas a Euryarchaeota não cultivadas e, com relação ao Dominio Bactéria, Proteobacteria foi grupo filogenético predominante apresentando os maiores índices de diversidade e riqueza. As sequências putativas de hydA foram identificadas como sequências de genes de FeFehidrogenases ainda não descritas. Na segunda abordagem, a biblioteca metagenômica de fosmideo construída nesse estudo foi analisada empregando a tecnologia de pirosequenciamento 454 e resultou em aproximadamente 218 Mb de dados. Os três diferentes classificadores aplicados permitiram uma visão geral dos grupos taxonômicos mais abundantes devido ao enorme número de sequências metagenômicas não classificadas. Contudo, análises taxonômicas revelaram Gammaproteobacteria e Deltaproteobacteria, respectivamente, como as classes taxonômicas predominantes, enquanto que as espécies do gênero Methanospirillum foram dominantes entre as arquéias metanogênicas. A análise do metabolismo da comunidade microbiana através das bases de dados COG e Carma revelou que a degradação da biomassa depende de diferentes grupos filogenéticos, como por exemplo, Bacteroidia e Gammaproteobacteria, os quais foram indicados como envolvidos na degradação de carboidratos e proteínas, respectivamente. Além disso, as análises sugerem Clostridia e Methanomicrobiales e Methanosarcinales como principais micro-organismos produtores de hidrogênio e metano, respectivamente. As análises das seis sequências codificantes de FeFehidrogenase identificadas no conjunto de dados metagenômicos revelaram que essas representam novas sequências do gene alvo. Contudo, quatro dessas sequências foram identificadas na biblioteca de fosmídeo pela triagem gênica baseada no uso de PCR. O conjunto de resultados obtido nesse estudo permitiu elucidar a composição e o potencial metabólico dos micro-organismos residentes na planta de tratamento de esgoto analisada e sugere esse ambiente como um reservatório potencial de novos genes de hidrogenases para a exploração biotecnológica / Abstract: Hydrogen appears to be the most promising candidate for the replacement of fossil fuel due to its potentially higher efficiency of conversion to usable power and no toxic emission production. The production of hydrogen from organic wastes is performed through the anaerobic digestion, making it an environmentally friendly alternative for satisfying future hydrogen demands. Nonetheless, the microorganisms and metabolic processes involved are far from being exhaustively characterized. In this work, samples of a domestic sewage treatment plant were analyzed in two complementary studies aiming at the characterization of its phylogenetic diversity and the description of new hydrogenases. The first one, combined the analysis of 16S rRNA and [FeFe]-hydrogenase (hydA) genes with statistical tools to estimate richness and diversity of the prokaryotic community at the phylogenetic and functional levels. Phylogenetic analysis showed that all archaeal sequences were affiliated with yet uncultured Euryarchaeota and that Proteobacteria were the most predominant and diversified phylogenetic group within the bacterial library. The putative hydA sequences were identified as hitherto undetected [Fe-Fe]- hydrogenase gene sequences. Diversity statistical analysis confirmed a great richness and diversity of bacterial and hydA sequences retrieved from the sewage sludge sample. In the second approach, a fosmid metagenomic library was constructed and analyzed employing 454- pyrosequencing technology, resulting in approximately 218 Mb of data. Three different classifiers applied allowed a broad overview of the most abundant taxonomic groups due to a huge number of metagenome reads remained unidentified. However, taxonomic analysis revealed Gammaproteobacteria and Deltaproteobacteria, respectively, as the most abundant classes, whereas species of the genus Methanospirillum were dominant among methanogenic Archaea. The analyzes of the microbial community metabolism by means of COG and Carma databases revealed that the degradation of biomass depends on different phylogenetic groups, for instance, Bacteroidia and Gammaproteobacteria were indicated as involved into the degradation of carbohydrate and proteins. Furthermore, the analysis suggested Clostridia and Methanomicrobiales and Methanosarcinales as the main microorganisms producing hydrogen and methane, respectively. Analysis of the six coding sequences of FeFe-hydrogenases identified into the dataset revealed that they represented novel target gene sequences. However, only four of these coding sequences could be detected into the fosmid library by PCR screening. The combined results obtained in this study allowed us to have an insight of the composition and potential metabolism of the microbes residing in the analyzed domestic sewage treatment plant and suggested such environment as a potential reservoir for new hydrogenase genes to biotechnological exploration / Doutorado / Genetica de Microorganismos / Doutora em Genética e Biologia Molecular
24

Análise de elementos cis-acting em regiões promotoras de genes relacionados com desenvolvimento radicular em arroz (Oryza sativa L.) / Analysis of cis-acting elements in the regions of promoting genes related to root development

Farias, Daniel da Rosa 28 June 2013 (has links)
Submitted by Gabriela Lopes (gmachadolopesufpel@gmail.com) on 2017-02-08T14:22:01Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-02-15T16:51:40Z (GMT) No. of bitstreams: 2 dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-02-15T16:51:40Z (GMT). No. of bitstreams: 2 dissertação.pdf: 1451490 bytes, checksum: 8db111be8fc2a2a277ed6aa88f84a6e6 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2013-06-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / As raízes possuem uma grande variedade de funções nas plantas, incluindo absorção de água, nutrientes e suporte estrutural. A combinação de métodos clássicos de genética e melhoramento com tecnologias moleculares de análise genômica abre uma nova perspectiva para a ampliação do conhecimento das bases genéticas e aceleração de programas de melhoramento. A maioria dos conhecimentos sobre as redes gênicas envolvidas no desenvolvimento radicular vem sendo acumuladas na espécie Arabidopsis thaliana, modelo de planta dicotiledônea. O entendimento dos mecanismos envolvidos na regulação da expressão dos genes é essencial para compreender a forma e a função dos sistemas. Os elementos cis-acting são regiões do DNA que atuam como interruptores moleculares envolvidos na regulação da transcrição de uma rede gênica dinâmica. Embora freqüentemente tenham somente cinco a 20 pb de tamanho, os elementos cis-acting são críticos para o entendimento da regulação gênica. O conhecimento destes elementos presentes na região promotora de famílias gênicas, poderá contribuir para a compreensão dos sistemas reguladores da expressão da rede gênica envolvida na formação do sistema radicular. O objetivo desse trabalho é identificar os elementos cis-acting presentes na região promotora de genes de arroz (Oryza sativa subsp japonica cv. Nipponbare) similares aos genes das famílias Argonauta, Cullin e Arade Arabidopsis thaliana. A região promotora dos genes destas famílias no arroz foi investigada quanto à abundância destes elementos. As seqüências foram analisadas utilizando o programa “Signal Scan Search” do portal “Plant Cis-acting Regulatory DNA Elements” (PLACE) para a identificação dos diferentes elementos cis-acting. Foram detectados 96 diferentes elementos, sendo cinco destes (GAREAT (TAACAAR), TGACGTVMAMY (TGACGT), CCAATBOX1 (CCAAT), LECPLEACS2 (TAAAATAT) e SV40COREENHAN (GTGGWWHG), comuns as famílias gênicas Argonauta, Cullin e Ara. / The roots have a large range of functions in plants, including acquisition of water and nutrients, as well as structural support. The combination of classical methods of genetics and breeding with molecular technologies for genomic analysis opens a new perspective to expand the knowledge of the genetic basis and to accelerate breeding programs. The most advanced knowledge regarding gene networks involved in root development has been obtained in the model dicotyledon plant species Arabidopsis thaliana. Understanding the mechanisms involved in regulation of gene expression is essential to predict the form and function of systems. Cis-actingelements are DNA regions that act as molecular switches involved in the regulation of transcription of dynamic gene network. Although often having only five to 20 bp in size, cis-actingelements are critical to the understanding of gene regulation. Knowledge of the cis-acting elements present in the promoter region of gene families, can contribute to the understanding of the expression regulatory systems of these genes and others, involved with the root system. The objective of this study is to identify the cisacting elements present in the upstream region of rice (Oryza sativa subsp japonica cv. Nipponbare) genes, similar to gene families Argonauta, Cullin and Arain Arabidopsis thaliana. The promoter region of these rice gene families were investigated for the abundance of cis-acting elements. The sequences were analyzed using the software “Signal Scan Search” ofthe website “Plant Cis-acting Regulatory DNA Elements” (PLACE) to the identification of different cis-acting elements. It were detected 96 different cis-acting elements, and five of these, (GAREAT (TAACAAR), TGACGTVMAMY (TGACGT), CCAATBOX1 (CCAAT), LECPLEACS2 (TAAAATAT) e SV40COREENHAN (GTGGWWHG) were common to the gene families Argonauta, Cullin andAra.
25

G4-Hunter : un nouvel algorithme pour la prédiction des G-quadruplexes / G4-Hunter : a new algorithm for G-quadruplexes prediction’s

Bedrat, Amina 06 November 2015 (has links)
Des séquences compatibles avec la formation de G4 sont présentes au niveau de certaines régions clés du génome telles que les extrémités des chromosomes, mais également les régions de commutation de classe des immunoglobulines, les promoteurs de certains gènes dont des oncogènes et des séquences transcrites. Plus de 370 000 cibles potentielles ont été prédites lors des analyses bioinformatiques du génome humain. Cependant, ces prédictions ne sont pas exhaustives étant limitées par la formulation des algorithmes de prédiction utilisés. En effet, les séquences recherchées suivent la formule consensus suivante G3+N(1−7)G3+N(1−7)G3+N(1−7)G3+. Ainsi, en apportant plus de souplesse dans la description du quadruplex nous pourrons identifier et localiser plus de cibles potentielles. C’est pourquoi, nous proposons un nouvel algorithme G4-Hunter qui permettra l’identification la plus exhaustive possible de séquences cibles en prenant en compte la totalité de la région et non plus uniquement la cible potentielle. Par ailleurs, une étude expérimentale à grande échelle (sur une centaine de séquences cibles) a été menée afin de valider et tester la robustesse de G4-Hunter. A l’aide de ce nouvel outil, nous avons pu identifier de nouvelles séquences cibles non identifiées par les approches déjà existantes au sein des génomes humain, HIV et Dictyostelium discoideum. / Biologically relevant G4 DNA structures are formed throughout the genome including immunoglobulin switch regions, promoter sequences and telomeric repeats. They can arise when single-stranded G-rich DNA or RNA sequences are exposed during replication, transcription or recombination. Computational analysis using predictive algorithms suggests that the human genome contains approximately 370 000 potential G4-forming sequences. These predictions are generally limited to the standard G3+N(1−7)G3+N(1−7)G3+N(1−7)G3+ description. However, many stable G4s defy this description and escape this consensus; this is the reason why broadening this description should allow the prediction of more G4 loci. We propose an objective score function, G4- hunter, which predicts G4 folding propensity from a linear nucleic acid sequence. The new method focus on guanines clusters and GC asymmetry, taking into account the whole genomic region rather than individual quadruplexes sequences. In parallel with this computational technique, a large scale in vitro experimental work has also been developed to validate the performance of our algorithm in silico on one hundred of different sequences. G4- hunter exhibits unprecedented accuracy and sensitivity and leads us to reevaluate significantly the number of G4-prone sequences in the human genome. G4-hunter also allowed us to predict potential G4 sequences in HIV and Dictyostelium discoideum, which could not be identified by previous computational methods.
26

The Role of the Glycerophosphocholine Remodelling in Alzheimer’s Disease

P. Blanchard, Alexandre January 2016 (has links)
Advances in high performance liquid chromatography-electrospray ionization-mass spectrometry made in proteomics and now applied to the emerging field of lipidomics has enabled the identification of lipid composition at the molecular level. These improvements have given fresh impetus to lipid research. Modulating lipid compositions has been suggested to represent a novel therapeutic target for intervention in Alzheimer’s disease. A better understanding of how metabolic alterations in the lipid landscape alter Alzheimer’s disease prognosis is required to realize this promise. To achieve this goal, further methodological improvement in lipidomic data acquisition and analysis are required as are comprehensive comparative analyses of lipid metabolism at the systems level in clinical samples and mouse models of human neurodegenerative disease. In this thesis, I present two new lipidomic bioinformatic tools Retention Time Standardization and Registration (RTStaR) and Visualization and Phospholipid Identification (VaLID) designed to facilitate analysis of high performance liquid chromatography-electrospray ionization-mass spectrometry lipidomic data. Using these tools and methodologies, I then comparatively profiled the glycerophosphocholine lipidome in the plasma of young adults, cognitively normal elderly with vascular impairment, mild cognitive impairment and late-onset Alzheimer’s disease patients and the entorhinal-hippocampal circuit of late-onset Alzheimer’s disease patients, TgCRND8 human amyloid beta precursor protein transgenic mice (Alzheimer’s disease mouse model), and across the lifespan of NonTg female littermates. Systems-level analyses identified aberrant glycerophosphocholine metabolic pathways systemically perturbed by age, disease, and amyloid beta biogenesis resulting in the regionally-specific accumulation of critical platelet-activating factor and, to a lesser extent, the lysoglycerophosphocholine, metabolites in brain that could be, in part, predicted by changes in plasma. Finally, using proteomic approaches I identified additional changes in lipid metabolic pathways associated with phenoconversion in the TgCRND8 mouse model of Alzheimer’s disease.
27

Identification de motifs au sein des structures biologiques arborescentes / Pattern identification in biological tree structure

Gaillard, Anne-Laure 30 November 2011 (has links)
Avec l’explosion de la quantité de données biologiques disponible, développer de nouvelles méthodes de traitements efficaces est une problématique majeure en bioinformatique. De nombreuses structures biologiques sont modélisées par des structures arborescentes telles que les structures secondaires d’ARN et l’architecture des plantes. Ces structures contiennent des motifs répétés au sein même de leur structure mais également d’une structure à l’autre. Nous proposons d’exploiter cette propriété fondamentale afin d’améliorer le stockage et le traitement de tels objets.En nous inspirant du principe de filtres sur les séquences, nous définissons dans cette thèse une méthode de filtrage sur les arborescences ordonnées permettant de rechercher efficacement dans une base de données un ensemble d’arborescences ordonnées proches d’une arborescence requête. La méthode se base sur un découpage de l’arborescence en graines et sur une recherche de graines communes entre les structures. Nous définissons et résolvons le problème de chainage maximum sur des arborescences. Nous proposons dans le cas des structures secondaires d’ARN une définition de graines (l−d) centrées.Dans un second temps, en nous basant sur des techniques d’instanciations utilisées, par exemple, en infographie et sur la connaissance des propriétés de redondances au sein des structures biologiques, nous présentons une méthode de compression permettant de réduire l’espace mémoire nécessaire pour le stockage d’arborescences non-ordonnées. Après une détermination des redondances nous utilisons une structure de données plus compacte pour représenter notamment l’architecture de la plante, celle-ci pouvant contenir des informations topologiques mais également géométriques. / The explosion of available biological data urges the need for bioinformatics methods. Manybiological structures are modeled by tree structures such as RNA secondary structure and plantsarchitecture. These structures contain repeating units within their structure, but also betweendifferent structures. We propose to exploit this fundamental property to improve storage andtreatment of such objects.Following the principle of sequence filtering, we define a filtering method on ordered treesto efficiently retrieve in a database a set of ordered trees close from a query. The method isbased on a decomposition of the tree into seeds and the detection of shared seeds between thesestructures. We define and solve the maximum chaining problem on trees. We propose for RNAsecondary structure applications a definition of (l−d) centered seed.Based on instantiation techniques used for instance in computer graphics and the repetitivenessof biological structures, we present a compression method which reduces the memoryspace required for plant architecture storage. A more compact data structure is used in order torepresent plant architecture. The construction of this data structure require the identification ofinternal redundancies and taking into account both topological and geometrical informations.
28

Bioinformatique pour l’exploration de la diversité inter-espèces et inter-populations : hétérogénéité & données multi-omiques / Bioinformatics for exploring inter-species and inter-population diversity : heterogenity & multi-omics data

Cogne, Yannick 07 October 2019 (has links)
L’exploitation conjointe des données transcriptomiques et protéomiques permet l’étude détaillée des mécanismes moléculaires induits lors de perturbations environnementales. L’assemblage de données issues du séquençage des ARNs d’organismes dit « non-modèle » permet de produire la base de données pour l’interprétation des spectres générés en protéomique shotgun. Dans ce contexte, les travaux de thèse avaient pour objectif d’optimiser l’interprétation et l’analyse des données protéomiques par le développement de concepts innovants pour la construction de bases de données protéiques et l’exploration de la biodiversité. La première étape s’est concentrée sur la mise au point d’une méthode de pré-traitement des données de séquençage basée sur les résultats d’attribution protéomique. La deuxième étape a consisté à travailler sur la réduction de la taille des bases de données en optimisant les paramètres de la recherche automatisée des régions codantes. La méthode optimisée a permis l’analyse de 7 groupes taxonomiques de Gammaridés représentatifs de la diversité retrouvée in natura. Les bases de données protéomiques ainsi produites ont permis l’analyse inter-population de 40 protéomes individuels de Gammarus pulex répartis sur deux sites de prélèvement (pollué vs référence). L’analyse statistique basée sur une approche « individu-centré » a montré une hétérogénéité de la réponse biologique au sein d’une population d’organismes suite à une perturbation environnementale. Différents sous-groupes de mécanismes moléculaires induits ont été identifiés. Enfin, l’étude de la transversalité de biomarqueurs peptidiques identifiés chez Gammarus fossarum a permis de définir les peptides communs à l’aide de l’ensemble des données protéomiques et transcriptomiques. Pour cela, un logiciel d’exploration des séquences peptidiques a été développé permettant de proposer de potentiels biomarqueurs substituts dans le cas où les peptides définis ne sont pas applicables à certaines espèces de gammare. Tous ces concepts s’intègrent dans une démarche pour améliorer et approfondir l’interprétation des données par protéogénomique. Ces travaux entrouvrent la porte à l’analyse multi-omique d’individus prélevés in natura en considérant la biodiversité inter-espèce et intra-population. / The exploitation of omics data combining transcriptomic and proteomic enables the detailed study of the molecular mechanisms of non-model organisms exposed to an environmental stress. The assembly of data from the RNA-seq of non-model organism enables to produce the protein database for the interpretation of spectra generated in shotgun proteomics. In this context, the aim of the PhD work was to optimize the interpretation and analysis of proteomic data through the development of innovative concepts for the construction of protein databases and the exploration of biodiversity. The first step focused on the development of a pretreatment method for RNA-seq data based on proteomic attribution results. The second step was to work on reducing the size of the databases by optimizing the parameters of the automated coding region search. The optimized method enabled the analysis of 7 taxonomic groups of Gammarids representative of the diversity found in natura. The proteomic databases thus produced enabled the inter-population analysis of 40 individual Gammarus pulex proteomes from two sampling sites (polluted vs reference). Statistical analysis based on an "individual" approach has shown an heterogeneity of the biological response within a population of organisms induced by an environmental stress. Different subclusters of molecular mechanisms response have been identified. Finally, the study of the transversality of the biomarkers peptides identified with Gammarus fossarum revealed which are the common ones using both proteomic and transcriptomic data. For this purpose, a software for the exploration of peptide sequences has been developed suggesting potential substitute biomarkers when the defined peptides are not available for some species of gammarids. All these concepts aim to improve the interpretation of data by proteogenomics. This work opens the door to the multi-omic analysis of individuals collected in natura by considering inter-species and intra-population biodiversity.
29

SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNING

Junhui Wang (5930375) 15 May 2019 (has links)
<p><a>Internal ribosome entry sites (IRES) are segments of the mRNA found in untranslated regions, which can recruit the ribosome and initiate translation independently of the more widely used 5’ cap dependent translation initiation mechanism. IRES play an important role in conditions where has been 5’ cap dependent translation initiation blocked or repressed. They have been found to play important roles in viral infection, cellular apoptosis, and response to other external stimuli. It has been suggested that about 10% of mRNAs, both viral and cellular, can utilize IRES. But due to the limitations of IRES bicistronic assay, which is a gold standard for identifying IRES, relatively few IRES have been definitively described and functionally validated compared to the potential overall population. Viral and cellular IRES may be mechanistically different, but this is difficult to analyze because the mechanistic differences are still not very clearly defined. Identifying additional IRES is an important step towards better understanding IRES mechanisms. Development of a new bioinformatics tool that can accurately predict IRES from sequence would be a significant step forward in identifying IRES-based regulation, and in elucidating IRES mechanism. This dissertation systematically studies the features which can distinguish IRES from nonIRES sequences. Sequence features such as kmer words, and structural features such as predicted MFE of folding, Q<sub>MFE</sub>, and sequence/structure triplets are evaluated as possible discriminative features. Those potential features incorporated into an IRES classifier based on XGBboost, a machine learning model, to classify novel sequences as belong to IRES or nonIRES groups. The XGBoost model performs better than previous predictors, with higher accuracy and lower computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by adding global kmer and structural features. The trained XGBoost model has been implemented as the first high-throughput bioinformatics tool for IRES prediction, IRESpy. This website provides a public tool for all IRES researchers and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.</a></p>
30

Reconstruction and Local Recovery of Data from Synchronization Errors

Minshen Zhu (15334783) 21 April 2023 (has links)
<p>In this thesis we study the complexity of data recovery from synchronization errors, namely insertion and deletion (insdel) errors.</p> <p>Insdel Locally Decodable Codes (Insdel LDCs) are error-correcting codes that admit super-efficient decoding algorithms even in the presence of many insdel errors. The study of such codes for Hamming errors have spanned several decades, whereas work on the insdel analogue had amounted to only a few papers before our work. This work initiates a systematic study of insdel LDCs, seeking to bridge this gap through designing codes and proving limitations. Our upper bounds essentially match those for Hamming LDCs in important ranges of parameters, even though insdel LDCs are more general than Hamming LDCs. Our main results are lower bounds that are exponentially stronger than the ones inherited from the Hamming LDCs. These results also have implications for the well-studied variant of relaxed LDCs. For this variant, besides showing the first results in the insdel setting, we also answer an open question for the Hamming variant by showing a strong lower bound.</p> <p>In the trace reconstruction problem, the goal is to recover an unknown source string x \in {0,1}n from random traces, which are obtained by hitting the source string with random deletion/insertions at a fixed rate. Mean-based algorithms are a class of reconstruction algorithms whose outputs depend only on the empirical estimates of individual bits. The number of traces needed for mean-based trace reconstruction has already been settled. We further study the performance of mean-based algorithms in a scenario where one wants to distinguish between two source strings parameterized by their edit distance, and we also provide explicit construction of strings that are hard to distinguish. We further establish an equivalence to the Prouhet-Tarry-Escott problem from number theory, which ends up being an obstacle to constructing explicit hard instances against mean-based algorithms.</p>

Page generated in 0.0586 seconds