• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 4
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Bayesian Integration and Modeling for Next-generation Sequencing Data Analysis

Chen, Xi 01 July 2016 (has links)
Computational biology currently faces challenges in a big data world with thousands of data samples across multiple disease types including cancer. The challenging problem is how to extract biologically meaningful information from large-scale genomic data. Next-generation Sequencing (NGS) can now produce high quality data at DNA and RNA levels. However, in cells there exist a lot of non-specific (background) signals that affect the detection accuracy of true (foreground) signals. In this dissertation work, under Bayesian framework, we aim to develop and apply approaches to learn the distribution of genomic signals in each type of NGS data for reliable identification of specific foreground signals. We propose a novel Bayesian approach (ChIP-BIT) to reliably detect transcription factor (TF) binding sites (TFBSs) within promoter or enhancer regions by jointly analyzing the sample and input ChIP-seq data for one specific TF. Specifically, a Gaussian mixture model is used to capture both binding and background signals in the sample data; and background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. An Expectation-Maximization algorithm is used to learn the model parameters according to the distributions on binding signal intensity and binding locations. Extensive simulation studies and experimental validation both demonstrate that ChIP-BIT has a significantly improved performance on TFBS detection over conventional methods, particularly on weak binding signal detection. To infer cis-regulatory modules (CRMs) of multiple TFs, we propose to develop a Bayesian integration approach, namely BICORN, to integrate ChIP-seq and RNA-seq data of the same tissue. Each TFBS identified from ChIP-seq data can be either a functional binding event mediating target gene transcription or a non-functional binding. The functional bindings of a set of TFs usually work together as a CRM to regulate the transcription processes of a group of genes. We develop a Gibbs sampling approach to learn the distribution of CRMs (a joint distribution of multiple TFs) based on their functional bindings and target gene expression. The robustness of BICORN has been validated on simulated regulatory network and gene expression data with respect to different noise settings. BICORN is further applied to breast cancer MCF-7 ChIP-seq and RNA-seq data to identify CRMs functional in promoter or enhancer regions. In tumor cells, the normal regulatory mechanism may be interrupted by genome mutations, especially those somatic mutations that uniquely occur in tumor cells. Focused on a specific type of genome mutation, structural variation (SV), we develop a novel pattern-based probabilistic approach, namely PSSV, to identify somatic SVs from whole genome sequencing (WGS) data. PSSV features a mixture model with hidden states representing different mutation patterns; PSSV can thus differentiate heterozygous and homozygous SVs in each sample, enabling the identification of those somatic SVs with a heterozygous status in the normal sample and a homozygous status in the tumor sample. Simulation studies demonstrate that PSSV outperforms existing tools. PSSV has been successfully applied to breast cancer patient WGS data for identifying somatic SVs of key factors associated with breast cancer development. In this dissertation research, we demonstrate the advantage of the proposed distributional learning-based approaches over conventional methods for NGS data analysis. Distributional learning is a very powerful approach to gain biological insights from high quality NGS data. Successful applications of the proposed Bayesian methods to breast cancer NGS data shed light on underlying molecular mechanisms of breast cancer, enabling biologists or clinicians to identify major cancer drivers and develop new therapeutics for cancer treatment. / Ph. D.
2

Functional analysis of genomic variations associated with emerging artemisinin resistant P. falciparum parasite populations and human infecting piroplasmida B. microti / Analyse fonctionnelle des variations du génome au sein de populations de P. falciparum résistantes à l’artémisinine et chez le piroplasme responsable de la babésiose humaine B. microti

Dwivedi, Ankit 28 September 2016 (has links)
Le programme d’élimination du paludisme de l’OMS est menacé par l’émergence etla propagation potentielle de parasites de l’espèce Plasmodium falciparum résistants à l’artémisinine. Récemment il a été montré que (a) des SNPs dans une région du chromosome 13 subissaient une forte sélection positive récente au Cambodge,(b) plusieurs sous-populations de parasites de P. falciparum résistants et sensibles à l’artémisinine étaient présentes au Cambodge, (c) des mutations dans le domaine Kelch du gène k13 sont des déterminants majeurs de la résistance à l’artémisinine dans la population parasitaire cambodgien et (d) des parasites de sous-populations du nord du Cambodge près de la Thaïlande et du Laos sont résistants à la méfloquine et portent l’allèle R539T du gène de k13.Il est donc nécessaire d’identifier la base génétique de la résistance dans le but de surveiller et de contrôler la transmission de parasites résistants au reste du monde, pour comprendre le métabolisme des parasites et pour le développement de nouveaux médicaments. Ce travail a porté sur la caractérisation de la structure de la population de P. falciparum au Cambodge et la description des propriétés métaboliques des sous-populations présentes ainsi que des flux de gènes entre ces sous-populations. Le but est d’identifier les bases génétiques associées à la transmission et l’acquisition de résistance à l’artémisinine dans le pays.La première approche par code-barre a été développée pour identifier des sous-populations à l’aide d’un petit nombre de loci. Une approche moléculaire de PCR-LDR-FMA multiplexée et basée sur la technologie LUMINEX a été mise au point pour identifier les SNP dans 537 échantillons de sang (2010 - 2011) provenant de 16centres de santé au Cambodge. La présence de sous-populations le long des frontières du pays a été établie grâce à l’analyse de 282 échantillons. Les flux de gènes ont été décrits à partir des 11 loci du code-barre. Le code-barre permet d’identifier les sous-populations de parasites associées à la résistance à l’artémisinine et à la méfloquine qui ont émergé récemment.La seconde approche de caractérisation de la structure de la population de P.falciparum au Cambodge a été définie sur la base de l’analyse de 167 génomes de parasites (données NGS de 2008 à 2011) provenant de quatre localités au Cambodge et récupérés à partir de la base de données ENA. Huit sous-populations de parasites ont pu être décrites à partir d’un jeu de 21257 SNPs caractérisés dans cette étude. La présence de sous-populations mixtes de parasite apparait comme un risque majeur pour la transmission de la résistance à l’artémisinine. L’analyse fonctionnelle montre qu’il existe un fond génétique commun aux isolats dans les populations résistantes et a confirmé l’importance de la voie PI3K dans l’acquisition de la résistance en aidant le parasite à rester sous forme de stade anneau.Nos résultats remettent en question l’origine et la persistance des sous-populations de P. falciparum au Cambodge, fournissent des preuves de flux génétique entre les sous-populations et décrivent un modèle d’acquisition de résistance à l’artémisinine.Le processus d’identification des SNPs fiables a été ensuite appliqué au génome de Babesia microti. Ce parasite est responsable de la babésiose humain (un syndrome de type malaria) et est endémique dans le nord-est des Etats-Unis. L’objectif était de valider la position taxonomique de B. microti en tant que groupe externe aux piroplasmes et d’améliorer l’annotation fonctionnelle du génome en incluant la variabilité génétique, l’expression des gènes et la capacité antigénique des protéines. Nous avons ainsi identifié de nouvelles protéines impliquées dans les interactions hôte-parasite. / The undergoing WHO Malaria elimination program is threatened by the emergenceand potential spread of the Plasmodium falciparum artemisinin resistant parasite.Recent reports have shown (a) SNPs in region of chromosome 13 to be understrong recent positive selection in Cambodia, (b) presence of P. falciparum parasiteresistant and sensitive subpopulations in Cambodia, (c) the evidence that mutationsin the Kelch propeller domain of the k13 gene are major determinants ofartemisinin resistance in Cambodian parasite population and (d) parasite subpopulations in Northern Cambodia near Thailand and Laos with mefloquine drugresistance and carrying R539T allele of the k13 gene.Identifying the genetic basis of resistance is important to monitor and control thetransmission of resistant parasites and to understand parasite metabolism for the development of new drugs. This thesis focuses on analysis of P. falciparum population structure in Cambodia and description of metabolic properties of these subpopulations and gene flow among them. This could help in identifying the genetic evidence associated to transmission and acquisition of artemisinin resistance over the country.First, a barcode approach was used to identify parasite subpopulations using smallnumber of loci. A mid-throughput PCR-LDR-FMA approach based on LUMINEXtechnology was used to screen for SNPs in 537 blood samples (2010 - 2011) from 16health centres in Cambodia. Based on successful typing of 282 samples, subpopulations were characterized along the borders of the country. Gene flow was described based on the gradient of alleles at the 11 loci in the barcode. The barcode successfully identifies recently emerging parasite subpopulations associated to artemisinin and mefloquine resistance.In the second approach, the parasite population structure was defined based on167 parasite NGS genomes (2008 - 2011) originating from four locations in Cambodia,recovered from the ENA database. Based on calling of 21257 SNPs, eight parasite subpopulations were described. Presence of admixture parasite subpopulation couldbe supporting artemisinin resistance transmission. Functional analysis based on significant genes validated similar background for resistant isolates and revealed PI3K pathway in resistant populations supporting acquisition of resistance by assisting the parasite in ring stage form.Our findings question the origin and the persistence of the P. falciparum subpopulations in Cambodia, provide evidence of gene flow among subpopulations anddescribe a model of artemisinin resistance acquisition.The variant calling approach was also implemented on the Babesia microti genome.This is a malaria like syndrome, and is endemic in the North-Eastern USA. Theobjective was to validate the taxonomic position of B. microti as out-group amongpiroplasmida and improve the functional genome annotation based on genetic variation, gene expression and protein antigenicity. We identified new proteins involved in parasite host interactions.
3

Techniques for Storing and Processing Next-Generation DNA Sequencing Data

Camerlengo, Terry Luke 02 June 2014 (has links)
No description available.
4

Efektivní hledání překryvů u NGS dat / Effective Search for Overlaps in NGS Data

Matocha, Petr January 2017 (has links)
The main theme of this work is the detection of overlaps in NGS data. The work contains an overview of NGS sequencing technologies that are the source of NGS data. In the thesis, the problem of overlapping detection is generally defined. Next, an overview of the available algorithms and approaches for detecting overlaps in NGS data is created. Principles of these algorithms are described herein. In the second part of this work a suitable tool for detecting approximate overlaps in NGS data is designed and its implementation is described herein. In conclusion, the experiments performed with this tool and the conclusions that follow are summarized and described.

Page generated in 0.0315 seconds