Global ETD Search

41	Caractérisation des erreurs de séquençage non aléatoires : application aux mosaïques et tumeurs hétérogènes / Characterization of non-random sequencing errors : application to mosaicism and heterogeneous tumors Saad, Chadi 26 September 2018 (has links) L'arrivée des technologies de séquençage d’ADN à haut-débit a représenté une révolution dans le domaine de la génomique personnalisée, en raison de leur résolution et leur faible coût. Toutefois, ces nouvelles technologies présentent un taux d’erreur élevé, qui varie entre 0,1% et 1% pour les séquenceurs de seconde génération. Cette valeur est problématique dans le cadre de la recherche de variants de faible ratio allélique, comme ce qui est observé dans le cas des tumeurs hétérogènes. En effet, un tel taux d’erreur peut mener à des milliers de faux positifs. Chaque région de l’ADN étudié doit donc être séquencée plusieurs fois, et les variants sont alors filtrés en fonction de critères basés sur leur profondeur. Malgré ces filtres, le nombre d’artefacts reste important, montrant la limite des approches conventionnelles et indiquant que certains artefacts de séquençage ne sont pas aléatoires.Dans le cadre de cette thèse, nous avons développé un algorithme exact de recherche des motifs d’ADN dégénérés sur-représentés en amont des erreurs de séquençage non aléatoires et donc potentiellement liés à leur apparition. Cet algorithme a été mis en oeuvre dans un logiciel appelé DiNAMO, qui a été testé sur des données de séquençage issues des technologies IonTorrent et Illumina.Les résultats expérimentaux ont mis en évidence plusieurs motifs, spécifiques à chacune de ces deux technologies. Nous avons ensuite montré que la prise en compte de ces motifs dans l’analyse, réduisait considérablement le taux de faux positifs. DiNAMO peut donc être utilisé en aval de chaque analyse, comme un filtre supplémentaire permettant d’améliorer l’identification des variants, en particulier des variants à faible ratio allélique. / The advent of Next Generation DNA Sequencing technologies has revolutionized the field of personalized genomics through their resolution and low cost. However, these new technologies are associated with a relatively high error rate, which varies between 0.1% and 1% for second-generation sequencers. This value is problematic when searching for low allelic ratio variants, as observed in the case of heterogeneous tumors. Indeed, such error rate can lead to thousands of false positives. Each region of the studied DNA must therefore be sequenced several times, and the variants are then filtered according to criteria based on their depth. Despite these filters, the number of errors remains significant, showing the limit of conventional approaches and indicating that some sequencing errors are not random.In the context of this thesis, we have developed an exact algorithm for over-represented degenerate DNA motifs discovery on the upstream of non-random sequencing errors and thus potentially linked to their appearance. This algorithm was implemented in a software called DiNAMO, which was tested on sequencing data from IonTorrent and Illumina technologies.The experimental results revealed several motifs, specific to each of these two technologies. We then showed that taking these motifs into account in the analysis reduced significantly the false-positive rate. DiNAMO can therefore be used downstream of each analysis, as an additional filter to improve the identification of variants, especially, variants with low allelic ratio. Recherche de motif Facteur de transcription Erreur de séquençage systématique IUPAC Tumeurs hétérogènes Pattern Chip-Seq
42	Dynamics of epigenome and 3D genome in hematopoietic stem cell development Chen, Changya 15 December 2017 (has links) Hematopoietic stem cell (HSC) development is accompanied by dynamic changes in the transcriptional program. How the corresponding transcriptional programs are related to the epigenetic mechanism is poorly understood. To fill this gap, we first profiled the transcriptomes and epigenomes using RNA-Seq and ChIP-Seq for five key developmental stages of HSC emergence in the mouse embryo. Using epigenetic markers, we identified novel 12,000~17,000 enhancers for each developmental stage. We applied a computational tool to link those enhancers to their target genes. Systematical analysis of enhancer-promoter (EP) pairs using network-based strategy reveals multiple novel key transcription factors for early specification of HSC in the mouse embryo. Second, we compared the 3D genome organization, epigenomes, and transcriptome of fetal and adult HSCs in the mouse. We found that higher-order genome structures are largely conserved between fetal and adult HSCs, including chromosomal compartments and topologically associating domains (TADs). However, chromatin interactions within TADs exhibit substantial differences. We found that promoters within 23% (242/1039) of TADs undergo interaction changes. Transcription factor motif analysis of HSC-specific enhancer-promoter loops suggests a role of KLF1 in mediating condition-specific enhancer looping and regulation of genes involved in cell cycle. Our result provides a comprehensive view of the differences in 3D genome organization, epigenome, and transcriptome between fetal and adult HSCs. ChIP-Seq enhancer-promoter interaction epigenetics genome organization hematopoietic stem cells transcriptional program Genetics
43	Regulation of Pol II transcription and mRNA capping Nilson, Kyle Andrew 01 May 2016 (has links) In humans, RNA polymerase II is the sole source of messenger RNAs that are ultimately translated into proteins and its transcriptional activity is highly regulated. Mechanisms have evolved to control which, when, and to what degree genes are transcribed. Because most cells have the same genome, control of transcription is essential in maintaining cellular identity. Misregulation of Pol II transcription is a hallmark of both cancer and retroviral infection. This research investigates the regulation of Pol II transcription and related co-transcriptional mRNA capping. Chromatin immunoprecipitation experiments were used to characterize the composition of nucleosomes and Pol II, DSIF and NELF occupancies at bidirectional promoters and enhancers. In collaboration with Alberto Bosque and Vicente Planelles, sequencing experiments were performed in a primary T cell model of HIV latency and a role for sequence-specific recruitment of STAT5 was established in HIV reactivation. In contrast, analysis of Myc binding in vitro and in cells demonstrated that transcription machinery played a major role in recruiting Myc to genomic sites. A precise method was also developed to detect polymerase-associated nascent transcripts in nuclei. The roles of Cdk7, a subunit of TFIIH that phosphorylates Pol II during initiation, were characterized by treatment of nuclear extracts and cells with THZ1, a recently developed covalent inhibitor with anti-cancer properties. Inhibition of Cdk7 was demonstrated to cause defects in Pol II phosphorylation, co-transcriptional capping, promoter proximal pausing, and productive elongation. Capping of nascent RNAs was found to be spatially and temporally regulated in part by a previously undescribed THZ1-sensitive factor present in nuclear extract. THZ1 impacted pausing through a capping-independent block of DSIF and NELF loading. The P-TEFb-dependent transition into productive elongation was also inhibited by THZ1, likely due to misloading of DSIF. In vitro and sequencing methods were used to describe an extremely rapid and global transcriptional response to hydrogen peroxide. During periods of oxidative stress, termination was likely inhibited and Pol II accumulated at promoters and enhancers after as few as two minutes, and clearance of these polymerases required P-TEFb. In the presence of flavopiridol, a potent P-TEFb inhibitor, non-productive elongation was observed and a potential role for P-TEFb in termination was proposed. ChIP-Seq mRNA Capping Pol II P-TEFb THZ1 Cell Biology
44	Implication des factures de remodelage de chromatine de la famille CHD dans les réseaux de régulation transcriptionnelle des cellules souches embryonnaires De Dieuleveult, Maud 17 September 2010 (has links) (PDF) Les cellules souches embryonnaires (cellules ES) ont la capacité unique de se diviser indéfiniment et de pouvoir se différencier en de multiples types cellulaires. Elles apparaissent donc très prometteuses comme agents thérapeutiques dans les traitements médicaux du futur. Un enjeu majeur de la recherche actuelle consiste à comprendre la contribution des protéines régulatrices de la chromatine à la plasticité et au contrôle de l'expression du génome des cellules. La famille des remodeleurs Chd, qui fait partie de la super famille SNF2, comprend neuf membres, soit le tiers des remodeleurs exprimés dans les cellules ES murines. L'objectif principal de ce projet de thèse a consisté à identifier de manière exhaustive les gènes cibles de chaque facteur pour comprendre comment ils participent à la régulation du génome et se partagent le remodelage de la chromatine. Nous avons entrepris un projet à grande échelle dans lequel chaque gène codant chaque Chd a été fusionné, à son extrémité carboxy-terminale, à une séquence codant une étiquette, par recombinaison homologue en cellules ES. Les cellules ES étiquetées ont ensuite été utilisées pour des expériences d'immunoprécipitation de chromatine (ChIP-seq). La présence de l'étiquette a permis de standardiser et d'optimiser les méthodes d'immunoprécipitation des protéines. Les fragments d'ADN isolés ont ensuite été séquencés dans le laboratoire d'Ivo Gut (CEA/CNG -Evry- et CNAG -Barcelone-). Nous avons également analysé les transcriptomes des cellules ES où la déplétion de chaque protéine Chd a été réalisée, par hybridation sur puce et RNA-seq. Ces données ont permis de montrer le rôle de NuRD (Chd4, Hdac2) au sein des réseaux de la régulation transcriptionnelle des ES. Les données obtenues pour les facteurs Chd1, Chd8 et Chd4 montrent des rôles différents mais interconnectés pour chaque protéine. Enfin, ces données nous ont permis de proposer des hypothèses pour expliquer comment ces protéines contribuent à la régulation du génome. [SDV] Life Sciences
45	Chromatin Determinants of the Eukaryotic DNA Replication Program Eaton, Matthew Lucas January 2011 (has links) <p>The accurate and timely replication of eukaryotic DNA during S-phase is of critical importance for the cell and for the inheritance of genetic information. Missteps in the replication program can activate cell cycle checkpoints or, worse, trigger the genomic instability and aneuploidy associated with diseases such as cancer. Eukaryotic DNA replication initiates asynchronously from hundreds to tens of thousands of replication origins spread across the genome. The origins are acted upon independently, but patterns emerge in the form of large-scale replication timing domains. Each of these origins must be localized, and the activation time determined by a system of signals that, though they have yet to be fully understood, are not dependent on the primary DNA sequence. This regulation of DNA replication has been shown to be extremely plastic, changing to fit the needs of cells in development or effected by replication stress. </p><p>We have investigated the role of chromatin in specifying the eukaryotic DNA replication program. Chromatin elements, including histone variants, histone modifications and nucleosome positioning, are an attractive candidate for DNA replication control, as they are not specified fully by sequence, and they can be modified to fit the unique needs of a cell without altering the DNA template. The origin recognition complex (ORC) specifies replication origin location by binding the DNA of origins. The <italic>S. cerevisiae</italic> ORC recognizes the ARS (autonomously replicating sequence) consensus sequence (ACS), but only a subset of potential genomic sites are bound, suggesting other chromosomal features influence ORC binding. Using high-throughput sequencing to map ORC binding and nucleosome positioning, we show that yeast origins are characterized by an asymmetric pattern of positioned nucleosomes flanking the ACS. The origin sequences are sufficient to maintain a nucleosome-free origin; however, ORC is required for the precise positioning of nucleosomes flanking the origin. These findings identify local nucleosomes as an important determinant for origin selection and function. Next, we describe the <italic>D. melanogaster</italic> replication program in the context of the chromatin and transcription landscape for multiple cell lines using data generated by the modENCODE consortium. We find that while the cell lines exhibit similar replication programs, there are numerous cell line-specific differences that correlate with changes in the chromatin architecture. We identify chromatin features that are associated with replication timing, early origin usage, and ORC binding. Primary sequence, activating chromatin marks, and DNA-binding proteins (including chromatin remodelers) contribute in an additive manner to specify ORC-binding sites. We also generate accurate and predictive models from the chromatin data to describe origin usage and strength between cell lines. Multiple activating chromatin modifications contribute to the function and relative strength of replication origins, suggesting that the chromatin environment does not regulate origins of replication as a simple binary switch, but rather acts as a tunable rheostat to regulate replication initiation events. </p><p>Taken together our data and analyses imply that the chromatin contains sufficient information to direct the DNA replication program.</p> / Dissertation Bioinformatics Molecular Biology Computer Science ChIP-seq Chromatin Epigenetics High-throughput Histone code Replication
46	Genetic and Genomic Analysis of Transcriptional Regulation in Human Cells Motallebipour, Mehdi January 2008 (has links) There are around 20.000 genes in the human genome all of which could potentially be expressed. However, it is obvious that not all of them can be active at the same time. Thus, there is a need for coordination achieved through the regulation of transcription. Transcriptional regulation is a crucial multi-component process involving genetic and epigenetic factors, which determine when and how genes are expressed. The aim of this thesis was to study two of these components, the transcription factors and the DNA sequence elements with which they interact. In papers I and II, we tried to characterize the regulatory role of repeated elements in the regulatory sequences of nitric oxide synthase 2 gene. We found that this type of repeat is able to adopt non B-DNA conformations in vitro and that it binds nuclear factors, in addition to RNA polymerase II. Therefore it is probable that these types of repeats can participate in the regulation of genes. In papers III-V, we intended to analyze the genome-wide binding sites for six transcription factors involved in fatty acid and cholesterol metabolism and the sites of an epigenetic mark in a human liver cell line. For this, we applied the chromatin immunoprecipitation (ChIP) method together with detection on microarrays (ChIP-chip) or by detection with the new generation massively parallel sequencers (ChIP-seq). We found that all of these transcription factors are involved in other liver-specific processes than metabolism, for example cell proliferation. We were also able to define two sets of transcription factors depending on the position of their binding relative to gene promoters. Finally, we demonstrated that the patterns of the epigenetic mark reflect the structure and transcriptional activity of the promoters. In conclusion, this thesis presents experiments, which moves our view from genetics to genomics, from in vitro to in vivo, and from low resolution to high resolution analysis of transcriptional regulation. Transcription ChIP-chip ChIP-seq genome-wide transcription factors microsatellite epigenetic Medical genetics Medicinsk genetik
47	A Bioinformatics Study of Human Transcriptional Regulation Ameur, Adam January 2008 (has links) Regulation of transcription is a central mechanism in all living cells that now can be investigated with high-throughput technologies. Data produced from such experiments give new insights to how transcription factors (TFs) coordinate the gene transcription and thereby regulate the amounts of proteins produced. These studies are also important from a medical perspective since TF proteins are often involved in disease. To learn more about transcriptional regulation, we have developed strategies for analysis of data from microarray and massively parallel sequencing (MPS) experiments. Our computational results consist of methods to handle the steadily increasing amount of data from high-throughput technologies. Microarray data analysis tools have been assembled in the LCB-Data Warehouse (LCB-DWH) (paper I), and other analysis strategies have been developed for MPS data (paper V). We have also developed a de novo motif search algorithm called BCRANK (paper IV). The analysis has lead to interesting biological findings in human liver cells (papers II-V). The investigated TFs appeared to bind at several thousand sites in the genome, that we have identified at base pair resolution. The investigated histone modifications are mainly found downstream of transcription start sites, and correlated to transcriptional activity. These histone marks are frequently found for pairs of genes in a bidirectional conformation. Our results suggest that a TF can bind in the shared promoter of two genes and regulate both of them. From a medical perspective, the genes bound by the investigated TFs are candidates to be involved in metabolic disorders. Moreover, we have developed a new strategy to detect single nucleotide polymorphisms (SNPs) that disrupt the binding of a TF (paper IV). We further demonstrated that SNPs can affect transcription in the immediate vicinity. Ultimately, our method may prove helpful to find disease-causing regulatory SNPs. bioinformatics microarray ChIP-chip ChIP-seq transcription factor histone modification motif search Bioinformatics Bioinformatik
48	Computational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data Georgiev, Stoyan January 2011 (has links) <p>Uncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarrays and, more recently, of massively parallel high-throughput sequencing technologies has made possible the adoption of a global perspective to the inference of the regulatory rules governing the context-specific interpretation of the genetic code that complements the more focused classical experimental approaches. Extracting useful information and managing the complexity resulting from the sheer volume and the high-dimensionality of the data produced by these genomic assays has emerged as a major challenge which we attempt to address in this work by developing computational methods and tools, specifically designed for the study of the gene regulatory processes in this new global genomic context. </p><p>First, we focus on the genome-wide discovery of physical interactions between regulatory sequence regions and their cognate proteins at both the DNA and RNA level. We present a motif analysis framework that leverages the genome-wide</p><p>evidence for sequence-specific interactions between trans-acting factors and their preferred cis-acting regulatory regions. The utility of the proposed framework is demonstarted on DNA and RNA cross-linking high-throughput data.</p><p>A second goal of this thesis is the development of scalable approaches to dimension reduction based on spectral decomposition and their application to the study of population structure in massive high-dimensional genetic data sets. We have developed computational tools and have performed theoretical and empirical analyses of their statistical properties with particular emphasis on the analysis of the individual genetic variation measured by Single Nucleotide Polymorphism (SNP) microrarrays.</p> / Dissertation Bioinformatics binding site chip-seq dimension reduction population structure randomized algorithm transcription factor
49	Genome-wide approaches to explore transcriptional regulation in eukaryotes Park, Daechan 21 August 2015 (has links) Transcriptional regulation is a complicated process controlled by numerous factors such as transcription factors (TFs), chromatin remodeling enzymes, nucleosomes, post-transcriptional machineries, and cis-acting DNA sequence. I explored the complex transcriptional regulation in eukaryotes through three distinct studies to comprehensively understand the functional genomics at various steps. Although a variety of high throughput approaches have been developed to understand this complex system on a genome wide scale with high resolution, a lack of accurate and comprehensive annotation transcription start sites (TSS) and polyadenylation sites (PAS) has hindered precise analyses even in Saccharomyces cerevisiae, one of the simplest eukaryotes. We developed Simultaneous Mapping Of RNA Ends by sequencing (SMORE-seq) and identified the strongest TSS and PAS of over 90% of yeast genes with single nucleotide resolution. Owing to the high accuracy of TSS identified by SMORE-seq, we detected possibly mis-annotated 150 genes that have a TSS downstream of the annotated start codon. Furthermore, SMORE-seq showed that 5’-capped non-coding RNAs were highly transcribed divergently from TATA-less promoters in wild-type cells under normal conditions. Mapping of DNA-protein interactions is essential to understanding the role of TFs in transcriptional regulation. ChIP-seq is the most widely used method for this purpose. However, careful attention has not been given to technical bias reflected in final target calling due to many experimental steps of ChIP-seq including fixation and shearing of chromatin, immunoprecipitation, sequencing library construction, and computational analysis. While analyzing large-scale ChIP-seq data, we observed that unrelated proteins appeared to bind to the gene bodies of highly transcribed genes across datasets. Control experiments including input, IgG ChIP in untagged cells, and the Golgi factor Mnn10 ChIP also showed the strong binding at the same loci, indicating that the signals were obviously derived from bias that is devoid of biological meaning. In addition, the appearance of nucleosomal periodicity in ChIP-seq data for proteins localizing to gene bodies is another bias that can be mistaken for false interactions with nucleosomes. We alleviated these biases by correcting data with proper negative controls, but the biases could not be completely removed. Therefore, caution is warranted in interpreting the results from ChIP-seq. Nucleosome positioning is another critical mechanism of transcriptional regulation. Global mapping of nucleosome occupancy in S. cerevisiae strains deleted for chromatin remodeling complexes has elucidated the role of these complexes on a genome wide scale. In this study, loss of chromodomain helicase DNA binding protein 1 (Chd1) resulted in severe disorganization of nucleosome positioning. Despite the difficulties of performing ChIP-seq for chromatin remodeling complexes due to their transient and dynamic localization on chromatin, we successfully mapped the genome-wide occupancy of Chd1 and quantitatively showed that Chd1 co-localizes with early transcription elongation factors, but not late transcription elongation factors. Interestingly, Chd1 occupancy was independent of the methylation levels at H3K36, indicating the necessity of a new working model describing Chd1 localization. Transcription Genomics Next generation sequencing ChIP-seq RNA-seq MNase-seq TSS Non-coding RNA Nucleosome
50	Cell Fate Decisions in Early Embryonic Development Zhang, Xiaoxiao 08 October 2013 (has links) The basis of developmental biology lies in the idea of when and how cells decide to divide or to differentiate. Previous studies have established several signaling pathways that determine cell fate decisions, including Notch, Wingless, Hedgehog, Bone morphogenetic protein, and Fibroblast growth factor. Signaling converges on transcriptional factors that regulate gene expression. In mouse embryonic stem cells, I explored how pluripotency and differentiation are regulated through opposing actions of beta-catenin-mediated canonical Wnt signaling, and the mechanisms underlying Sonic hedgehog signaling in generating progenitor cells in the ventral neural tube. Biochemistry ChIP-seq embryonic development embryonic stem cell gene regulatory network neural tube signaling pathway

Search results