Global ETD Search

261	USING SYSTEMS BIOLOGY APPROACHES TO UNDERSTAND THE TRANSCRIPTIONAL REGULATION UNDERLYING PLANT DEFENSE AND GROWTH Liang Tang (14226836) 06 December 2022 (has links) <p> </p> <p>Plant complex traits are controlled by multi-layer of dynamic and complicated gene networks regulated at different levels. To better inform crop breeding to promote desired traits, a comprehensive and fundamental understanding of their genetic basis is much needed. With the rapid developments of <em>omics</em> planforms and next generation sequencing technology, we now have large-scale data from genome, epigenome, transcriptome, metabolome, and others for the crop plants. Integration of those multiple <em>omics</em> data together with computational approaches led to the establishment of a novel science known as system biology. Research described in this thesis used system biology approaches to dissect complex crop traits such as disease response of tomato (Chapter2 and Chapter3) and the heterosis of nitrogen use efficiency of maize (Chapter4).</p> <p>Plant disease response is an elaborate, multilayered complex trait involving several lines of defense signaling. In the past decades, progress in molecular analyses of plant immune system has revealed key elements of a complex response network in Arabidopsis, a model species. Histone modifications, a type of epigenetic regulation, have emerged as key modulators that regulate defense responses, while our understanding of the role of histone-modifying enzymes in this process is still in its infancy. Here, we described the immune function of two histone methyltransferases SDG33 and SDG34 in tomato. We found the single mutants in <em>sdg33</em> and <em>sdg34</em> showed increased susceptibility to hemibiotrophic bacterial pathogen <em>Pseudomonas syringae</em> whereas the double mutant <em>sdg33sdg34</em> is comparable to wild type. Using RNA-seq and histone ChIP-seq approaches, we investigated the possible underlying mechanisms and found that the expression of a set of immune-related genes is misregulated by <em>P. syringae</em> only in the single mutants but not in the double mutant. Integrating with epigenomic data, we found that the misexpression of those SDG33/SDG34 dependent immune-response genes was associated with altered histone methylation status in the single mutant. Intriguingly, the double mutant also showed altered histone methylation but unaffected gene expression, suggesting a compensating regulatory mechanism at play. The function of SDG33 and SDG34 in immune response seems to be specific for the pathogen, as the double mutants exhibited enhanced resistance the single mutants showed no altered responses when treated with necrotrophic fungal pathogen <em>Botrytis cinerea</em>. Network analysis found the most regulatory gene by <em>B. cinerea</em> in a SDG33/SDG34 dependent manner have been implicated in biotic stress response such as <em>ERF4, TOPLESS, PUB23 </em>and<em> RCD1</em>. Comparing the immune response of double mutant against <em>P. syringae</em> and <em>B. cinerea</em>, we found that the disease related genes are only mis-regulated in the interaction of <em>B. cinerea</em> treatment not in the <em>P. syringae</em> treatment, which could be the reason of enhanced resistance to <em>B. cinerea</em> but not for <em>P. syringae</em> in the double mutants. In summary, we found the histone methyltransferases SDG33 and SDG34 has different functions in the immune response against <em>P. syringae</em> and <em>B. cinerea</em>, which might be direct or indirect relevant to the histone methylation level of the expression of downstream immune related gene.</p> <p>In addition to biotic stress, another complex trait studied in this thesis is the heterosis of nitrogen use efficiency (NUE) in Maize. NUE is another complex trait associated with multiple physiological processes including N sensing, uptake, assimilation, transport, and storage. Heterosis refers to a phenomenon where the progeny generated by crossing two different cultivars of the same species exhibit superior fitness than the inbred parents. Even though, heterosis has been exploited to improve complex traits including NUE, the underlying molecular mechanisms is not completely understood. Here, we analyzed N-responsive transcriptomes and physiological traits of a panel of six maize hybrids and their corresponding inbreds grown in the field at two different N levels. We observed diverse levels of trait heterosis that are dependent on the N conditions and organ types. We discovered dramatic pattern shift of beyond-parental-range gene expression in hybrids in response to varying N levels. We identified through integrative analyses a set of genes whose expression heterosis are quantitatively correlated to trait heterosis. These genes are involved in response to stimulus, photosynthesis, and N metabolism, and likely mediate the heterosis phenotype of N-use and growth traits in maize. In summary, our integrated analysis provided insights into the mechanistic basis of the heterosis of NUE. </p> <p>Together, applying systems and functional genomics approaches to investigate important agricultural traits could lead to a comprehensive understanding of plant complex traits to inform future engineering and breeding for better crops.</p> Plant pathology plant pathology plant heterosis systems biology Next Generation Sequencing Transcriptional Regulation
262	Molecular Evolution of Odonata Opsins, Odonata Phylogenomics and Detection of False Positive Sequence Homology Using Machine Learning Suvorov, Anton 01 March 2018 (has links) My dissertation comprises three related topics of evolutionary and computational biology, which correspond to the three Chapters. Chapter 1 focuses on tempo and mode of evolution in visual genes, namely opsins, via duplication events and subsequent molecular adaptation in Odonata (dragonflies and damselflies). Gene duplication plays a central role in adaptation to novel environments by providing new genetic material for functional divergence and evolution of biological complexity. Odonata have the largest opsin repertoire of any insect currently known. In particular our results suggest that both the blue sensitive (BS) and long-wave sensitive (LWS) opsin classes were subjected to strong positive selection that greatly weakens after multiple duplication events, a pattern that is consistent with the permanent heterozygote model. Due to the immense interspecific variation and duplicability potential of opsin genes among odonates, they represent a unique model system to test hypotheses regarding opsin gene duplication and diversification at the molecular level. Chapter 2 primarily focuses on reconstruction of the phylogenetic backbone of Odonata using RNA-seq data. In order to reconstruct the evolutionary history of Odonata, we performed comprehensive phylotranscriptomic analyses of 83 species covering 75% of all extant odonate families. Using maximum likelihood, Bayesian, coalescent-based and alignment free tree inference frameworks we were able to test, refine and resolve previously controversial relationships within the order. In particular, we confirmed the monophyly of Zygoptera, recovered Gomphidae and Petaluridae as sister groups with high confidence and identified Calopterygoidea as monophyletic. Fossil calibration coupled with diversification analyses provided insight into key events that influenced the evolution of Odonata. Specifically, we determined that there was a possible mass extinction of ancient odonate diversity during the P-Tr crisis and a single odonate lineage persisted following this extinction event. Lastly, Chapter 3 focuses on identification of erroneously assigned sequence homology using the intelligent agents of machine learning techniques. Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. We developed biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. molecular evolution vision insects Bayesian modeling phylogenetic inference big data next-generation sequencing artificial intelligence homology Biology Life Sciences
263	Implication du récepteur nucléaire orphelin Nur77 (Nr4a1) dans les effets des antipsychotiques par une approche de transcriptomique chez des rats déficients en Nur77 Majeur, Simon 11 1900 (has links) Malgré l’usage de médicaments antipsychotiques depuis plusieurs décennies, leur mécanisme d’action précis, autre que leur interaction avec les récepteurs dopaminergiques et sérotoninergiques, demeure peu connu. Nur77 (Nr4a1 ou NGFI-B) est un facteur de transcription de la famille des récepteurs nucléaires associé aux effets des antipsychotiques. Ceci étant dit, le mécanisme d’action de Nur77 est également peu connu. Afin de mieux comprendre les éléments impliqués avec les antipsychotiques et l’activité de Nur77, nous avons comparé les niveaux de transcrits dans le striatum suite à un traitement avec l’halopéridol chez des rats sauvages et déficients en Nur77 à l’aide de la technique de séquençage à haut débit (RNAseq) et d’une analyse bio-informatique. L’halopéridol et Nur77 ont modulé d’importants groupes de gènes associés avec la signalisation des récepteurs dopaminergiques et la synapse glutamatergique. L’analyse a révélé des modulations de gènes clés reliés à la signalisation des protéines G. Parmi les transcrits modulés significativement chez les rats traités avec halopéridol et ceux déficients en Nur77, la dual specificity phosphatase 5 (Dusp5) représente un nouveau candidat d’intérêt. En effet, nous avons confirmé que les niveaux d’ARNm et protéiques de Dusp5 dans le striatum sont associés aux mouvements involontaires anormaux (dyskinésie) dans un modèle de primates non-humains traités chroniquement avec halopéridol. Cette analyse transcriptomique a démontré des altérations rapides et importantes d’éléments impliqués dans la signalisation des protéines G par l’halopéridol, et a permis d’identifier, pour la première fois, une expression de Dusp5 dépendante de Nur77 en tant que nouvelle composante reliée avec la dyskinésie tardive. / Despite antipsychotic drugs being used for several decades, their precise mechanism of action remains elusive. Nur77 (Nr4a1 or NGFI-B) is a transcription factor of the nuclear receptor family associated with antipsychotic drug effects. However, the mechanism of action of Nur77 is also not well understood. To better understand the signaling components implicated with antipsychotic drug use and Nur77 activity, we compared striatal gene transcripts following haloperidol in wild-type and Nur77-deficient rats using Next Generation RNA Sequencing (RNAseq) and a bioinformatics analysis. Haloperidol and Nur77 modulated important subsets of striatal genes associated with dopamine receptor signaling and glutamate synapses. The analysis revealed modulations of key components of G protein signaling that are consistent with a rapid adaptation of striatal cells that may partially explain long-term haloperidol-induced dopamine D2 receptor upregulation. Amongst significantly modulated transcripts in rats treated with haloperidol and rats deficient in Nur77, dual specificity phosphatase 5 (Dusp5) represents a new and very interesting candidate. Indeed, we confirmed that striatal Dusp5 mRNA and protein levels were associated with abnormal involuntary movements (dyskinesia) in non-human primates chronically exposed to haloperidol. This transcriptomic analysis showed important haloperidol-induced G protein-coupled receptor signaling alterations that may support a regulatory role of Nur77 in dopamine D2 receptor signaling pathways and identified, for the first time, a putative Nur77-dependent expression of Dusp5 as a new signaling component for antipsychotic drug-induced tardive dyskinesia. Séquençage à haut débit (RNAseq) Halopéridol Nur77 Dyskinésie Dusp5 Striatum Next-generation sequencing (RNAseq) Dyskinesia
264	Towards a Human Genomic Coevolution Network Savel, Daniel M. 04 June 2018 (has links) No description available. Computer Science Bioinformatics Next-generation sequencing suffix trees sequence assembly coevolution co-conservation multiple hypothesis testing eQTL phylogenetic profiles
265	Pipeline for Next Generation Sequencing data of phage displayed libraries to support affinity ligand discovery Schleimann-Jensen, Ella January 2022 (has links) Affinity ligands are important molecules used in affinity chromatography for purification of significant substances from complex mixtures. To find affinity ligands specific to important target molecules could be a challenging process. Cytiva uses the powerful phage display technique to find new promising affinity ligands. The phage display technique is a method run in several enrichment cycles. When developing new affinity ligands, a protein scaffold library with a diversity of up to 1010-1011 different protein scaffold variants is run through the enrichment cycles. The result from the phage display rounds is screened for target molecule binding followed by sequencing, usually with one of the conventional screening methods ELISA or Biacore followed by Sanger sequencing. However, the throughput of these analyses are unfortunately very low, often with only a few hundred screened clones. Therefore, Next Generation Sequencing or NGS, has become an increasingly popular screening method for phage display libraries which generates millions of sequences from each phage display round. This creates a need for a robust data analysis pipeline to be able to interpret the large amounts of data. In this project, a pipeline for analysis of NGS data of phage displayed libraries has been developed at Cytiva. Cytiva uses NGS as one of their screening methods of phage displayed protein libraries because of the high throughput compared to the conventional screening methods. The purpose is to find new affinity ligands for purification of essential substances used in drugs. The pipeline has been created using the object-oriented programming language R and consists of several analyses covering the most important steps to be able to find promising results from the NGS data. With the developed pipeline the user can analyze the data on both DNA and protein sequence level and per position residue breakdown, as well as filter the data based on specific amino acids and positions. This gives a robust and thorough analysis which can lead to promising results that can be used in the development of novel affinity ligands for future purification products. NGS next generation sequencing phage display affinity ligands data analysis bioinformatics Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Computer Sciences Datavetenskap (datalogi)
266	Droplet-Based Microfluidics for High-Throughput Single-Cell Omics Profiling Zhang, Qiang 06 September 2022 (has links) Droplet-based microfluidics is a powerful tool permitting massive-scale single-cell analysis in pico-/nano-liter water-in-oil droplets. It has been integrated into various library preparation techniques to accomplish high-throughput scRNA-seq, scDNA-seq, scATAC-seq, scChIP-seq, as well as scMulti-omics-seq. These advanced technologies have been providing unique and novel insights into both normal differentiation and disease development at single-cell level. In this thesis, we develop four new droplet-based tools for single-cell omics profiling. First, the developed Drop-BS is the first droplet-based platform to construct single-cell bisulfite sequencing libraries for DNA methylome profiling and allows production of BS library of 2,000-10,000 single cells within 2 d. We applied the technology to separately profile mixed cell lines, mouse brain tissues, and human brain tissues to reveal cell type heterogeneity. Second, the new Drop-ChIP platform only requires two steps of droplet generation to achieve multiple steps of reactions in droplets such as single-cell lysis, chromatin fragmentation, ChIP, and barcoding. Third, we aim to establish a droplet-based platform to accomplish high-throughput full-length RNA-seq (Drop-full-seq), which both current tube-based and droplet-based methods cannot realize. Last, we constructed an in-house droplet-based tool to assist single-cell ATAC-seq library preparation (Drop-ATAC), which provided a low-cost and facile protocol to conduct scATAC-seq in laboratories without the expensive instrument. / Doctor of Philosophy / Microfluidics is a collection of techniques to manipulate fluids in the micrometer scale. One of microfluidic techniques is called "droplet-based microfluidics". It can manipulate (i.e., generate, merge, sort, split, etc) pico-/nano-liter of water-in-oil droplets. First, since the water phase is separated by the continuous oil phase, these droplets are discrete and individual reactors. Second, droplet-based microfluidics can achieve highly parallel manipulation of thousands to millions of droplets. These two advantages make droplet-based microfluidics an ideal tool to perform single-cell assays. Over the past 10 years, various droplet-based platforms have been developed to study single-cell transcriptome, genome, epigenome, as well as multi-ome. To expand droplet-based tools for single-cell analysis, we aim to develop four novel platforms in this thesis. First, Drop-BS, by integrating droplet generation and droplet fusion techniques, can achieve high-throughput single-cell bisulfite sequencing library preparation. It can generate 10,000 single-cell BS libraries within 2 days which is difficult to achieve for conventional library preparation in tubes/microwells. Second, we developed a novel and facile Drop-ChIP platform to prepare single-cell ChIP-seq library. It is easy to operate since it only requires two steps of droplet generation. It also generates higher quality of data compared to previous work. In addition, we are working on the development and characterization of the other two droplet-based tools to achieve full-length single-cell RNA-seq and single-cell ATAC-seq. droplet-based microfluidics single-cell analysis ChIP-seq BS-seq RNA-seq ATAC-seq next generation sequencing library preparation
267	Low-Input Multi-Omic Studies of Brain Neuroscience Involved in Mental Diseases Zhu, Bohan 13 September 2022 (has links) Psychiatric disorders are believed to result from the combination of genetic predisposition and many environmental triggers. While the large number of disease-associated genetic variations have been recognized by previous genome-wide association studies (GWAS), the role of epigenetic mechanisms that mediate the effects of environmental factors on CNS gene activity in the etiology of most mental illnesses is still largely unclear. A growing body of evidence suggested that the abnormalities (changes in gene expression, formation of neural circuits, and behavior) involved in most psychiatric syndromes are preserved by epigenetic modifications identified in several specific brain regions. In this thesis, we developed the second generation of one of our microfluidic technologies (MOWChIP-seq) and used it to profile genome-wide histone modifications in three mental illness-related biological studies: the effect of psychedelics in mice, schizophrenia, and the effect of maternal immune activation in mice offspring. The second generation of MOWChIP-seq was designed to generate histone modification profiles from as few as 100 cells per assay with a throughput as high as eight assays in each run. Then, we applied the new MOWChIP-seq and SMART-seq2 to profile the histone modification H3K27ac and transcriptome, respectively, using NeuN+ neuronal nuclei from the mouse frontal cortex after a single dose of psychedelic administration. The epigenomic and transcriptomic changes induced by 2,5-Dimethoxy-4-iodoamphetamine (DOI), a subtype of psychedelics, in mouse neuronal nuclei at various time points suggest that the long-lasting effects of the psychedelic are more closely related to epigenomic alterations than the changes in transcriptomic patterns. Next, we comprehensively characterized epigenomic and transcriptomic features from the frontal cortex of 29 individuals with schizophrenia and 29 individually matched controls (gender and age). We found that schizophrenia subjects exhibited thousands of neuronal vs. glial epigenetic differences at regions that included several susceptibility genetic loci, such as NRXN1, RGS4 and GRIN3A. Finally, we investigated the epigenetic and transcriptomic alterations induced by the maternal immune activation (MIA) in mice offspring's frontal cortex. Pregnant mice were injected with influenza virus at GD 9.5 and the frontal cortex from mice pups (10 weeks old) were examined later. The results offered us some insights into the contribution of MIA to the etiology of some mental disorders, like schizophrenia and autism. / Doctor of Philosophy / While this field is still in its early stage, the epigenetic studies of mental disorders present promise to expand our understanding about how environmental stimulates, interacting with genetic factors, contribute to the etiology of various psychiatric syndromes, like major depression and schizophrenia. Previous clinical trials suggested that psychedelics may represent a promising long-lasting treatment for patients with depression and other psychiatric conditions. These research presented the therapeutic potential of psychedelic compounds for treating major depression and demonstrated the capability of psychedelics in increasing dendritic density and stimulating synapse formation. However, the molecular mechanism mediating the clinical effectiveness of psychedelics remain largely unexplored. Our study revealed that epigenomic-driven changes in synaptic plasticity sustain psychedelics' long-lasting antidepressant action. Another serious mental illness is schizophrenia, which could affect how an individual feels, thinks, and behaves. Like most other mental disorders, schizophrenia results from a combination of genetic and environmental causes. Epigenetic marks allow a dynamic impact of environmental factors, including antipsychotic medications, on the access to genes and regulatory elements. Despite this, no study so far has profiled cell-type-specific genome-wide histone modifications in postmortem brain samples from schizophrenia subjects or the effect of antipsychotic treatment on such epigenetic marks. Here we show the first comprehensive epigenomic characterization of the frontal cortex of 29 individuals with schizophrenia and 29 matched controls. The process of brain development is surprisingly sensitive to a lot of environmental insults. Epidemiological studies have recognized maternal immune activation as a risk factor that may change the normal developmental trajectory of the fetal brain and increase the odds of developing a range of psychiatric disorders, including schizophrenia and autism, in its lifetime. Given the prevalence of the coronavirus, uncovering the molecular mechanism underlie the phenotypic alterations has become more urgent than before, for both prevention and treatment. Microfluidics Epigenome Transcriptome Chromatin Immunoprecipitation Next generation sequencing RNA sequencing Psychiatric disorders Psychedelics Schizophrenia Antipsychotic treatment Maternal Immune Activation
268	Improved Error Correction of NGS Data Alic, Andrei Stefan 15 July 2016 (has links) Tesis por compendio / [EN] The work done for this doctorate thesis focuses on error correction of Next Generation Sequencing (NGS) data in the context of High Performance Computing (HPC). Due to the reduction in sequencing cost, the increasing output of the sequencers and the advancements in the biological and medical sciences, the amount of NGS data has increased tremendously. Humans alone are not able to keep pace with this explosion of information, therefore computers must assist them to ease the handle of the deluge of information generated by the sequencing machines. Since NGS is no longer just a research topic (used in clinical routine to detect cancer mutations, for instance), requirements in performance and accuracy are more stringent. For sequencing to be useful outside research, the analysis software must work accurately and fast. This is where HPC comes into play. NGS processing tools should leverage the full potential of multi-core and even distributed computing, as those platforms are extensively available. Moreover, as the performance of the individual core has hit a barrier, current computing tendencies focus on adding more cores and explicitly split the computation to take advantage of them. This thesis starts with a deep analysis of all these problems in a general and comprehensive way (to reach out to a very wide audience), in the form of an exhaustive and objective review of the NGS error correction field. We dedicate a chapter to this topic to introduce the reader gradually and gently into the world of sequencing. It presents real problems and applications of NGS that demonstrate the impact this technology has on science. The review results in the following conclusions: the need of understanding of the specificities of NGS data samples (given the high variety of technologies and features) and the need of flexible, efficient and accurate tools for error correction as a preliminary step of any NGS postprocessing. As a result of the explosion of NGS data, we introduce MuffinInfo. It is a piece of software capable of extracting information from the raw data produced by the sequencer to help the user understand the data. MuffinInfo uses HTML5, therefore it runs in almost any software and hardware environment. It supports custom statistics to mould itself to specific requirements. MuffinInfo can reload the results of a run which are stored in JSON format for easier integration with third party applications. Finally, our application uses threads to perform the calculations, to load the data from the disk and to handle the UI. In continuation to our research and as a result of the single core performance limitation, we leverage the power of multi-core computers to develop a new error correction tool. The error correction of the NGS data is normally the first step of any analysis targeting NGS. As we conclude from the review performed within the frame of this thesis, many projects in different real-life applications have opted for this step before further analysis. In this sense, we propose MuffinEC, a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio -experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by three type of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. This manuscript is structured in chapters based on material that has been previously published in prestigious journals indexed by the Journal of Citation Reports (on outstanding positions) and relevant congresses. / [ES] El trabajo realizado en el marco de esta tesis doctoral se centra en la corrección de errores en datos provenientes de técnicas NGS utilizando técnicas de computación intensiva. Debido a la reducción de costes y el incremento en las prestaciones de los secuenciadores, la cantidad de datos disponibles en NGS se ha incrementado notablemente. La utilización de computadores en el análisis de estas muestras se hace imprescindible para poder dar respuesta a la avalancha de información generada por estas técnicas. El uso de NGS transciende la investigación con numerosos ejemplos de uso clínico y agronómico, por lo que aparecen nuevas necesidades en cuanto al tiempo de proceso y la fiabilidad de los resultados. Para maximizar su aplicabilidad clínica, las técnicas de proceso de datos de NGS deben acelerarse y producir datos más precisos. En este contexto es en el que las técnicas de comptuación intensiva juegan un papel relevante. En la actualidad, es común disponer de computadores con varios núcleos de proceso e incluso utilizar múltiples computadores mediante técnicas de computación paralela distribuida. Las tendencias actuales hacia arquitecturas con un mayor número de núcleos ponen de manifiesto que es ésta una aproximación relevante. Esta tesis comienza con un análisis de los problemas fundamentales del proceso de datos en NGS de forma general y adaptado para su comprensión por una amplia audiencia, a través de una exhaustiva revisión del estado del arte en la corrección de datos de NGS. Esta revisión introduce gradualmente al lector en las técnicas de secuenciación masiva, presentando problemas y aplicaciones reales de las técnicas de NGS, destacando el impacto de esta tecnología en ciencia. De este estudio se concluyen dos ideas principales: La necesidad de analizar de forma adecuada las características de los datos de NGS, atendiendo a la enorme variedad intrínseca que tienen las diferentes técnicas de NGS; y la necesidad de disponer de una herramienta versátil, eficiente y precisa para la corrección de errores. En el contexto del análisis de datos, la tesis presenta MuffinInfo. La herramienta MuffinInfo es una aplicación software implementada mediante HTML5. MuffinInfo obtiene información relevante de datos crudos de NGS para favorecer el entendimiento de sus características y la aplicación de técnicas de corrección de errores, soportando además la extensión mediante funciones que implementen estadísticos definidos por el usuario. MuffinInfo almacena los resultados del proceso en ficheros JSON. Al usar HTML5, MuffinInfo puede funcionar en casi cualquier entorno hardware y software. La herramienta está implementada aprovechando múltiples hilos de ejecución por la gestión del interfaz. La segunda conclusión del análisis del estado del arte nos lleva a la oportunidad de aplicar de forma extensiva técnicas de computación de altas prestaciones en la corrección de errores para desarrollar una herramienta que soporte múltiples tecnologías (Illumina, Roche 454, Ion Torrent y experimentalmente PacBio). La herramienta propuesta (MuffinEC), soporta diferentes tipos de errores (sustituciones, indels y valores desconocidos). MuffinEC supera los resultados obtenidos por las herramientas existentes en este ámbito. Ofrece una mejor tasa de corrección, en un tiempo muy inferior y utilizando menos recursos, lo que facilita además su aplicación en muestras de mayor tamaño en computadores convencionales. MuffinEC utiliza una aproximación basada en etapas multiples. Primero agrupa todas las secuencias utilizando la métrica de los k-mers. En segundo lugar realiza un refinamiento de los grupos mediante el alineamiento con Smith-Waterman, generando contigs. Estos contigs resultan de la corrección por columnas de atendiendo a la frecuencia individual de cada base. La tesis se estructura por capítulos cuya base ha sido previamente publicada en revistas indexadas en posiciones dest / [CA] El treball realitzat en el marc d'aquesta tesi doctoral se centra en la correcció d'errors en dades provinents de tècniques de NGS utilitzant tècniques de computació intensiva. A causa de la reducció de costos i l'increment en les prestacions dels seqüenciadors, la quantitat de dades disponibles a NGS s'ha incrementat notablement. La utilització de computadors en l'anàlisi d'aquestes mostres es fa imprescindible per poder donar resposta a l'allau d'informació generada per aquestes tècniques. L'ús de NGS transcendeix la investigació amb nombrosos exemples d'ús clínic i agronòmic, per la qual cosa apareixen noves necessitats quant al temps de procés i la fiabilitat dels resultats. Per a maximitzar la seua aplicabilitat clínica, les tècniques de procés de dades de NGS han d'accelerar-se i produir dades més precises. En este context és en el que les tècniques de comptuación intensiva juguen un paper rellevant. En l'actualitat, és comú disposar de computadors amb diversos nuclis de procés i inclús utilitzar múltiples computadors per mitjà de tècniques de computació paral·lela distribuïda. Les tendències actuals cap a arquitectures amb un nombre més gran de nuclis posen de manifest que és esta una aproximació rellevant. Aquesta tesi comença amb una anàlisi dels problemes fonamentals del procés de dades en NGS de forma general i adaptat per a la seua comprensió per una àmplia audiència, a través d'una exhaustiva revisió de l'estat de l'art en la correcció de dades de NGS. Esta revisió introduïx gradualment al lector en les tècniques de seqüenciació massiva, presentant problemes i aplicacions reals de les tècniques de NGS, destacant l'impacte d'esta tecnologia en ciència. D'este estudi es conclouen dos idees principals: La necessitat d'analitzar de forma adequada les característiques de les dades de NGS, atenent a l'enorme varietat intrínseca que tenen les diferents tècniques de NGS; i la necessitat de disposar d'una ferramenta versàtil, eficient i precisa per a la correcció d'errors. En el context de l'anàlisi de dades, la tesi presenta MuffinInfo. La ferramenta MuffinInfo és una aplicació programari implementada per mitjà de HTML5. MuffinInfo obté informació rellevant de dades crues de NGS per a afavorir l'enteniment de les seues característiques i l'aplicació de tècniques de correcció d'errors, suportant a més l'extensió per mitjà de funcions que implementen estadístics definits per l'usuari. MuffinInfo emmagatzema els resultats del procés en fitxers JSON. A l'usar HTML5, MuffinInfo pot funcionar en gairebé qualsevol entorn maquinari i programari. La ferramenta està implementada aprofitant múltiples fils d'execució per la gestió de l'interfície. La segona conclusió de l'anàlisi de l'estat de l'art ens porta a l'oportunitat d'aplicar de forma extensiva tècniques de computació d'altes prestacions en la correcció d'errors per a desenrotllar una ferramenta que suport múltiples tecnologies (Illumina, Roche 454, Ió Torrent i experimentalment PacBio). La ferramenta proposada (MuffinEC), suporta diferents tipus d'errors (substitucions, indels i valors desconeguts). MuffinEC supera els resultats obtinguts per les ferramentes existents en este àmbit. Oferix una millor taxa de correcció, en un temps molt inferior i utilitzant menys recursos, la qual cosa facilita a més la seua aplicació en mostres més gran en computadors convencionals. MuffinEC utilitza una aproximació basada en etapes multiples. Primer agrupa totes les seqüències utilitzant la mètrica dels k-mers. En segon lloc realitza un refinament dels grups per mitjà de l'alineament amb Smith-Waterman, generant contigs. Estos contigs resulten de la correcció per columnes d'atenent a la freqüència individual de cada base. La tesi s'estructura per capítols la base de la qual ha sigut prèviament publicada en revistes indexades en posicions destacades de l'índex del Journal of Citation Repor / Alic, AS. (2016). Improved Error Correction of NGS Data [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/67630 / Compendio Error Correction NGS TGS Next Generation Sequencing Statistics HTML5 C++ OpenMP Parallel Review FastQ FastA
269	Microfluidic Technology for Low-Input Epigenomic Analysis Zhu, Yan 25 May 2018 (has links) Epigenetic modifications, such as DNA methylation and histone modifications, play important roles in gene expression and regulation, and are highly involved in cellular processes such as stem cell pluripotency/differentiation and tumorigenesis. Chromatin immunoprecipitation (ChIP) is the technique of choice for examining in vivo DNA-protein interactions and has been a great tool for studying epigenetic mechanisms. However, conventional ChIP assays require millions of cells for tests and are not practical for examination of samples from lab animals and patients. Automated microfluidic chips offer the advantage to handle small sample sizes and facilitate rapid reaction. They also eliminate cumbersome manual handling. In this report, I will talk about three different projects that utilized microfluidic immunoprecipitation followed by next genereation sequencing technologies to enable low input and high through epigenomics profiling. First, I examined RNA polymerase II transcriptional regulation with microfluidic chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) assays. Second, I probed the temporal dynamics in the DNA methylome during cancer development using a transgenic mouse model with microfluidic methylated DNA immunoprecipitation followed by next generation sequencing (MeDIP-seq) assays. Third, I explored negative enrichment of circulating tumor cells (CTCs) followed by microfluidic ChIP-seq technology for studying temporal dynamic histone modification (H3K4me3) of patient-derived tumor xenograft on an immunodeficient mouse model during the course of cancer metastasis. In the first study, I adapted microfluidic ChIP-seq devices to achieve ultrahigh sensitivity to study Pol2 transcriptional regulation from scarce cell samples. I dramatically increased the assay sensitivity to an unprecedented level (~50 K cells for pol2 ChIP-seq). Importantly, this is three orders of magnitude more sensitive than the prevailing pol2 ChIP-seq assays. I showed that MNase digestion provided better ChIP-seq signal than sonication, and two-steps fixation with MNase digestion provided the best ChIP-seq quality followed by one-step fixation with MNase digestion, and lastly, no fixation with MNase digestion. In the second study, I probed dynamic epigenomic changes during tumorigenesis using mice often require profiling epigenomes using a tiny quantity of tissue samples. Conventional epigenomic tests do not support such analysis due to the large amount of materials required by these assays. In this study, I developed an ultrasensitive microfluidics-based methylated DNA immunoprecipitation followed by next-generation sequencing (MeDIP-seq) technology for profiling methylomes using as little as 0.5 ng DNA (or ~100 cells) with 1.5 h on-chip process for immunoprecipitation. This technology enabled me to examine genome-wide DNA methylation in a C3(1)/SV40 T-antigen transgenic mouse model during different stages of mammary cancer development. Using this data, I identified differentially methylated regions and their associated genes in different periods of cancer development. Interestingly, the results showed that methylomic features are dynamic and change with tumor developmental stage. In the last study, I developed a negative enrichment of CTCs followed by ultrasensitive microfluidic ChIP-seq technology for profiling histone modification (H3K4Me3) of CTCs to resolve the technical challenges associated with CTC isolation and difficulties related with tools for profiling whole genome histone modification on tiny cell samples. / Ph. D. / The human genome has been sequenced and completed over a decade ago. The information provided by the genomic map inspired numerous studies on genetic variations and their roles in diseases. However, genomic information alone is not always sufficient to explain important biological processes. Gene activation and expression are not only associated with alteration in the DNA sequence, but also affected by other changes to DNA and histones. Epigenetics refers to the molecular mechanisms that affect gene expression and phenotypes without involving changes in the DNA sequence. For example, the DNA can get methylated, the histone protein that is wrapped around by DNA can also get methylated or acetylatied, and transcription factors can bind to different part of DNA. All of these can affect gene expression without alter the DNA sequences. Epigenetic changes occur throughout all stages of cell development or in response to environmental cues. They change transcription patterns in a tissue/cell-specific fashion. For example, transcriptional silencing of tumor-suppressor genes by DNA methylation plays an important role in cancer development. Therefore, understanding of epigenetic regulations will help to improve various aspects of biomedicine. For instance, personalized medicine can be vi tailored based on epigenetic profile of certain patient to specifically control gene expression in the disease treatment. However, the technology for profiling epigenetic modifications, i.e. Chromatin Immunoprecipitation (ChIP), suffers from serious limitations. The key limitation is the sensitivity of the assay. Conventional assay requires a large number of cells (>10⁶ cells per ChIP). This is feasible when using cell lines. However, such requirement has become a major challenge when primary cells are used because very limited amounts of samples can be generated from lab animals or patients. Population heterogeneity information may also be lost when a large cell number is used. In this project, we developed an automated ultrasensitive microfluidic chromatin/DNA immunoprecipitation followed by next-generation sequencing (ChIP/MeDIP-Seq) technology for profiling epigenetic modifications (e.g., histone modifications, transcriptional regulations, and DNA methylation). We extensively optimized design parameters for each and every step of ChIP/MeDIP (e.g. sonication/crosslinking time, antibody concentration, washing conditions) in order to reach highest sensitivity of 0.1 ng DNA (or ~50-100 cells) as starting material for IP, which is roughly 4-5 orders of magnitude higher than the prevailing protocol and 2-3 orders of magnitude higher than the-state-of-the-art(~50 ng). With such sensitivity, we were able to study temporal dynamics in the DNA methylomes during the various stages of mammary cancer development from a transgenic mouse mode. We were able to investigate transcriptional regulation of RNA polymerase II from scarce cell samples. We were also able to study histone modification (H3K4Me3) of circulating tumor cells during cancer metastasis. Chromatin immunoprecipitation (ChIP) Next generation sequencing (NGS) Epigenetics Transcriptional regulations DNA methylation Histone modifications Microfluidics Circulating tumor cell (CTC)
270	Methods for Differential Analysis of Gene Expression and Metabolic Pathway Activity Temate Tiagueu, Yvette Charly B, Temate Tiagueu, Yvette C. B. 09 May 2016 (has links) RNA-Seq is an increasingly popular approach to transcriptome profiling that uses the capabilities of next generation sequencing technologies and provides better measurement of levels of transcripts and their isoforms. In this thesis, we apply RNA-Seq protocol and transcriptome quantification to estimate gene expression and pathway activity levels. We present a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. In the first version of IsoDE, we compared the tool against four existing methods: Fisher's exact test, GFOLD, edgeR and Cuffdiff on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. We also introduce the second version of IsoDE which runs 10 times faster than the first implementation due to some in-memory processing applied to the underlying gene expression frequencies estimation tool and we also perform more optimization on the analysis. The second part of this thesis presents a set of tools to differentially analyze metabolic pathways from RNA-Seq data. Metabolic pathways are series of chemical reactions occurring within a cell. We focus on two main problems in metabolic pathways differential analysis, namely, differential analysis of their inferred activity level and of their estimated abundance. We validate our approaches through differential expression analysis at the transcripts and genes levels and also through real-time quantitative PCR experiments. In part Four, we present the different packages created or updated in the course of this study. We conclude with our future work plans for further improving IsoDE 2.0. Bootstrapping algorithm Next generation sequencing Gene expression RNA-Seq data Expectation maximization Graph analysis Metabolic pathway activity level Metabolic pathways Metabolic pathway abundance KEGG Differential gene expression analysis

Search results