Global ETD Search

141	Statistical Methods for Functional Metagenomic Analysis Based on Next-Generation Sequencing Data Pookhao, Naruekamol January 2014 (has links) Metagenomics is the study of a collective microbial genetic content recovered directly from natural (e.g., soil, ocean, and freshwater) or host-associated (e.g., human gut, skin, and oral) environmental communities that contain microorganisms, i.e., microbiomes. The rapid technological developments in next generation sequencing (NGS) technologies, enabling to sequence tens or hundreds of millions of short DNA fragments (or reads) in a single run, facilitates the studies of multiple microorganisms lived in environmental communities. Metagenomics, a relatively new but fast growing field, allows us to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Also, it assists us to identify significantly different metabolic potentials in different environments. Particularly, metagenomic analysis on the basis of functional features (e.g., pathways, subsystems, functional roles) enables to contribute the genomic contents of microbes to human health and leads us to understand how the microbes affect human health by analyzing a metagenomic data corresponding to two or multiple populations with different clinical phenotypes (e.g., diseased and healthy, or different treatments). Currently, metagenomic analysis has substantial impact not only on genetic and environmental areas, but also on clinical applications. In our study, we focus on the development of computational and statistical methods for functional metagnomic analysis of sequencing data that is obtained from various environmental microbial samples/communities. Feature comparative Functional metagenomics Negative binomial model Next generation sequencing Elastic-net
142	Systematic Analysis of Suppressor Mutations in S. cerevisiae Strains with Deleted Genome Integrity Genes Yamaguchi, Takafumi 11 December 2013 (has links) The effects of a mutation in one gene can occasionally be suppressed by mutation in another gene. Genetic suppression indicates functional relationships and provides clues about the mechanism and order of action in genetic pathways. Here I explored the existing yeast deletion collection to identify suppressor relationships. The collection was released in 2000 and it is known that some strains in the collection have acquired mutations. Whole genome sequencing of 48 yeast deletion strains corresponding to 26 genome integrity genes was performed. High-throughput sequencing revealed a broad mutational spectrum including point mutations, indels, and copy number variations. I identified and experimentally validated two new suppressor mutations (sgs1 mutations in both top3Δ and rmi1Δ strains) corresponding to gene pairs with previously known suppressor relationships. Thus, high-throughput sequencing and analysis of yeast deletion strains can identify suppressor mutations. The resulting genome sequences also provide a baseline for future laboratory evolution experiments. suppressor mutation next-generation sequencing deletion strain DNA repair mutation spectrum 0307 0715 0369
143	Identification and Characterization of Pathogenic Mutations in Neurodevelopmental Disorders Discovered by Next-Generation Sequencing Ruzzo, Elizabeth Kathryn January 2014 (has links) <p>Neurodevelopmental disorders develop over time and are characterized by a wide variety of mental, behavioral, and physical phenotypes. The categorization of neurodevelopmental disorders encompasses a broad range of conditions including intellectual disability, autism spectrum disorder, attention deficit hyperactivity disorder, cerebral palsy, schizophrenia, bipolar disorder, and epilepsy, among others. Diagnostic classifications of neurodevelopmental disorders are complicated by comorbidities among these neurodevelopmental disorders, unidentified causal genes, and growing evidence of shared genetic risk factors. </p><p>We sought to identify the genetic underpinnings of a variety of neurodevelopmental disorders, with a particular emphasis on the epilepsies, by employing next–generation sequencing to thoroughly interrogate genetic variation in the human genome/exome. First, we investigated four families presenting with a seemingly identical and previously undescribed neurodevelopmental disorder characterized by congenital microcephaly, intellectual disability, progressive cerebral atrophy, and intractable seizures. These families all exhibited an apparent autosomal recessive pattern of inheritance. Second, we investigated a heterogeneous cohort of &sim;60 undiagnosed patients, the majority of whom suffered from severe neurodevelopmental disorders with a suspected genetic etiology. Third, we investigated 264 patients with epileptic encephalopathies — severe childhood epilepsy disorders — looking specifically at infantile spasms and Lennox–Gastaut syndrome. Finally, we investigated &sim;40 large multiplex epilepsy families with complex phenotypic constellations and unclear modes of inheritance. The studied neurodevelopmental disorders exhibited a range of genetic complexity, from clear Mendelian disorders to common complex disorders, resulting in varying degrees of success in the identification of clearly causal genetic variants. </p><p>In the first project, we successfully identified the disease–causing gene. We show that recessive mutations in <italic>ASNS </italic> (encoding asparagine synthetase) are responsible for this previously undescribed neurodevelopmental disorder. We also characterized the causal mutations <italic>in vitro</italic> and studied Asns–deficient mice that mimicked aspects of the patient phenotype. This work describes ASNS deficiency as a novel neurodevelopmental disorder, identifies three distinct causal mutations in the ASNS gene, and indicates that asparagine synthesis is essential for the proper development and function of the brain.</p><p>In the second project, we exome sequenced 62 undiagnosed patients and their unaffected biological parents (trios). By analyzing all identified variants that were annotated as putatively functional and observed as a novel genotype in the probands (not observed in the unaffected parents or controls), we obtained a genetic diagnosis for 32% (20/62) of these patients. Additionally, we identify strong candidate variants in 31% (13/42) of the undiagnosed cases. We also present additional analysis methods for moving beyond traditional screens, e.g., considering only securely implicated genes, or subjecting qualifying variants from any gene to two unique analysis approaches. This work adds to the growing evidence for the utility of diagnostic exome sequencing, increases patient sizes for rare neurodevelopmental disorders (enabling more detailed analyses of the phenotypic spectrum), and proposes novel analysis approaches which will likely become beneficial as the number of sequenced undiagnosed patients grows. </p><p>In the third project, we again employ a trio–based exome sequencing design to investigate the role of <italic>de novo</italic> mutations in two classical forms of epileptic encephalopathy. We find a significant excess of <italic>de novo</italic> mutations in the &sim;4,000 genes that are the most intolerant to functional genetic variation in the human population (P = 2.9 x 10<super>–3</super>, likelihood analysis). We provide clear statistical evidence for two novel genes associated with epileptic encephalopathy — <italic>GABRB3</italic> and <italic>ALG13</italic>. Together with the 15 well–established epileptic encephalopathy genes, we statistically confirm the association of an additional ten putative epileptic encephalopathy genes. We show that only &sim;12% of epileptic encephalopathy patients in our cohort are explained by <italic>de novo</italic> mutations in one of these 24 genes, highlighting the extreme locus heterogeneity of the epileptic encephalopathies. </p><p>Finally, we investigated multiplex epilepsy families to uncover novel epilepsy susceptibility factors. Candidate variants emerging from sequencing within discovery families were further assessed by cosegregation testing, variant association testing in a case–control cohort, and gene–based resequencing in a cohort of additional multiplex epilepsy families. Despite employing multiple approaches, we did not identify any clear genetic associations with epilepsy. This work has, however, identified a set of candidates that may include real risk factors for epilepsy; the most promising of these is the <italic>MYCBP2</italic> gene. This work emphasizes the extremely high locus and allelic heterogeneity of the epilepsies and demonstrates that very large sample sizes are needed to uncover novel genetic risk factors. </p><p>Collectively, this body of work has securely implicated three novel neurodevelopmental disease genes that inform the underlying pathology of these disorders. Furthermore, in the final three studies, this work has highlighted additional candidate variants and genes that may ultimately be validated as disease–causing as sample sizes increase.</p> / Dissertation Genetics Medicine ASNS epilepsy epileptic encephalopathy neurodevelopmental disorders next-generation sequencing
144	Towards understanding mastrevirus dynamics and the use of viral metagenomic approaches to identify novel gemini-like circular DNA viruses Kraberger, Simona January 2015 (has links) Mastreviruses (family Geminiviridae) are plant-infecting viruses with circular single-stranded (ss) DNA genomes (~2.7kb). The genus Mastrevirus is comprised of thirty-two species which are transmitted by leafhoppers belonging to the genus Cicadulina. Mastreviruses are widely distributed and have been found in the Middle East, Europe, Asia, Australia, Africa and surrounding islands. Only one species, dragonfly-associated mastrevirus has so far been identified in the Americas, isolated from a dragonfly in Puerto Rico. Species can be group based on the host(s) they infect, those which infect monocotyledonous (monocot) plants and those which infect dicotyledonous (dicot) plants. In recent years many new mastrevirus species have been discovered. Several of these new discoveries can largely been attributed to the development of new molecular tools. The current state of sequencing platforms has made it affordable and easier to characterise mastreviruses at a genome level thus allowing scientists to delve deeper into understanding the dynamics of mastreviruses. A few mastrevirus species have been identified as important agricultural pathogens and as a result have been the focus of much of the mastrevirus research. Maize streak virus, strain A (MSV-A) has been the most extensively studied due to the devastating impact it has on maize production in Africa. Studies have shown that MSV-A likely emerged as a pathogen of maize less than 250 years following introduction of maize in Africa by early European settlers. There is compelling evidence to suggest that MSV-A is likely the result of recombination events between wild grass adapted MSV strains. It therefore is equally important to monitor viruses infecting non-cultivated plants in order to gain a greater understanding of the epidemiological dynamics of mastreviruses, which in turn is essential for implementing disease management strategies. The objective of the research undertaken as part of this PhD thesis was to investigate global mastrevirus dynamics focusing on diversity, host and geographic ranges, mechanisms of evolution, phylogeography and possible origins of these viruses. In addition to this a viral metagenomic approach was used in order to identify novel mastreviruses or mastrevirus-like present in New Zealand. The dynamics of the monocot-infecting mastreviruses are investigated in Chapter Two and Three. The work described in these two chapters focus mainly on mastreviruses which infect non-cultivated grasses in Africa and Australia, a total of 161 full mastrevirus genomes were recovered collectively in the two studies. Chapter Two reveals a high level of mastrevirus diversity present in Australia with the discovery of four new species and several new strains of previously characterised species. An extensive sampling effort in Africa undertaken in Chapter Three reveals a broader host range and geographic distribution of the African monocot-infecting mastreviruses than previously documented. Mosaic patterns of recombination are evident among both the Australian and African monocot-infecting mastreviruses. In Chapters Four, Five and Six a comprehensive investigation was undertaken focusing on the dicot-infecting mastreviruses. The study undertaken in Chapter Four entailed the recovery of 49 full mastrevirus genomes from Australia, the Middle East, Africa, Turkey and the Indian Subcontinent to investigate the diversity of dicot-infecting mastreviruses from a global context. Analyses revealed a high degree of CpCDV strain diversity and extended the known geographic range of CpCDV. For the first time phylogeographic analysis was able to investigate the origins of the dicot-infecting mastreviruses. Results revealed the likely origin of the most recent common ancestor (MRCA) of these viruses is likely closer to Australia than anywhere else that dicot-infecting mastreviruses have been sampled and illuminated a supported series of historical movements following the emergence of the MRCA. In Chapter Five two novel mastreviruses Australian-like mastreviruses were isolated from chickpea material from Pakistan. A comprehensive analysis of CpCDV isolates in the major pulse growing regions of Sudan in Chapter Six reveals that this region harbours a high degree of strain diversity. Complex patterns of intra-species recombination indicate these strains are evidently circulating in these regions and infecting the same hosts, driving the emergence of new CpCDV strains. Collectively the results discussed in Chapters Two through Six extended the current knowledge of mastrevirus diversity. The natural host range of many mastreviruses has proven to be more extensive than previously documented, with many species having overlapping host ranges and hence these hosts could be acting as ‘mixing vessels’ enabling inter-species recombination. Patterns of recombination and selection were observed in both the monocot-infecting and the dicot-infecting mastreviruses further elucidating the mechanisms these viruses employ to evolve rapidly. Extensive sampling in a wide range of geographic regions provides insights into the true geographic range of species such as MSV and CpCDV. Given that mastreviruses have been able to move globally and Australia has been identified as a major mastrevirus diversity hotspot it is conceivable that mastreviruses are also present in New Zealand. In Chapter Seven and Eight this is explored by using a viral metagenomic approach to investigate the ssDNA viral populations associated with wild grasses and sewage material in New Zealand. Although no mastreviruses were recovered, this endeavour resulted in the discovery of more than 50 novel circular Rep-encoding ssDNA (CRESS DNA) viruses associated with non-cultivated grasses and treated sewage material, many of which are similar to mastreviruses and other geminiviruses. These discoveries expand current knowledge on the diversity of ssDNA viruses present in New Zealand and further highlight this viral metagenomic approach as an effective method for ssDNA virus discovery. Overall the results discussed in this thesis provide insights into mastrevirus diversity and dynamics as well as revealing a wealth of novel CRESS DNA viruses, some of which share similarities to geminiviruses. Geminivirus Mastrevirus single-stranded DNA virus Next-generation sequencing Viral metagenomics
145	Genomic variations in the EGFR pathway in relation to skin toxicity of EGFR inhibitors analyzed by deep sequencing Hasheminasab, Sayedmohammad 22 April 2015 (has links) No description available. 610 EGFR inhibitors Next generation sequencing skin toxicity Toxikologie Pharmakologie
146	Bioinformatics challenges of high-throughput SNP discovery and utilization in non-model organisms 2014 October 1900 (has links) A current trend in biological science is the increased use of computational tools for both the production and analysis of experimental data. This is especially true in the field of genomics, where advancements in DNA sequencing technology have dramatically decreased the time and cost associated with DNA sequencing resulting in increased pressure on the time required to prepare and analyze data generated during these experiments. As a result, the role of computational science in such biological research is increasing. This thesis seeks to address several major questions with respect to the development and application of single nucleotide polymorphism (SNP) resources in non-model organisms. Traditional SNP discovery using polymerase chain reaction (PCR) amplification and low-throughput DNA sequencing is a time consuming and laborious process, which is often limited by the time required to design intron-spanning PCR primers. While next-generation DNA sequencing (NGS) has largely supplanted low-throughput sequencing for SNP discovery applications, the PCR based SNP discovery method remains in use for cost effective, targeted SNP discovery. This thesis seeks to develop an automated method for intron-spanning PCR design which would remove a significant bottleneck in this process. This work develops algorithms for combining SNP data from multiple individuals, independent of the DNA sequencing platforms, for the purpose of developing SNP genotyping arrays. Additionally, tools for the filtering and selection of SNPs will be developed, providing start to finish support for the development of SNP genotyping arrays in complex polyploids using NGS. The result of this work includes two automated pipelines for the design of intron-spanning PCR primers, one which designs a single primer pair per target and another that designs multiple primer pairs per target. These automated pipelines are shown to reduce the time required to design primers from one hour per primer pair using the semi-automated method to 10 minutes per 100 primer pairs while maintaining a very high efficacy. Efficacy is tested by comparing the number of successful PCR amplifications of the semi- automated method with that of the automated pipelines. Using the Chi-squared test, the semi-automated and automated approaches are determined not to differ in efficacy. Three algorithms for combining SNP output from NGS data from multiple individuals are developed and evaluated for their time and space complexities. These algorithms were found to be computationally efficient, requiring time and space linear to the size of the input. These algorithms are then implemented in the Perl language and their time and memory performance profiled using experimental data. Profiling results are evaluated by applying linear models, which allow for predictions of resource requirements for various input sizes. Additional tools for the filtering of SNPs and selection of SNPs for a SNP array are developed and applied to the creation of two SNP arrays in the polyploid crop Brassica napus. These arrays, when compared to arrays in similar species, show higher numbers of polymorphic markers and better 3-cluster genotype separation, a viable method for determining the efficacy of design in complex genomes. Bioinformatics Next-Generation Sequencing SNP Discovery SNP Genotyping Arrays Non-model organism informatics
147	Distributed H∞ Control of Segmented Telescope Mirrors Ulutas, Baris 12 August 2014 (has links) Segmented mirrors are to be used in the next generation of the ground-based optical telescopes to increase the size of the primary mirrors. A larger primary mirror enables the collection of more light, which results in higher image resolutions. The main reason behind the choice of segmented mirrors over monolithic mirrors is to reduce manufacturing, transportation, and maintenance costs of the overall system. However, segmented mirrors bring new challenges to the telescope design and control problem. The large number of inputs and outputs make the computations for centralized control schemes intractable. Centralized controllers also result in systems that are vulnerable to a complete system failure due to a malfunction of the controller. Distributed control is a viable alternative that requires the use of a network of simple individual segment controllers that can address two levels of coupling among segments and achieve the same performance objectives. Since segments share a common support structure, there exists a coupling among segments at the dynamics level. Any control action in one segment may excite the natural modes of the support structure and disturb other segments through this common support. In addition, the objective of maintaining a smooth mirror surface requires minimization of the relative displacements among neighbouring segment edges. This creates another level of coupling generally referred to as the objective coupling. This dissertation investigates the distributed H∞ control of the segmented next generation telescope primary mirrors in the presence of wind disturbances. Three distributed H∞ control techniques are proposed and tested on three segmented primary mirror models: the dynamically uncoupled model, the dynamically coupled model and the finite element model of Thirty Meter Telescope (TMT) project. It is shown that the distributed H∞ controllers are able to satisfy the stringent imaging performance requirements. / Graduate / 0548 robust control H∞ control distributed control segmented mirror next generation telescope active optics
148	Sviluppo ed applicazione di pipilines bioinformatiche per l'analisi di dati NGS / DEVELOPMENT AND APPLICATION OF BIOINFORMATICS PIPELINES FOR NEXT GENERATION SEQUENCING DATA ANALYSIS LAMONTANARA, ANTONELLA 28 January 2015 (has links) Lo sviluppo delle tecnologie di sequenziamento ha portato alla nascita di strumenti in grado di produrre gigabasi di dati di sequenziamento in una singola corsa. Queste tecnologie, comunemente indicate come Next Generation Sequencing o NGS, producono grandi e complessi dataset la cui analisi comporta diversi problemi a livello bioinformatico. L'analisi di questo tipo di dati richiede la messa a punto di pipelines computazionali il cui sviluppo richiede un lavoro di scripting necessario per concatenare i softwares già esistenti. Questa tesi tratta l'aspetto metodologico dell'analisi di dati NGS ottenuti con tecnologia Illumina. In particolare in essa sono state sviluppate tre pipelines bioinformatiche applicate ai seguenti casi studio: 1) uno studio di espressione genica mediante RNA-seq in "Olea europaea" finalizzato all’indagine dei meccanismi molecolari alla base dell’acclimatazione al freddo in questa specie; 2) uno studio mediante RNA-seq finalizzato all’identificazione dei polimorfismi di sequenza nel trascrittoma di due razze bovine mirato a produrre un ampio catalogo di marcatori di tipo SNPs; 3) il sequenziamento, l’assemblaggio e l’annotazione del genoma di un ceppo di Lactobacillus plantarum che mostrava potenziali proprietà probiotiche. / The advance in sequencing technologies has led to the birth of sequencing platforms able to produce gigabases of sequencing data in a single run. These technologies commonly referred to as Next Generation Sequencing or NGS produce millions of short sequences called “reads” generating large and complex datasets that pose several challenges for Bioinformatics. The analysis of large omics dataset require the development of bioinformatics pipelines that are the organization of the bioinformatics tools in computational chains in which the output of one analysis is the input of the subsequent analysis. A work of scripting is needed to chain together a group of existing software tools.This thesis deals with the methodological aspect of the data analysis in NGS sequencing performed with the Illumina technology. In this thesis three bioinformatics pipelines were developed.to the following cases of study: 1) a global transcriptome profiling of “Oleaeuropeae” during cold acclimation, aimed to unravel the molecular mechanisms of cold acclimation in this species; 2) a SNPs profiling in the transcriptome of two cattle breeds aimed to produce an extensive catalogue of SNPs; 3) the genome sequencing, the assembly and annotation of the genome of a Lactobacillus plantarum strain showing probiotic properties. BIO/11: BIOLOGIA MOLECOLARE
149	MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data 2014 November 1900 (has links) The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs. Bioinformatics Sequence Alignment Smith-Waterman Algorithm GPU Computing CUDA Sequence Assembly Metagenomics Next-Generation-Sequencing
150	Proximity Ligation Assays for Disease Biomarkers Analysis Nong, Rachel Yuan January 2011 (has links) One of the pressing needs in the field of disease biomarker discovery is new technologies that could allow high performance protein analysis in different types of clinical material, such as blood and solid tissues. This thesis includes four approaches that address important limitations of current technologies, thus enabling highly sensitive, specific and parallel protein measurements. Paper I describes a method for sensitive singleplex protein detection in complex biological samples, namely solid phase proximity ligation assay (SP-PLA). SP-PLA exhibited improved sensitivity compared to conventional sandwich immunoassays. We applied SP-PLA to validate the potential of GDF-15 as a biomarker for cardiovascular disease. Paper II describes ProteinSeq, a multiplexed immunoassay based on the principle of SP-PLA, for parallel detection of 36 proteins using next-generation sequencing as readout. ProteinSeq exhibited improved sensitivity compared to multiplexed sandwich immunoassays, and the potential to achieve even higher levels of multiplexing while preserving a high sensitivity and specificity. We applied ProteinSeq to analyze 36 proteins, including one internal control, in 5 μl of plasma samples in a cohort of patients with cardiovascular disease and healthy controls. Paper III describes PLA-DTM, a strategy for recording all possible interactions between sets of proteins in clinical samples. Individual proteins and their interactions are first encoded to dual barcoded DNA by PLA, and the barcodes are interrogated by a method named dual tag microarray (DTM). We applied the method for studying interactions among protein members of the NFκB signaling pathway. Paper IV describes a novel probing strategy for analyzing individual biomolecules in solution or in situ. The technique employs a new class of probes for unfolding proximity ligation assays - uPLA probes. The probes are designed so that each probe set is sufficient in forming and replicating circular DNA reporter, without interactions among themselves when incubated with the sample. The uPLA probing strategy provides ease in the design of multiple probe sets in parallelized assays while enhancing the specificity of detection. We used the uPLA probes to detect various targets, including synthetic DNA and cancer-related transcripts in situ. proximity ligation assay blood biomarkers protein interactions pathway analysis single molecule next-generation sequencing

Search results