Global ETD Search

21	Helicase Purification for DNA Sequencing Leah, Labib January 2014 (has links) BACKGROUND: A method to increase accuracy and ease-of-use, while decreasing time and cost in deoxyribonucleic acid (DNA) sequence identification, is sought after. Helicase, which unwinds DNA, and avidin, which strongly attracts biotin for potential attraction of biotinylated DNA segments, were investigated for use in a novel DNA sequencing method. AIM: This study aimed to (1) purify bacteriophage T7 gene product 4 helicase and helicase-avidin fusion protein in a bacterial host and (2) characterize their functionality. METHODS: Helicase and helicase-avidin were cloned for purification from bacteria. Helicase-avidin was solubilised via urea denaturation/renaturation. DNA and biotin binding were assessed using Electrophoretic Mobility Shift Assays and biotinylated resins, respectively. RESULTS: (1) Helicase and helicase-avidin proteins were successfully purified. (2) Helicase protein was able to bind DNA and avidin protein strongly bound biotin. CONCLUSION: Helicase and helicase-avidin can be purified in a functional form from a bacterial host, thus supporting further investigation for DNA sequencing purposes. DNA Helicase DNA Sequencing
22	In Silico Edgetic Profiling and Network Analysis of Human Genetic Variants, with an Application to Disease Module Detection Cui, Hongzhu 18 May 2020 (has links) In the past several decades, Next Generation Sequencing (NGS) methods have produced large amounts of genomic data at the exponentially increasing rate. It has also enabled tremendous advancements in the quest to understand the molecular mechanisms underlying human complex traits. Along with the development of the NGS technology, many genetic variation and genotype–phenotype databases and functional annotation tools have been developed to assist scientists to better understand the intricacy of the data. Together, the above findings bring us one step closer towards mechanistic understanding of the complex phenotypes. However, it has rarely been possible to translate such a massive amount of information on mutations and their associations with phenotypes into biological or therapeutic insights, and the mechanisms underlying genotype-phenotype relationships remain partially explained. Meanwhile, increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. Among them, protein- protein interaction (PPI) network studies have attracted perhaps most attention. Our overarching goal of this dissertation is to (i) perform a systematic study to investigate the role of pathogenic human genetic variant in the interactome; (ii) examine how common population-specific SNVs affect PPI network and how they contribute to population phenotypic variance and disease susceptibility; and (iii) develop a novel framework to incorporate the functional effect of mutations for disease module detection. In this dissertation, we first present a systematic multi-level characterization of human mutations associated with genetic disorders by determining their individual and combined interaction-rewiring effects on the human interactome. Our in-silico analysis highlights the intrinsic differences and important similarities between the pathogenic single nucleotide variants (SNVs) and frameshift mutations. Functional profiling of SNVs indicates widespread disruption of the protein-protein interactions and synergistic effects of SNVs. The coverage of our approach is several times greater than the recently published experimental study and has the minimal overlap with it, while the distributions of determined edgotypes between the two sets of profiled mutations are remarkably similar. Case studies reveal the central role of interaction- disrupting mutations in type 2 diabetes mellitus and suggest the importance of studying mutations that abnormally strengthen the protein interactions in cancer. Second, aided with our SNP-IN tool, we performed a systematic edgetic profiling of population specific non-synonymous SNVs and interrogate their role in the human interactome. Our results demonstrated that a considerable amount of normal nsSNVs can cause disruptive impact to the interactome. We also showed that genes enriched with disruptive mutations associated with diverse functions and have implications in various diseases. Further analysis indicates that distinct gene edgetic profiles among major populations can help explain the population phenotypic variance. Finally, network analysis reveals phenotype-associated modules are enriched with disruptive mutations and the difference of the accumulated damage in such modules may suggest population-specific disease susceptibility. Lastly, we propose and develop a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein–protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non- synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. With the advancement of next-generation sequencing technology that drives precision medicine, there is an increasing demand in understanding the changes in molecular mechanisms caused by the specific genetic variation. The current and future in-silico edgotyping tools present a cheap and fast solution to deal with the rapidly growing datasets of discovered mutations. Our work shows the feasibility of a large- scale in-silico edgetic study and revealing insights into the orchestrated play of mutations inside a complex PPI network. We also expect for our module detection method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers. Next Generation Sequencing
23	RPOS-DEPENDENT STATIONARY PHASE INDUCTION OF NITRATE REDUCTASE Z IN E. COLI Chang, Lily 12 1900 (has links) During entry into stationary phase, Escherichia coli expresses many genes which impart cellular resistance to numerous environmental stresses such as oxidative or acid stress. Many ofthese genes are regulated by the alternative sigma factor, RpoS To identify additional genes regulated by RpoS, a phenotype independent genetic screen was previously employed (L. Wei Masters thesis). The identities ofthe ten most highly RpoSdependent fusions were determined by DNA sequencing and subsequent sequence analysis using the BLAST algorithm Three fusions map to genes previously known to be RpoSdependent while the remaining seven represent new members ofthe regulon The expression of many ofthe RpoS-dependent fusions remained growth phase dependent even in the rpoS background This suggests that other growth phase regulatory factors in addition to RpoS may coordinate stationary phase gene expression Upon sequencing the remaining rsd fusions, three mutants mapped to narY which is part ofthe narZYWV operon encoding the secondary nitrate reductase Z (NRZ) This operon was selected for further investigation since NRZ has been previously reported to be constitutively expressed Expression studies using promoter lacZ fusions and nitrate reductase assays reveal that NRZ is induced ten-fold at the onset ofstationary phase and twenty-fold in the presence of nitrate Like other rsd fusions, growth phase dependent expression was observed in an rpoS background indicating that other regulatory factors may be involvedn the regulation of NRZ Northern analyses using probes specific to NRZ confirmed that transcription of NRZ is indeed dependent on RpoS. These results suggest that RpoS mediated regulation of NRZ may be an important physiological adaptation to reduced oxygen levels during transition to stationary phase / Thesis / Master of Science (MSc) E. Coli RpoS DNA Sequencing
24	High-throughput DNA Sequencingin Microbial Ecology : Methods and Applications Hugerth, Luisa January 2016 (has links) Microorganisms play central roles in planet Earth’s geochemical cycles, in food production, and in health and disease of humans and livestock. In spite of this, most microbial life forms remain unknown and unnamed, their ecological importance and potential technological applications beyond the realm of speculation. This is due both to the magnitude of microbial diversity and to technological limitations. Of the many advances that have enabled microbiology to reach new depth and breadth in the past decade, one of the most important is affordable high-throughput DNA sequencing. This technology plays a central role in each paper in this thesis. Papers I and II are focused on developing methods to survey microbial diversity based on marker gene amplification and sequencing. In Paper I we proposed a computational strategy to design primers with the highest coverage among a given set of sequences and applied it to drastically improve one of the most commonly used primer pairs for ecological surveys of prokaryotes. In Paper II this strategy was applied to an eukaryotic marker gene. Despite their importance in the food chain, eukaryotic microbes are much more seldom surveyed than bacteria. Paper II aimed at making this domain of life more amenable to high-throughput surveys. In Paper III, the primers designed in papers I and II were applied to water samples collected up to twice weekly from 2011 to 2013 at an offshore station in the Baltic proper, the Linnaeus Microbial Observatory. In addition to tracking microbial communities over these three years, we created predictive models for hundreds of microbial populations, based on their co-occurrence with other populations and environmental factors. In paper IV we explored the entire metagenomic diversity in the Linnaeus Microbial Observatory. We used computational tools developed in our group to construct draft genomes of abundant bacteria and archaea and described their phylogeny, seasonal dynamics and potential physiology. We were also able to establish that, rather than being a mixture of genomes from fresh and saline water, the Baltic Sea plankton community is composed of brackish specialists which diverged from other aquatic microorganisms thousands of years before the formation of the Baltic itself. / <p>QC 20150505</p>
25	Multiple displacement amplification and whole genome sequencing for the diagnosis of infectious diseases Anscombe, C. J. January 2016 (has links) Next-generation sequencing technologies are revolutionising our ability to characterise and investigate infectious diseases. Utilising the power of high throughput sequencing, this study reports, the development of a sensitive, non-PCR based, unbiased amplification method, which allows the rapid and accurate sequencing of multiple microbial pathogens directly from clinical samples. The method employs Φ29 DNA polymerase, a highly efficient enzyme able to produce strand displacement during the polymerisation process with high fidelity. Problems with DNA secondary structure were overcome and the method optimised to produce sufficient DNA to sequence from a single bacterial cell in two hours. Evidence was also found that the enzyme requires at least six bases of single stranded DNA to initiate replication, and is not capable of amplification from nicks. Φ29 multiple displacement amplification was shown to be suitable for a range of GC contents and bacterial cell wall types as well as for viral pathogens. The method was shown to be able to provide relative quantification of mixed cells, and a method for quantification of viruses using a known standard was developed. To complement the novel molecular biology workflow, a data analysis pipeline was developed to allow pathogen identification and characterisation without prior knowledge of input. The use of de novo assemblies for annotation was shown to be equivalent to the use of polished reference genomes. Single cell Φ29 MDA samples had better assembly and annotation than non-amplification controls, a novel finding which, when combined with the very long DNA fragments produced, has interesting implications for a variety of analytical procedures. A sampling process was developed to allow isolation and amplification of pathogens directly from clinical samples, with good concordance shown between this method and traditional testing. The process was tested on a variety of modelled and real clinical samples showing good application to sterile site infections, particularly bacteraemia models. Within these samples multiple bacterial, viral and parasitic pathogens were identified, showing good application across multiple infection types. Emerging pathogens were identified including Onchocerca volvulus within a CSF sample, and Sneathia sanguinegens within an STI sample. Use of Φ29 MDA allows rapid and accurate amplification of whole pathogen genomes. When this is coupled with the sample processing developed here it is possible to detect the presence of pathogens in sterile sites with a sensitivity of a single genome copy. 616.9
26	Methods to Prepare DNA for Efficient Massive Sequencing Lundin, Sverker January 2012 (has links) Massive sequencing has transformed the field of genome biology due to the continuous introduction and evolution of new methods. In recent years, the technologies available to read through genomes have undergone an unprecedented rate of development in terms of cost-reduction. Generating sequence data has essentially ceased to be a bottleneck for analyzing genomes instead to be replaced by limitations in sample preparation and data analysis. In this work, new strategies are presented to increase both the throughput of library generation prior to sequencing, and the informational content of libraries to aid post-sequencing data processing. The protocols developed aim to enable new possibilities for genome research concerning project scale and sequence complexity. The first two papers that underpin this thesis deal with scaling library production by means of automation. Automated library preparation is first described for the 454 sequencing system based on a generic solid-phase polyethylene-glycol precipitation protocol for automated DNA handling. This was one of the first descriptions of automated sample handling for producing next generation sequencing libraries, and substantially improved sample throughput. Building on these results, the use of a double precipitation strategy to replace the manual agarose gel excision step for Illumina sequencing is presented. This protocol considerably improved the scalability of library construction for Illumina sequencing. The third and fourth papers present advanced strategies for library tagging in order to multiplex the information available in each library. First, a dual tagging strategy for massive sequencing is described in which two sets of tags are added to a library to trace back the origins of up to 4992 amplicons using 122 tags. The tagging strategy takes advantage of the previously automated pipeline and was used for the simultaneous sequencing of 3700 amplicons. Following that, an enzymatic protocol was developed to degrade long range PCR-amplicons and forming triple-tagged libraries containing information of sample origin, clonal origin and local positioning for the short-read sequences. Through tagging, this protocol makes it possible to analyze a longer continuous sequence region than would be possible based on the read length of the sequencing system alone. The fifth study investigates commonly used enzymes for constructing libraries for massive sequencing. We analyze restriction enzymes capable of digesting unknown sequences located some distance from their recognition sequence. Some of these enzymes have previously been extensively used for massive nucleic acid analysis. In this first high throughput study of such enzymes, we investigated their restriction specificity in terms of the distance from the recognition site and their sequence dependence. The phenomenon of slippage is characterized and shown to vary significantly between enzymes. The results obtained should favor future protocol development and enzymatic understanding. Through these papers, this work aspire to aid the development of methods for massive sequencing in terms of scale, quality and knowledge; thereby contributing to the general applicability of the new paradigm of sequencing instruments. / <p>QC 20121126</p> DNA Massive sequencing Next Generation Sequencing Library Preparation Barcoding Multiplexing
27	Comparison of DNA sequence assembly algorithms using mixed data sources Bamidele-Abegunde, Tejumoluwa 15 April 2010 DNA sequence assembly is one of the fundamental areas of bioinformatics. It involves the correct formation of a genome sequence from its DNA fragments ("reads") by aligning and merging the fragments. There are different sequencing technologies -- some support long DNA reads and the others, shorter DNA reads. There are sequence assembly programs specifically designed for these different types of raw sequencing data.<p> This work explores and experiments with these different types of assembly software in order to compare their performance on the type of data for which they were designed, as well as their performance on data for which they were not designed, and on mixed data. Such results are useful for establishing good procedures and tools for sequence assembly in the current genomic environment where read data of different lengths are available. This work also investigates the effect of the presence or absence of quality information on the results produced by sequence assemblers.<p> Five strategies were used in this research for assembling mixed data sets and the testing was done using a collection of real and artificial data sets for six bacterial organisms. The results show that there is a broad range in the ability of some DNA sequence assemblers to handle data from various sequencing technologies, especially data other than the kind they were designed for. For example, the long-read assemblers PHRAP and MIRA produced good results from assembling 454 data. The results also show the importance of having an effective methodology for assembling mixed data sets. It was found that combining contiguous sequences obtained from short-read assemblers with long DNA reads, and then assembling this combination using long-read assemblers was the most appropriate approach for assembling mixed short and long reads. It was found that the results from assembling the mixed data sets were better than the results obtained from separately assembling individual data from the different sequencing technologies. DNA sequence assemblers which do not depend on the availability of quality information were used to test the effect of the presence of quality values when assembling data. The results show that regardless of the availability of quality information, good results were produced in most of the assemblies.<p> In more general terms, this work shows that the approach or methodology used to assemble DNA sequences from mixed data sources makes a lot of difference in the type of results obtained, and that a good choice of methodology can help reduce the amount of effort spent on a DNA sequence assembly project. Sanger sequencing Next generation sequencing technoloiges DNA sequence assembly
28	Comparison of DNA sequence assembly algorithms using mixed data sources Bamidele-Abegunde, Tejumoluwa 15 April 2010 (has links) DNA sequence assembly is one of the fundamental areas of bioinformatics. It involves the correct formation of a genome sequence from its DNA fragments ("reads") by aligning and merging the fragments. There are different sequencing technologies -- some support long DNA reads and the others, shorter DNA reads. There are sequence assembly programs specifically designed for these different types of raw sequencing data.<p> This work explores and experiments with these different types of assembly software in order to compare their performance on the type of data for which they were designed, as well as their performance on data for which they were not designed, and on mixed data. Such results are useful for establishing good procedures and tools for sequence assembly in the current genomic environment where read data of different lengths are available. This work also investigates the effect of the presence or absence of quality information on the results produced by sequence assemblers.<p> Five strategies were used in this research for assembling mixed data sets and the testing was done using a collection of real and artificial data sets for six bacterial organisms. The results show that there is a broad range in the ability of some DNA sequence assemblers to handle data from various sequencing technologies, especially data other than the kind they were designed for. For example, the long-read assemblers PHRAP and MIRA produced good results from assembling 454 data. The results also show the importance of having an effective methodology for assembling mixed data sets. It was found that combining contiguous sequences obtained from short-read assemblers with long DNA reads, and then assembling this combination using long-read assemblers was the most appropriate approach for assembling mixed short and long reads. It was found that the results from assembling the mixed data sets were better than the results obtained from separately assembling individual data from the different sequencing technologies. DNA sequence assemblers which do not depend on the availability of quality information were used to test the effect of the presence of quality values when assembling data. The results show that regardless of the availability of quality information, good results were produced in most of the assemblies.<p> In more general terms, this work shows that the approach or methodology used to assemble DNA sequences from mixed data sources makes a lot of difference in the type of results obtained, and that a good choice of methodology can help reduce the amount of effort spent on a DNA sequence assembly project. Sanger sequencing Next generation sequencing technoloiges DNA sequence assembly
29	Genetics of Two Mendelian Traits and Validation of Induced Pluripotent Stem Cell (iPSC) Technology for Disease Modeling Raykova, Doroteya January 2015 (has links) Novel technologies for genome analysis have provided almost unlimited opportunities to uncover structural gene variants behind human disorders. Whole exome sequencing (WES) is especially useful for understanding rare Mendelian conditions, because it reduces the requirements for a priori clinical data, and can be applied on a small number of patients. However, supporting functional data on the effect of specific gene variants are often required to power these findings. A variety of methods and biological model systems exists for this purpose. Among those, induced pluripotent stem cells (iPSCs), which are capable of self-renewal and differentiation, stand out as an alternative to animal models. In papers I and II we took advantage of WES to identify gene variants underlying autosomal recessive pure hair and nail ectodermal dysplasia (AR PHNED) as well as autosomal dominant familial visceral myopathy (FVM). We identified a homozygous variant c.821T>C (p.Phe274Ser) in the KRT74 gene as the causative mutation in AR PHNED, supported by the fact that Keratin-74 was undetectable in hair follicles of an affected family member. In a family segregating FVM we found a heterozygous tandem base substitution c.806_807delinsAA (p.(Gly269Glu)) in the ACTG2 gene in the affected members. This novel variant is associated with a broad range of visceral symptoms and a variable age of onset. In Paper III we explored the similarity between clonally derived iPSC lines originating from a single parental fibroblast line and we highlighted the necessity to use lines originating from various donors in disease modeling because of biological variation. Paper IV focused on how the genomic integrity of iPSCs is affected by the choice of reprogramming methods. We described several novel cytogenetic rearrangements in iPSCs and we identified a chromosome 5q duplication as a candidate aberration for growth advantage. In summary, this doctoral thesis brings novel findings on unreported disease-causing variants, as supported by extensive genetic analysis and functional data. A novel molecular mechanism behind AR PHNED is presented and the phenotypic spectrum associated with FVM is expanded. In addition, the thesis brings novel understanding of benefits and limitations of the iPSC technology to be considered for disease modeling. Disease modeling Mendelian disorders iPSC Whole exome sequencing Transcriptome sequencing
30	Draft Assembly and Baseline Annotation of the Ziziphus spina-christi Genome Shuwaikan, Raghad H. 07 1900 (has links) Third generation sequencing has revolutionized our understanding of genomics, and enabled the in-depth discovery of complex plant genomes. In this project I aimed to assemble and annotate the genome of Z. spina-christi, a native plant to Saudi Arabia, as part of the the Kingdom of Saudi Arabia Native Genome Project established at the Center for Desert Agriculture at KAUST. Initially, a voucher plant was selected from the Al Lith region of Western Saudi Arabia. Fresh leaf tissue was collected for high-molecular weight (HMW) DNA extraction, as well as seed for greenhouse propagation. After HMW DNA extraction, library construction and PacBio HiFi sequencing, I generated a de novo assembly of the Z. spina-christi genome using the Hifiasm assembler, which yielded a 1.9 Gbp long assembly with high levels of duplication. The assembled contigs were scaffolded using an in-house script based on the software RagTag, that yielded a 406 Mbp long scaffold with 331 gaps (85.45% of estimated genome size). A preliminary analysis of the assembly for transposable elements revealed a TE content of 32.36%, with Long Terminal Repeats retrotransposons (LTR-RTs) being the major contributor to the total TE content. Basline annotation was completed using Omicsbox revealing 18,330 functional genes. This work describes the first genomic resource for the desert plant Z. spina-christi. To improve the assembly, I suggest the use of scaffolding using optical mapping, long Nanopore reads and Hi-C data to capture the spatial organization of the genome. Further experimental, genetic and TEs analysis is needed to explore the plant’s resilience to abiotic stresses in extreme environments. sequencing genome assembly plant genomics plant genomes third-generation sequencing

Search results