51 |
Optimizing analysis pipelines for improved variant discoveryHighnam, Gareth Wei An 17 April 2014 (has links)
In modern genomics, all experiments begin data collection with sequencing and downstream alignment or assembly processing. As such, the development of reliable sequencing pipelines is hugely important as a foundation for any future analysis on that data. While much existing work has been done on enhancing the throughput and computational performance of such pipelines, there is still the question of accuracy. The rift in knowledge between speed and accuracy can be attributed to the more conceptually complex nature of what constitutes the measurement of accuracy. Unlike simply parsing logs of memory usage and CPU hours, accuracy requires experimental validation. Subsets of accuracy are also created when assessing alignment or variations around particular genomic features such as indels, Copy Number Variants (CNVs), or microsatellite repeats. Here is the development of accuracy measurements in read alignment and variation calls, allowing the optimization of sequencing pipelines at all stages. The underlying hypothesis, then, is that different sequencing platforms and analysis software can be distinguished from each other in accuracy by both sample and genomic variation of interest. As the term accuracy suggests, the measurements of alignment and variation recall require comparison against a truth set, for which read library simulations and high quality data from the Genome in a Bottle Consortium or Illumina Omni array have served us. In exploring the hypothesis, the measurements are built into a community resource to crowdsource the creation of a benchmarking repository for pipeline comparison. Results from pipelines promoted by this computational model are then wet lab validated with support for a hierarchy of pipeline performance. Particularly, the construction of an accurate pipeline for genotyping microsatellite repeats will be investigated, which is then used to create a database of human microsatellites.
Progress in this area is vital for the growth of sequencing in both clinical and research settings. For genomics research to fully translate to the bedside, the boom of new technology must be controlled by rational metrics and industry standardization. This project will address both of these issues, as well as contribute to the understanding of human microsatellite variation. / Ph. D.
|
52 |
Tagging systems for sequencing large cohortsNeiman, Mårten January 2010 (has links)
<p>Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets.</p><p>This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized.</p><p>The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set.</p><p>In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained.</p> / QC20100907
|
53 |
Sequenciamento, montagem e anotação do genoma de um novo isolado de Leptospira borgpetersenii / Sequencing, assembly and genome annotation of a new isolated of Leptospira borgpeterseniiEslabão, Marcus Redu 27 February 2012 (has links)
Made available in DSpace on 2014-08-20T13:32:45Z (GMT). No. of bitstreams: 1
dissertacao_marcus_redu_eslabao.pdf: 801425 bytes, checksum: d5a120076fe65d76b21da14d5db5817b (MD5)
Previous issue date: 2012-02-27 / Leptospirosis is a neglected zoonosis with global distribution. The disease is caused by pathogenic bacteria of the genus Leptospira, which affect humans and various domestic and wild animals, causing serious problems to human health and damage
to livestock. The objective of this study was to determine the genome sequence of Leptospira borgpetersenii serogroup Ballum strain 4E, isolated from domestic mice (Mus musculus), one of the main reservoirs of this genus. The complete genome
sequence was determined using SOLiDTM system, which generated over 85 million 50 bp reads. These reads were used to obtain scaffolds of the two chromosomes present in this organism through the ab initio sequence assembly with Velvet and Edena softwares and orientation of contigs with G4All software. With completion of the assembly process, the large chromosome was 3,071,053 bp, GC content of 40.58%, 36 tRNA, 4 rRNA and 2,908 open reading frames (ORF). The small
chromosome has 305,940 bp, GC content of 40.25%, 277 ORFs, no tRNA or rRNA. A reduction in the large chromosome of 4E strain was observed compared to the large chromosome of L550 strain, where 99 genes of L550 strain are not present in the 4E strain and about 394 kb of non-coding region was also lost. The main hypothesis for this reduction is the effect of the presence of a large number of mobile genetic elements. Genome reduction has been observed in other strains of L.
borgpetersenii. The Applied Biosystems SOLiD 4 method allowed determination of the genome sequence of L. borgpetersenii strain 4E, with wide coverage and accuracy. The ab initio assembly methods used allowed for complete utilization of the sequences generated. / A leptospirose é uma zoonose negligenciada com distribuição global. A doença é causada por bactérias patogênicas do gênero Leptospira, as quais acometem humanos e vários animais domésticos e silvestres, acarretando graves problemas à saúde humana e prejuízos na pecuária. O presente trabalho teve como objetivo sequenciar o genoma da Leptospira borgpetersenii sorogroupo Ballum cepa 4E, isolada de camundongo doméstico (Mus musculus), um dos principais reservatórios deste gênero. A sequência completa do genoma foi determinada através do sistema SOLiDTM, onde foram obtidas mais de 85 milhões de leituras com tamanho de 50 pb
cada. Essas leituras foram utilizadas para obtenção de scaffolds dos dois cromossomos presente neste organismo, através de montagem ab initio com os softwares Velvet e Edena; e posterior orientação das contigs com o software G4All. Com a conclusão da montagem, o cromossomo maior apresentou o tamanho de 3.071.053 pb, 40,58% de conteúdo GC, 36 tRNA, 4 rRNA e 2.908 fases de leitura abertas (ORF). Para o cromossomo menor o total de bases foi de 305.940 pb,
conteúdo GC de 40,25%, 277 ORFs, nenhum tRNA e rRNA foram preditos. Foi observada uma redução do cromossomo maior da cepa 4E em ralação ao cromossomo maior da cepa L550, onde 99 genes da cepa L550 não estão presentes
na cepa 4E e cerca de 394 kb de região não codificante também foi perdida. A principal hipótese para a redução é o efeito da presença de um grande número de elementos móveis, processo observado no genoma de outras cepas da espécie L.
borgpetersenii. O método Applied Biosystems SOLiD 4 permitiu a determinação da sequência do genoma de L. borgpetersenii cepa 4E, com ampla cobertura e acurácia. Os metodos de montagem ab initio utilizados proporcionaram aproveitar ao máximo as sequencias geradas.
|
54 |
Tagging systems for sequencing large cohortsNeiman, Mårten January 2010 (has links)
Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets. This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized. The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set. In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained. / QC20100907
|
55 |
Nanopore sequencing for Mycobacterium tuberculosis: a critical review of the literature, new developments and future opportunitiesDippenaar, A., Goossens, S.N., Grobbelaar, M., Oostvogels, S., Cuypers, B., Laukens, K., Meehan, Conor J., Warren, R.M., van Rie, A. 18 June 2021 (has links)
Yes / The next-generation short-read sequencing technologies that generate comprehensive, whole-genome data with single-nucleotide resolution have already advanced tuberculosis diagnosis, treatment, surveillance and source investigation. Their high costs, tedious and lengthy processes, and large equipment remain major hurdles for research use in high tuberculosis burden countries and implementation into routine care. The portable next-generation sequencing devices developed by Oxford Nanopore Technologies (ONT) are attractive alternatives due to their long-read sequence capability, compact low-cost hardware, and continued improvements in accuracy and throughput. A systematic review of the published literature demonstrated limited uptake of ONT sequencing in tuberculosis research and clinical care. Of the 12 eligible articles presenting ONT sequencing data on at least one Mycobacterium tuberculosis sample, four addressed software development for long read ONT sequencing data with potential applications for M. tuberculosis. Only eight studies presented results of ONT sequencing of M. tuberculosis, of which five performed whole-genome and three did targeted sequencing. Based on these findings, we summarize the standard processes, reflect on the current limitations of ONT sequencing technology, and the research needed to overcome the main hurdles. Summary: The low capital cost, portable nature and continued improvement in the performance of ONT sequencing make it an attractive option for sequencing for research and clinical care, but limited data is available on its application in the tuberculosis field. Important research investment is needed to unleash the full potential of ONT sequencing for tuberculosis research and care.
|
56 |
Identifying functional variation in schizophrenia GWAS loci by pooled sequencingLoken, Erik 01 January 2014 (has links)
Schizophrenia demonstrates high heritability in part accounted for by common simple nucleotide variants (SNV), rare copy number variants (CNV) and, most recently, rare SNVs Although heritability explained by rare SNVs and CNVs is small compared to that explained by common SNVs, rare SNVs in functional sequences may identify specific disease mechanisms. However, current exome methods do not capture a large proportion of potentially functional bases where rare variation may impact disease risk: as much as two-thirds of conserved sequences lie outside the exome in non-coding regions of cross-species evolutionary constraint. We reasoned that the candidate loci from the Psychiatric Genomics Consortium Phase 1 (PGC-1) schizophrenia study represent good target loci to test for the impact of rare SNVs in non-coding constrained regions. We developed custom reagents to capture mammalian constrained non-coding regions, exons, and 5’- and 3’-untranslated regions (UTRs) in the 12 PGC-1 loci for pooled sequencing in 912 cases and 936 controls. Compared to our coding targets, our noncoding targets contain substantially more highly conserved bases (46,412 vs. 31,609) and variants (390 vs. 193). Using C-alpha to detect excess variance due to aggregate risk increasing or decreasing rare SNV effects, we identified signals attributable to alleles with MAF < 0.1% in both coding sequences and in functional non-coding sequences, including variants within ENCODE transcription factor binding sites, DNase hypersensitive regions, and histone modification sites in neuronal cell lines. We also observed significant excess risk-altering variation in the CUB domain of CSMD1, a gene expressed in the developing central nervous system. These results support the hypothesis that common and rare variants in the same loci contribute to schizophrenia risk, but highlight the need to expand capture strategies in order to detect trait-relevant sequence variation in a broader set of functional sequences.
|
57 |
Detection and characterization of gene-fusions in breast and ovarian cancer using high-throughput sequencingMittal, Vinay K. 21 September 2015 (has links)
Gene-fusions are a prevalent class of genetic variants that are often employed as cancer biomarkers and therapeutic targets. In recent years, high-throughput sequencing of the cellular genome and transcriptome have emerged as a promising approach for the investigation of gene-fusions at the DNA and RNA level. Although, large volumes of sequencing data and complexity of gene-fusion structures presents unique computational challenges. This dissertation describes research that first addresses the bioinformatics challenges associated with the analysis of the massive volumes of sequencing data by developing bioinformatics pipeline and more applied integrated computational workflows. Application of high-throughput sequencing and the proposed bioinformatics approaches for the breast and ovarian cancer study reveals unexpected complex structures of gene-fusions and their functional significance in the onset and progression of cancer. Integrative analysis of gene-fusions at DNA and RNA level shows the key importance of the regulation of gene-fusion at the transcription level in cancer.
|
58 |
Genetic Alterations in Pheochromocytoma and ParagangliomaWelander, Jenny January 2015 (has links)
Pheochromocytomas and paragangliomas are neuroendocrine tumors that arise from neural crest-derived cells of the adrenal medulla and the extra-adrenal paraganglia. They cause hypertension due to an abnormally high production of catecholamines (mainly adrenaline and noradrenaline), with symptoms including recurrent episodes of headache, palpitations and sweating, and an increased risk of cardiovascular disease. Malignancy in the form of distant metastases occurs in 10-15% of the patients. The malignant cases are difficult to predict and cure, and have a poor prognosis. About a third of pheochromocytomas and paragangliomas are caused by hereditary mutations in a growing list of known susceptibility genes. However, the cause of the remaining, sporadic, tumors is still largely unknown. The aim of this thesis project has been to further characterize the genetic background of pheochromocytomas and paragangliomas, with a focus on the sporadic tumors. First, we investigated the role of the genes known from the familial tumors in the sporadic form of the disease. By studying mutations, copy number variations, DNA methylation and gene expression, we found that many of the known susceptibility genes harbor somatic alterations in sporadic pheochromocytomas. Particularly, we found that the NF1 gene, which plays an important role in suppressing cell growth and proliferation by regulating the RASMAPK pathway, was inactivated by mutations in more than 20% of the cases. The mutations occurred together with deletions of the normal allele and were associated with a reduced NF1 gene expression and a specific hormone profile. We also detected activating mutations in the gene EPAS1, which encodes HIF-2α, in a subset of sporadic cases. Microarray analysis of gene expression showed that several genes involved in angiogenesis and cell metabolism were upregulated in EPAS1-mutated tumors, which is in agreement with the role of HIF-2α in the cellular response to hypoxia. In order to comprehensively investigate all the known susceptibility genes in a larger patient cohort, we designed a targeted next-generation sequencing approach and could conclude that it was fast and cost-efficient for genetic testing of pheochromocytomas and paragangliomas. The results showed that about 40% of the sporadic cases had mutations in the tested genes. The majority of the mutations were somatic, but some apparently sporadic cases in fact carried germline mutations. Such knowledge of the genetic background can be of importance to facilitate early detection and correct treatment of pheochromocytomas, paragangliomas and potential co-occurring cancers, and also to identify relatives that might be at risk. By sequencing all the coding regions of the genome, the exome, we then identified recurrent activating mutations in a novel gene, in which mutations have previously only been reported in subgroups of brain tumors. The identified mutations are proposed to cause constitutive activation of the encoded receptor tyrosine kinase, resulting in the activation of downstream kinase signaling pathways that promote cell growth and proliferation. In summary, the studies increase our biological understanding of pheochromocytoma and paraganglioma, and possibly also co-occurring cancers in which the same genes and pathways are involved. Together with the findings of other scientific studies, our results may contribute to the development of future treatment options.
|
59 |
Next-generation bioinformatics analysis of bacterial genomes, with a focus on serovar host specificity and pathogenicity in SalmonellaRichardson, Emily Jane January 2013 (has links)
Salmonella is one of the most important pathogens of mankind and animals alike, causing several billion pounds worth of damage worldwide each year. We have sequenced, annotated and published 4 genomes of Salmonella of well-defined virulence in farm animals. This provides valuable measures of intraserovar diversity and opportunities to formally link genotypes to phenotypes in target animals. Specifically, we have examined pathway detrition and mutagenesis and linked this to host specificity of the serovars. With the advent of next generation sequencing there has been a boom in genomic sequence submission, and an onslaught of -omics data has ensued. Integrating these different data types is complex and there is little available to visualise this data in the context of its genome. We present GeneBook, a web-based tool that synchronously integrates disparate datasets, displaying a fully annotated genome, enriched with publicly available data and the user's private experiments. It is accessed through a user-friendly interface that allows scientists to interrogate genomic features across multiple, heterogeneous, experiments.
|
60 |
Investigation of genetic variation contributing to antipsychotic treatment response in a South African first episode schizophrenia cohortDrogemoller, Britt Ingrid 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: Schizophrenia is a debilitating disorder that occurs the world over. Although antipsychotics
are largely effective in treating the positive symptoms of schizophrenia, the outcomes are
non-optimal in many patients. As antipsychotic treatment response has been shown to be
heritable, it is expected that the implementation of antipsychotic pharmacogenomics should
aid in the optimization of antipsychotic treatments, however to date clinically applicable
results are limited. Therefore this study utilized exome sequencing in a cohort of well
characterized first episode schizophrenia patients to identify the genetic factors
contributing to antipsychotic treatment response.
The utility of exome sequencing for antipsychotic pharmacogenomic applications in the
African context was assessed through examination of the literature and publically available
data. Thereafter, a cohort of 104 well characterized South African first episode
schizophrenia patients who were treated with flupenthixol decanoate for twelve months
was collected. From this cohort, subsets of patients on extreme ends of the treatment
response spectrum were identified for exome sequencing. Thereafter a bioinformatics
pipeline was used to call and annotate variants. These variants and those that have
previously been associated with antipsychotic response, along with a panel of ancestry
informative markers, were prioritized for genotyping in the entire cohort of patients. After
genotyping of the 393 variants, statistical analyses were performed to identify associations
with treatment response outcomes.
Examination of the literature revealed a need for exome sequencing in Africa. However,
critical analyses of next generation sequencing data demonstrated that complex regions of
the genome may not be well suited to these technologies. Thus, it may be necessary to
combine exome sequencing with knowledge obtained from past research, as was done in
this study to identify the genetic factors contributing to antipsychotic treatment response.
Using this strategy, the current study highlighted the potential role that rare variants play in
antipsychotic treatment response and additionally detected 11 variants that were
significantly associated with antipsychotic treatment response outcomes (P=2.19x10-5). Nine
of these variants were predicted to alter the function of the genes in which they occurred;
of which eight were novel with regards to antipsychotic treatment response. The remaining
two variants have been associated with antipsychotic treatment outcomes in previous
GWAS. Examination of the function of the genes in which the variants occurred revealed
that the variants associated with (i) positive symptom improvement were involved in the
folate metabolism pathway and (ii) negative and general pathological symptoms
improvement had potential links to neuronal development and migration.
To our knowledge this study is the first to utilize exome sequencing for antipsychotic
pharmacogenomic purposes. The ability of this study to identify significant associations,
even after correction for multiple testing, has highlighted the importance of combining
genomic technologies with well characterized cohorts. The results generated from this study
have served both to replicate results from previous antipsychotic pharmacogenetic studies
and to identify novel genes and pathways involved in antipsychotic response. These results
should aid in improving our understanding of the biological underpinnings of antipsychotic
treatment response and may ultimately aid in the optimization of these treatments. / AFRIKAANSE OPSOMMING: Skisofrenie is ‘n siekte wat wêreldwyd voorkom en lei tot erge funksionele inkorting.
Alhoewel antipsigotiese medikasie redelik effektief is in die behandeling van die positiewe
simptome van skisofrenie, is die funksionele uitkomste in baie pasiënte nie optimaal nie.
Die reaksie op antipsigotiese behandeling blyk oorerflik te wees. Die verwagting is dus dat
die implementering van antipsigotiese farmakogenomika met die optimalisering van
antipsigotiese behandeling sal help. Tot dusver het die resultate van farmakogenomika
studies egter beperkte kliniese toepassings opgelewer. Hierdie studie het dus eksoomvolgordebepaling
in 'n groep van goed-karakteriseerde eerste-episode skisofrenie pasiënte
gebruik om die genetiese faktore wat bydra tot die antipsigotiese behandelings-reaksies te
identifiseer.
Die gebruik van eksoom-volgordebepaling vir antipsigotiese farmakogenomika in die Afrikakonteks
is deur die ondersoek van literatuur en openbaar-beskikbare data geëvalueer.
Daarna is 'n groep van 104 goed-gekarakteriseerde Suid-Afrikaanse eerste-episode
skisofrenie pasiënte, wat met flupenthixol dekanoaat vir twaalf maande behandel is,
versamel. Uit hierdie groep is subgroepe van pasiënte op die teenoorgestelde eindpunte
van die behandelings-reaksiespektrum vir eksoom-volgordebepaling geïdentifiseer. Hierna is
'n bioinformatika pyplyn gebruik om variante te identifiseer en te annoteer. Hierdie
variante, asook variante wat voorheen met antipsigotiese reaksie geassosieer is, is saam
met 'n paneel van afkoms-informatiewe merkers vir genotipering in die hele groep pasiënte
geprioritiseer vir genotipering. Na genotipering van die 393 variante, is statistiese analises
uitgevoer om assosiasies met behandelings-reaksie uitkomste te identifiseer.
‘n Ondersoek van die literatuur het getoon dat daar 'n behoefte vir eksoomvolgordebepaling
in Afrika is. ‘n Kritiese analise van volgende-generasie volgordebepalings
data het egter getoon dat komplekse dele van die genoom nie geskik is vir die gebruik van
hierdie tegnologie nie. Om die genetiese faktore wat bydra tot suksesvolle antipsigotiese
behandeling te identifiseer, mag dit nodig wees om eksoom-volgordebepaling te kombineer
met bevindings verkry uit vorige navorsing, soos gedoen in hierdie studie. In die huidige
studie het die gebruik van hierdie strategie die potensiële rol van skaars variante in
antipsigotiese behandelings-reaksies beklemtoon en ‘n bykomende 11 variante is
geïdentifiseer wat beduidend met antipsigotiese behandelingsrespons geassosieer is
(P=2.19x10-5). Daar is voorspel dat nege van hierdie variante die funksie van die gene
waarin hulle voorkom sal verander en agt van hierdie variante is vir die eerste keer met
antipsigotiese behandelingsrespons geassosieer. Die oorblywende twee variante is met
antipsigotiese behandelingsrespons in vorige GWAS geassosieer. ‘n Ondersoek na die
funksie van die gene waarin die variasies voorgekom het, toon dat die variante wat
geassosieer is met (i) verbetering van positiewe simptome ‘n rol speel in folaatmetabolisme,
terwyl variante wat geassosieer is met (ii) die verbetering in negatiewe en
algemene patologiese simptome potensiële skakels met neuron ontwikkeling en migrasie
het.
Na ons wete is hierdie die eerste studie wat eksoom-volgordebepaling vir antipsigotiese
farmakogenomika doeleindes gebruik. Die vermoë van hierdie studie om beduidende
assosiasies te identifiseer, selfs na korreksie vir veelvoudige toetse, onderstreep die
belangrikheid van die kombinering van genomiese tegnologie met goed-gekarakteriseerde
pasiënte. Die bevindinge van hierdie studie het nie net die resultate van vorige antipsigotiese farmakogenetiese studies bevestig nie, maar ook nuwe gene en variante wat
betrokke is in antipsigotiese reaksie geïdentifiseer. Hierdie resultate sal hopelik ons begrip
van die onderliggende biologiese faktore wat antipsigotiese behandelingsrespons beïnvloed
verbeter en uiteindelik ook met die optimalisering van behandeling help.
|
Page generated in 0.1032 seconds