• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 175
  • 50
  • 48
  • 33
  • 22
  • 5
  • 4
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 426
  • 426
  • 426
  • 70
  • 66
  • 63
  • 56
  • 51
  • 50
  • 47
  • 46
  • 45
  • 44
  • 44
  • 42
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

In Silico Edgetic Profiling and Network Analysis of Human Genetic Variants, with an Application to Disease Module Detection

Cui, Hongzhu 18 May 2020 (has links)
In the past several decades, Next Generation Sequencing (NGS) methods have produced large amounts of genomic data at the exponentially increasing rate. It has also enabled tremendous advancements in the quest to understand the molecular mechanisms underlying human complex traits. Along with the development of the NGS technology, many genetic variation and genotype–phenotype databases and functional annotation tools have been developed to assist scientists to better understand the intricacy of the data. Together, the above findings bring us one step closer towards mechanistic understanding of the complex phenotypes. However, it has rarely been possible to translate such a massive amount of information on mutations and their associations with phenotypes into biological or therapeutic insights, and the mechanisms underlying genotype-phenotype relationships remain partially explained. Meanwhile, increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. Among them, protein- protein interaction (PPI) network studies have attracted perhaps most attention. Our overarching goal of this dissertation is to (i) perform a systematic study to investigate the role of pathogenic human genetic variant in the interactome; (ii) examine how common population-specific SNVs affect PPI network and how they contribute to population phenotypic variance and disease susceptibility; and (iii) develop a novel framework to incorporate the functional effect of mutations for disease module detection. In this dissertation, we first present a systematic multi-level characterization of human mutations associated with genetic disorders by determining their individual and combined interaction-rewiring effects on the human interactome. Our in-silico analysis highlights the intrinsic differences and important similarities between the pathogenic single nucleotide variants (SNVs) and frameshift mutations. Functional profiling of SNVs indicates widespread disruption of the protein-protein interactions and synergistic effects of SNVs. The coverage of our approach is several times greater than the recently published experimental study and has the minimal overlap with it, while the distributions of determined edgotypes between the two sets of profiled mutations are remarkably similar. Case studies reveal the central role of interaction- disrupting mutations in type 2 diabetes mellitus and suggest the importance of studying mutations that abnormally strengthen the protein interactions in cancer. Second, aided with our SNP-IN tool, we performed a systematic edgetic profiling of population specific non-synonymous SNVs and interrogate their role in the human interactome. Our results demonstrated that a considerable amount of normal nsSNVs can cause disruptive impact to the interactome. We also showed that genes enriched with disruptive mutations associated with diverse functions and have implications in various diseases. Further analysis indicates that distinct gene edgetic profiles among major populations can help explain the population phenotypic variance. Finally, network analysis reveals phenotype-associated modules are enriched with disruptive mutations and the difference of the accumulated damage in such modules may suggest population-specific disease susceptibility. Lastly, we propose and develop a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein–protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non- synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. With the advancement of next-generation sequencing technology that drives precision medicine, there is an increasing demand in understanding the changes in molecular mechanisms caused by the specific genetic variation. The current and future in-silico edgotyping tools present a cheap and fast solution to deal with the rapidly growing datasets of discovered mutations. Our work shows the feasibility of a large- scale in-silico edgetic study and revealing insights into the orchestrated play of mutations inside a complex PPI network. We also expect for our module detection method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers.

Exploring the diversity of unmapped reads from human deep sequencing

Zarif Saffari, Amin January 2012 (has links)
currently DNA and RNA sequencing are performed as standard parts of many scientific experiments. While the majority of the reads produced in these experiments do map to the genome of the organism of interest there are a significant fraction that do not. These reads have often been viewed as uninteresting and thus discarded, sometimes explained as errors created in the sequencing process. However, there may be a real possibility that these reads actually contain genomic sequences belonging to, but not currently in the genome ofthe organism investigated, as well as information about other organisms which live and thrivein the sample material. Considering this, it is of great interest to investigate these reads to see if they contain any usable information. In this project the unmapped reads from SOLiD sequencing of blood and saliva from a twin pair were assembled. The assembled parts were thencompared to different blast databases to investigate if similar genomic regions are reported inother species. We can conclude that indeed a large fraction of the contigs found in this assemblyhave homology to bacterial genes while other contigs share similarity to genomic regions foundin apes and other species closely related to us. All in all the results show that there is more to the unmapped reads than just sequencing errors.

Integrated approaches to elucidate the genetic architecture of congenital heart defects

Al Turki, Saeed January 2014 (has links)
Congenital heart defects (CHD) are structural anomalies affecting the heart, are found in 1% of the population and arise during early stages of embryo development. Without surgical and medical interventions, most of the severe CHD cases would not survive after the first year of life. The improved health care for CHD patients has increased CHD prevalence significantly, and it has been estimated that the population of adults with CHD is growing ~5% per year. Understanding the causes of CHD would greatly help improve our knowledge of the pathophysiology, family counseling and planning and possibly prevention and treatment in the future. The aim of my thesis was to identify novel or known CHD genes enriched for rare coding genetic variants in isolated CHD cases and learn about the relative performance of different study designs. High-throughput next generation sequencing (NGS) was used to sequence all coding genes (whole exome) coupled with various analytical pipelines and tools to identify candidate genes in different family-based study designs. Since there is no general consensus on the underlying genetic model of isolated CHD, I developed a suite of software tools to enable different family-based exome analyses of de novo and inherited variants (chapter 2) and then piloted these tools in several gene discovery projects where the mode of inheritance was already known to identify previously described and novel pathogenic genes, before applying them to an analysis of families with two or more siblings with CHD. Based on the tools developed in chapter 2, I designed a two-stage study to investigate isolated parent-offspring trios with Tetralogy of Fallot (chapter 3). In the first stage, I used whole exome sequence data from 30 trios to identify genes with de novo coding variants. This analysis identified six de novo loss-of-function and 13 de novo missense variants. Only one gene showed recurrent de novo mutations in NOTCH1, a well known CHD gene that has mostly been associated with left ventricle outflow tract malformations (LVOT). Besides NOTCH1, the de novo analysis identified several possibly pathogenic novel genes such as ZMYM2 and ARHGAP35, that harbor de novo loss-of-function variants (frameshift and stop gain, respectively). In the second stage of the study, I designed custom baits to capture 122 candidate genes for additional sequencing using NGS in a larger sample size of 250 parent-offspring trios with isolated Tetralogy of Fallot and identified six de novo variants in four genes, half of them are loss-of-function variants. Both of NOTCH1 and its ligand JAG1 harbor two additional de novo mutations (two stop gains in NOTCH1 and one missense and a splice donor in JAG1). The analysis showed a strongly significant over-representation of de novo loss-of-function variants in NOTCH1 (P=3.8 ×10-9). To assess alternative family-based study design in CHD, I combined the analysis from 13 isolated parent-offspring trios with 112 unrelated index cases of isolated atrioventricular septal defects (AVSD) in chapter 4. Initially, I started with a case/control analysis to test the burden of rare missense variants in cases compared with 5,194 ethnically matching controls and identified the gene NR2F2 (Fisher exact test P=7.7×10-07, odds ratio=54). The de novo analysis in the AVSD trios identified two de novo missense variants in the same gene. NR2F2 encodes a pleiotropic developmental transcription factor, and decreased dosage of NR2F2 in mice has been shown to result in abnormal development of atrioventricular septa. The results from luciferase assays show that all coding sequence variants observed in patients significantly alter the activity of NR2F2 target promoters. My work has identified both known and novel CHD genes enriched for rare coding variants using next-generation sequencing data. I was able to show how using single or combined family-based study designs is an effective approach to study the genetic causes of isolated CHD subtypes. Despite the extreme heterogeneity of CHD, combining NGS data with the proper study design has proved to be an effective approach to identify novel and known CHD genes. Future studies with considerably larger sample sizes are required to yield deeper insights into the genetic causes of isolated CHD.

Next-generation sequencing methylation profiling of subjects with obesity identifies novel gene changes

Day, Samantha E., Coletta, Richard L., Kim, Joon Young, Campbell, Latoya E., Benjamin, Tonya R., Roust, Lori R., De Filippis, Elena A., Dinu, Valentin, Shaibi, Gabriel Q., Mandarino, Lawrence J., Coletta, Dawn K. 18 July 2016 (has links)
Background: Obesity is a metabolic disease caused by environmental and genetic factors. However, the epigenetic mechanisms of obesity are incompletely understood. The aim of our study was to investigate the role of skeletal muscle DNA methylation in combination with transcriptomic changes in obesity. Results: Muscle biopsies were obtained basally from lean (n = 12; BMI = 23.4 +/- 0.7 kg/m(2)) and obese (n = 10; BMI = 32.9 +/- 0.7 kg/m(2)) participants in combination with euglycemic-hyperinsulinemic clamps to assess insulin sensitivity. We performed reduced representation bisulfite sequencing (RRBS) next-generation methylation and microarray analyses on DNA and RNA isolated from vastus lateralis muscle biopsies. There were 13,130 differentially methylated cytosines (DMC; uncorrected P < 0.05) that were altered in the promoter and untranslated (5' and 3'UTR) regions in the obese versus lean analysis. Microarray analysis revealed 99 probes that were significantly (corrected P < 0.05) altered. Of these, 12 genes (encompassing 22 methylation sites) demonstrated a negative relationship between gene expression and DNA methylation. Specifically, sorbin and SH3 domain containing 3 (SORBS3) which codes for the adapter protein vinexin was significantly decreased in gene expression (fold change -1.9) and had nine DMCs that were significantly increased in methylation in obesity (methylation differences ranged from 5.0 to 24.4 %). Moreover, differentially methylated region (DMR) analysis identified a region in the 5' UTR (Chr. 8: 22,423,530-22,423,569) of SORBS3 that was increased in methylation by 11.2 % in the obese group. The negative relationship observed between DNA methylation and gene expression for SORBS3 was validated by a site-specific sequencing approach, pyrosequencing, and qRT-PCR. Additionally, we performed transcription factor binding analysis and identified a number of transcription factors whose binding to the differentially methylated sites or region may contribute to obesity. Conclusions: These results demonstrate that obesity alters the epigenome through DNA methylation and highlights novel transcriptomic changes in SORBS3 in skeletal muscle.

Using next generation sequencing to investigate the generation of diversity in the genus Begonia

Emelianova, Katie January 2017 (has links)
Begonia is one of the most diverse genera on the planet, with a species count approaching 2000 and a distribution across tropics in South America, Africa and South East Asia. The genus has occupied a vast range of niches; many highly variable growth forms can be found across the distribution, and species exhibit very diverse morphologies, even in closely related species. A recent study has revealed a putative whole genome duplication (WGD) event in the evolutionary history of Begonia, which has prompted an interest in investigating the impact gene and genome duplication has had on the diversification of Begonia. To answer questions about phenotypic and ecological diversification in Begonia, two species from South America, B. conchifolia and B. plebeja were chosen as study species based on their close phylogenetic relationship and divergent ecology and phenotype. RNA-seq data for six tissues from B. conchifolia and B. plebeja was generated using the Illumina sequencing platform, and normalised relative expression data was obtained by mapping reads to transcripts predicted from the B. conchifolia draft genome. A bioinformatics pipeline was devised to compare expression profiles across 6 different tissues between duplicated gene pairs shared between B. conchifolia and B. plebeja. Gene duplicate pairs were selected as candidates if they showed divergent expression in one species but not in another. Such duplicate pairs are suggestive of neofunctionalization in one species, providing evidence of a potential basis for phenotypic divergence and diversification between B. conchifolia and B. plebeja. Two duplicate pairs were identified as showing such divergent expression patterns as well as being functionally ecologically relevant, Chalcone Synthase and 3-Ketoacyl-CoA synthase, involved in anthocyanin biosynthesis and wax biosynthesis respectively. Investigation of expression and duplication patterns in both gene families showed the candidate gene families to be strikingly different. While 3-Ketoacyl-CoA synthase showed deeper duplications shared with outgroup taxa, Chalcone Synthase appeared to be expanded very recently, with a burst of duplications specific to the genus. 3-Ketoacyl-CoA synthase showed examples of partitioned expression by tissue for different gene family members, with at least five members of the gene family being highly expressed in one or two tissues only. Chalcone Synthase, however, showed dominance of one basal gene family member. Other Chalcone Synthase members, though expressed at lower levels, showed some evidence of reciprocal silencing in B. plebeja, though this pattern was not observed in B. conchifolia. Further investigation of the Chalcone Synthase gene family revealed lineage specific duplication in B. plebeja, and more extensive differential duplication patterns were found across other South American Begonias. Additionally, signals of positive selection were found in two branches on the Chalcone Synthase phylogeny.

Next-generation bioinformatics analysis of bacterial genomes, with a focus on serovar host specificity and pathogenicity in Salmonella

Richardson, Emily Jane January 2013 (has links)
Salmonella is one of the most important pathogens of mankind and animals alike, causing several billion pounds worth of damage worldwide each year. We have sequenced, annotated and published 4 genomes of Salmonella of well-defined virulence in farm animals. This provides valuable measures of intraserovar diversity and opportunities to formally link genotypes to phenotypes in target animals. Specifically, we have examined pathway detrition and mutagenesis and linked this to host specificity of the serovars. With the advent of next generation sequencing there has been a boom in genomic sequence submission, and an onslaught of -omics data has ensued. Integrating these different data types is complex and there is little available to visualise this data in the context of its genome. We present GeneBook, a web-based tool that synchronously integrates disparate datasets, displaying a fully annotated genome, enriched with publicly available data and the user's private experiments. It is accessed through a user-friendly interface that allows scientists to interrogate genomic features across multiple, heterogeneous, experiments.

UTILIZAÇÃO da Bioinformática na Busca de Novos Genes em Osteogênese Imperfeita

COUTINHO, A. S. 26 February 2018 (has links)
Made available in DSpace on 2018-08-01T21:35:03Z (GMT). No. of bitstreams: 1 tese_12056_Dissertação_Amanda Silva Coutinho.pdf: 1166104 bytes, checksum: f4756c682c195491abc65c33b3ce87fc (MD5) Previous issue date: 2018-02-26 / A osteogênese imperfeita (OI) é uma doença genética rara do tecido conjuntivo, causada por mutações em genes que participam, em geral, da formação óssea. A maioria dos pacientes é portadora de mutações nos genes que codificam o colágeno tipo 1, mas já foram descritas mutações em mais de 17 outros genes causando OI e ainda existe uma busca constante de novos genes na área cientifica. Entre as estratégias de diagnóstico molecular destaca-se a técnica de sequenciamento de nova geração (NGS), que pode sequenciar vários genes presentes em uma plataforma customizada, gerando uma grande quantidade de dados genômicos. Esses dados se tornam preciosas fontes de informação na busca de novos genes relacionados a doenças. O objetivo desta pesquisa foi realizar a busca de novos genes potencialmente causadores de OI por meio de recursos de bioinformática. Foram utilizadas estratégias de filtragem pelo programa Microsoft Office Excel 2013, bem como análises de predição de mutação. Como referência genômica foram utilizados os bancos de dados Ensembl e National Center for Biotechnology Information. Foram selecionados quatro pacientes diagnosticados clinicamente com OI que foram submetidos à técnica de NGS e apresentaram resultados normais para os genes conhecidos. Com o intuito de selecionar uma lista de genes candidatos na plataforma customizada de NGS que estivessem relacionados com os sintomas de OI, foi realizada uma busca de genes no banco de dados Ensembl envolvidos com as vias metabólicas de formação óssea, cartilaginosa ou de colágeno, que identificou 643 genes. A lista de genes candidatos foi comparada com os genes sequenciados dos pacientes, onde foram selecionados 70 genes em comum para análise. Foram realizadas filtragens in silico de forma a selecionar alterações raras na população, preditas como patogênicas e que efetivamente codifiquem uma proteína ou uma molécula de RNA funcional. Os resultados mostraram que o paciente P.1 é portador de uma mutação em heterozigose potencialmente patogênica no gene ALX1. O paciente P.2 apresentou apenas uma alteração no gene COL6A3 que foi predita como polimorfismo. O paciente P.3 apresentou mutações patogênicas em heterozigose nos genes ALPL e FKBP10. No paciente P.4 foram encontradas mutações patogênicas em heterozigose nos genes P3H1 e RYR1. Entre os cinco genes identificados, sabe-se que dois deles, FKBP10 e P3H1, estão relacionados com a OI de herança autossômica recessiva. Também já é descrito que mutações no gene ALPL causam sintomas clínicos semelhantes a OI, podendo confundir o diagnóstico. Assim, o presente estudo identificou dois genes, ALX1 e RYR1, potencialmente causadores de OI. O gene ALX1 tem um papel importante no desenvolvimento craniano e dos membros, pois atua na formação da cartilagem. Já o RYR1 codifica a rianodina, um importante receptor de cálcio nos osteoblastos. Estudos funcionais dos genes identificados são necessários para validar esta hipótese em pesquisas futuras. Os resultados deste trabalho sugerem que ferramentas de bioinformática podem direcionar a busca por novos genes relacionados a doenças genéticas. A caracterização de novas mutações em genes relacionados com OI auxilia no planejamento de estratégias mais eficientes que permitam o diagnóstico molecular da doença e o aconselhamento genético.

Unearthing the genome of the earthworm Lumbricus rubellus

Elsworth, Benjamin Lloyd January 2013 (has links)
The earthworm has long been of interest to biologists, most notably Charles Darwin, who was the first to reveal their true role as eco-engineers of the soil. However, to fully understand an animal one needs to combine observational data with the fundamental building blocks of life, DNA. For many years, sequencing a genome was an incredibly costly and time-consuming process. Recent advances in sequencing technology have led to high quality, high throughput data being available at low cost. Although this provides large amounts of sequence data, the bioinformatics knowledge required to assemble and annotate these new data are still in their infancy. This bottleneck is slowly opening up, and with it come the first glimpses into the new and exciting biology of many new species. This thesis provides the first high quality draft genome assembly and annotation of an earthworm, Lumbricus rubellus. The assembly process and resulting data highlight the complexity of assembling a eukaryotic genome using short read data. To improve assembly, a novel approach was created utilising transcripts to scaffold the genome (https://github.com/elswob/SCUBAT). The annotation of the assembly provides the draft of the complete proteome, which is also supported by the first RNA-Seq generated transcriptome. These annotations have enabled detailed analysis of the protein coding genes including comparative analysis with two other annelids (a leech and a polychaete worm) and a symbiont (Verminephrobacter). This analysis identified four key areas which appear to be either highly enhanced or unique to L. rubellus. Three of these may be related to the unique environment from which the sequenced worms originated and add to the mounting evidence for the use of earthworms as bioindicators of soil quality. All data is stored in relational databases and available to search and browse via a website at www.earthworms.org. It is hoped that this genome will provide a springboard for many future investigations into the earthworm and continue research into this wonderful animal.

Discovering rare variants from populations to families

Indap, Amit R. January 2013 (has links)
Thesis advisor: Gabor T. Marth / Partitioning an individual's phenotype into genetic and environmental components has been a major goal of genetics since the early 20th century. Formally, the proportion of phenotypic variance attributable to genetic variation in the population is known as heritability. Genome wide association studies have explained a modest percentage of variability of complex traits by genotyping common variants. Currently, there is great interest in what role rare variants play in explaining the missing heritability of complex traits. Advances of next generation sequencing and genomic enrichment technologies over the past several years have made it feasible to re-sequence large numbers of individuals, enabling the discovery of the full spectrum of genetic variation segregating in the human population, including rare variants. The four projects that comprise my dissertation all revolve around the discovery of rare variants from next generation sequencing datasets. In my first project, I analyzed data from the exon sequencing pilot of the 1000 Genomes Project, where I discovered variants from exome capture sequencing experiments in a worldwide sample of nearly 700 individuals. My results show that the allele frequency spectrum of the dataset has an excess of rare variants. My next project demonstrated the applicability of using whole-genome amplified DNA (WGA) in capture sequencing. WGA is a method that amplifies DNA from nanogram starting amounts of template. In two separate capture experiments I compared the concordance of call sets, both at the site and genotype level, of variant calls derived from WGA and genomic DNA. WGA derived calls have excellent concordance metrics, both at the site and genotypic level, suggesting that WGA DNA can be used in lieu of genomic DNA. The results of this study have ramifications for medical sequencing experiments, where DNA stocks are a finite quantity and re-collecting samples maybe too expensive or not possible. My third project kept its focus on capture sequencing, but in a different context. Here, I analyzed sequencing data from Mendelian exome study of non-sensorineural hearing loss (NSHL). A subset of 6 individuals (5 affected, 1 unaffected) from a family of European descent were whole exome sequenced in an attempt to uncover the causative mutation responsible for the loss of hearing phenotype in the family. Previous linkage analysis uncovered a linkage region on chr12, but no mutations in previous candidate genes were found, suggesting a novel mutation segregates in the family. Using a discrete filtering approach with a minor allele frequency cutoff, I uncovered a putative causative non-synonymous mutation in a gene that encodes a transmembrane protein. The variant perfectly segregates with the phenotype in the family and is enriched in frequency in an unrelated cohort of individuals. Finally, for my last project I implemented a variant calling method for family sequencing datasets, named Pgmsnp, which incorporates Mendelian relationships of family members using a Bayesian network inference algorithm. My method has similar detection sensitivities compared to other pedigree aware callers, and increases power of detection for non-founder individuals. / Thesis (PhD) — Boston College, 2013. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology.

Sequenciamento, montagem e anotação do genoma de um novo isolado de Leptospira borgpetersenii / Sequencing, assembly and genome annotation of a new isolated of Leptospira borgpetersenii

Eslabão, Marcus Redu 27 February 2012 (has links)
Made available in DSpace on 2014-08-20T13:32:45Z (GMT). No. of bitstreams: 1 dissertacao_marcus_redu_eslabao.pdf: 801425 bytes, checksum: d5a120076fe65d76b21da14d5db5817b (MD5) Previous issue date: 2012-02-27 / Leptospirosis is a neglected zoonosis with global distribution. The disease is caused by pathogenic bacteria of the genus Leptospira, which affect humans and various domestic and wild animals, causing serious problems to human health and damage to livestock. The objective of this study was to determine the genome sequence of Leptospira borgpetersenii serogroup Ballum strain 4E, isolated from domestic mice (Mus musculus), one of the main reservoirs of this genus. The complete genome sequence was determined using SOLiDTM system, which generated over 85 million 50 bp reads. These reads were used to obtain scaffolds of the two chromosomes present in this organism through the ab initio sequence assembly with Velvet and Edena softwares and orientation of contigs with G4All software. With completion of the assembly process, the large chromosome was 3,071,053 bp, GC content of 40.58%, 36 tRNA, 4 rRNA and 2,908 open reading frames (ORF). The small chromosome has 305,940 bp, GC content of 40.25%, 277 ORFs, no tRNA or rRNA. A reduction in the large chromosome of 4E strain was observed compared to the large chromosome of L550 strain, where 99 genes of L550 strain are not present in the 4E strain and about 394 kb of non-coding region was also lost. The main hypothesis for this reduction is the effect of the presence of a large number of mobile genetic elements. Genome reduction has been observed in other strains of L. borgpetersenii. The Applied Biosystems SOLiD 4 method allowed determination of the genome sequence of L. borgpetersenii strain 4E, with wide coverage and accuracy. The ab initio assembly methods used allowed for complete utilization of the sequences generated. / A leptospirose é uma zoonose negligenciada com distribuição global. A doença é causada por bactérias patogênicas do gênero Leptospira, as quais acometem humanos e vários animais domésticos e silvestres, acarretando graves problemas à saúde humana e prejuízos na pecuária. O presente trabalho teve como objetivo sequenciar o genoma da Leptospira borgpetersenii sorogroupo Ballum cepa 4E, isolada de camundongo doméstico (Mus musculus), um dos principais reservatórios deste gênero. A sequência completa do genoma foi determinada através do sistema SOLiDTM, onde foram obtidas mais de 85 milhões de leituras com tamanho de 50 pb cada. Essas leituras foram utilizadas para obtenção de scaffolds dos dois cromossomos presente neste organismo, através de montagem ab initio com os softwares Velvet e Edena; e posterior orientação das contigs com o software G4All. Com a conclusão da montagem, o cromossomo maior apresentou o tamanho de 3.071.053 pb, 40,58% de conteúdo GC, 36 tRNA, 4 rRNA e 2.908 fases de leitura abertas (ORF). Para o cromossomo menor o total de bases foi de 305.940 pb, conteúdo GC de 40,25%, 277 ORFs, nenhum tRNA e rRNA foram preditos. Foi observada uma redução do cromossomo maior da cepa 4E em ralação ao cromossomo maior da cepa L550, onde 99 genes da cepa L550 não estão presentes na cepa 4E e cerca de 394 kb de região não codificante também foi perdida. A principal hipótese para a redução é o efeito da presença de um grande número de elementos móveis, processo observado no genoma de outras cepas da espécie L. borgpetersenii. O método Applied Biosystems SOLiD 4 permitiu a determinação da sequência do genoma de L. borgpetersenii cepa 4E, com ampla cobertura e acurácia. Os metodos de montagem ab initio utilizados proporcionaram aproveitar ao máximo as sequencias geradas.

Page generated in 0.1745 seconds