31 |
Mutational signatures reveal the dynamic interplay of risk factors and cellular process during liver tumorigenesis / Identification des mécanismes mutagènes liés aux facteurs de risque et aux processus cellulaires dans les cancers du foieShinde, Jayendra 30 November 2017 (has links)
Le cancer est une maladie du génome. La transformation tumorale résulte de l’acquisition de mutations somatiques via divers processus mutagènes opérant tout au long de la vie du patient. Les mécanismes à l’origine des mutations incluent les erreurs de réplication, les défauts de réparation de l’ADN, les modifications de base spontanées ou catalysées par des enzymes cellulaires, et l’exposition à des agents mutagènes endogènes (ROS) ou exogènes (tabac, UV…). Au cours de ma thèse, j’ai analysé des données de séquençage exome et génome complet de tumeurs hépatiques pour décortiquer les mécanismes à l’origine des mutations dans ces tumeurs, leur interaction avec les facteurs de risque, les processus cellulaires, les gènes drivers, et leur évolution au cours de la maladie. J’ai utilisé des méthodes statistiques existantes et dévoloppé des outils bioinformatiques innovants pour:- extraire les signatures de mutations et de réarrangements structuraux à l’aide de données de séquençage à haut débit- identifier les facteurs de risque et/ou les altérations génétiques à l’origine de chacune- prédire les mécanismes mutagènes à l’origine de chaque mutation somatique- explorer les corrélations entre la densité des mutations et les processus cellulaires comme la réplication et la transcription- reconstruire l’histoire clonale des tumeurs et dater l’apparition des signatures mutationnelles et des aberrations chromosomiques.Ces approches innovantes m’ont permis d’identifier 10 signatures mutationnelles: 5 signatures ubiquitaires à l’œuvre dans toutes les tumeurs hépatiques mais modulées par les facteurs de risque (sexe, alcool, tabac), et 5 signatures sporadiques opérant dans moins de 5% des tumeurs et associées à des étiologies connues (aflatoxine B1, acide aristolochique) ou restant à identifier. J’ai aussi mis en évidence 6 signatures de réarrangements structuraux, notamment des phénotypes duplicateurs et déléteurs, spécifiques de petits groupes de tumeurs. Chaque processus mutagène est modulé différemment par la réplication et la transcription. Les signatures liées à des molécules formant des adducts sur l’ADN (hydrocarbures polycycliques aromatiques, aflatoxine B1, acide aristolochique) sont nettement moins actives dans les gènes fortement exprimés suite à l’action du transcription-coupled repair, alors que la signature 16, liée à l’alcool, présente un motif unique de transcription-coupled damage. Une corrélation étonnante entre la densité des petites insertions et délétions (indels) et l’expression des gènes a été identifiée, conduisant à une accumulation considérable d’indels dans les gènes très forterment exprimés dans les cellules hépatiques. Enfin, l’histoire clonale des tumeurs hépatiques montre l’évolution des signatures mutationnelles au cours du temps et identifie l’accumulation de gains chromosomiques multiples comme un évènement tardif entraînant probablement une croissance de la tumeur jusqu’à une taille détactable en clinique. Ces résultats nous éclairent sur les mécanismes à l’origine des altérations génomiques dans l’histoire naturelle des cancers du foie. / Cancer is a disease of the genome. A normal cell goes rogue and is transformed into a cancerous cell due to acquired somatic mutations in its genome. The catalogue of these somatic mutations observed in the cancer genome is the outcome of multiple mutational processes that have been operative over the lifetime of a patient. These mutational processes that have occurred throughout the development of cancer may be infidelity of the DNA replication machinery, impaired DNA repair system, enzymatic modifications of DNA, or exposures to exogenous or endogenous mutagens. Each mutational process leaves a characteristic pattern – a “mutational signature” on the cancer genome. Various genomic features related to genome architecture, including DNA replication and transcription, modulate these mutational processes. During my PhD, I analyzed whole exome and whole genome sequencing data from liver tumors to understand the mutational processes remodeling these tumor genomes, their interaction with risk factors, cellular processes, and driver genes, and their evolution along the tumor histories. For that aim, I used existing statistical methods and I developed innovative computational tools to:- extract mutational and structural variant signatures from next-generation sequencing data- identify risk factors or genetic alterations underlying each process- predict the mutational process at the origin of each somatic mutation- explore correlations between mutation rates and cellular processes like replication and transcription- reconstruct the clonal history of a tumor and the timing of mutational processes and copy-number changes These innovative analytical strategies allowed me to identify 10 mutational signatures: 5 ubiquitous signatures operative in every liver cancer but modulated by risk factors (gender, alcohol, tobacco), and 5 sporadic signatures operative in <5% of HCC and associated with specific known (aflatoxin B1, aristolochic acid) or unknown mutational processes. I also identified 6 structural variant signatures, including striking duplicator or deletor phenotypes in rare tumors. Each mutational process showed a different relationship with replication and transcription. Signatures of bulky DNA adducts (polycyclic aromatic hydrocarbons, aflatoxin B1, aristolochic acid) strongly decreased in highly expressed genes due to transcription-coupled repair, whereas the alcohol-related signature 16 displayed a unique feature of transcription-coupled damage. A striking positive correlation between indel rate and gene expression was observed, leading to recurrent mutations in very highly expressed tissue-specific genes. Finally, reconstructing the clonal history of HCC revealed the evolution of mutational processes along tumor development and identified synchronous chromosome duplications as late events probably leading to fast tumor growth and clinical detection of the tumor. Together, these findings shed new light on the mechanisms generating DNA alterations along the natural history of liver cancers.
|
32 |
Influence of climate change on Organism Abundance in the Kiruna Region, Northern Sweden: Insights from-long term high-quality DNA sequencing / Effekter av klimatförändring på organismabundans i Kirunaregionen, norra Sverige: insikter från långa tidsserier av högkvalitets-DNA-sekvenseringSandström, Anton January 2023 (has links)
This study investigates if the area around Kiruna, Northern Sweden has experienced large shifts in weather conditions and whether these have affected organism abundances. With a significant increase in global temperature and an increase in average temperature of 1.9 °C in Sweden during the last 130 years, it is crucial to understand the effect of climate change on organisms. The Swedish Defense Research Agency deployed an air filter station to monitor radioactive fallout. The archived filters allowed for the creation of a high-resolution time series of organism composition ranging from 1974 to 2008, based on DNA sequencing. The organisms were clustered into 17 distinct clusters based on their similarities in time series patterns. This study found that Cluster 2 (plant pathogenic bacteria), Cluster 3 (wetland microorganisms) and Cluster 5 (planktonic bacteria) exhibited changepoint correlations with relevant climate variables. Plotting the 3 clusters and their relevant climate variables revealed that sea surface temperatures have a positive influence on the abundance of both Cluster 2 and 5. Frost change days negatively influenced Cluster 2. Dry spells positively influenced Cluster3 and 5. Additionally, the results suggest that air pressure and water deficiency in soil are predictors for Cluster 5. Overall, these findings provide insights into how climate change affects different organisms and can help inform future management decisions for these ecosystems.
|
33 |
Exploration Of The Genomes Of Two Diverse ConifersThummasuwan, Supaphan 13 December 2008 (has links)
My research is focused on advancing understanding of the genomes of two important distantly related conifer species, loblolly pine (Pinus taeda L.) and bald cypress (Taxodium distichum (L.) Rich. var. distichum). Loblolly pine is the most commercially important tree crop in the United States, the major source of pulpwood for paper manufacturing, a source of quality lumber, a prime bioenergy feedstock, and an important part of the ecosystem of the southeastern U.S. Bald cypress is the dominant tree species in the aptly named ¡°cypress swamps¡± of the South.Its ecological importance to the wetlands of the southern U.S. is immeasurable. Moreover, bald cypress is a popular ornamental due to its attractive appearance and extreme resistance to pests, pathogens, and weather. Maintaining the security and productiveness of these important crop/forest species in the face of new pest, pathogen and environmental threats will require a better understanding of their genes and the structures of their genomes. We have conducted a study of loblolly pine and bald cypress in which Cot analysis and DNA sequencing of Cotiltered DNA were utilized to study genome structure. Cot analysis revealed that loblolly pine and bald cypress genomes are each composed of three major kinetic components which we have deemed highly repetitive (HR), moderately repetitive (MR), and single/low copy (SL). In loblolly pine, the HR, MR, and SL components account for 57, 24, and 10%, of genomic DNA, respectively. Of note 2.71% of random genomic sequences (i.e., 580 Mb, an amount roughly three times that of the Arabidopsis genome) show significant (bit score ≥ 60) homology to mRNA sequences. This result suggests that the loblolly pine genome contains many genes or pseudogenes, and/or gene duplications. In bald cypress, the HR, MR, and SL components account for 52, 38, and 4%, of genomic DNA, respectively. Sample sequencing was performed only on the HR component of bald cypress; sequence analysis shows only 0.81% of HR sequence reads with homology to mRNA sequences. My research provides insight into the evolution of these distant conifers and key sequence data that should greatly facilitate ongoing molecular breeding programs.
|
34 |
Gaussian Deconvolution and MapReduce Approach for Chipseq AnalysisSugandharaju, Ravi Kumar Chatnahalli 26 September 2011 (has links)
No description available.
|
35 |
The Influence of DNA Sequence and Post Translational Modifications on Nucleosome Positioning and StabilityMooney, Alex M. 20 December 2012 (has links)
No description available.
|
36 |
Strategies for de novo DNA sequencingBlomstergren, Anna January 2003 (has links)
The development of improved sequencing technologies hasenabled the field of genomics to evolve. Handling andsequencing of large numbers of samples require an increasedlevel of automation in order to obtain high throughput andconsistent quality. Improved performance has lead to thesequencing of numerous microbial genomes and a few genomes fromhigher eukaryotes and the benefits of comparing sequences bothwithin and between species are now becoming apparent. Thisthesis describes both the development of automated purificationmethods for DNA, mainly sequencing products, and a comparativesequencing project. The initially developed purification technique is dedicatedto single stranded DNA containing vector specific sequences,exemplified by sequencing products. Specific capture probescoupled to paramagnetic beads together with stabilizing modularprobes hybridize to the single stranded target. After washing,the purified DNA can be released using water. When sequencingproducts are purified they can be directly loaded onto acapillary sequencer after elution. Since this approach isspecific it can be applied to multiplex sequencing products.Different probe sets are used for each sequencing product andthe purifications are performed iteratively. The second purification approach, which can be applied to anumber of different targets, involves biotinylated PCR productsor sequencing products that are captured using streptavidinbeads. This has been described previously, buthere theinteraction between streptavidin and biotin can be disruptedwithout denaturing the streptavidin, enabling the re-use of thebeads. The relatively mild elution conditions also enable therelease of sensitive biotinylated molecules. Another project described in this thesis is the comparativesequencing of the 40 kbcagpathogenicity island (PAI) in fourHelicobacter pyloristrains. The results included thediscovery of a novel gene, present in approximately half of theSwedish strains tested. In addition, one of the strainscontained a major rearrangement dividing thecagPAI into two parts. Further, information about thevariability of different genes could be obtained. Keywords:DNA sequencing, DNA purification, automation,solid-phase, streptavidin, biotin, modular probes,Helicobacter pylori,cagPAI. / <p>NR 20140805</p>
|
37 |
Biogeographic Relationships of Pocket Gophers (Geomys breviceps and Geomys bursarius) in the Southeastern Portion of Their RangesElrod, Douglas Allen 08 1900 (has links)
This research utilized population genetic analyses (protein starch-gel electrophoresis and DNA sequencing of the cytochrome b mtDNA gene), host-parasite specificity (lice coevolution), remote sensing of satellite data, and geographic information systems (GIS) to characterize newly discovered populations of pocket gophers (genus: Geomys) in Arkansas. These populations are isolated and occur in seemingly unsuitable habitat in the Ozark Mountains of Arkansas. Analyses of electrophoretic and ectoparasite data suggested the populations in the Ozark Mountains represented isolates allied to Geomys bursarius, a species not known to occur in Arkansas. Comparison of mitochondrial DNA sequence data of the cytochrome b gene with that of other taxa and morphometric analyses confirmed that these populations are most closely allied to G. bursarius occurring to the north in Missouri. Moreover, these mtDNA sequence analyses indicated a degree of differentiation typical of that between other subspecies of pocket gophers. Therefore, these populations represent a distinct genetic entity in an intermediate stage of speciation and should be designated as a new subspecies, Geomys bursarius ozarkensis. Molecular clock analysis revealed a time of lineage divergence for this new subspecies as approximately 511,000 YBP. Due to the isolated nature and limited distribution of this subspecies, an evaluation of critical habitat needs was initiated. Remote sensing and GIS technologies were used to identify and describe suitable habitat Computerized classification of satellite imagery of suitable vegetation, integrated with ancillary digital information on soil associations, roads, and water systems, revealed that human activity had played a positive role in the establishment and dispersal of pocket gophers in this area. This research represents an initial combination of classical systematic tools with remote sensing and GIS to investigate biogeographic patterns and evolution. This project establishes a framework for using an interdisciplinary approach to studying organisms with limited distributions, determining evolutionary status, and providing recommendations for conservation.
|
38 |
Análise das alterações genéticas em exomas de camundongos / Analysis of genetic alterations in mice exomes.Souza, Tiago Antonio de 27 March 2018 (has links)
Camundongos são modelos valiosos para o entendimento dos processos e mecanismos moleculares e fisiológicos em mamíferos. A maioria do nosso conhecimento sobre esses processos e mecanismos vem de experimentos realizados com camundongos de linhagens isogênicas. Essas linhagens, criadas normalmente por sucessivos cruzamentos irmão-irmã, surgiram no início do século XX visando reduzir a interferência da variabilidade genética, aumentando a reprodutibilidade dos experimentos. Caracterizar o background genético das linhagens isogênicas permite não só traçar possíveis relações de parentesco entre linhagens, mas também permite o controle genético oriundo de possíveis contaminações e mutações espontâneas que possam surgir na população. Além das linhagens isogênicas, os camundongos mutantes também são importantes como modelos para o estudo de doenças humanas. O uso desses modelos murinos permite a elucidação e associação de fatores genéticos a manifestações fenotípicas diversas, como síndromes hereditárias e predisposições a doenças. Esses mutantes podem ser gerados por uma abordagem de varredura de mutagênese pelo agente mutagênico ENU, que inclui a caracterização de fenótipos interessantes e a busca pelas mutações causativas induzidas. O presente trabalho teve como objetivo utilizar o sequenciamento completo de exomas para caracterizar o background genético das linhagens isogênicas C57BL/6ICBI e BALB/cICBI, mantidas há quase 20 anos no Brasil e distribuídas pelo ICB-USP a pesquisadores de todo país. O trabalho também usou o sequenciamento de nova geração (NGS) para a busca das mutações causadores de fenótipo em um grupo de sete mutantes induzidos por ENU oriundos de uma varredura prévia. Através da aplicação de uma estratégia de análise de dados e filtragem de mutações foi possível encontrar mutações candidatas com alto potencial de impacto para todos os mutantes avaliados, validadas por sequenciamento Sanger. Os genes afetados pelas mutações encontradas indicam que os mutantes possam se tornar interessantes modelos para o estudo de doenças neuromusculares e neurológicas. A avaliação do exoma das linhagens C57BL/6ICBI e BALB/cICBI descartou a possibilidade de contaminação das colônias com outras linhagens, e revelou similaridades relacionadas com o parentesco das sublinhagens brasileiras em relação a linhagens gold-standard. As informações obtidas serão uma fonte importante de informação no planejamento e análise dos resultados obtidos com o uso tanto dos mutantes quanto com as linhagens fornecidas pelo Biotério do Departamento de Imunologia ao ICB a instituições de todo o Brasil. / Mice are valuable models for the comprehension of molecular processes and underlying physiological mechanisms in mammals. Most of the knowledge about those processes came from experiments with isogenic mice. Those strains, arose in the 1900s by successive inbreeding, are very important as they reduce genetic variability across the experiments increasing reproducibility. Isogenic lineages are kept as isolated colonies in animal facilities and supplied to researchers, as they needed. Thus, is possible to trace relationships among strains all over the world using the characterization of their genetic backgrounds. It is also possible to detect putative contaminations and spontaneous mutations which can arise in the populations. Mutant mice are also important tools as human disease models, allowing associations between genetic factors and phenotypes. Those mutants could be generated in forward genetics approaches by screenings using mutagens as ENU. The aims of this work were to characterize the genetic background of two mouse strains used at ICB-USP C57BL/6ICBI and BALB/cICBI and to find causative mutations of seven mutants generated by a previous ENU-mutagenesis screening. We used whole-exome sequencing followed by resequencing data-analysis approaches to detect SNVs for both isogenic strains and mutants. Exome evaluation of isogenic strains C57BL/6ICBI and BALB/cICBI did not reveal any evidence for cross-contamination and provided insightful details related to other strains and substrains. A specific filtering strategy was applied to select candidates for phenotypecausative mutations in the seven ENU-induced mutants. We are able to select candidates for all mutants at a high global Sanger validation rate when considering only the main candidates for each mutant. Considering affected genes and phenotypes all mutants have potential to become interesting mouse models for human diseases. Taken together, our results are a reliable and confident source of genetic information for experimental analysis for researchers who use isogenic strains provided by animal facility at ICB-USP and research groups interested in further characterization of mutant study neuromuscular, neuronal or development processes using mice as animal models.
|
39 |
Nouvelle technique de détection simultanée des variant ponctuels et des copy number variants dans l’obésité monogénique / New method for the simultaneous detection of punctual variations and copy number variants in monogenic obesityDerhourhi, Mehdi 19 December 2018 (has links)
La génétique, et par extension le séquençage de l’ADN, sont des outils qui ont transformé la compréhension des mécanismes impliqués dans la survenue de nombreuses pathologies, dont l’obésité. Les technologies aujourd’hui à notre disposition nous permettent de déterminer rapidement si un patient est ou non porteur d’un évènement génétique pouvant expliquer sa pathologie. L’une des techniques les plus utilisées en diagnostic aujourd’hui est le séquençage d’exome, ou WES, qui permet une excellente détection des mutations ponctuelles dans les régions codantes du génome. Mais d’autres évènements comme les copy number variants, ou CNV, peuvent également expliquer certaines pathologies, dont l’obésité, via entre autres les CNV de la région 16p11.2. Actuellement, la technique de référence pour la détection de ces copy number variants est l’analyse de puces CGH (Comparative Genomic Hybridization), mais celles-ci ne permettent pas de détecter des mutations non répertoriées au préalable lors de la création de la puce. Sur le principe, le séquençage d’exome peut lui aussi être utilisé pour détecter les CNV, mais son absence de couverture des régions non codantes du génome ne permet pas une détection efficace de ces CNV, car ceux-ci peuvent survenir sur l’ensemble du génome, en englobant des régions codantes et non codantes au sein d’un seul évènement. Le séquençage génome complet peut détecter ces deux types d’évènement, mais son cout est encore élevé ce qui freine sa démocratisation, et l’analyse de données associées nécessite d’importantes ressources informatiques, et le rend difficilement utilisable en diagnostic de routine en l’état actuel des choses. Il est donc pour l’instant nécessaire d’avoir recours à deux techniques différentes pour couvrir ces deux types d’évènements génétiques. Cela implique d’utiliser des échantillons parfois très précieux à deux reprises, de supporter les couts liés à deux techniques diagnostiques (d’environ 450 euros pour le séquençage d’exome au laboratoire et un cout un peu plus élevé pour une puce à ADN dans un laboratoire clinique), et d’allonger les temps de rendu de résultats et donc la durée d’établissement du diagnostic du patient. Cet état de fait nous a conduit à développer une technique de séquençage, que nous avons nommé CoDE-seq (Copy number variation Detection and Exome sequencing), et qui permettra la détection simultanée de ces deux types d’évènements, pour diminuer les temps d’établissement de diagnostics, leurs couts, et la quantité d’échantillon nécessaire. Ce travail a nécessité deux aspects : la mise au point technique et la mise au point analytique. La mise au point technique est passée par la création d’une nouvelle « capture », permettant une détection correcte des mutations ponctuelles de l’exome et des CNV de tout le génome. La mise au point analytique a consisté à définir la méthode à employer, et à permettre d’arriver à une détection fiable, à la fois sensible et spécifique, des CNV sur l’ensemble du génome. Une fois ces CNV identifiés, la question de leur signification fonctionnelle se pose également, et une seconde partie de ma thèse porte sur l’étude de cette signification fonctionnelle, via l’étude de la conformation spaciale de la chromatine et de l’influence des CNV sur celle-ci. / Genetics, and by extention DNA sequencing, are tools that have modified the understanding of the mechanisms involved in genetic diseases, like obesity. Today’s technology has allowed us to rapidly find if a patient carries a genetic event that may explain his/her pathology. One of the most used technology for diagnostic is exome sequencing, or WES, which enables an excellent detection of point mutations in coding regions of the genome. However other events, such as copy number variations, or CNV, can also explain some pathologies, like a severe form of obesity due to CNV in the chr16p11.2 region. Actually, the gold standard method for an accurate detection of CNV is array CGH, but this technology cannot detect new point mutations. Exome sequencing can be used to detect CNV, but the lack of coverage in non-coding regions limits CNV detection sensitivity. Of note, whole genome sequencing can detect both CNVs and point mutations, but it is still very expensive and needs huge informatics capacities, which is an obvious limitation for a routine diagnostic use.For now, we have had to use two different methods in order to accurately detect both CNVs and point mutations. In other words, we have had to use precious samples two times, to assume the cost of two different methods (which is nearly 450 euros in the laboratory for exome sequencing, and a bit more for array CGH in a clinical laboratory), and to consider the time of the realization of two different methods in order to achieve a complete diagnostic.In this context, we aimed to develop an innovative sequencing method, named CoDE-seq (Copy number variation Detection and Exome sequencing), which would allow us to simultaneously detect both CNVs and point mutations, in order to reduce the time of diagnostic, the cost, and the needed quantity of sample.This work included the method conception, and the data analysis steps. The method conception has been done through the creation of a new capture enabling the detection of point mutations in the exome, and CNVs all along the genome. Furthermore, the data analysis step included the choice of the bioinformatics methods to be used, in order to get a specific and sensitive CNV detection, all along the genome.We were also interested in the fonctional significance of identified CNV, and tried to decipher it by the study of chromatine spacial conformation and the influence of these CNV.
|
40 |
Square: uma plataforma gráfica e intuitiva para anotação de genomas bacterianos / Square: a graphical and intuitive platform for annotation of bacterial genomesEslabão, Marcus Redü 29 February 2016 (has links)
Submitted by Maria Beatriz Vieira (mbeatriz.vieira@gmail.com) on 2017-10-18T11:53:11Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-10-23T11:08:52Z (GMT) No. of bitstreams: 2
tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-10-23T11:09:03Z (GMT) No. of bitstreams: 2
tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-10-23T11:09:12Z (GMT). No. of bitstreams: 2
tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2016-02-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / O sequenciamento de DNA é uma técnica que fornece uma fonte vasta de informações sobre diversos organismos. Atualmente, novas metodologias de sequenciamento conhecidas como Next-Generation Sequencing, estão fazendo com que esta técnica fique inúmeras vezes mais rápida, precisa e economicamente acessível, tornando-se popular e disseminada no meio científico. Com a popularização do sequenciamento de genomas, laboratórios que não possuem ênfase em sequenciamento de DNA, utilizam desta abordagem para complementar seus estudos. Porém, a facilidade em obter a sequência do DNA contrasta com a dificuldade em processar, analisar e anotar o genoma, para que então seja possível obter informações biológicas relevantes sobre aquele organismo. Para auxiliar os pesquisadores que se utilizam desta técnica, alguns softwares estão disponíveis, porém, geralmente são pagos, não realizam toda a tarefa ou são de difícil utilização, neste último caso, por serem em sua grande maioria executados através de terminais de comando, que não contam com um ambiente gráfico para guiar os usuários. Com base nesta problemática, o presente trabalho teve por objetivo criar um software de anotação de genomas de fácil utilização e com interface gráfica amigável, gratuito e que anote com as informações necessárias para submissão ao GenBank. Para implementação do software, denominado Square, as linguagens de programação Python e Object Pascal foram utilizadas. Os algoritmos Prodigal, NCBI BLAST e tRNAscan-SE também foram integrados no software. Ao final da etapa de desenvolvimento, o Square foi testado com três genomas e comparado com dois anotadores populares: o RAST e o BASys. O resultado mostrou que o Square possui maior precisão que os dois outros anotadores, por se aproximar mais do resultado depositado no NCBI, e mais rápido, por ser executado localmente com rapidez. O Square demonstrou-se uma boa alternativa para usuários que não estão acostumados com o terminal de comando Linux e está disponível no endereço http://sourceforge.net/projects/sqgenome/. / DNA sequencing is a technique that provides a vast source of information on various organisms. Currently, new sequencing methods known as Next-Generation Sequencing, are making this technique many times more rapid, accurate and affordable, making it popular and widespread in the scientific community. With the popularization of genome sequencing, laboratories that do not have an emphasis on DNA sequencing, are using this approach to complement their studies. However, the ease in obtaining a DNA sequence contrasts with the difficulty to process, analyze and annotate the genome, in order to obtain relevant biological information. To assist researchers who use this technique, several programs are available, however, they are generally not free, do not perform all the necessary analysis or are difficult to use, mainly because a considerable number of them make use of command line to be executed, which is not intuitive. The objective of this study was to create a genome annotation software easy to use, with a user friendly interface, free and able to provide all the necessary information for the annotated genome to be submitted to GenBank. For software implementation named Square, Python and Object Pascal programming languages were used. The Prodigal algorithms, NCBI BLAST and tRNAscan-SE were also integrated in the software. At the end of the development stage, Square was tested with three genomes and compared to two popular annotators: RAST and BASYS. The result showed that the Square has higher accuracy than the other two annotator programs, as the results are similar to what is deposited in NCBI, and produce the result in a shorter time, as it runs locally. The Square proved to be a good alternative for users not familiar with the Linux command terminal and is available in http://sourceforge.net/projects/sqgenome/ address.
|
Page generated in 0.0601 seconds