Return to search

Developing a reproducible bioinformatics workflow for canine inherited retinal disease

Inherited Retinal Degenerations (IRDs) are a heterogenous group of diseases which lead to vision impairment and can be found both in humans and in dogs. About 1 in 1,380 humans is estimated to suffer from an autosomal recessive IRD, which would be 5.5 million people worldwide, and many more are estimated to be unaffected carriers. This makes autosomal recessive IRDs likely the most common group of Mendelian diseases in humans. Today, about 300 genetic mutations have been connected to cause retinal diseases in humans. Whilst in dogs only 32 genes have been identified, numerous eye conditions have been described where the genetic cause has not yet been identified. This suggests that there are much more genetic causes to discover in the dog genome. Additionally, the dog serves well as a model organism to investigate IRDs as it is sharing morphological and genetic similarities with humans. For these reasons, proper software, a canine reference genome of high quality, and smart implementation of bioinformatic tools and methods are a big advantage to increase chances of finding new causative genetic variants and subsequently enable faster detection of possible preventions of the disease or at least alleviating its symptoms via early diagnosis. In this project, a pre-existing pipeline consisting of Bash scripts was stepwise improved with the goal to increase its efficiency. After controlling whether previous data could still be reproduced with the old pipeline in a first step, the software was exchanged to more updated versions in a second step. A main change was the replacement of the mapping tool Burrows-Wheeler Aligner (BWA) from bwa mem to bwa-mem2 mem, and the update of deprecated Genome Analysis Toolkit (GATK) 3.7 to version 4.3 or 4.4. Thirdly, the scripts were adapted from using the older canine reference genome CanFam3.1 to CanFam4. In a fourth step, for automatization and fastening the running time, the pipeline steps were implemented into the workflow management system Nextflow. Additionally, this step was partly aiming to make the pipeline in concordance with the FAIR-principles. All steps were tested on the same test data set, a Labrador retriever family trio, in which one genetic cause for a canine form of the IRD Stargardt disease in a previous study had been detected, namely an insertion in the ABCA4 gene. Lastly, the workflow was also tested on a second data set of a novel IRD of unknown genetic origin on two sibling pairs of Chinese Crested Dogs (CCR). The adjustment of the pipeline shows similar results regarding the change of mapping tool. Introducing the new reference genome revealed a drop of average coverage by one read average for when using CanFam4, while other results were similar. Using the new reference genome increased the number of unknown variants compared to findings with CanFam3.1. However, the known causative variant for the canine form of Stargardt disease, an insertion in ABCA4 gene, could be found in all cases. The run with Nextflow produced identical results to when the respective steps were run with Bash scripts, but it reduced the running time. Running the workflow on the new data set (CCR) and subsequent annotation and filtering indicate new candidates which could be further investigated as a potential cause for this currently unknown cause for an IRD.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-506760
Date January 2023
CreatorsMartin, Melina Toni Marie
PublisherUppsala universitet, Institutionen för biologisk grundutbildning
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds