Return to search

Genome-wide Genotype Imputation-Aspects of Quality, Performance and Practical Implementation

Finding a relation between a particular phenotype and genotype is one of the central themes in medical genetics. Single-nucleotide polymorphisms are easily assessable markers allowing genome wide association (GWA) studies and meta-analysis. Hundreds of such analyses were performed in the last decades. Even though several tools for such analyses are available, an efficient SNP-data transformation tool was tool was necessary. We developed a data management tool fcGENE which allows us easy transformation of genetic data into different formats required by different GWA tools.
Genotype imputation which is a common technique in GWA, allows us to study the relationship of a phenotype at markers that are missing and even at completely un-typed markers. Moreover this technique helps us to infer both common and rare variants that are not directly typed. We studied different aspects of the imputation processes especially focussing on its accuracy. More specifically, our focus lied on the impact of pre-imputation filtering on the accuracy of imputation results. To measure the imputation accuracy, we defined two new statistical sores, which allowed us the comparison between imputed and true genotypes directly. Our direct comparison between the true and imputed genotypes showed that strict quality filtering of SNPs prior to imputation process may be detrimental.
We further studied the impact of differently selected reference panels from publicly available projects like HapMap and 1000 genome projects on the imputation quality. More specifically, we analysed the relationship between genetic distance of the reference and the resulting imputation quality. For this purpose, we considered different summary statistics of population differentiation (e.g. Reich’s , Nei’s and other modified scores) between the study data set and the reference panel used in imputation processes.
In the third analysis, we compared two basic trends of using reference panels in imputation process: (1) use of genetically best-matched reference panel, and (2) use of an admixed reference panel that allows the use of individual reference panel from all possible type of populations, and let the software itself select the optimal references in a piece-wise manner or as complete sequences of SNPs for each individual separately. We have analysed in detail the performance of different imputation software and also the accuracy of the imputation processes in both cases. We found that the current trend of using software with admixed reference panel in all cases is not always the best strategy. Prior to imputation process, phasing of study data sets by using an external reference panel is also a common trend especially when it comes to the imputation of large datasets. We studied the performance of different imputation frameworks with or without pre-phasing. It turned out that pre-phasing clearly reduces the imputation quality for medium-sized data sets.:Table of Contents
List of Tables IV
List of Figures V
1 Overview of the Thesis 1
1.1 Abstract 1
1.2 Outlines 4
2 Introduction 5
2.1 Basics of genetics 5
2.1.1 Phenotype, genotype and haplotype 5
2.1.2 Hardy-Weinberg law 6
2.1.3 Linkage disequilibrium 6
2.1.4 Genome-wide association analysis 7
2.2 Phasing of Genotypes 7
2.3 Genotype imputation 8
2.3.1 Tools for Imputing genotype data 9
2.3.2 Reference panels 9
3 Results 11
3.1 Detailed Abstracts 11
3.1.1 First Research Paper 11
3.1.2 Second Research Paper 14
3.1.3 Third Research Paper 17
3.1.4 Fourth Research Paper 19
3.2 Discussion and Conclusion 22
4 Published Articles 27
4.1 First Research Paper 27
4.1.1 Supplementary Information 34
4.2 Second Research Paper 51
4.2.1 Supplementary Information 62
4.3 Third Research Paper 69
4.3.1 Supplementary Information 85
4.4 Fourth Research Paper 97
4.4.1 Supplementary Information 109
5 Zusammenfassung der Arbeit 117
6 Bibliography 120
7 Eigene Publikationen 124
8 Darstellung des eigenen Beitrags 125
8.1 First Research Paper 125
8.2 Second Research Paper 126
8.3 Third Research Paper 127
8.4 Fourth Research Paper 128
9 Erklärung über die eigenständige Abfassung der Arbeit 129
10 Danksagung 130
11 Curriculum Vitae 131
List of Tables IV
List of Figures V
1 Overview of the Thesis 1
1.1 Abstract 1
1.2 Outlines 4
2 Introduction 5
2.1 Basics of genetics 5
2.1.1 Phenotype, genotype and haplotype 5
2.1.2 Hardy-Weinberg law 6
2.1.3 Linkage disequilibrium 6
2.1.4 Genome-wide association analysis 7
2.2 Phasing of Genotypes 7
2.3 Genotype imputation 8
2.3.1 Tools for Imputing genotype data 8
2.3.2 Reference panels 8
3 Results 8
3.1 Detailed Abstracts 8
3.1.1 First Research Paper 8
3.1.2 Second Research Paper 8
3.1.3 Third Research Paper 8
3.1.4 Fourth Research Paper 8
3.2 Discussion and Conclusion 8
4 Published Articles 8
4.1 First Research Paper 8
4.1.1 Supplementary Information 8
4.2 Second Research Paper 8
4.2.1 Supplementary Information 8
4.3 Third Research Paper 8
4.3.1 Supplementary Information 8
4.4 Fourth Research Paper 8
4.4.1 Supplementary Information 8
5 Zusammenfassung der Arbeit 8
6 Bibliography 8
7 Eigene Publikationen 8
8 Erklärung über die eigenständige Abfassung der Arbeit 8
9 Danksagung 8
10 Curriculum Vitae 8

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:71661
Date06 August 2020
CreatorsRoshyara, Nab Raj
ContributorsUniversität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/updatedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds