Return to search

Doplnění (imputace) chybějících genetických markerů SNP / Imputation of missing genetic markers SNP

Working with genomic information in cattle breeding has become a standard procedure. This study is focused on completion of missing genetic markers - SNPs (single nucleotide polymorphisms) - on genetic chips. More specifically completion of missing values in datasets which contain pieces of information about SNP occurence in cattle genome. These polymorfisms are used for evaluation of genomic relationship, prediction of genomic breeding values and for the valuation of tested animals. The most common chips used for genotyping are Illumina and Affymetrix. Each company develops its own techniques of genotype obtaining. Affymetrix has unified coding type of SNPs among chips of different generations and thus even older data can be used. Illumina uses many coding types between different generations of chips. Thus, direct comparison of SNPs is not possible. Illumina has chips of different density and financial costingness. Illumina chips have become a standard all over the world and it is used by all breeding companies. The most used software programs for imputations are Beagle, AlphaImpute, Impute 2, FindHap, DAGPHASE, FImputePedImpute and MaCH. Each software requires a relationship between genotyped individuals. In common breeding business the genotyping is not in train of generations. That is why our own methodological process was used. The aim of this study is to map the current research about the completion of missing genetic markers on genetic chips and to verify the calculation process. In total, it was created 8 models with different amount of tested SNPs. From 10 to 100 neighbouring loci was tested. The testing was processed at chosen loci in two datasets. Dataset A contained 260 bull genotypes of different breeds from the Czech Republic. Dataset B contained 3982 genotypes of pure Holstein bulls from nine countries. In the first case a very good results were obtained. The prediction of missing values was almost accurate with model reliability 100%. The only exception was for almost entirely homozygous loci where the reliability reached only 55%. When the second dataset was tested, the most extensive model reached the reliability of 80 90% even in case of homozygous loci. The prediction error value was higher than in the first case. It was proven that missing values prediction is possible to calculate using the neighbouring SNPs. The outputs of this study are to be the base for further study of genomic data.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:259552
Date January 2016
CreatorsKranjčevičová, Anita
ContributorsPřibyl, Josef, Jindřich, Jindřich
PublisherČeská zemědělská univerzita v Praze
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0168 seconds