Return to search

Analysis and standardization of marker genotype data for DNA fingerprinting applications

Genetic polymorphisms can be seen as the occurrence of more than one form of a DNA- or protein sequence at a single locus in a group of organisms, where these different forms occur more frequently than can be attributed to mutation alone. The combination of genetic polymorphisms present in the genome of a particular individual is referred to as its genotype. A wide range of genotyping techniques have been developed to detect and visualize genetic polymorphisms. One such technique examines highly polymorphic repetitive DNA regions called microsatellites, also called “short tandem repeats” (STRs) and sometimes “simple sequence repeats” (SSRs) or “simple-sequence length polymorphisms” (SSLPs). A microsatellite region consists of a DNA sequence of identical units of usually 2-6 base pairs strung together to produce highly variable numbers of tandem repeats among individuals of a population. Microsatellite genotyping is a popular choice for many types of studies including individual identification, paternity testing, germplasm evaluation, genome mapping and diversity studies and can be used in many commercial, academic, social, and agricultural applications. There are, however, many obstacles in effectively managing and analysing microsatellite genotype data. Currently, researchers are struggling to effectively manage and analyse rapidly growing volumes of genotyping data. Management problems range from simply the lack of a secure, easily accessible central data repository to more complex issues like the merging and standardization of data from multiple sources into combined datasets. Due to these issues, genetic fingerprinting applications such as identity matching and relatedness studies can be challenging when data from different experiments or laboratories have to be combined into a central database. The main aim of this M.Sc study in Bioinformatics was to develop a bioinformatics resource for the management and analysis of genetic fingerprinting data from microsatellite marker genotyping studies, and to apply the software to the analysis of microsatellite marker data from ramets of Pinus patula clones with the purpose of analysing clonal identity in pine breeding programmes. The software resource developed here is called GenoSonic. It is a web application that provides users with a secure, easily accessible space where genotyping project data can be managed and analysed as a team. Users can upload and download large amounts of marker genotype data. Once uploaded to the system, DNA fingerprint data needs to be standardised before it can be used in further analyses. To do this, a two-step approach was implemented in GenoSonic. The first step is to assign standardized allele sizes to all of the input allele sizes of the microsatellite fingerprints automatically using a novel automated binning algorithm called CSMerge-1, which was designed specifically to bin data from multiple experiments. The second step is to manually verify the results from the automated binning function and add the verified data to a standardized dataset. Once the genetic fingerprints have been standardized, allele- and genotype frequencies can be viewed for any given marker. GenoSonic also provides functionalities for identity matching. One or more DNA fingerprints from unknown samples can be matched against a standardized dataset to establish identities or infer relatedness. Finally, GenoSonic implements a genetic distance tree construction function, which can be used to visualize relatedness among samples in a selected dataset. The bioinformatics resource developed in this study was applied to a microsatellite DNA fingerprinting project aimed at the re-establishment or confirmation of clonal identity of Pinus patula ramets from pine clonal seed orchards developed by a South African forestry company at one of their new agricultural estates in South Africa. The results from GenoSonic‟s automated binning function (CSMerge-1) and the results from the identity matching and tree construction exercise were compared to results obtained by human experts who have analysed the data manually. It was demonstrated that the results from GenoSonic equalled or surpassed the manual results in terms of accuracy and consistency, and far surpasses the manual effort in terms of the speed at which analyses could be completed. GenoSonic was developed with specific focus on reusability, and the ability to be modified or extended to solve future genotyping-related problems. This study not only provides a solution to current genotype data management and analysis needs of researchers, but is aimed at serving as a basic framework, or component library for future software development projects that may be required to address specific needs of researchers dealing with high-throughput genotyping data. / Dissertation (MSc)--University of Pretoria, 2011. / Biochemistry / unrestricted

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/28908
Date21 October 2011
CreatorsSchriek, Cornelis Arnold
ContributorsMyburg, Alexander Andrew, corne.schriek@gmail.com, Joubert, Fourie
Source SetsSouth African National ETD Portal
Detected LanguageEnglish
TypeDissertation
Rights© 2010, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.

Page generated in 0.0025 seconds