Global ETD Search

1	Analysis and standardization of marker genotype data for DNA fingerprinting applications Schriek, Cornelis Arnold 21 October 2011 (has links) Genetic polymorphisms can be seen as the occurrence of more than one form of a DNA- or protein sequence at a single locus in a group of organisms, where these different forms occur more frequently than can be attributed to mutation alone. The combination of genetic polymorphisms present in the genome of a particular individual is referred to as its genotype. A wide range of genotyping techniques have been developed to detect and visualize genetic polymorphisms. One such technique examines highly polymorphic repetitive DNA regions called microsatellites, also called “short tandem repeats” (STRs) and sometimes “simple sequence repeats” (SSRs) or “simple-sequence length polymorphisms” (SSLPs). A microsatellite region consists of a DNA sequence of identical units of usually 2-6 base pairs strung together to produce highly variable numbers of tandem repeats among individuals of a population. Microsatellite genotyping is a popular choice for many types of studies including individual identification, paternity testing, germplasm evaluation, genome mapping and diversity studies and can be used in many commercial, academic, social, and agricultural applications. There are, however, many obstacles in effectively managing and analysing microsatellite genotype data. Currently, researchers are struggling to effectively manage and analyse rapidly growing volumes of genotyping data. Management problems range from simply the lack of a secure, easily accessible central data repository to more complex issues like the merging and standardization of data from multiple sources into combined datasets. Due to these issues, genetic fingerprinting applications such as identity matching and relatedness studies can be challenging when data from different experiments or laboratories have to be combined into a central database. The main aim of this M.Sc study in Bioinformatics was to develop a bioinformatics resource for the management and analysis of genetic fingerprinting data from microsatellite marker genotyping studies, and to apply the software to the analysis of microsatellite marker data from ramets of Pinus patula clones with the purpose of analysing clonal identity in pine breeding programmes. The software resource developed here is called GenoSonic. It is a web application that provides users with a secure, easily accessible space where genotyping project data can be managed and analysed as a team. Users can upload and download large amounts of marker genotype data. Once uploaded to the system, DNA fingerprint data needs to be standardised before it can be used in further analyses. To do this, a two-step approach was implemented in GenoSonic. The first step is to assign standardized allele sizes to all of the input allele sizes of the microsatellite fingerprints automatically using a novel automated binning algorithm called CSMerge-1, which was designed specifically to bin data from multiple experiments. The second step is to manually verify the results from the automated binning function and add the verified data to a standardized dataset. Once the genetic fingerprints have been standardized, allele- and genotype frequencies can be viewed for any given marker. GenoSonic also provides functionalities for identity matching. One or more DNA fingerprints from unknown samples can be matched against a standardized dataset to establish identities or infer relatedness. Finally, GenoSonic implements a genetic distance tree construction function, which can be used to visualize relatedness among samples in a selected dataset. The bioinformatics resource developed in this study was applied to a microsatellite DNA fingerprinting project aimed at the re-establishment or confirmation of clonal identity of Pinus patula ramets from pine clonal seed orchards developed by a South African forestry company at one of their new agricultural estates in South Africa. The results from GenoSonic‟s automated binning function (CSMerge-1) and the results from the identity matching and tree construction exercise were compared to results obtained by human experts who have analysed the data manually. It was demonstrated that the results from GenoSonic equalled or surpassed the manual results in terms of accuracy and consistency, and far surpasses the manual effort in terms of the speed at which analyses could be completed. GenoSonic was developed with specific focus on reusability, and the ability to be modified or extended to solve future genotyping-related problems. This study not only provides a solution to current genotype data management and analysis needs of researchers, but is aimed at serving as a basic framework, or component library for future software development projects that may be required to address specific needs of researchers dealing with high-throughput genotyping data. / Dissertation (MSc)--University of Pretoria, 2011. / Biochemistry / unrestricted Genetic polymorphisms Deoxyribonucleic acid (DNA) Fingerprinting applications Genotype data UCTD
2	Integrating phenotype-genotype data for prioritization of candidate symptom genes Xing, L., Zhou, X., Peng, Yonghong, Zhang, R., Hu, J., Yu, J., Liu, B. January 2013 (has links) No / Symptoms and signs (symptoms in brief) are the essential clinical manifestations for traditional Chinese medicine (TCM) diagnosis and treatments. To gain insights into the molecular mechanism of symptoms, this paper presents a network-based data mining method to integrate multiple phenotype-genotype data sources and predict the prioritizing gene rank list of symptoms. The result of this pilot study suggested some insights on the molecular mechanism of symptoms.
3	Iterative full-genome phasing and imputation using neural networks Rydin, Lotta January 2022 (has links) In this project, a model based on a convolutional neural network have been developed with the aim of imputing missing genotype data. This model was based on an already existing autoencoder that was modified into a U-Net structure. The network was trained and used iteratively with the intention that the result would improve in each iteration. In order to do this, the output of the model was used as the input in the next iteration. The data used in this project was diploid genotype data, which was phased into haploids and then run separately through the network. In each iteration, the new haploids were generated based on the output haploids. These were used as in input in the next iteration. The result showed that the accuracy of the imputation improved slightly in every iteration. However, it did not surpass the same model that was trained for one single iteration. Further work is needed to make the model more useful. Machine learning Genotype data U-Net Convolutional neural networks Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
4	Associating genotype sequence properties to haplotype inference errors ROSA, Rogério dos Santos 12 March 2015 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-03-16T15:28:47Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) RogerioSantosRosa_Tese.pdf: 1740026 bytes, checksum: aa346f64c34419c4b83269ccb99ade6a (MD5) / Made available in DSpace on 2016-03-16T15:28:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) RogerioSantosRosa_Tese.pdf: 1740026 bytes, checksum: aa346f64c34419c4b83269ccb99ade6a (MD5) Previous issue date: 2015-03-12 / Haplotype information has a central role in the understanding and diagnosis of certain illnesses, and also for evolution studies. Since that type of information is hard to obtain directly, computational methods to infer haplotype from genotype data have received great attention from the computational biology community. Unfortunately, haplotype inference is a very hard computational biology problem and the existing methods can only partially identify correct solutions. I present neural network models that use different properties of the data to predict when a method is more prone to make errors. I construct models for three different Haplotype Inference approaches and I show that our models are accurate and statistically relevant. The results of our experiments offer valuable insights on the performance of those methods, opening opportunity for a combination of strategies or improvement of individual approaches. I formally demonstrate that Linkage Disequilibrium (LD) and heterozygosity are very strong indicators of Switch Error tendency for four methods studied, and I delineate scenarios based on LD measures, that reveal a higher or smaller propension of the HI methods to present inference errors, so the correlation between LD and the occurrence of errors varies among regions along the genotypes. I present evidence that considering windows of length 10, immediately to the left of a SNP (upstream region), and eliminating the non-informative SNPs through Fisher’s Test leads to a more suitable correlation between LD and Inference Errors. I apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the accuracy of the haplotype inference results, developing models for two databases (considering only Humans) and using two error metrics. The accuracy of our results and the stability of our proposed models are supported by statistical evidence. / Haplótipos têm um papel central na compreensão e diagnóstico de determinadas doenças e também para estudos de evolução. Este tipo de informação é difícil de obter diretamente, diante disto, métodos computacionais para inferir haplótipos a partir de dados genotípicos têm recebido grande atenção da comunidade de biologia computacional. Infelizmente, a Inferência de Halótipos é um problema difícil e os métodos existentes só podem predizer parcialmente soluções corretas. Foram desenvolvidos modelos de redes neurais que utilizam diferentes propriedades dos dados para prever quando um método é mais propenso a cometer erros. Foram calibrados modelos para três abordagens de Inferência de Haplótipos diferentes e os resultados validados estatisticamente. Os resultados dos experimentos oferecem informações valiosas sobre o desempenho e comportamento desses métodos, gerando condições para o desenvolvimento de estratégias de combinação de diferentes soluções ou melhoria das abordagens individuais. Foi demonstrado que Desequilíbrio de Ligação (LD) e heterozigosidade são fortes indicadores de tendência de erro, desta forma foram delineados cenários com base em medidas de LD, que revelam quando um método tem maior ou menor propensão de cometer erros. Foi identificado que utilizando janelas de 10 SNPs (polimorfismo de um único nucleotídeo), imediatamente a montante, e eliminando os SNPs não informativos pelo Teste de Fisher leva-se a uma correlação mais adequada entre LD e a ocorrência de erros. Por fim, foi aplicada análise de Regressão Linear para explorar a relevância de várias propriedades biologicamente significativas das sequências de genótipos para a precisão dos resultados de Inferência de Haplótipos, estimou-se modelos para duas bases de dados (considerando apenas humanos) utilizando duas métricas de erro. A precisão dos resultados e a estabilidade dos modelos propostos foram validadas por testes estatísticos. Regressão Linear Análises Estatística SNPs Haplótipos Dados Genotípicos Inferência de Haplótipos Linear Regression Statistical Analysis SNPs Haplotypes Genotype Data Haplotype Inference

1

Page generated in 0.0515 seconds