Single nucleotide polymorphisms (SNPs) have been involved in describing the risk a
person is at for developing diseases. Simulating a collection of d correlated autosomal
biallelic SNPs is useful to acquire empirical results for statistical tests in settings such
as having a low sample size. A collection of d correlated autosomal biallelic SNPs can
be modeled as a random vector X = (X1,...,Xd) where Xi ∼ binomial(2, pi) and
pi is the minor allele frequency for the ith SNP. The pairwise correlations between
components of X can be specified by a d ×d symmetric positive definite correlation
matrix having all diagonal entries equal to one. Two versions of a novel method to
simulate X are developed in this thesis; one version is based on generating correlated
binomials directly and the other is based on generating correlated Bernoulli random
vectors and summing them component wise. Two existing methods to simulate X are
also discussed and implemented. In particular, a method involving the multivariate
normal by Madsen and Birkes (2013) is compared to our novel methods for d ≥ 3.
Our novel binomial method has a different variance for the Fisher transformed sample
correlation than the other two methods. Overall, if the target pairwise correlations
are smaller than the lowest upper bound possible and the number of SNPs is low,
then our novel Bernoulli method works the best since it is faster than the Madsen
and Birkes method and has comparable variability and bias for sample correlation. / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/27069 |
Date | January 2021 |
Creators | Lai, Winfield |
Contributors | Canty, Angelo, Davies, Katherine, Mathematics and Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0024 seconds