Global ETD Search

1	Fractionation Statistics Wang, Baoyong 01 May 2014 (has links) Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean mu, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r; itself a random variable with distribution pi(.). A biologically more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying pi(r), as a function of mu, and show how sampling l allows us to estimate mu. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r) as a function of mu and the proportion of unreduced paralog pairs. This is based on a computing formula containing nested sums. The parameter mu can be estimated based on run lengths of single-copy regions. We then reduce the computing formulae, at least in the one-sided case, to closed form. This virtually eliminates computing time due to highly nested summations. We formulate a continuous version of the fractionation process, deleting line segments of exponentially distributed lengths in analogy to geometric distributed numbers of genes. We derive nested integrals and discover that the number of previously deleted regions to be skipped by a new deletion event is exactly geometrically distributed. We undertook a large simulation experiment to show how to discriminate between the gene-by-gene duplicate deletion model and the deletion of a geometrically distributed number of genes. This revealed the importance of the effects of genome size N, the mean of the geometric distribution, the progress towards completion of the fractionation process, and whether the data are based on runs of deleted genes or undeleted genes. mathematical model evolution whole genome doubling gene loss theory of runs
2	Fractionation Statistics Wang, Baoyong January 2014 (has links) Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean mu, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r; itself a random variable with distribution pi(.). A biologically more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying pi(r), as a function of mu, and show how sampling l allows us to estimate mu. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r) as a function of mu and the proportion of unreduced paralog pairs. This is based on a computing formula containing nested sums. The parameter mu can be estimated based on run lengths of single-copy regions. We then reduce the computing formulae, at least in the one-sided case, to closed form. This virtually eliminates computing time due to highly nested summations. We formulate a continuous version of the fractionation process, deleting line segments of exponentially distributed lengths in analogy to geometric distributed numbers of genes. We derive nested integrals and discover that the number of previously deleted regions to be skipped by a new deletion event is exactly geometrically distributed. We undertook a large simulation experiment to show how to discriminate between the gene-by-gene duplicate deletion model and the deletion of a geometrically distributed number of genes. This revealed the importance of the effects of genome size N, the mean of the geometric distribution, the progress towards completion of the fractionation process, and whether the data are based on runs of deleted genes or undeleted genes. mathematical model evolution whole genome doubling gene loss theory of runs

Search results

Fractionation Statistics

Fractionation Statistics