1 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
2 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
3 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
4 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
5 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
6 |
An exploratory method for identifying reactant-product lipid pairs from lipidomic profiles of wild-type and mutant leaves of Arabidopsis thalianaFan, Lixia January 1900 (has links)
Master of Science / Department of Statistics / Gary L. Gadbury / Discerning the metabolic or enzymatic role of a particular gene product, in the absence of
information indicating sequence homology to known gene products, is a difficult task. One
approach is to compare the levels of metabolites in a wild-type organism to those in an organism
with a mutation that causes loss of function of the gene. The goal of this project was to develop
an approach to analyze metabolite data on wild-type and mutant organisms for the purpose of
identifying the function of a mutated gene.
To develop and test statistical approaches to analysis of metabolite data for identification
of gene function, levels of 141 lipid metabolites were measured in leaves of wild-type
Arabidopsis thaliana plants and in leaves of Arabidopsis thaliana plants with known mutations
in genes involved in lipid metabolism. The mutations were primarily in fatty acid desaturases,
which are enzymes that catalyze reactions in which double bonds are added to fatty acids. When
these enzymes are mutated, leaf lipid composition is altered, and the altered levels of specific
lipid metabolites can be detected by a mass spectrometry.
A randomization P-Value and other metrics were calculated for all potential reactant product
pairs, which included all lipid metabolite pairs. An algorithm was developed to combine
these data and rank the results for each pair as to likelihood of being the actual reactant-product
pair. This method was designed and tested on data collected on mutants in genes with known
functions, fad2 (Okuley et al., 1994), fad3 (Arondel et al., 1992), fad4, fad5 (Mekhedov et al.,
2000), fad6 (Falcone et al., 1994), and fad7 (Iba et al., 1993 and Gibson et al., 1994). Application
of the method to three additional genes produced by random mutagenesis, sfd1, sfd2, and sfd3,
indicated that the significant pairs for fad6 and sfd3 were similar. Consistent with this, genetic
evidence has indicated that sfd3 is a mutation in the FAD6 gene.
The methods provide a list of putative reactions for an enzyme encoded by an unknown
mutant gene. The output lists for unknown genes and known genes can be compared to provide
evidence for similar biochemical activities. However, the strength of the current method is that
the list of candidate chemical reactions for an enzyme encoded by a mutant gene can be
produced without data other than the metabolite profile of the wild-type and mutant organisms,
i.e., known gene analysis is not a requirement to obtain the candidate reaction list.
|
7 |
Development of highly recombinant inbred populations for quantitative-trait locus mappingBoddhireddy, Prashanth January 1900 (has links)
Doctor of Philosophy / Genetics Interdepartmental Program-Plant Pathology / James Nelson / The goal of quantitative-trait locus (QTL) mapping is to understand the genetic architecture of an organism by identifying the genes underlying quantitative traits. It targets gene numbers and locations, interaction with other genes and environments, and the sizes of gene effects on the traits. QTL mapping in plants is often done on a population of progeny derived from one or more designed, or controlled, crosses. These crosses are designed to exploit correlation among marker genotypes for the purposes of mapping QTL. Reducing correlations between markers can improve the precision of location and effect estimates by reducing multicollinearity. The purpose of this thesis is to propose an approach for developing experimental populations to reduce correlation by increasing recombination between markers in QTL mapping populations especially in selfing species.
QTL mapping resolution of recombinant inbred lines (RILs) is limited by the amount of recombination RILs experience during development. Intercrossing during line development can be used to counter this disadvantage, but requires additional generations and is difficult in self-pollinated species. In this thesis I propose a way of improving mapping resolution through recombination enrichment. This method is based on genotyping at each generation and advancing lines selected for high recombination and/or low heterozygosity. These lines developed are called SA-RILs (selectively advanced recombinant inbred lines). In simulations, the method yields lines that represent up to twice as many recombination events as RILs developed conventionally by selfing without selection, or the same amount but in three generations, without reduction in homozygosity. Compared to methods that require maintaining a large population for several generations and selecting lines only from the finished population, the method proposed here achieves up to 25% more recombination.
Although SA-RILs accumulate more recombination than conventional RILs and can be used as fine-mapping populations for selfing species, the effectiveness of the SA-RIL approach decreases with genome size and is most valuable only when applied either to small genomes or to defined regions of large genomes. Here I propose the development of QTL-focused SA-RILs (QSA-RILs), which are SA-RILs enriched for recombination in regions of a large genome selected for evidence for the presence of a QTL. This evidence can be derived from QTL analysis in a subset of the population at the F2 generation and/or from previous studies. In simulations QSA-RILs afford up to threefold increase in recombination and twofold increase in accuracy of QTL position estimate in comparison with RILs. The regional-selection method also shows potential for resolving QTL linked in repulsion.
One of the recent Bayesian methods for QTL mapping, the shrinkage Bayesian method (BayesA (Xu)), has been successfully used for estimating marker effects in the QTL mapping populations. Although the implementation of the BayesA (Xu) method for estimating main effects was described by the author, the equations for the posterior mean and variance, used in estimation of the effects, were not elaborated. Here I derive the equations used for the estimation of main effects for doubled-haploid and F2 populations. I then extend these equations to estimate interaction effects in doubled-haploid populations. These derivations are helpful for an understanding of the intermediate steps leading to the equations described in the original paper introducing the shrinkage Bayesian method.
|
8 |
Inference of nonparametric hypothesis testing on high dimensional longitudinal data and its application in DNA copy number variation and micro array data analysisZhang, Ke January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / High throughput screening technologies have generated a huge amount of
biological data in the last ten years. With the easy availability of
array technology, researchers started to investigate biological
mechanisms using experiments with more sophisticated designs that pose novel challenges to
statistical analysis. We provide theory for robust statistical tests in three flexible
models. In the first model, we consider the hypothesis testing
problems when there are a large number of variables observed
repeatedly over time. A potential application is in tumor genomics
where an
array comparative genome hybridization (aCGH) study will be used to
detect progressive DNA copy number changes in tumor development. In
the second model, we consider hypothesis testing theory in a
longitudinal microarray study when there are multiple treatments or experimental conditions.
The tests developed can be used to
detect treatment effects for a large group of genes and discover genes that respond to treatment over
time. In the third model, we address a hypothesis testing problem that could
arise when array data from different sources are to be integrated. We
perform statistical tests by assuming a nested design. In all
models, robust test statistics were constructed based on moment methods allowing unbalanced design and arbitrary heteroscedasticity. The limiting
distributions were derived under the nonclassical setting when the number of probes is large. The
test statistics are not targeted at a single probe. Instead, we are
interested in testing for a selected set of probes simultaneously.
Simulation studies were carried out to compare the proposed methods with
some traditional tests using linear mixed-effects
models and generalized estimating equations. Interesting results obtained with the proposed theory in two
cancer genomic studies suggest that the new methods are promising for a wide range of biological applications with longitudinal arrays.
|
9 |
Individual treatment effect heterogeneity in multiple time points trialsNdum, Edwin Andong January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary Gadbury / In biomedical studies, the treatment main effect is often expressed in terms of an “average difference.” A treatment that appears superior based on the average effect may not be superior for all subjects in a population if there is substantial “subject-treatment interaction.” A parameter quantifying subject-treatment interaction is inestimable in two sample completely randomized designs. Crossover designs have been suggested as a way to estimate the variability in individual treatment effects since an “individual treatment effect” can be measured. However, variability in these observed individual effects may include variability due to the treatment plus inherent variability of a response over time. We use the “Neyman - Rubin Model of Causal Inference” (Neyman, 1923; Rubin, 1974) for analyses.
This dissertation consists of two parts: The quantitative and qualitative response analyses. The quantitative part focuses on disentangling the variability due to treatment effects from variability due to time effects using suitable crossover designs. Next, we propose a variable that defines the variance of a true individual treatment effect in a two crossover designs and show that they are not directly estimable but the mean effect is estimable. Furthermore, we show the variance of individual treatment effects is biased under both designs. The bias depends on time effects. Under certain design considerations, linear combinations of time effects can be estimated, making it possible to separate the variability due to time from that due to treatment.
The qualitative section involves a binary response and is centered on estimating the average treatment effect and bounding a probability of a negative effect, a parameter which relates to the individual treatment effect variability. Using a stated joint probability distribution of potential outcomes, we express the probability of the observed outcomes under a two treatment, two periods crossover design. Maximum likelihood estimates of these probabilities are found using an iterative numerical method. From these, we propose bounds for an inestimable probability of negative effect. Tighter bounds are obtained with information from subjects that receive the same treatments over the two periods. Finally, we simulate an example of observed count data to illustrate estimation of the bounds.
|
10 |
Statistical identification of metabolic reactions catalyzed by gene products of unknown functionZheng, Lianqing January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary L. Gadbury / High-throughput metabolite analysis is an approach used by biologists seeking to identify the functions of genes. A mutation in a gene encoding an enzyme is expected to alter the level of the metabolites which serve as the enzyme’s reactant(s) (also known as substrate) and product(s). To find the function of a mutated gene, metabolite data from a wild-type organism and a mutant are compared and candidate reactants and products are identified. The screening principle is that the concentration of reactants will be higher and the concentration of products will be lower in the mutant than in wild type. This is because the mutation reduces the reaction between the reactant and the product in the mutant organism.
Based upon this principle, we suggest a method to screen the possible lipid reactant and product pairs related to a mutation affecting an unknown reaction. Some numerical facts are given for the treatment means for the lipid pairs in each treatment group, and relations between the means are found for the paired lipids. A set of statistics from the relations between the means of the lipid pairs is derived. Reactant and product lipid pairs associated with specific mutations are used to assess the results.
We have explored four methods using the test statistics to obtain a list of potential reactant-product pairs affected by the mutation. The first method uses the parametric bootstrap to obtain an empirical null distribution of the test statistic and a technique to identify a family of distributions and corresponding parameter estimates for modeling the null distribution. The second method uses a mixture of normal distributions to model the empirical bootstrap null. The third method uses a normal mixture model with multiple components to model the entire distribution of test statistics from all pairs of lipids. The argument is made that, for some cases, one of the model components is that for lipid pairs affected by the mutation while the other components model the null distribution. The fourth method uses a two-way ANOVA model with an interaction term to find the relations between the mean concentrations and the role of a lipid as a reactant or product in a specific lipid pair. The goal of all methods is to identify a list of findings by false discovery techniques. Finally a simulation technique is proposed to evaluate properties of statistical methods for identifying candidate reactant-product pairs.
|
Page generated in 0.076 seconds