41 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
42 |
New methods for analysis of epidemiological data using capture-recapture methodsHuakau, John Tupou January 2002 (has links)
Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only.
|
43 |
Using statistical learning to predict survival of passengers on the RMS TitanicWhitley, Michael Aaron January 1900 (has links)
Master of Science / Statistics / Christopher Vahl / When exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers.
|
44 |
On goodness-of-fit of logistic regression modelLiu, Ying January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Shie-Shien Yang / Logistic regression model is a branch of the generalized linear models and is
widely used in many areas of scientific research. The logit link function and the binary
dependent variable of interest make the logistic regression model distinct from linear
regression model.
The conclusion drawn from a fitted logistic regression model could be incorrect or
misleading when the covariates can not explain and /or predict the response variable
accurately based on the fitted model- that is, lack-of-fit is present in the fitted logistic
regression model.
The current goodness-of-fit tests can be roughly categorized into four types. (1)
The tests are based on covariate patterns, e.g., Pearson's Chi-square test, Deviance D
test, and Osius and Rojek's normal approximation test. (2) Hosmer-Lemeshow's C and
Hosmer-Lemeshow's H tests are based on the estimated probabilities. (3) Score tests
are based on the comparison of two models, where the assumed logistic regression
model is embedded into a more general parametric family of models, e.g., Stukel's
Score test and Tsiatis's test. (4) Smoothed residual tests include le Cessie and van
Howelingen's test and Hosmer and Lemeshow's test. All of them have advantages and
disadvantages.
In this dissertation, we proposed a partition logistic regression model which can
be viewed as a generalized logistic regression model, since it includes the logistic
regression model as a special case. This partition model is used to construct goodness-of-
fit test for a logistic regression model which can also identify the nature of lack-of-fit is
due to the tail or middle part of the probabilities of success. Several simulation results
showed that the proposed test performs as well as or better than many of the known
tests.
|
45 |
Off-line quality control by robust parameter designMin, Jun Young January 1900 (has links)
Master of Science / Department of Statistics / Shie-Shien Yang / There have been considerable debates over the robust parameter design. As a result, there have been many approaches presented that are suited to the robust parameter design. In my report, I illustrate and present Taguchi's robust parameter design, response surface approach and semi-parameter design.
Considerable attention has been placed on the semi-parameter design. This approach is new technology that was introduced to Picke, Robinson, Birch and Anderson-Cook (2006). The method is a combined parametric and nonparametric technique to improve the estimates of both the mean and the variance of the response.
|
46 |
Data analysis for quantitative determinations of polar lipid molecular speciesSong, Tingting January 1900 (has links)
Master of Science / Department of Statistics / Gary L. Gadbury / This report presents an analysis of data resulting from a lipidomics experiment. The experiment sought to determine the changes in the lipidome of big bluestem prairie grass when exposed to stressors. The two stressors were drought (versus a watered condition) and a rust infection (versus no infection), and were whole plot treatments arranged in a 2 by 2 factorial. A split plot treatment factor was the position on a sampled leaf (top half versus bottom half). In addition, samples were analyzed at different times, representing a blocking factor. A total of 110 samples were used and, for each sample, concentrations of 137 lipids were obtained. Many lipids were not detected for certain samples and, in some cases, a lipid was not detected in most samples. Thus, each lipid was analyzed separately using a modeling strategy that involved a combination of mixed effects linear models and a categorical analysis technique, with the latter used for certain lipids to determine if a pattern of observed zeros was associated with the treatment condition(s). In addition, p-values from tests of fixed effects in a mixed effect model were computed three different ways and compared. Results in general show that the drought condition has the greatest effect on the concentrations of certain lipids, followed by the effect of position on the leaf. Of least effect on lipid concentrations was the rust condition.
|
47 |
Classification of image pixels based on minimum distance and hypothesis testingGhimire, Santosh January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / We introduce a new classification method that is applicable to classify image pixels. This
work was motivated by the test-based classification (TBC) introduced by Liao and Akritas(2007). We found that direct application of TBC on image pixel classification can lead to high mis-classification rate. We propose a method that combines the minimum distance
and evidence from hypothesis testing to classify image pixels. The method is implemented in R programming language. Our method eliminates the drawback of Liao and Akritas (2007).Extensive experiments show that our modified method works better in the classification of image pixels in comparison with some standard methods of classification; namely, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Classification Tree(CT), Polyclass classification, and TBC. We demonstrate that our method works well in the case of both grayscale and color images.
|
48 |
Robustness of normal theory inference when random effects are not normally distributedDevamitta Perera, Muditha Virangika January 1900 (has links)
Master of Science / Department of Statistics / Paul I. Nelson / The variance of a response in a one-way random effects model can be expressed as the sum of the variability among and within treatment levels. Conventional methods of statistical analysis for these models are based on the assumption of normality of both sources of variation. Since this assumption is not always satisfied and can be difficult to check, it is important to explore the performance of normal based inference when normality does not hold. This report uses simulation to explore and assess the robustness of the F-test for the presence of an among treatment variance component and the normal theory confidence interval for the intra-class correlation coefficient under several non-normal distributions. It was found that the power function of the F-test is robust for moderately heavy-tailed random error distributions. But, for very heavy tailed random error distributions, power is relatively low, even for a large number of treatments. Coverage rates of the confidence interval for the intra-class correlation coefficient are far from nominal for very heavy tailed, non-normal random effect distributions.
|
49 |
Ordinary least squares regression of ordered categorical data: inferential implications for practiceLarrabee, Beth R. January 1900 (has links)
Master of Science / Department of Statistics / Nora Bello / Ordered categorical responses are frequently encountered in many disciplines. Examples of interest in agriculture include quality assessments, such as for soil or food products, and evaluation of lesion severity, such as teat ends status in dairy cattle. Ordered categorical responses are characterized by multiple categories or levels recorded on a ranked scale that, while apprising relative order, are not informative of magnitude of or proportionality between levels. A number of statistically sound models for ordered categorical responses have been proposed, such as logistic regression and probit models, but these are commonly underutilized in practice. Instead, the ordinary least squares linear regression model is often employed with ordered categorical responses despite violation of basic model assumptions. In this study, the inferential implications of this approach are investigated using a simulation study that evaluates robustness based on realized Type I error rate and statistical power. The design of the simulation study is motivated by applied research cases reported in the literature. A variety of plausible scenarios were considered for simulation, including various shapes of the frequency distribution and different number of categories of the ordered categorical response. Using a real dataset on frequency of antimicrobial use in feedlots, I demonstrate the inferential performance of ordinary least squares linear regression on ordered categorical responses relative to a probit model.
|
50 |
Parameter estimation of the Black-Scholes-Merton modelTeka, Kubrom Hisho January 1900 (has links)
Master of Science / Department of Statistics / James Neill / In financial mathematics, asset prices for European options are often modeled according to the Black-Scholes-Merton (BSM) model, a stochastic differential equation (SDE) depending on unknown parameters. A derivation of the solution to this SDE is reviewed, resulting in a stochastic process called geometric Brownian motion (GBM) which depends on two unknown real parameters referred to as the drift and volatility. For additional insight, the BSM equation is expressed as a heat equation, which is a partial differential equation (PDE) with well-known properties. For American options, it is established that asset value can be characterized as the solution to an obstacle problem, which is an example of a free boundary PDE problem. One approach for estimating the parameters in the GBM solution to the BSM model can be based on the method of maximum likelihood. This approach is discussed and applied to a dataset involving the weekly closing prices for the Dow Jones Industrial Average between January 2012 and December 2012.
|
Page generated in 0.0239 seconds