Global ETD Search

41	New methods for analysis of epidemiological data using capture-recapture methods Huakau, John Tupou January 2002 (has links) Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only. MATHEMATICS (0405) STATISTICS (0463) BIOLOGY, BIOSTATISTICS (0308)
42	New methods for analysis of epidemiological data using capture-recapture methods Huakau, John Tupou January 2002 (has links) Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only. MATHEMATICS (0405) STATISTICS (0463) BIOLOGY, BIOSTATISTICS (0308)
43	Using statistical learning to predict survival of passengers on the RMS Titanic Whitley, Michael Aaron January 1900 (has links) Master of Science / Statistics / Christopher Vahl / When exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers. Decision tree Ensemble Kaggle Titanic Statistics (0463)
44	On goodness-of-fit of logistic regression model Liu, Ying January 1900 (has links) Doctor of Philosophy / Department of Statistics / Shie-Shien Yang / Logistic regression model is a branch of the generalized linear models and is widely used in many areas of scientific research. The logit link function and the binary dependent variable of interest make the logistic regression model distinct from linear regression model. The conclusion drawn from a fitted logistic regression model could be incorrect or misleading when the covariates can not explain and /or predict the response variable accurately based on the fitted model- that is, lack-of-fit is present in the fitted logistic regression model. The current goodness-of-fit tests can be roughly categorized into four types. (1) The tests are based on covariate patterns, e.g., Pearson's Chi-square test, Deviance D test, and Osius and Rojek's normal approximation test. (2) Hosmer-Lemeshow's C and Hosmer-Lemeshow's H tests are based on the estimated probabilities. (3) Score tests are based on the comparison of two models, where the assumed logistic regression model is embedded into a more general parametric family of models, e.g., Stukel's Score test and Tsiatis's test. (4) Smoothed residual tests include le Cessie and van Howelingen's test and Hosmer and Lemeshow's test. All of them have advantages and disadvantages. In this dissertation, we proposed a partition logistic regression model which can be viewed as a generalized logistic regression model, since it includes the logistic regression model as a special case. This partition model is used to construct goodness-of- fit test for a logistic regression model which can also identify the nature of lack-of-fit is due to the tail or middle part of the probabilities of success. Several simulation results showed that the proposed test performs as well as or better than many of the known tests. Logistic Regression Goodness-of-Fit Statistics (0463)
45	Off-line quality control by robust parameter design Min, Jun Young January 1900 (has links) Master of Science / Department of Statistics / Shie-Shien Yang / There have been considerable debates over the robust parameter design. As a result, there have been many approaches presented that are suited to the robust parameter design. In my report, I illustrate and present Taguchi's robust parameter design, response surface approach and semi-parameter design. Considerable attention has been placed on the semi-parameter design. This approach is new technology that was introduced to Picke, Robinson, Birch and Anderson-Cook (2006). The method is a combined parametric and nonparametric technique to improve the estimates of both the mean and the variance of the response. Robust Taguchi response semi-parameter Statistics (0463)
46	Data analysis for quantitative determinations of polar lipid molecular species Song, Tingting January 1900 (has links) Master of Science / Department of Statistics / Gary L. Gadbury / This report presents an analysis of data resulting from a lipidomics experiment. The experiment sought to determine the changes in the lipidome of big bluestem prairie grass when exposed to stressors. The two stressors were drought (versus a watered condition) and a rust infection (versus no infection), and were whole plot treatments arranged in a 2 by 2 factorial. A split plot treatment factor was the position on a sampled leaf (top half versus bottom half). In addition, samples were analyzed at different times, representing a blocking factor. A total of 110 samples were used and, for each sample, concentrations of 137 lipids were obtained. Many lipids were not detected for certain samples and, in some cases, a lipid was not detected in most samples. Thus, each lipid was analyzed separately using a modeling strategy that involved a combination of mixed effects linear models and a categorical analysis technique, with the latter used for certain lipids to determine if a pattern of observed zeros was associated with the treatment condition(s). In addition, p-values from tests of fixed effects in a mixed effect model were computed three different ways and compared. Results in general show that the drought condition has the greatest effect on the concentrations of certain lipids, followed by the effect of position on the leaf. Of least effect on lipid concentrations was the rust condition. Lipidomics experiment Mixed effect linear models Categorical analysis Statistics (0463)
47	Classification of image pixels based on minimum distance and hypothesis testing Ghimire, Santosh January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / We introduce a new classification method that is applicable to classify image pixels. This work was motivated by the test-based classification (TBC) introduced by Liao and Akritas(2007). We found that direct application of TBC on image pixel classification can lead to high mis-classification rate. We propose a method that combines the minimum distance and evidence from hypothesis testing to classify image pixels. The method is implemented in R programming language. Our method eliminates the drawback of Liao and Akritas (2007).Extensive experiments show that our modified method works better in the classification of image pixels in comparison with some standard methods of classification; namely, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Classification Tree(CT), Polyclass classification, and TBC. We demonstrate that our method works well in the case of both grayscale and color images. Hypothesis testing minimum distance image processing image classification Statistics (0463)
48	Robustness of normal theory inference when random effects are not normally distributed Devamitta Perera, Muditha Virangika January 1900 (has links) Master of Science / Department of Statistics / Paul I. Nelson / The variance of a response in a one-way random effects model can be expressed as the sum of the variability among and within treatment levels. Conventional methods of statistical analysis for these models are based on the assumption of normality of both sources of variation. Since this assumption is not always satisfied and can be difficult to check, it is important to explore the performance of normal based inference when normality does not hold. This report uses simulation to explore and assess the robustness of the F-test for the presence of an among treatment variance component and the normal theory confidence interval for the intra-class correlation coefficient under several non-normal distributions. It was found that the power function of the F-test is robust for moderately heavy-tailed random error distributions. But, for very heavy tailed random error distributions, power is relatively low, even for a large number of treatments. Coverage rates of the confidence interval for the intra-class correlation coefficient are far from nominal for very heavy tailed, non-normal random effect distributions. Random Effects Models Non-normal random effects Variance Components Statistics (0463)
49	Ordinary least squares regression of ordered categorical data: inferential implications for practice Larrabee, Beth R. January 1900 (has links) Master of Science / Department of Statistics / Nora Bello / Ordered categorical responses are frequently encountered in many disciplines. Examples of interest in agriculture include quality assessments, such as for soil or food products, and evaluation of lesion severity, such as teat ends status in dairy cattle. Ordered categorical responses are characterized by multiple categories or levels recorded on a ranked scale that, while apprising relative order, are not informative of magnitude of or proportionality between levels. A number of statistically sound models for ordered categorical responses have been proposed, such as logistic regression and probit models, but these are commonly underutilized in practice. Instead, the ordinary least squares linear regression model is often employed with ordered categorical responses despite violation of basic model assumptions. In this study, the inferential implications of this approach are investigated using a simulation study that evaluates robustness based on realized Type I error rate and statistical power. The design of the simulation study is motivated by applied research cases reported in the literature. A variety of plausible scenarios were considered for simulation, including various shapes of the frequency distribution and different number of categories of the ordered categorical response. Using a real dataset on frequency of antimicrobial use in feedlots, I demonstrate the inferential performance of ordinary least squares linear regression on ordered categorical responses relative to a probit model. ordinary least squares likert ordinal regression categorical Statistics (0463)
50	Parameter estimation of the Black-Scholes-Merton model Teka, Kubrom Hisho January 1900 (has links) Master of Science / Department of Statistics / James Neill / In financial mathematics, asset prices for European options are often modeled according to the Black-Scholes-Merton (BSM) model, a stochastic differential equation (SDE) depending on unknown parameters. A derivation of the solution to this SDE is reviewed, resulting in a stochastic process called geometric Brownian motion (GBM) which depends on two unknown real parameters referred to as the drift and volatility. For additional insight, the BSM equation is expressed as a heat equation, which is a partial differential equation (PDE) with well-known properties. For American options, it is established that asset value can be characterized as the solution to an obstacle problem, which is an example of a free boundary PDE problem. One approach for estimating the parameters in the GBM solution to the BSM model can be based on the method of maximum likelihood. This approach is discussed and applied to a dataset involving the weekly closing prices for the Dow Jones Industrial Average between January 2012 and December 2012. Parameter estimation Black-Scholes-Merton model Statistics (0463)

Search results