Global ETD Search

21	On Some Ridge Regression Estimators for Logistic Regression Models Williams, Ulyana P 28 March 2018 (has links) The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences. ridge regression estimators logistic regression multicollinearity Applied Statistics Other Statistics and Probability Statistical Methodology
22	An investigation of the methods for estimating usual dietary intake distributions : a thesis presented in partial fulfillment of the requirements for the degree of Master of Applied Statistics at Massey University, Albany, New Zealand Stoyanov, Stefan Kremenov January 2008 (has links) The estimation of the distribution of usual intake of nutrients is important for developing nutrition policies as well as for etiological research and educational purposes. In most nutrition surveys only a small number of repeated intake observations per individual are collected. Of main interest is the longterm usual intake which is defined as long-term daily average intake of a dietary component. However, dietary intake on a single day is a poor estimate of the individual’s long-term usual intake. Furthermore, the distribution of individual intake means is also a poor estimator of the distribution of usual intake since usually there is large within-individual compared to between-individual variability in the dietary intake data. Hence, the variance of the mean intakes is larger than the variance of the usual intake distribution. Essentially, the estimation of the distribution of long-term intake is equivalent to the estimation of a distribution of a random variable observed with measurement error. Some of the methods for estimating the distributions of usual dietary intake are reviewed in detail and applied to nutrient intake data in order to evaluate their properties. The results indicate that there are a number of robust methods which could be used to derive the distribution of long-term dietary intake. The methods share a common framework but differ in terms of complexity and assumptions about the properties of the dietary consumption data. Hence, the choice of the most appropriate method depends on the specific characteristics of the data, research purposes as well as availability of analytical tools and statistical expertise. Statistical methodology Estimates Food intake distributions
23	A New Screening Methodology for Mixture Experiments Weese, Maria 01 May 2010 (has links) Many materials we use in daily life are comprised of a mixture; plastics, gasoline, food, medicine, etc. Mixture experiments, where factors are proportions of components and the response depends only on the relative proportions of the components, are an integral part of product development and improvement. However, when the number of components is large and there are complex constraints, experimentation can be a daunting task. We study screening methods in a mixture setting using the framework of the Cox mixture model [1]. We exploit the easy interpretation of the parameters in the Cox mixture model and develop methods for screening in a mixture setting. We present specific methods for adding a component, removing a component and a general method for screening a subset of components in mixtures with complex constraints. The variances of our parameter estimates are comparable with the typically used Scheff ́e model variances and our methods provide a reduced run size for screening experiments with mixtures containing a large number of components. We then further extend the new screening methods by using Evolutionary Operation (EVOP) developed by Box and Draper [2]. EVOP methods use small movement in a subset of process parameters and replication to reveal effects out of the process noise. Mixture experiments inherently have small movements (since the proportions can only range from zero to unity) and the effects have large variances. We update the EVOP methods by using sequential testing of effects opposed to the confidence interval method originally proposed by Box and Draper. We show that the sequential testing approach as compared with a fixed sample size reduced the required sample size as much as 50 percent with all other testing parameters held constant. We present two methods for adding a component and a general screening method using a graphical sequential t-test and provide R-code to reproduce the limits for the test. mixture experiments screening Cox model Applied Statistics Design of Experiments and Sample Surveys Statistical Methodology
24	Statistical Methodology for Sequence Analysis Adhikari, Kaustubh 24 July 2012 (has links) Rare disease variants are receiving increasing importance in the past few years as the potential cause for many complex diseases, after the common disease variants failed to explain a large part of the missing heritability. With the advancement in sequencing techniques as well as computational capabilities, statistical methodology for analyzing rare variants is now a hot topic, especially in case-control association studies. In this thesis, we initially present two related statistical methodologies designed for case-control studies to predict the number of common and rare variants in a particular genomic region underlying the complex disease. Genome-wide association studies are nowadays routinely performed to identify a few putative marker loci or a candidate region for further analysis. These methods are designed to work with SNP data on such a genomic region highlighted by GWAS studies for potential disease variants. The fundamental idea is to use Bayesian methodology to obtain bivariate posterior distributions on counts of common and rare variants. While the ﬁrst method uses randomly generated (minimal) ancestral recombination graphs, the second method uses ensemble clustering method to explore the space of genealogical trees that represent the inherent structure in the test subjects. In contrast to the aforesaid methods which work with SNP data, the third chapter deals with next-generation sequencing data to detect the presence of rare variants in a genomic region. We present a non-parametric statistical methodology for rare variant association testing, using the well-known Kolmogorov-Smirnov framework adapted for genetic data. it is a fast, model-free robust statistic, designed for situations where both deleterious and protective variants are present. It is also unique in utilizing the variant locations in the test statistic. biostatistics Bayesian modeling common variants genetic association rare variants statistical methodology
25	BAYESIAN SEMIPARAMETRIC GENERALIZATIONS OF LINEAR MODELS USING POLYA TREES Schoergendorfer, Angela 01 January 2011 (has links) In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions. One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations from the assumption of logistic error can result in great bias in odds ratio estimates. A one-step approximation to the Savage-Dickey ratio will be presented as a Bayesian test for distributional assumptions in the traditional logistic regression model. The approximation utilizes least-squares estimates in the place of a full Bayesian Markov Chain simulation, and the equivalence of inferences based on the two implementations will be shown. A framework for flexible, semiparametric estimation of risks in the case that the assumption of logistic error is rejected will be proposed. A second application deals with regression scenarios in which residuals are correlated and their distribution evolves over an ordinal covariate such as time. In the context of prediction, such complex error distributions need to be modeled carefully and flexibly. The proposed model introduces dependent, but separate Polya tree priors for each time point, thus pooling information across time points to model gradual changes in distributional shapes. Theoretical properties of the proposed model will be outlined, and its potential predictive advantages in simulated scenarios and real data will be demonstrated. Polya trees risk estimation logistic regression Bayesian nonparametrics longitudinal data Applied Statistics Statistical Methodology
26	A Fault-Based Model of Fault Localization Techniques Hays, Mark A 01 January 2014 (has links) Every day, ordinary people depend on software working properly. We take it for granted; from banking software, to railroad switching software, to flight control software, to software that controls medical devices such as pacemakers or even gas pumps, our lives are touched by software that we expect to work. It is well known that the main technique/activity used to ensure the quality of software is testing. Often it is the only quality assurance activity undertaken, making it that much more important. In a typical experiment studying these techniques, a researcher will intentionally seed a fault (intentionally breaking the functionality of some source code) with the hopes that the automated techniques under study will be able to identify the fault's location in the source code. These faults are picked arbitrarily; there is potential for bias in the selection of the faults. Previous researchers have established an ontology for understanding or expressing this bias called fault size. This research captures the fault size ontology in the form of a probabilistic model. The results of applying this model to measure fault size suggest that many faults generated through program mutation (the systematic replacement of source code operators to create faults) are very large and easily found. Secondary measures generated in the assessment of the model suggest a new static analysis method, called testability, for predicting the likelihood that code will contain a fault in the future. While software testing researchers are not statisticians, they nonetheless make extensive use of statistics in their experiments to assess fault localization techniques. Researchers often select their statistical techniques without justification. This is a very worrisome situation because it can lead to incorrect conclusions about the significance of research. This research introduces an algorithm, MeansTest, which helps automate some aspects of the selection of appropriate statistical techniques. The results of an evaluation of MeansTest suggest that MeansTest performs well relative to its peers. This research then surveys recent work in software testing using MeansTest to evaluate the significance of researchers' work. The results of the survey indicate that software testing researchers are underreporting the significance of their work. Software testing testability statistical analysis MeansTest empirical validation Design of Experiments and Sample Surveys Software Engineering Statistical Methodology
27	Statistical Learning with Artificial Neural Network Applied to Health and Environmental Data Sharaf, Taysseer 01 January 2015 (has links) The current study illustrates the utilization of artificial neural network in statistical methodology. More specifically in survival analysis and time series analysis, where both holds an important and wide use in many applications in our real life. We start our discussion by utilizing artificial neural network in survival analysis. In literature there exist two important methodology of utilizing artificial neural network in survival analysis based on discrete survival time method. We illustrate the idea of discrete survival time method and show how one can estimate the discrete model using artificial neural network. We present a comparison between the two methodology and update one of them to estimate survival time of competing risks. To fit a model using artificial neural network, you need to take care of two parts; first one is the neural network architecture and second part is the learning algorithm. Usually neural networks are trained using a non-linear optimization algorithm such as quasi Newton Raphson algorithm. Other learning algorithms are base on Bayesian inference. In this study we present a new learning technique by using a mixture of the two available methodologies for using Bayesian inference in training of neural networks. We have performed our analysis using real world data. We have used patients diagnosed with skin cancer in the United states from SEER database, under the supervision of the National Cancer Institute. The second part of this dissertation presents the utilization of artificial neural to time series analysis. We present a new method of training recurrent artificial neural network with Hybrid Monte Carlo Sampling and compare our findings with the popular auto-regressive integrated moving average (ARIMA) model. We used the carbon dioxide monthly average emission to apply our comparison, data collected from NOAA. Statistical Learning Bayesian Learning for ANN Artiﬁcial Neural Network Cancer Survival Global Warming Mathematics Statistical Methodology
28	A Comparison of Some Confidence Intervals for Estimating the Kurtosis Parameter Jerome, Guensley 15 June 2017 (has links) Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap techniques used are: Bias-Corrected Standard Bootstrap, Efron’s Percentile Bootstrap, Hall’s Percentile Bootstrap and Bias-Corrected Percentile Bootstrap. We have found significant differences in the performance of classical and bootstrap estimators. We observed that the parametric method works well in terms of coverage probability when data come from a normal distribution, while the bootstrap intervals struggled in constantly reaching a 95% confidence level. When sample data are from a distribution with negative kurtosis, both parametric and bootstrap confidence intervals performed well, although we noticed that bootstrap methods tend to have smaller intervals. When it comes to positive kurtosis, bootstrap methods perform slightly better than classical methods in coverage probability. Among the three kurtosis estimators, G2 performed better. Among bootstrap techniques, Efron’s Percentile intervals had the best coverage. Kurtosis Confidence Interval Bootstrap Simulation Kurtosis Parameter Kurtosis Estimator Applied Statistics Other Statistics and Probability Statistical Methodology
29	Power Comparison of Some Goodness-of-fit Tests Liu, Tianyi 06 July 2016 (has links) There are some existing commonly used goodness-of-fit tests, such as the Kolmogorov-Smirnov test, the Cramer-Von Mises test, and the Anderson-Darling test. In addition, a new goodness-of-fit test named G test was proposed by Chen and Ye (2009). The purpose of this thesis is to compare the performance of some goodness-of-fit tests by comparing their power. A goodness-of-fit test is usually used when judging whether or not the underlying population distribution differs from a specific distribution. This research focus on testing whether the underlying population distribution is an exponential distribution. To conduct statistical simulation, SAS/IML is used in this research. Some alternative distributions such as the triangle distribution, V-shaped triangle distribution are used. By applying Monte Carlo simulation, it can be concluded that the performance of the Kolmogorov-Smirnov test is better than the G test in many cases, while the G test performs well in some cases. Goodness-of-fit test Exponential distribution Power comparison Monte-Carlo simulation Statistical Methodology
30	Simulation of Mathematical Models in Genetic Analysis Patel, Dinesh Govindal 01 May 1964 (has links) In recent years a new field of statistics has become of importance in many branches of experimental science. This is the Monte Carlo Method, so called because it is based on simulation of stochastic processes. By stochastic process, it is meant some possible physical process in the real world that has some random or stochastic element in its structure. This is the subject which may appropriately be called the dynamic part of statistics or the statistics of "change," in contrast with the static statistical problems which have so far been the more systematically studied. Many obvious examples of such processes are to be found in various branches of science and technology, for example, the phenomenon of Brownian Motion, the growth of a bacterial colony, the fluctuating numbers of electrons and protons in a cosmic ray shower or the random segregation and assortment of genes (chemical entities responsible for governing physical traits for the plant and animal systems) under linkage condition. Their occurrences are predominant in the fields of medicine, genetics, physics, oceanography, economics, engineering and industry, to name only a few scientific disciplines. The scientists making measurements in his laboratory, the meteriologist attempting to forecast weather, the control systems engineer designing a servomechanism (such as an aircraft or a thermostatic control), the electrical engineer designing a communication system (such as the radio link between entertainer and audience or the apparatus and cables that transmit messages from one point to another), economist studying price fluctuations in business cycles and the neurosurgion studying brain wave records, all are encountering problems to which the theory of stochastic processes may be relevant. Let us consider a few of these processes in a little more detail. In statistical physics many parts of the theory of stochastic processes were developed in correlation with the study of fluctuations and noise in physical systems (Einstein, 1905; Smoluchowski, 1906; and Schottky, 1918). Consequently, the theory of stochastic processes can be regarded as the mathematical foundation of statistical physics. The stochastic models for population growth consider the size and composition of a population which is constantly fluctuating. These are mostly considered by Bailey (1957), Bartlett (1960), and Bharucha-Reid (1960). In communication theory a wide variety of problems involving communication and/or control such as the problem of automatic tracking of moving objects, the reception of radio signals in the presence of natural and artificial disturbances, the reproduction of sound and images, the design of guidance systems, the design of control systems for industrial processes may be regarded as special cases of the following general problem; that is, let T denote a set of points in a time axis such that at each point t in T an observation has been made of a random variable X(t). Given the observations [x(t), t ϵT] and a quantity Z related to the observation, one desires to from in an optimum manner, estimates of, and tests of hypothesis about Z and various functions h(Z). mathematical models genetic analysis statistics Mathematics Statistical Methodology Statistical Models Statistical Theory

Search results