Global ETD Search

181	Multicollinearity and the Estimation of Regression Coefficients Teed, John Charles 01 May 1978 (has links) The precision of the estimates of the regression coefficients in a regression analysis is affected by multicollinearity. The effect of certain factors on multicollinearity and the estimates was studied. The response variables were the standard error of the regression coefficients and a standarized statistic that measures the deviation of the regression coefficient from the population parameter. The estimates are not influenced by any one factor in particular, but rather some combination of factors. The larger the sample size, the better the precision of the estimates no matter how "bad" the other factors may be. The standard error of the regression coefficients proved to be the best indication of estimation problems. multicollinearity estimation regression coefficients Applied Statistics Statistics and Probability
182	Parameter Estimation for Generalized Pareto Distribution Lin, Der-Chen 01 May 1988 (has links) The generalized Pareto distribution was introduced by Pickands (1975). Three methods of estimating the parameters of the generalized Pareto distribution were compared by Hosking and Wallis (1987) . The methods are maximum likelihood, method of moments and probability-weighted moments. An alternate method of estimation for the generalized Pareto distribution, based on least square regression of expected order statistics (REOS), is developed and evaluated in this thesis . A Monte Carlo comparison is made between this method and the estimating methods considered by Hosking and Wallis (1987). This method is shown to be generally superior to the maximum likelihood, method of moments and probability-weighted moments parameter estimation pareto distribution generalized Applied Statistics Statistics and Probability
183	Modeling the Performance of a Baseball Player's Offensive Production Smith, Michael Ross 09 March 2006 (has links) (PDF) This project addresses the problem of comparing the offensive abilities of players from different eras in Major League Baseball (MLB). We will study players from the perspective of an overall offensive summary statistic that is highly linked with scoring runs, or the Berry Value. We will build an additive model to estimate the innate ability of the player, the effect of the relative level of competition of each season, and the effect of age on performance using piecewise age curves. Using Hierarchical Bayes methodology with Gibbs sampling, we model each of these effects for each individual. The results of the Hierarchical Bayes model permit us to link players from different eras and to rank the players across the modern era of baseball (1900-2004) on the basis of their innate overall offensive ability. The top of the rankings, of which the top three were Babe Ruth, Lou Gehrig, and Stan Musial, include many Hall of Famers and some of the most productive offensive players in the history of the game. We also determine that trends in overall offensive ability in Major League Baseball exist based on different rule and cultural changes. Based on the model, MLB is currently at a high level of run production compared to the different levels of run production over the last century. baseball Hierarchical Bayes Gibbs sampler Metropolis-Hastings Statistics and Probability
184	Food Shelf Life: Estimation and Experimental Design Larsen, Ross Allen Andrew 15 May 2006 (has links) (PDF) Shelf life is a parameter of the lifetime distribution of a food product, usually the time until a specified proportion (1-50%) of the product has spoiled according to taste. The data used to estimate shelf life typically come from a planned experiment with sampled food items observed at specified times. The observation times are usually selected adaptively using ‘staggered sampling.’ Ad-hoc methods based on linear regression have been recommended to estimate shelf life. However, other methods based on maximizing a likelihood (MLE) have been proposed, studied, and used. Both methods assume the Weibull distribution. The observed lifetimes in shelf life studies are censored, a fact that the ad-hoc methods largely ignore. One purpose of this project is to compare the statistical properties of the ad-hoc estimators and the maximum likelihood estimator. The simulation study showed that the MLE methods have higher coverage than the regression methods, better asymptotic properties in regards to bias, and have lower median squared errors (mese) values, especially when shelf life is defined by smaller percentiles. Thus, they should be used in practice. A genetic algorithm (Hamada et al. 2001) was used to find near-optimal sampling designs. This was successfully programmed for general shelf life estimation. The genetic algorithm generally produced designs that had much smaller median squared errors than the staggered design that is used commonly in practice. These designs were radically different than the standard designs. Thus, the genetic algorithm may be used to plan studies in the future that have good estimation properties. food shelf life genetic algorithm maximum likelihood methods Statistics and Probability
185	Bayesian and Positive Matrix Factorization approaches to pollution source apportionment Lingwall, Jeff William 02 May 2006 (has links) (PDF) The use of Positive Matrix Factorization (PMF) in pollution source apportionment (PSA) is examined and illustrated. A study of its settings is conducted in order to optimize them in the context of PSA. The use of a priori information in PMF is examined, in the form of target factor profiles and pulling profile elements to zero. A Bayesian model using lognormal prior distributions for source profiles and source contributions is fit and examined. pollution source apportionment source attribution statistics air pollution Statistics and Probability
186	Probabilistic Methodology for Record Linkage Determining Robustness of Weights Jensen, Krista Peine 20 July 2004 (has links) (PDF) Record linkage is the process that joins separately recorded pieces of information for a particular individual from one or more sources. To facilitate record linkage, a reliable computer based approach is ideal. In genealogical research computerized record linkage is useful in combing information for an individual across multiple censuses. In creating a computerized method for linking censuse records it needs to be determined if weights calculated from one geographical area, can be used to link records from another geographical area. Research performed by Marcie Francis calculates field weights using census records from 1910 and 1920 for Ascension Parish Louisiana. These weights are re-calculated to take into account population changes of the time period and then used on five data sets from different geographical locations to determine their robustness. HeritageQuest provided indexed census records on four states. They include California, Connecticut, Illinois and Michigan in addition to Louisiana. Because the record size of California was large and we desired at least five data sets for comparison this state was split into two groups based on geographical location. Weights for Louisiana were re-calculated to take into consideration visual basic code modifications for the field "Place of Origin", "Age" and "Location" (enumeration district). The validity of these weights, were a concern due to the low number of known matches present in the data set for Louisiana. Thus, to get a better feel for how weights calculated from a data source with a larger number of known matches present, weights were calculated for Michigan census records. Error rates obtained using weights calculated from the Michigan data set were lower than those obtained using Louisiana weights. In order to determine weight robustness weights for Southern California were also calculated to allow for comparison between two samples. Error rates acquired using Southern California weights were much lower than either of the previously calculated error rates. This led to the decision to calculate weights for each of the data sets and take the average of the weights and use them to link each data set to take into account fluctuations of the population between geographical locations. Error rates obtained when using the averaged weights proved to be robust enough to use in any of the geographical areas sampled. The weights obtained in this project can be used when linking any census records from 1910 and 1920. When linking census records from other decades it is necessary to calculate new weights to account for specific time period fluctuations. Genealogy Reocrd Linkage Robustness Census Records Statistics and Probability
187	Performance of AIC-Selected Spatial Covariance Structures for fMRI Data Stromberg, David A. 28 July 2005 (has links) (PDF) FMRI datasets allow scientists to assess functionality of the brain by measuring the response of blood flow to a stimulus. Since the responses from neighboring locations within the brain are correlated, simple linear models that assume independence of measurements across locations are inadequate. Mixed models can be used to model the spatial correlation between observations, however selecting the correct covariance structure is difficult. Information criteria, such as AIC are often used to choose among covariance structures. Once the covariance structure is selected, significance tests can be used to determine if a region of interest within the brain is significantly active. Through the use of simulations, this project explores the performance of AIC in selecting the covariance structure. Type I error rates are presented for the fixed effects using the the AIC chosen covariance structure. Power of the fixed effects are also discussed. fMRI Spatial Statistics AIC Covariance Structure Statistics and Probability
188	Estimating the Discrepancy Between Computer Model Data and Field Data: Modeling Techniques for Deterministic and Stochastic Computer Simulators Dastrup, Emily Joy 08 August 2005 (has links) (PDF) Computer models have become useful research tools in many disciplines. In many cases a researcher has access to data from a computer simulator and from a physical system. This research discusses Bayesian models that allow for the estimation of the discrepancy between the two data sources. We fit two models to data in the field of electrical engineering. Using this data we illustrate ways of modeling both a deterministic and a stochastic simulator when specific parametric assumptions can be made about the discrepancy term. Bayesian statistics computer models validation combining data Statistics and Probability
189	Modeling Distributions of Test Scores with Mixtures of Beta Distributions Feng, Jingyu 08 November 2005 (has links) (PDF) Test score distributions are used to make important instructional decisions about students. The test scores usually do not follow a normal distribution. In some cases, the scores appear to follow a bimodal distribution that can be modeled with a mixture of beta distributions. This bimodality may be due different levels of students' ability. The purpose of this study was to develop and apply statistical techniques for fitting beta mixtures and detecting bimodality in test score distributions. Maximum likelihood and Bayesian methods were used to estimate the five parameters of the beta mixture distribution for scores in four quizzes in a cell biology class at Brigham Young University. The mixing proportion was examined to draw conclusions about bimodality. We were successful in fitting the beta mixture to the data, but the methods were only partially successful in detecting bimodality. mixture distribution Beta mixture distribution Bayesian likelihood Statistics and Probability
190	A Comparative Simulation Study of Robust Estimators of Standard Errors Johnson, Natalie 10 July 2007 (has links) (PDF) The estimation of standard errors is essential to statistical inference. Statistical variability is inherent within data, but is usually of secondary interest; still, some options exist to deal with this variability. One approach is to carefully model the covariance structure. Another approach is robust estimation. In this approach, the covariance structure is estimated from the data. White (1980) introduced a biased, but consistent, robust estimator. Long et al. (2000) added an adjustment factor to White's estimator to remove the bias of the original estimator. Through the use of simulations, this project compares restricted maximum likelihood (REML) with four robust estimation techniques: the Standard Robust Estimator (White 1980), the Long estimator (Long 2000), the Long estimator with a quantile adjustment (Kauermann 2001), and the empirical option of the MIXED procedure in SAS. The results of the simulation show small sample and asymptotic properties of the five estimators. The REML procedure is modelled under the true covariance structure, and is the most consistent of the five estimators. The REML procedure shows a slight small-sample bias as the number of repeated measures increases. The REML procedure may not be the best estimator in a situation in which the covariance structure is in question. The Standard Robust Estimator is consistent, but it has an extreme downward bias for small sample sizes. The Standard Robust Estimator changes little when complexity is added to the covariance structure. The Long estimator is unstable estimator. As complexity is introduced into the covariance structure, the coverage probability with the Long estimator increases. The Long estimator with the quantile adjustment works as designed by mimicking the Long estimator at an inflated quantile level. The empirical option of the MIXED procedure in SAS works well for homogeneous covariance structures. The empirical option of the MIXED procedure in SAS reduces the downward bias of the Standard Robust Estimator when the covariance structure is homogeneous. standard errors robust estimators white estimator empirical option Statistics and Probability

Search results