Global ETD Search

201	Hitters vs. Pitchers: A Comparison of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models Huddleston, Scott D. 17 April 2012 (has links) (PDF) In recent years, fantasy baseball has seen an explosion in popularity. Major League Baseball, with its long, storied history and the enormous quantity of data available, naturally lends itself to the modern-day recreational activity known as fantasy baseball. Fantasy baseball is a game in which participants manage an imaginary roster of real players and compete against one another using those players' real-life statistics to score points. Early forms of fantasy baseball began in the early 1960s, but beginning in the 1990s, the sport was revolutionized due to the advent of powerful computers and the Internet. The data used in this project come from an actual fantasy baseball league which uses a head-to-head, points-based scoring system. The data consist of the weekly point totals that were accumulated over the first three-fourths of the 2011 regular season by the top 110 hitters and top 70 pitchers in Major League Baseball. The purpose of this project is analyze the relative value of pitchers versus hitters in this league using hierarchical Bayesian models. Three models will be compared, one which differentiates between hitters and pitchers, another which also differentiates between starting pitchers and relief pitchers, and a third which makes no distinction whatsoever between hitters and pitchers. The models will be compared using the deviance information criterion (DIC). The best model will then be used to predict weekly point totals for the last fourth of the 2011 season. Posterior predictive densities will be compared to actual weekly scores. fantasy baseball hierarchical Bayesian models MCMC Statistics and Probability
202	Species Identification and Strain Attribution with Unassembled Sequencing Data Francis, Owen Eric 18 April 2012 (has links) (PDF) Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample or that the target strain is not even contained within the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for genome assembly - a time consuming and labor intensive step. We demonstrate our approach using genomic data from a variety of known bacterial agents of bioterrorism and agents impacting human health. Next-generation sequencing bioforensics biosurveillance Bayesian mixture model Statistics and Probability
203	An Applied Investigation of Gaussian Markov Random Fields Olsen, Jessica Lyn 26 June 2012 (has links) (PDF) Recently, Bayesian methods have become the essence of modern statistics, specifically, the ability to incorporate hierarchical models. In particular, correlated data, such as the data found in spatial and temporal applications, have benefited greatly from the development and application of Bayesian statistics. One particular application of Bayesian modeling is Gaussian Markov Random Fields. These methods have proven to be very useful in providing a framework for correlated data. I will demonstrate the power of GMRFs by applying this method to two sets of data; a set of temporal data involving car accidents in the UK and a set of spatial data involving Provo area apartment complexes. For the first set of data, I will examine how including a seatbelt covariate effects our estimates for the number of car accidents. In the second set of data, we will scrutinize the effect of BYU approval on apartment complexes. In both applications we will investigate Laplacian approximations when normal distribution assumptions do not hold. Gaussian Markov Random Fields Spatial Correlated Data Statistics and Probability
204	Predicting Missionary Service Burraston, Bert 01 January 1994 (has links) (PDF) The purpose of this thesis was to test the antecedents of religiosity on religious commitment. Specifically, what dimensions of religiosity predict if a young-adult Mormon male will serve a mission. Both Logistic Regression and LISREL were used to examine data from the Young Men's Study, in order to predict Mission. The six variables, Religious Intention, Public Religiosity, Religious Negativism, Family Structure, Tithing, and Smoking were found to have direct effects on missionary service. Four more variables were found to have important indirect effects on Mission. The four variables are Parents Church Attendance, Home Religious Observances, Agree With Parents' Values, and Private Religiosity. Mormon church Missions Missionaries Anthropology Mormon Studies Statistics and Probability
205	To Hydrate or Chlorinate: A Regression Analysis of the Levels of Chlorine in the Public Water Supply Doyle, Drew A. 01 December 2015 (has links) Public water supplies contain disease-causing microorganisms in the water or distribution ducts. In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all U.S. water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this thesis will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples will be collected and have their chlorine level, temperature, and pH recorded. A linear regression analysis will be performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level will be the independent variables collected from each water sample. All data collected will be analyzed through various Statistical Analysis System (SAS®) procedures. Partial residual plots will be used to determine possible relationships between the chlorine level and the independent variables and stepwise selection to eliminate possible insignificant predictors. From there, several possible models for the data will be selected. F tests will be conducted to determine which of the models appears to be the most useful. All tests will include hypotheses, test statistics, p values, and conclusions. There will also be an analysis of the residual plot, jackknife residuals, leverage values, Cook’s D, press statistic, and normal probability plot of the residuals. Possible outliers will be investigated and the critical values for flagged observations will be stated along with what problems the flagged values indicate. Chlorine Public water Regression Water Water supply Statistics and Probability
206	Geometric and Combinatorial Aspects of 1-Skeleta McDaniel, Chris Ray 01 May 2010 (has links) In this thesis we investigate 1-skeleta and their associated cohomology rings. 1-skeleta arise from the 0- and 1-dimensional orbits of a certain class of manifold admitting a compact torus action and many questions that arise in the theory of 1-skeleta are rooted in the geometry and topology of these manifolds. The three main results of this work are: a lifting result for 1-skeleta (related to extending torus actions on manifolds), a classification result for certain 1-skeleta which have the Morse package (a property of 1-skeleta motivated by Morse theory for manifolds) and two constructions on 1-skeleta which we show preserve the Lefschetz package (a property of 1-skeleta motivated by the hard Lefschetz theorem in algebraic geometry). A corollary of this last result is a conceptual proof (applicable in certain cases) of the fact that the coinvariant ring of a finite reflection group has the strong Lefschetz property. 1-Skeleta Equivariant cohomology GKM manifolds Mathematics Statistics and Probability
207	Performance Comparison of Multiple Imputation Methods for Quantitative Variables for Small and Large Data with Differing Variability Onyame, Vincent 01 May 2021 (has links) Missing data continues to be one of the main problems in data analysis as it reduces sample representativeness and consequently, causes biased estimates. Multiple imputation methods have been established as an effective method of handling missing data. In this study, we examined multiple imputation methods for quantitative variables on twelve data sets with varied sizes and variability that were pseudo generated from an original data. The multiple imputation methods examined are the predictive mean matching, Bayesian linear regression and linear regression, non-Bayesian in the MICE (Multiple Imputation Chain Equation) package in the statistical software, R. The parameter estimates generated from the linear regression on the imputed data were compared to the closest parameter estimates from the complete data across all twelve data sets. Missing data Multiple imputation methods Quantitative data. Statistics and Probability
208	How Many Are Out There? A Novel Approach For Open and Closed Systems Rehman, Zia 01 January 2014 (has links) We propose a ratio estimator to determine population estimates using capture-recapture sampling. It's different than traditional approaches in the following ways: (1) Ordering of recaptures: Currently data sets do not take into account the "ordering" of the recaptures, although this crucial information is available to them at no cost. (2) Dependence of trials and cluster sampling: Our model explicitly considers trials to be dependent and improves existing literature which assumes independence. (3) Rate of convergence: The percentage sampled has an inverse relationship with population size, for a chosen degree of accuracy. (4) Asymptotic Attainment of Minimum Variance (Open Systems: (=population variance). (5) Full use of data and model applicability (6) Non-parametric (7) Heterogeneity: When units being sampled are hard to identify. (8) Open and closed systems: Simpler results are presented separately for closed systems. (9) Robustness to assumptions in open systems Capture recapture sampling ecology healthcare mark release Statistics and Probability
209	Using the Haar-Fisz wavelet transform to uncover regions of constant light intensity in Saturn's rings Paulson, Courtney L. 01 January 2010 (has links) Saturn's ring system is actually comprised of a multitude of separate rings, yet each of these rings has areas with more or less constant structural properties which are hard to uncover by observation alone. By measuring stellar occultations, data is collected in the form of Poisson counts (of over 6 million observations) which need to be denoised in order to find these areas with constant properties. At present, these areas are found by visual inspection or examining moving averages, which is hard to do when the amount of data is huge. It is also impossible to do this using the changepoint analysis-based method by Scargle (1998, 2005). For the purpose of finding areas of constant Poisson intensity, a state-of-the-art Haar-Fisz algorithm for Poisson intensity estimation is employed. This algorithm is based on wavelet-like transformation of the original data and subsequent denoising, a methodology originally developed by Nason and Fryzlewicz (2005). We apply the HaarFisz transform to the original data, which normalizes the noise level, then apply the Haar wavelet transform and threshold wavelet coefficients. Finally, we apply the inverse Haar-Fisz transform to recover the Poisson intensity function. We implement the algorithm using R programming language. The program was first tested using synthetic data and then applied to original Saturn ring observations, resulting in a quick, easy method to resolve data into discrete blocks with equal mean average intensities. Statistics and Probability
210	A Parametric Test for Trend Based on Moving Order Statistics Tan, Tao 10 1900 (has links) <p>When researchers work on time series or sequence, certain fundamental questions will naturally arise. One of them will be whether the series or sequence exhibits a gradual trend over time. In this thesis, we propose a test statistic based on moving order statistics and establish an exact procedure to test for the presence of monotone trends. We show that the test statistic under the null hypothesis that there is no trend follows the closed skew normal distribution. An efficient algorithm is then developed to generate realizations from this null distribution. A simulation study is conducted to evaluate the proposed test under the alternative hypotheses with linear, logarithmic and quadratic trend functions. Finally, a practical example is provided to illustrate the proposed test procedure.</p> / Master of Science (MSc) Parametric test Moving order statistics Time series Monotone trends Closed skew normal distribution Efficient algorithm Statistics and Probability Statistics and Probability

Search results