Global ETD Search

31	An investigaton of umpire performance using PITCHf/x data via longitudinal analysis Juarez, Christopher January 1900 (has links) Master of Science / Department of Statistics / Abigail Jager / Baseball has long provided statisticians a playground for analysis. In this report we discuss the history of Major League Baseball (MLB) umpires, MLB data collection, and the use of technology in sports officiating. We use PITCHf/x data to answer 3 questions. 1) Has the proportion of incorrect calls made by a major league umpire decreased over time? 2) Does the proportion of incorrect calls differ for umpires hired prior to the implementation of technology in evaluating umpire performance from those hired after? 3) Does the rate of change in the proportion of incorrect calls differ for umpires hired prior to the implementation of technology in evaluating umpire performance from those hired after? PITCHf/x is a publicly available database which gathers characteristics for every pitch thrown in one of the 30 MLB parks. In 2002, MLB began to use camera technology in umpire evaluations; prior to 2007, the data were not publicly available. Data were collected at the pitch level and the proportion of incorrect calls was calculated for each umpire for the first third, second third, and last third of each of the seasons for 2008-2011. We collected data from retrosheet.org, which provides game summary information. We also determined the year of each umpire’s MLB debut to differentiate pre- and post-technology hired umpires for our analysis. We answered our questions of interest using longitudinal data analysis, using a random coefficients model. We investigated the choice of covariance structure for our random coefficients model using Akaike’s Information Criterion and the Bayesian Information Criterion. Further, we compared our random coefficients model to a fixed slopes model and a general linear model. Statistics Baseball PITCHf/x Umpire Statistics (0463)
32	Statistical methods for diagnostic testing: an illustration using a new method for cancer detection Sun, Xin January 1900 (has links) Master of Science / Department of Statistics / Gary Gadbury / This report illustrates how to use two statistic methods to investigate the performance of a new technique to detect breast cancer and lung cancer at early stages. The two methods include logistic regression and classification and regression tree (CART). It is found that the technique is effective in detecting breast cancer and lung cancer, with both sensitivity and specificity close to 0.9. But the ability of this technique to predict the actual stages of cancer is low. The age variable improves the ability of logistic regression in predicting the existence of breast cancer for the samples used in this report. But since the sample sizes are small, it is impossible to conclude that including the age variable helps the prediction of breast cancer. Including the age variable does not improve the ability to predict the existence of lung cancer. If the age variable is excluded, CART and logistic regression give a very close result. Logistic regression Cancer detection Statistics (0463)
33	Estimating Non-homogeneous Intensity Matrices in Continuous Time Multi-state Markov Models Lebovic, Gerald 31 August 2011 (has links) Multi-State-Markov (MSM) models can be used to characterize the behaviour of categorical outcomes measured repeatedly over time. Kalbfleisch and Lawless (1985) and Gentleman et al. (1994) examine the MSM model under the assumption of time-homogeneous transition intensities. In the context of non-homogeneous intensities, current methods use piecewise constant approximations which are less than ideal. We propose a local likelihood method, based on Tibshirani and Hastie (1987) and Loader (1996), to estimate the transition intensities as continuous functions of time. In particular the local EM algorithm suggested by Betensky et al. (1999) is employed to estimate the in-homogeneous intensities in the presence of missing data. A simulation comparing the piecewise constant method with the local EM method is examined using two different sets of underlying intensities. In addition, model assessment tools such as bandwidth selection, grid size selection, and bootstrapped percentile intervals are examined. Lastly, the method is applied to an HIV data set to examine the intensities with regard to depression scores. Although computationally intensive, it appears that this method is viable for estimating non-homogeneous intensities and outperforms existing methods. Biostatistics Multi State Models Ordinal Data 0463
34	Estimating Non-homogeneous Intensity Matrices in Continuous Time Multi-state Markov Models Lebovic, Gerald 31 August 2011 (has links) Multi-State-Markov (MSM) models can be used to characterize the behaviour of categorical outcomes measured repeatedly over time. Kalbfleisch and Lawless (1985) and Gentleman et al. (1994) examine the MSM model under the assumption of time-homogeneous transition intensities. In the context of non-homogeneous intensities, current methods use piecewise constant approximations which are less than ideal. We propose a local likelihood method, based on Tibshirani and Hastie (1987) and Loader (1996), to estimate the transition intensities as continuous functions of time. In particular the local EM algorithm suggested by Betensky et al. (1999) is employed to estimate the in-homogeneous intensities in the presence of missing data. A simulation comparing the piecewise constant method with the local EM method is examined using two different sets of underlying intensities. In addition, model assessment tools such as bandwidth selection, grid size selection, and bootstrapped percentile intervals are examined. Lastly, the method is applied to an HIV data set to examine the intensities with regard to depression scores. Although computationally intensive, it appears that this method is viable for estimating non-homogeneous intensities and outperforms existing methods. Biostatistics Multi State Models Ordinal Data 0463
35	Training Recurrent Neural Networks Sutskever, Ilya 13 August 2013 (has links) Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization. Recurrent Neural Networks Optimization 0800 0984 0463
36	Stochastic Mortality Modelling Liu, Xiaoming 28 July 2008 (has links) For life insurance and annuity products whose payoffs depend on the future mortality rates, there is a risk that realized mortality rates will be different from the anticipated rates accounted for in their pricing and reserving calculations. This is termed as mortality risk. Since mortality risk is difficult to diversify and has significant financial impacts on insurance policies and pension plans, it is now a well-accepted fact that stochastic approaches shall be adopted to model the mortality risk and to evaluate the mortality-linked securities. The objective of this thesis is to propose the use of a time-changed Markov process to describe stochastic mortality dynamics for pricing and risk management purposes. Analytical and empirical properties of this dynamics have been investigated using a matrix-analytic methodology. Applications of the proposed model in the evaluation of fair values for mortality linked securities have also been explored. To be more specific, we consider a finite-state Markov process with one absorbing state. This Markov process is related to an underlying aging mechanism and the survival time is viewed as the time until absorption. The resulting distribution for the survival time is a so-called phase-type distribution. This approach is different from the traditional curve fitting mortality models in the sense that the survival probabilities are now linked with an underlying Markov aging process. Markov mathematical and phase-type distribution theories therefore provide us a flexible and tractable framework to model the mortality dynamics. And the time-changed Markov process allows us to incorporate the uncertainties embedded in the future mortality evolution. The proposed model has been applied to price the EIB/BNP Longevity Bonds and other mortality derivatives under the independent assumption of interest rate and mortality rate. A calibrating method for the model is suggested so that it can utilize both the market price information involving the relevant mortality risk and the latest mortality projection. The proposed model has also been fitted to various type of population mortality data for empirical study. The fitting results show that our model can interpret the stylized mortality patterns very well. Mortality modelling Time-changed Markov process 0463
37	Training Recurrent Neural Networks Sutskever, Ilya 13 August 2013 (has links) Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization. Recurrent Neural Networks Optimization 0800 0984 0463
38	New methods for analysis of epidemiological data using capture-recapture methods Huakau, John Tupou January 2002 (has links) Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only. MATHEMATICS (0405) STATISTICS (0463) BIOLOGY, BIOSTATISTICS (0308)
39	New methods for analysis of epidemiological data using capture-recapture methods Huakau, John Tupou January 2002 (has links) Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only. MATHEMATICS (0405) STATISTICS (0463) BIOLOGY, BIOSTATISTICS (0308)
40	New methods for analysis of epidemiological data using capture-recapture methods Huakau, John Tupou January 2002 (has links) Capture-recapture methods take their origins from animal abundance estimation, where they were used to estimate the unknown size of the animal population under study. In the late 1940s and again in the late 1960s and early 1970s these same capture-recapture methods were modified and applied to epidemiological list data. Since then through their continued use, in particular in the 1990s, these methods have become popular for the estimation of the completeness of disease registries and for the estimation of the unknown total size of human disease populations. In this thesis we investigate new methods for the analysis of epidemiological list data using capture-recapture methods. In particular we compare two standard methods used to estimate the unknown total population size, and examine new methods which incorporate list mismatch errors and model-selection uncertainty into the process for the estimation of the unknown total population size and its associated confidence interval. We study the use of modified tag loss methods from animal abundance estimation to allow for list mismatch errors in the epidemio-logical list data. We also explore the use of a weighted average method, the use of Bootstrap methods, and the use of a Bayesian model averaging method for incorporating model-selection uncertainty into the estimate of the unknown total population size and its associated confidence interval. In addition we use two previously unanalysed Diabetes studies to illustrate the methods examined and a well-known Spina Bifida Study for simulation purposes. This thesis finds that ignoring list mismatch errors will lead to biased estimates of the unknown total population size and that the list mismatch methods considered here result in a useful adjustment. The adjustment also approximately agrees with the results obtained using a complex matching algorithm. As for the incorporation of model-selection uncertainty, we find that confidence intervals which incorporate model-selection uncertainty are wider and more appropriate than confidence intervals that do not. Hence we recommend the use of tag loss methods to adjust for list mismatch errors and the use of methods that incorporate model-selection uncertainty into both point and interval estimates of the unknown total population size. / Subscription resource available via Digital Dissertations only. MATHEMATICS (0405) STATISTICS (0463) BIOLOGY, BIOSTATISTICS (0308)

Search results