71 |
Qwixx Strategies Using Simulation and MCMC MethodsBlank, Joshua W 01 June 2024 (has links) (PDF)
This study explores optimal strategies for maximizing scores and winning in the popular dice game Qwixx, analyzing both single and multiplayer gameplay scenarios. Through extensive simulations, various strategies were tested and compared, including a scorebased approach that uses a formula tuned by MCMC random walks, and race-to-lock approaches which use absorbing Markov chain qualities of individual score sheet rows to find ways to lock rows as quickly as possible. Results indicate that employing a scorebased strategy, considering gap, count, position, skip, and likelihood scores, significantly improves performance in single player games, while move restrictions based on specific dice roll sums in the race-to-lock strategy were found to enhance winning and scoring points in multiplayer games. While the results do not achieve the optimal scores attained by prior informal work, the study provides valuable insights into decision-making processes and gameplay optimization for Qwixx enthusiasts, offering practical guidance for players seeking to enhance their performance and strategic prowess in the game. It also serves as a lesson for how to approach optimization problems in the future.
|
72 |
Finding a Representative Distribution for the Tail Index Alpha, α, for Stock Return Data from the New York Stock ExchangeBurns, Jett 01 May 2022 (has links)
Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The distributions are selected by using AIC and worm plots. The Skew t family is found to be representative for the parameter α based on subsets of the CRSP data. The Skew t type 2 distribution is robust for multiple subsets of values calculated from the CRSP stock return data.
|
73 |
Software Profiling of Rogue Events in High-Volume GaugingBering, Thomas P.K. 10 1900 (has links)
Customers are placing ever increasing demands on automotive part manufacturers for high quality parts at low cost. Increasingly, the demand is for zero defects or defect rates in the less than one part per billion. This creates a significant challenge for manufacturers as to how to achieve these low defect levels economically while producing large volumes of parts. Importantly, the presence of infrequent process and measurement (gauge) events can adversely affect product quality. This thesis uses a statistical mixture model that allows one to assume a main production process that occurs most of the time, and secondary rogue events that occur infrequently. Often the rogue events correspond to necessary operator activity, like equipment repairs and tooling replacement. The mixture model predicts that some gauge observations will be influenced by combinations of these rogue events. Certain production applications, like those involving feedback or high-reliability gauging, are heavily influenced by rogue events and combinations of rogue events. A special runtime software profiler was created to collect information about rogue events, and statistical techniques (rogue event analysis) were used to estimate the waste generated by these rogue events. The value of these techniques was successfully demonstrated in three different industrial automotive part production applications. Two of these systems involve an automated feedback application with Computer Numerically Controlled (CNC) machining centers and Coordinate Measuring Machine (CMM) gauges. The third application involves a high-reliability inspection system that used optical, camera-based, machine-vision technology. The original system accepted reject parts at a rate of 98.7 part per million (ppm), despite multiple levels of redundancy. The final system showed no outgoing defects on a 1 million part factory data sample, and a 100 million part simulated data sample. It is expected that the final system reliability will meet the 0.001 ppm specification, which represents a huge improvement. / Doctor of Philosophy (PhD)
|
74 |
A statistical framework to detect gene-environment interactions influencing complex traitsDeng, Wei Q. 27 August 2014 (has links)
<p>Advancements in human genomic technology have helped to improve our understanding of how genetic variation plays a central role in the mechanism of disease susceptibility. However, the very high dimensional nature of the data generated from large-scale genetic association studies has limited our ability to thoroughly examine genetic interactions. A prioritization scheme – Variance Prioritization (VP) – has been developed to select genetic variants based on differences in the quantitative trait variance between the possible genotypes using Levene’s test (Pare et al., 2010). Genetic variants with Levene’s test p-values lower than a pre-determined level of significance are selected to test for interactions using linear regression models. Under a variety of scenarios, VP has increased power to detect interactions over an exhaustive search as a result of reduced search space. Nevertheless, the use of Levene’s test does not take into account that the variance will either monotonically increase or decrease with the number of minor alleles when interactions are present. To address this issue, I propose a maximum likelihood approach to test for trends in variance between the genotypes, and derive a closed-form representation of the likelihood ratio test (LRT) statistic. Using simulations, I examine the performance of LRT in assessing the inequality of quantitative traits variance stratified by genotypes, and subsequently in identifying potentially interacting genetic variants. LRT is also used in an empirical dataset of 2,161 individuals to prioritize genetic variants for gene-environment interactions. The interaction p-values of the prioritized genetic variants are consistently lower than expected by chance compared to the non-prioritized, suggesting improved statistical power to detect interactions in the set of prioritized genetic variants. This new statistical test is expected to complement the existing VP framework and accelerate the process of genetic interaction discovery in future genome-wide studies and meta-analyses.</p> / Master of Health Sciences (MSc)
|
75 |
Sparse Principal Component Analysis for High-Dimensional Data: A Comparative StudyBonner, Ashley J. 10 1900 (has links)
<p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p> / Master of Science (MSc)
|
76 |
LIKELIHOOD-BASED INFERENTIAL METHODS FOR SOME FLEXIBLE CURE RATE MODELSPal, Suvra 04 1900 (has links)
<p>Recently, the Conway-Maxwell Poisson (COM-Poisson) cure rate model has been proposed which includes as special cases some of the well-known cure rate models discussed in the literature. Data obtained from cancer clinical trials are often right censored and the expectation maximization (EM) algorithm can be efficiently used for the determination of the maximum likelihood estimates (MLEs) of the model parameters based on right censored data.</p> <p>By assuming the lifetime distribution to be exponential, lognormal, Weibull, and gamma, the necessary steps of the EM algorithm are developed for the COM-Poisson cure rate model and some of its special cases. The inferential method is examined by means of an extensive simulation study. Model discrimination within the COM-Poisson family is carried out by likelihood ratio test as well as by information-based criteria. Finally, the proposed method is illustrated with a cutaneous melanoma data on cancer recurrence. As the lifetime distributions considered are not nested, it is not possible to carry out a formal statistical test to determine which among these provides an adequate fit to the data. For this reason, the wider class of generalized gamma distributions is considered which contains all of the above mentioned lifetime distributions as special cases. The steps of the EM algorithm are then developed for this general class of distributions and a simulation study is carried out to evaluate the performance of the proposed estimation method. Model discrimination within the generalized gamma family is carried out by likelihood ratio test and information-based criteria. Finally, for the considered cutaneous melanoma data, the two-way flexibility of the COM-Poisson family and the generalized gamma family is utilized to carry out a two-way model discrimination to select a parsimonious competing cause distribution along with a suitable choice of a lifetime distribution that provides the best fit to the data.</p> / Doctor of Philosophy (PhD)
|
77 |
STATISTICAL AND METHODOLOGICAL ISSUES IN EVALUATION OF INTEGRATED CARE PROGRAMSYe, Chenglin January 2014 (has links)
<p><strong>Background </strong></p> <p>Integrated care programs are collaborations to improve health services delivery for patients with multiple conditions.</p> <p><strong>Objectives</strong></p> <p>This thesis investigated three issues in evaluation of integrated care programs: (1) quantifying integration for integrated care programs, (2) analyzing integrated care programs with substantial non-compliance, and (3) assessing bias when evaluating integrated care programs under different non-compliant scenarios.</p> <p><strong>Methods</strong></p> <p>Project 1: We developed a method to quantity integration through service providers’ perception and expectation. For each provider, four integration scores were calculated. The properties of the scores were assessed.</p> <p>Project 2: A randomized controlled trial (RCT) compared the Children’s Treatment Network (CTN) with usual care on managing the children with complex conditions. To handle non-compliance, we employed the intention-to-treat (ITT), as-treated (AT), per-protocol (PP), and instrumental variable (IV) analyses. We also investigated propensity score (PS) methods to control for potential confounding.</p> <p>Project 3: Based on the CTN study, we simulated trials of different non-compliant scenarios. We then compared the ITT, AT, PP, IV, and complier average casual effect methods in analyzing the data. The results were compared by the bias of the estimate, mean square error, and 95% coverage.</p> <p><strong>Results and conclusions</strong></p> <p>Project 1: We demonstrated the proposed method in measuring integration and some of its properties. By bootstrapping analyses, we showed that the global integration score was robust. Our method has extended existing measures of integration and possesses a good extent of validity.</p> <p>Project 2: The CTN intervention was not significantly different from usual care on improving patients’ outcomes. The study highlighted some methodological challenges in evaluating integrated care programs in a RCT setting.</p> <p>Project 3: When an intervention had a moderate or large effect, the ITT analysis was considerably biased under non-compliance and alternative analyses could provide unbiased results. To minimize the bias, we make some recommendations for the choice of analyses under different scenarios.</p> / Doctor of Philosophy (PhD)
|
78 |
Statistical Methods for Handling Intentional Inaccurate RespondersMcQuerry, Kristen J. 01 January 2016 (has links)
In self-report data, participants who provide incorrect responses are known as intentional inaccurate responders. This dissertation provides statistical analyses for address intentional inaccurate responses in the data.
Previous work with adolescent self-report, labeled survey participants who intentionally provide inaccurate answers as mischievous responders. This phenomenon also occurs in clinical research. For example, pregnant women who smoke may report that they are nonsmokers. Our advantage is that we do not solely have self-report answers and can verify responses with lab values. Currently, there is no clear method for handling these intentional inaccurate respondents when it comes to making statistical inferences.
We propose a using an EM algorithm to account for the intentional behavior while maintaining all responses in the data. The performance of this model is evaluated using simulated data and real data. The strengths and weaknesses of the EM algorithm approach will be demonstrated.
|
79 |
EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALSShen, Zhiyuan 01 January 2016 (has links)
Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the EL with right censored data using a linear statistical functional constraint in terms of cumulative hazard functions. In this dissertation, we extend Owen's (1988) and Pan and Zhou's (2002) results subject to non-linear but Hadamard differentiable statistical functional constraints. In this purpose, a study of differentiable functional with respect to hazard functions is done. We also generalize our results to two sample problems. Stochastic process and martingale theories will be applied to prove the theorems. The confidence intervals based on EL method are compared with other available methods. Real data analysis and simulations are used to illustrate our proposed theorem with an application to the Gini's absolute mean difference.
|
80 |
EMPIRICAL PROCESSES AND ROC CURVES WITH AN APPLICATION TO LINEAR COMBINATIONS OF DIAGNOSTIC TESTSChirila, Costel 01 January 2008 (has links)
The Receiver Operating Characteristic (ROC) curve is the plot of Sensitivity vs. 1- Specificity of a quantitative diagnostic test, for a wide range of cut-off points c. The empirical ROC curve is probably the most used nonparametric estimator of the ROC curve. The asymptotic properties of this estimator were first developed by Hsieh and Turnbull (1996) based on strong approximations for quantile processes. Jensen et al. (2000) provided a general method to obtain regional confidence bands for the empirical ROC curve, based on its asymptotic distribution.
Since most biomarkers do not have high enough sensitivity and specificity to qualify for good diagnostic test, a combination of biomarkers may result in a better diagnostic test than each one taken alone. Su and Liu (1993) proved that, if the panel of biomarkers is multivariate normally distributed for both diseased and non-diseased populations, then the linear combination, using Fisher's linear discriminant coefficients, maximizes the area under the ROC curve of the newly formed diagnostic test, called the generalized ROC curve. In this dissertation, we will derive the asymptotic properties of the generalized empirical ROC curve, the nonparametric estimator of the generalized ROC curve, by using the empirical processes theory as in van der Vaart (1998). The pivotal result used in finding the asymptotic behavior of the proposed nonparametric is the result on random functions which incorporate estimators as developed by van der Vaart (1998). By using this powerful lemma we will be able to decompose an equivalent process into a sum of two other processes, usually called the brownian bridge and the drift term, via Donsker classes of functions. Using a uniform convergence rate result given by Pollard (1984), we derive the limiting process of the drift term. Due to the independence of the random samples, the asymptotic distribution of the generalized empirical ROC process will be the sum of the asymptotic distributions of the decomposed processes. For completeness, we will first re-derive the asymptotic properties of the empirical ROC curve in the univariate case, using the same technique described before. The methodology is used to combine biomarkers in order to discriminate lung cancer patients from normals.
|
Page generated in 0.0997 seconds