Global ETD Search

31	Mixed Membership Distributions with Applications to Modeling Multiple Strategy Usage Galyardt, April 01 July 2012 (has links) This dissertation examines two related questions. How do mixed membership models work? and Can mixed membership be used to model how students use multiple strategies to solve problems? Mixed membership models have been used in thousands of applications from text and image processing to genetic microarray analysis. Yet these models are crafted on a case-by-case basis because we do not yet understand the larger class of mixed membership models. The work presented here addresses this gap and examines two different aspects of the general class of models. First I establish that categorical data is a special case, and allows for a different interpretation of mixed membership than in the general case. Second, I present a new identifiability result that characterizes equivalence classes of mixed membership models which produce the same distribution of data. These results provide a strong foundation for building a model that captures how students use multiple strategies. How to assess which strategies students use, is an open question. Most psychometric models either do not model strategies at all, or they assume that each student uses a single strategy on all problems, even if they allow different students to use different strategies. The problem is, that’s not what students do. Students switch strategies. Even on the very simplest of arithmetic problems, students use different strategies on different problems, and experts use a different mixture of strategies than novices do. Assessing which strategies students use is an important part of assessing student knowledge, yet the concept of ‘strategy’ can be ill-defined. I use the Knowledge- Learning-Instruction framework to define a strategy as a particular type of integrative knowledge component. I then look at two different ways to model how students use multiple strategies. I combine cognitive diagnosis models with mixed membership models to create a multiple strategies model. This new model allows for students to switch strategies from problem to problem, and allows us to estimate both the strategies that students are using and how often each student uses each strategy. I demonstrate this model on a modestly sized assessment of least common multiples. Lastly, I present an analysis of the different strategies that students use to estimate numerical magnitude. Three smaller results come out of this analysis. First, this illustrates the limits of the general mixed membership model. The properties of mixed membership models developed in this dissertation show that without serious changes to the model, it cannot describe the variation between students that is present in this data set. Second, I develop a exploratory data analysis method for summarizing functional data. Finally, this analysis demonstrates that existing psychological theory for how children estimate numerical magnitude is incomplete. There is more variation between students than is captured by current theoretical models. Statistics and Probability
32	Learning Spatio-Temporal Dynamics: Nonparametric Methods for Optimal Forecasting and Automated Pattern Discovery Goerg, Georg Matthias 01 December 2012 (has links) Many important scientific and data-driven problems involve quantities that vary over space and time. Examples include functional magnetic resonance imaging (fMRI), climate data, or experimental studies in physics, chemistry, and biology. Principal goals of many methods in statistics, machine learning, and signal processing are to use this data and i) extract informative structures and remove noisy, uninformative parts; ii) understand and reconstruct underlying spatio-temporal dynamics that govern these systems; and iii) forecast the data, i.e., describe the system in the future. Being data-driven problems, it is important to have methods and algorithms that work well in practice for a wide range of spatio-temporal processes as well as various data types. In this thesis I present such generally applicable statistical methods that address all three problems in a unifying manner. I introduce two new techniques for optimal nonparametric forecasting of spatiotemporal data: hard and mixed LICORS (Light Cone Reconstruction of States). Hard LICORS is a consistent predictive state estimator and extends previous work from Shalizi (2003); Shalizi, Haslinger, Rouquier, Klinkner, and Moore (2006); Shalizi, Klinkner, and Haslinger (2004) to continuous-valued spatio-temporal fields. Mixed LICORS builds on a new, fully probabilistic model of light cones and predictive states mappings, and is an EM-like version of hard LICORS. Simulations show that it has much better finite sample properties than hard LICORS. I also propose a sparse variant of mixed LICORS, which improves out-of-sample forecasts even further. Both methods can then be used to estimate local statistical complexity (LSC) (Shalizi, 2003), a fully automatic technique for pattern discovery in dynamical systems. Simulations and applications to fMRI data demonstrate that the proposed methods work well and give useful results in very general scientific settings. Lastly, I made most methods publicly available as R (R Development Core Team, 2010) or Python (Van Rossum, 2003) packages, so researchers can use these methods and better understand, forecast, and discover patterns in the data they study. Statistics and Probability
33	Statistical Modeling and Analysis of Breast Cancer and Pancreatic Cancer Kottabi, Zahra 01 January 2012 (has links) Abstract The object of the present study is to apply statistical modeling and estimate the mean of optimism of breast cancer patients as function of attribute variables; delay, education and age for each race of breast cancer patients. Moreover, to investigate the nonlinear association between optimism, education, age and delay with respect to each race and both. Furthermore, to develop differential equations that will characterize the behavior of the pancreatic cancer tumor size as a function of time. Having such differential equations, the mean solution of which once plotted will identify the rate of change of tumor size as a function of age. The structures of the differential equations characterize the growth of pancreatic cancer tumor. Once we have developed the differential equations and their solutions, and the object of the present study is to probabilistically evaluate commonly used methods to perform survival analysis of medical patients to validate the quality of the differential system and discuss its usefulness. In the last study, a comparison of parametric, semi-parametric and nonparametric analysis of probability survival time models. The first part of the evaluation of survival time by applying the statistical tests will guide us to how precede the actual cancer data and second part, identifying the parametric survival time function for each race and both. Moreover, we will evaluate the Kernel density, the popular Kaplan-Meier (KM) and the Cox Proportional Hazard (Cox PH) models by using actual pancreatic cancer data. As expected, the parametric survival analysis when applicable gives the best results followed by the not commonly used nonparametric Kernel density approach for evaluations actual cancer data. Statistics and Probability
34	Association between mean residual life (MRL) and failure rate functions for continuous and discrete lifetime distributions Bekker, Leonid 15 November 2002 (has links) The purpose of this study was to correct some mistakes in the literature and derive a necessary and sufficient condition for the MRL to follow the roller-coaster pattern of the corresponding failure rate function. It was also desired to find the conditions under which the discrete failure rate function has an upside-down bathtub shape if corresponding MRL function has a bathtub shape. The study showed that if discrete MRL has a bathtub shape, then under some conditions the corresponding failure rate function has an upside-down bathtub shape. Also the study corrected some mistakes in proofs of Tang, Lu and Chew (1999) and established a necessary and sufficient condition for the MRL to follow the roller-coaster pattern of the corresponding failure rate function. Similarly, some mistakes in Gupta and Gupta (2000) are corrected, with the ensuing results being expanded and proved thoroughly to establish the relationship between the crossing points of the failure rate and associated MRL functions. The new results derived in this study will be useful to model various lifetime data that occur in environmental studies, medical research, electronics engineering, and in many other areas of science and technology. Statistics and Probability
35	Interval estimation and point estimation for the location parameter of the three-parameter Weibull distribution Chen, Dongming 26 July 2005 (has links) Employing the approach proposed by Z. Chen for constructing an exact confidence interval for the location parameter, this study has investigated the exact confidence intervals, confidence limits and point estimators for the location parameter μ of the three-parameter Weibull distributions. Statistical simulation was carried out for different selections of i, j and k with specified confidence level and sample size. The critical values (ωα/2 and ω1-α/2 have been found using Mote-Carlo simulation. The optimization of the combination of i, j and k has been discussed. The point estimator for the location parameter of the three-parameter Weibull distributions is explored. It is observed that the critical values do not depend on the parameters. Simulation results show that the optimization of i, j and k is i=1, k-n and j=[n+2/3]. Compared with the commonly used MLE method, the described method provides a simpler, more accurate and more efficient way to estimate the location parameter of the three-parameter Weibull distributions. The described method yields very good statistical inferences for the location parameter of the three-parameter Weibull distributions. Statistics and Probability
36	Nonparametric assessment of safety levels in ecological risk assessment (ERA) Chen, Limei 26 March 2003 (has links) In ecological risk assessment (ERA), it is important to know whether the exposure that animal species receive from a chemical concentration exceeds the desired safety level. This study examined several statistical methods currently being used in ecological risk assessment and reviewed several statistical procedures related to this subject in the literature. Two large sample nonparametric tests were developed for this study. Monte Carlo study showed that these tests performed well even when the sample size was moderately large. A real data set was used to show that the new methodologies provide a good method for assessing the potential risks of pesticides residues at an investigated site. Statistics and Probability
37	Small sample confidence intervals for the mean of a positively skewed distribution Almonte, Cherylyn 09 July 2008 (has links) This thesis proposes some confidence intervals for the mean of a positively skewed distribution. The following confidence intervals are considered: Student-t, Johnson-t, median-t, mad-t, bootstrap-t, BCA, T1 , T3 and six new confidence intervals, the median bootstrap-t, mad bootstrap-t, median T1, mad T1 , median T3 and the mad T3. A simulation study has been conducted and average widths, coefficient of variation of widths, and coverage probabilities were recorded and compared across confidence intervals. To compare confidence intervals, the width and coverage probabilities were compared so that smaller widths indicated a better confidence interval when coverage probabilities were the same. Results showed that the median T1 and median T3 outperformed other confidence intervals in terms of coverage probability and the mad bootstrap-t, mad-t, and mad T3 outperformed others in terms of width. Some real life data are considered to illustrate the findings of the thesis. Statistics and Probability
38	Asymptotic tail probabilities of risk processes in insurance and finance Hao, Xuemiao 01 July 2009 (has links) In this thesis we are interested in the impact of economic and financial factors, such as interest rate, tax payment, reinsurance, and investment return, on insurance business. The underlying risk models of insurance business that we consider range from the classical compound Poisson risk model to the newly emerging and more general Lévy risk model. In these risk models, we assume that the claim-size distribution belongs to some distribution classes according to its asymptotic tail behavior. We consider both light-tailed and heavy-tailed cases. Our study is through asymptotic tail probabilities. Firstly, we study the asymptotic tail probability of discounted aggregate claims in the renewal risk model by introducing a constant force of interest. In this situation we focus on claims with subexponential tails. We derive for the tail probability of discounted aggregate claims an asymptotic formula, which holds uniformly for finite time intervals. For various special cases, we extend this uniformity to be valid for all time horizons. Then, we investigate the asymptotic tail probability of the maximum exceedance of a sequence of random variables over a renewal threshold. We derive a unified asymptotic formula for this tail probability for both light-tailed and heavy-tailed cases. By using the previous result, we study how to capture the impact of tax payments on the ruin probability in the Lévy risk model. We introduce periodic taxation under which the company pays tax at a fixed rate on its net income during each period. Assuming the Lévy measure, representing the claim-size distribution in the Lévy risk model, has a subexponential tail, a convolution-equivalent tail, or an exponential-like tail, we derive for the ruin probability several explicit asymptotic relations, in which the prefactor varies with the tax rate, reflecting the impact of tax payments. Finally, we consider the renewal risk model in which the surplus is invested into a portfolio consisting of both a riskless bond and a risky stock. The price process of the stock is modeled by an exponential Lévy process. We derive an asymptotic formula for the tail probability of the stochastically discounted net loss process. Statistics and Probability
39	Computational Graphics and Statistical Analysis: Mixed Type Random Variables, Confidence Regions, and Golden Quantile Rank Sets Weld, Christopher 01 January 2019 (has links) This dissertation has three principle areas of research: mixed type random variables, confidence regions, and golden quantile rank sets. While each offers a specific focus, some common themes persist; broadly stated, there are three. First, computational graphics play a critical role. Second, software development facilitates implementation and accessibility. Third, statistical analysis---often attributable to the aforementioned automation---provides valuable insights and applications. Each of the principle research areas are briefly summarized next. Mixed type random variables are a hybrid of continuous and discrete random variables, having components of both continuous probability density and discrete probability mass. This dissertation illustrates the challenges inherent in plotting mixed type distributions, and introduces an algorithm that addresses those issues. It considers sums and products of mixed type random variables, and supports its conclusions using Monte Carlo simulation experiments. Lastly, it introduces MixedAPPL, a computer algebra system software package designed for manipulating mixed type random variables. Confidence regions are a multi-dimensional version of a confidence interval. They are helpful to visualize and quantify uncertainty surrounding a point estimate. We begin by developing efficient plot algorithms for two-dimensional confidence regions. This research focuses specifically on likelihood-ratio based confidence regions for two-parameter univariate probability models, although the plot techniques are transferable to any two-dimensional setting. The R package 'conf' is introduced, which automates these confidence region plot algorithms for complete and right-censored data sets. Among its benefits, 'conf' provides access to Monte Carlo simulation experiments for confidence region coverage to an extent not possible previously. The corresponding coverage analysis results include reference tables for the Weibull, normal, and log-logistic distributions. These reference tables yield confidence region plots with exact coverage. The final topic is the introduction and analysis of a golden quantile rank set (GQRS). The term quantile rank set is used here to denote the population cumulative distribution function values corresponding to a sample. A GQRS can be thought of as "perfectly" representative of their population distribution because samples corresponding to a GQRS result in an estimator(s) matching the associated true population parameter(s). This unique characteristic is not applicable for all estimators and/or distributions, but when present, provides valuable insights and applications. Specifically, applications include an alternative (and at times computationally superior) method for parameter estimation and an exact actual coverage methodology for confidence regions (at times in which currently only estimates exist). Distributions with a GQRS associated with maximum likelihood estimation include the normal, exponential, Weibull, log logistic, and one-parameter exponential power distributions. Statistics and Probability
40	Statistical Analysis of Depression and Social Support Change in Arab Immigrant Women in USA Blbas, Hazhar 01 January 2014 (has links) Arab Muslim immigrant women encounter many stressors and are at risk for depression. Social supports from husbands, family and friends are generally considered mitigating resources for depression. However, changes in social support over time and the effects of such supports on depression at a future time period have not been fully addressed in the literature This thesis investigated the relationship between demographic characteristics, changes in social support, and depression in Arab Muslim immigrant women to the USA. A sample of 454 married Arab Muslim immigrant women provided demographic data, scores on social support variables and depression at three time periods approximately six months apart. Various statistical techniques at our disposal such as boxplots, response curves, descriptive statistics, ANOVA and ANCOVA, simple and multiple linear regressions have been used to see how various factors and variables are associated with changes in social support from husband, extended family and friend over time. Simple and multiple regression analyses are carried out to see if any variable observed at the time of first survey can be used to predict depression at a future time. Social support from husband and friend, husband's employment status and education, and depression at time one are found to be significantly associated with depression at time three. Finally, logistic regression analysis conducted for a binary depression outcome variable indicated that lower total social support and higher depression score of survey participants at the time of first survey increase their probability of being depressed at the time of third survey. Statistics and Probability

Search results