Global ETD Search

11	Robust Uncertainty Quantification and Scalable Computation for Computer Models with Massive Output Gu, Mengyang Gu January 2016 (has links) <p>Uncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.</p><p>Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.</p> / Dissertation Statistics
12	Methods for Imputing Missing Values and Synthesizing Confidential Values for Continuous and Magnitude Data Wei, Lan January 2016 (has links) <p>Abstract</p><p>Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data. </p><p>The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.</p><p>The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.</p><p>The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.</p> / Dissertation Statistics
13	PERCENTILE RESIDUAL LIFE FUNCTIONS -- PROPERTIES, TESTING AND ESTIMATION Unknown Date (has links) Let F be a life distribution with survival function F(' )(TBOND)(' )1 - F. Conditional on survival to time t, the remaining life has survival function / F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t < F('-1)(1). / The mean residual life function of F is / (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI) / if F has a finite mean. The (alpha)-percentile or quantile (0 < (alpha) < 1) residual life function of F is / q(,(alpha),F)(t) = F(,t)('-1)((alpha)) = F('-1)(1 - (alpha)F(t)) - t, 0 (LESSTHEQ) t < F('-1)(1), / where (alpha) = 1 - (alpha). Statisticians find it useful to categorize life distributions according to different aging properties. Categories which involve m(,F)(t) are the decreasing mean residual life (DMRL) class and the new better than used in expectation (NBUE) class. The DMRL class consists of distributions F such that m(,F)(t) is monotone decreasing on (0, F('-1)(1)) and the NBUE class consists of distributions F such that m(,F)(0) (GREATERTHEQ) m(,F)(t) for all 0 < t < F('-1)(1). Analogous categories which involve q(,(alpha),F)(t) are the decreasing (alpha)-percentile residual life (DPRL-(alpha)) class and the new better than used with respect to the (alpha)-percentile (NBUP-(alpha)) class. / The mean residual life function is of interest in biometry, actuarial studies and reliability, and the DMRL and NBUE classes of life distributions are useful for modelling situations where items deteriorate with age. In the statistical literature, there are several papers which consider properties or estimation of the mean residual life function or consider testing situations involving the DMRL and NBUE classes. Only one previous paper discusses the (alpha)-percentile residual life function. This dissertation is concerned with properties and estimation of the (alpha)-percentile residual life function, and with testing problems involving the (alpha)-percentile residual life function. / Properties of q(,(alpha),F)(t) and of the DPRL-(alpha), NBUP-(alpha) and their dual classes are studied in Chapter II. In Chapter III, tests are developed for testing exponentiality against alternatives of DPRL-(alpha) and NBUP-(alpha). In Chapter IV, these tests are extended to accommodate randomly censored data. In Chapter V, a distribution-free two-sample test is developed for testing the hypothesis that two life distributions F and G are equal against the alternative that q(,(alpha),F)(t) (GREATERTHEQ) q(,(alpha),G)(t) for all t. In Chapter VI, strong consistency, asymptotic normality, bias and mean squared error of the estimator F(,n)('-1)(1(' )-(' )(alpha)F(,n)(t)) - t of q(,(alpha),F)(t) are studied, where F(,n) is the empirical distribution function and F(,n)(' )(TBOND)(' )1 - F(,n). / Source: Dissertation Abstracts International, Volume: 43-02, Section: B, page: 0467. / Thesis (Ph.D.)--The Florida State University, 1982. Statistics
14	TESTING FOR CLASSES OF LIFE DISTRIBUTIONS USING RANDOMLY CENSORED DATA Unknown Date (has links) Source: Dissertation Abstracts International, Volume: 41-07, Section: B, page: 2667. / Thesis (Ph.D.)--The Florida State University, 1980. Statistics
15	A NEW METHOD FOR ESTIMATING LIFE DISTRIBUTIONS FROM INCOMPLETE DATA Unknown Date (has links) We construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE). / To date the principal method of nonparametric estimation from incomplete data is the Product-Limit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life testing. / In this work we establish rigorously the asymptotic (large sample) properties of the PEXE. Our results include the strong consistency of the PEXE under various sets of assumptions plus the weak convergence of the PEXE, suitably normalized, to a Gaussian process. From an intermediate result in our weak convergence proof we derive asymptotic confidence bands and a goodness-of-fit test based on the PEXE. / Though our main objective is the introduction of a new estimator for incomplete data and the study of its asymptotic properties, our second contribution to this area of research is the extension of the asymptotic results of the extensively used PLE. In particular, our results extend the work of Peterson {J. Amer. Statist. Assoc. (1977) 72} and Langberg, Proschan, and Quinzi {Ann. Statist. (1980) 8} in strong consistency and that of Breslow and Crowley {Ann. Statist. (1974) 2} in weak convergence. / Finally, we show that the New PEXE, as an alternative to the traditional PLE, has several advantages for estimating a continuous life distribution from incomplete data, along with some drawbacks. Since the two estimators are so alike asymptotically, we concentrate on differences in the PEXE and the PLE for estimation from small samples. / Source: Dissertation Abstracts International, Volume: 41-08, Section: B, page: 3093. / Thesis (Ph.D.)--The Florida State University, 1980. Statistics
16	A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS Unknown Date (has links) This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)-measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi). / A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) G-invariant Dirichlet process (G being a finite group of transformations) is presented. / The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely / (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI) / where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is G-invariant Dirichlet process is also obtained.(,) / Let ((chi),A) be the one-dimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak / convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If / {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained. / When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented. / Source: Dissertation Abstracts International, Volume: 41-10, Section: B, page: 3829. / Thesis (Ph.D.)--The Florida State University, 1981. Statistics
17	AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES Unknown Date (has links) Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted. / Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples. / Pseudo-random samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper non-outlier discordant, as a unit. / The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described. / Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced. / Source: Dissertation Abstracts International, Volume: 42-01, Section: B, page: 0275. / Thesis (Ph.D.)--The Florida State University, 1981. Statistics
18	ESTIMATION AND PREDICTION FOR EXPONENTIAL TIME SERIES MODELS Unknown Date (has links) This work is concerned with the study of stationary time series models in which the marginal distribution of the observations follows an exponential distribution. This is in contrast to the standard models in the literature where the error sequence and hence the marginal distributions of the o / Source: Dissertation Abstracts International, Volume: 42-10, Section: B, page: 4115. / Thesis (Ph.D.)--The Florida State University, 1981. Statistics
19	Time-Varying Coefficient Models with ARMA-GARCH Structures for Longitudinal Data Analysis Unknown Date (has links) The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the time-varying effects of the risk factors on CHD incidence. Time-varying coefficient models with ARMA-GARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since high-dimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The Kullback-Leibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the time-varying effects of covariates with respect to CHD incidence. To specify the time-series structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time. / A Dissertation submitted to the Department of Statistics in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy. / Degree Awarded: Fall Semester, 2010. / Date of Defense: September 28, 2010. / Time-varying coefficient models, Longitudinal data analysis, Time series analysis / Includes bibliographical references. / Xufeng Niu, Professor Co-Directing Dissertation; Fred Huﬀer, Professor Co-Directing Dissertation; Craig Nolder, University Representative; Dan McGee, Committee Member. Statistics
20	TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES Unknown Date (has links) Character displacement is an ecological process by which, theoretically, co-existing species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields. / Consider the model / X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i), / where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance. / Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.) / Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratios--an important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns. / We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested. / Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples. / Source: Dissertation Abstracts International, Volume: 43-05, Section: B, page: 1543. / Thesis (Ph.D.)--The Florida State University, 1982. Statistics

Search results