• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 113
  • 49
  • 30
  • 11
  • 9
  • Tagged with
  • 221
  • 176
  • 49
  • 49
  • 46
  • 43
  • 43
  • 32
  • 29
  • 29
  • 27
  • 25
  • 22
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Aspects of Composite Likelihood Estimation And Prediction

Xu, Ximing 08 January 2013 (has links)
A composite likelihood is usually constructed by multiplying a collection of lower dimensional marginal or conditional densities. In recent years, composite likelihood methods have received increasing interest for modeling complex data arising from various application areas, where the full likelihood function is analytically unknown or computationally prohibitive due to the structure of dependence, the dimension of data or the presence of nuisance parameters. In this thesis we investigate some theoretical properties of the maximum composite likelihood estimator (MCLE). In particular, we obtain the limit of the MCLE in a general setting, and set out a framework for understanding the notion of robustness in the context of composite likelihood inference. We also study the improvement of the efficiency of a composite likelihood by incorporating additional component likelihoods, or by using component likelihoods with higher dimension. We show through some illustrative examples that such strategies do not always work and may impair the efficiency. We also show that the MCLE of the parameter of interest can be less efficient when the nuisance parameters are known than when they are unknown. In addition to the theoretical study on composite likelihood estimation, we also explore the possibility of using composite likelihood to make predictive inference in computer experiments. The Gaussian process model is widely used to build statistical emulators for computer experiments. However, when the number of trials is large, both estimation and prediction based on a Gaussian process can be computationally intractable due to the dimension of the covariance matrix. To address this problem, we propose prediction methods based on different composite likelihood functions, which do not require the evaluation of the large covariance matrix and hence alleviate the computational burden. Simulation studies show that the blockwise composite likelihood-based predictors perform well and are competitive with the optimal predictor based on the full likelihood.
12

Ergodicity of Adaptive MCMC and its Applications

Yang, Chao 28 September 2009 (has links)
Markov chain Monte Carlo algorithms (MCMC) and Adaptive Markov chain Monte Carlo algorithms (AMCMC) are most important methods of approximately sampling from complicated probability distributions and are widely used in statistics, computer science, chemistry, physics, etc. The core problem to use these algorithms is to build up asymptotic theories for them. In this thesis, we show the Central Limit Theorem (CLT) for the uniformly ergodic Markov chain using the regeneration method. We exploit the weakest uniform drift conditions to ensure the ergodicity and WLLN of AMCMC. Further we answer the open problem 21 in Roberts and Rosenthal [48] through constructing a counter example and finding out some stronger condition which indicates the ergodic property of AMCMC. We find that the conditions (a) and (b) in [46] are not sufficient for WLLN holds when the functional is unbounded. We also prove the WLLN for unbounded functions with some stronger conditions. Finally we consider the practical aspects of adaptive MCMC (AMCMC). We try some toy examples to explain that the general adaptive random walk Metropolis is not efficient for sampling from multi-model targets. Therefore we discuss the mixed regional adaptation (MRAPT) on the compact state space and the modified mixed regional adaptation on the general state space in which the regional proposal distributions are optimal and the switches between different models are very efficient. The theoretical proof is to show that the algorithms proposed here fall within the scope of general theorems that are used to validate AMCMC. As an application of our theoretical results, we analyze the real data about the ``Loss of Heterozygosity" (LOH) using MRAPT.
13

An introduction to meta analysis

Boykova, Alla January 1900 (has links)
Master of Science / Department of Statistics / Dallas W. Johnson / Meta analysis is a statistical technique for synthesizing of results obtained from multiple studies. It is the process of combining, summarizing, and reanalyzing previous quantitative research. It yields a quantitative summary of the pooled results. Decisions of the validity of a hypothesis cannot be based on the results of a single study, because results typically vary from one study to the next. Traditional methods do not allow involving more than a few studies. Meta analysis provides certain procedures to synthesize data across studies. When the treatment effect (or effect size) is consistent from one study to the next, meta-analysis can be used to identify this common effect. When the effect varies from one study to the next, meta-analysis may be used to identify the reason for the variation. The amount of accumulated information in fast developing fields of science such as biology, medicine, education, pharmacology, physics, etc. increased very quickly after the Second World War. This lead to large amounts of literature which was not systematized. One problem in education might include ten independent studies. All of the studies might be performed by different researchers, using different techniques, and different measurements. The idea of integrating the research literature was proposed by Glass (1976, 1977). He referred it as the meta analysis of research. There are three major meta analysis approaches: combining significance levels, combining estimates of effect size for fixed effect size models and random effect size models, and vote-counting method.
14

A simple univariate outlier identification procedure on ratio data collected by the Department of Revenue for the state of Kansas

Jun, Hyoungjin January 1900 (has links)
Master of Science / Department of Statistics / John E. Boyer Jr / In order to impose fair taxes on properties, it is required that appraisers annually estimate prices of all the properties in each of the counties in Kansas. The Department of Revenue of Kansas oversees the quality of work of appraisers in each county. The Department of Revenue uses ratio data which is appraisal price divided by sale price for those parcels which are sold during the year as a basis for evaluating the work of the appraisers. They know that there are outliers in these ratio data sets and these outliers can impact their evaluations of the county appraisers. The Department of Revenue has been using a simple box plot procedure to identify outliers for the previous 10 years. Staff members have questioned whether there might be a need for improvement in the procedure. They considered the possibility of tuning the procedure to depend on distributions and sample sizes. The methodology as a possible solution was suggested by Iglewicz et al. (2007). In this report, we examine the new methodology and attempt to apply it to ratio data sets provided by the Department of Revenue.
15

Modeling a frost index in Kansas, USA

Wang, Yang January 1900 (has links)
Master of Science / Department of Statistics / Perla Reyes Cuellar / A frost index is a calculated value that can be used to describe the state and the changes in the weather conditions. Frost indices affect not only natural and managed ecosystems, but also a variety of human activities. In addition, they could indicate changes in extreme weather and climate events. Growing season length is one of the most important frost indices. In this report, growing season lengths were collected from 23 long-term stations over Kansas territory. The records extended to the late 1800s for a few stations, but many started observations in the early 1900s. Though the start dates of the records were different, the end dates were the same (2009). To begin with, time series models of growing season length for all the stations were fitted. In addition, by using fitted time series models, predictions and validation checking were conducted. Then a regular linear regression model was fitted for the GSL data. It removed the temporal trend by doing regression on year and it showed us the relationship between GSL and elevation. Finally, based on a penalized likelihood method with least angle regression (LARS) algorithm, spatial-temporal model selection and parameter estimation were performed simultaneously. Different neighborhood structures were used for model fitting. The spatial-temporal linear regression model obtained was used for interpreting growing season length of those stations across Kansas. These models could be used for agricultural management decision-making and updating recommendations for planting date in Kansas area.
16

Confidence intervals on several functions of the components of variance in a one-way random effects experiment

Banasik, Aleksandra Anna January 1900 (has links)
Master of Science / Department of Statistics / Dallas E. Johnson / Variability is inherent in most data and often it is useful to study the variability so scientists are able to make more accurate statements about their data. One of the most popular ways of analyzing variance in data is by making use of a one-way ANOVA which consists of partitioning the variability among observations into components of variability corresponding to between groups and within groups. One then has σ(subY)(superscript 2)=σ (sub A) (superscript)2+σ(sub e)(superscript 2). Thus there are two variance components. In certain situations, in addition to estimating these components of variance, it is important to estimate functions of the variance components. This report is devoted to methods for constructing confidence intervals for three particular functions of variance components in the unbalanced One- way random effects models. In order to compare the performance of the methods, simulations were conducted using SAS® and the results were compared across several scenarios based on the number of groups, the number of observations within each group, and the value of sigma (sub A)(superscript 2).
17

A simulation evaluation of backward elimination and stepwise variable selection in regression analysis

Li, Xin January 1900 (has links)
Master of Science / Department of Statistics / Paul Nelson / A first step in model building in regression analysis often consists of selecting a parsimonious set of independent variables from a pool of candidate independent variables. This report uses simulation to study and compare the performance of two widely used sequential, variable selection algorithms, stepwise and backward elimination. A score is developed to assess the ability of any variable selection method to terminate with the correct model. It is found that backward elimination performs slightly better than stepwise, increasing sample size leads to a relatively small improvement in both methods and that the magnitude of the variance of the error term is the major factor determining the performance of both.
18

Determining the effectiveness of including spatial information into a nematode/nutsedge pest complex model

Vetter, Joel January 1900 (has links)
Master of Science / Department of Statistics / Leigh Murray / An experiment was performed in 2005-2006 to determine if the variety of an alfalfa (Medicago sativa) crop rotation can effectively reduce the pest complex consisting of the yellow and purple nutsedge (YNS & PNS) weeds and the southern root-knot nematode (SRKN). During the 2005-2006 growing season, six months were selected to take samples from the alfalfa field (three months in 2005 and three months in 2006). The field was divided into 1m x 2m quadrats. Each month eighty quadrats were randomly selected. The counts of PNS, YNS and a soil sample (analyzed for the count of juvenile SRKN) were taken from each quadrat. In this study, two different ways were examined use [i.e. using] spatial information provided from the experiment to alter the original model. First spatial information was treated as fixed effects. Second spatial information was treated as random effects by modifying the residual variance matrix using various “spatial” variance-covariance structures. The results were compared to the original Poisson model and the spatial models to each other but did not have an effective way of comparing random effects models with the fixed effects models. For this data, the use of spatial statistics did not improve the original model consistently. This may be partly because of the nature of the experiment. The alfalfa effectively reduced the YNS, PNS, and SRKN counts. The spatial information was generally more useful earlier in the experiment when the YNS, PNS, and SRKN populations were denser.
19

Exploring network models under sampling

Zhou, Shu January 1900 (has links)
Master of Science / Department of Statistics / Perla Reyes / Networks are defined as sets of items and their connections. Interconnected items are represented by mathematical abstractions called vertices (or nodes), and the links connecting pairs of vertices are known as edges. Networks are easily seen in everyday life: a network of friends, the Internet, metabolic or citation networks. The increase of available data and the need to analyze network have resulted in the proliferation of models for networks. However, for networks with billions of nodes and edges, computation and inference might not be achieved within a reasonable amount of time or budget. A sampling approach seems a natural choice, but traditional models assume that we can have access to the entire network. Moreover, when data is only available for a sampled sub-network conclusions tend to be extrapolated to the whole network/population without regard to sampling error. The statistical problem this report addresses is the issue of how to sample a sub-network and then draw conclusions about the whole network. Are some sampling techniques better than others? Are there more efficient ways to estimate parameters of interest? In which way can we measure how effectively my method is reproducing the original network? We explore these questions with a simulation study on Mesa High School students' friendship network. First, to assess the characteristics of the whole network, we applied the traditional exponential random graph model (ERGM) and a stochastic blockmodel to the complete population of 205 students. Then, we drew simple random and stratified samples of 41 students, applied the traditional ERGM and the stochastic blockmodel again, and defined a way to generalized the sample findings to the population friendship network of 205 students. Finally, we used the degree distribution and other network statistics to compare the true friendship network with the projected one. We achieved the following important results: 1) as expected stratified sampling outperforms simple random sampling when selecting nodes; 2) ERGM without restrictions offers a poor estimate for most of the tested parameters; and 3) the Bayesian stochastic blockmodel estimation using a strati ed sample of nodes achieves the best results.
20

Robust mixtures of regressions models

Bai, Xiuqin January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In the fitting of mixtures of linear regression models, the normal assumption has been traditionally used for the error term and then the regression parameters are estimated by the maximum likelihood estimate (MLE) using the EM algorithm. Under the normal assumption, the M step of the EM algorithm uses a weighted least squares estimate (LSE) for the regression parameters. It is well known that the LSE is sensitive to outliers or heavy tailed error distributions. In this report, we propose a robust mixture of linear regression model, which replaces the least square criterion with some robust criteria in the M step of the EM algorithm. In addition, we will use a simulation study to demonstrate how sensitive the traditional mixture regression estimation method is to outliers or heavy tailed error distributions and compare it with our proposed robust mixture regression estimation method. Based on our empirical studies, our proposed robust estimation method works comparably to the traditional estimation method when there are no outliers and the error is normally distributed but is much better if there are outliers or the error has heavy tails (such as t-distribution). A real data set application is also provided to illustrate the effectiveness of our proposed methodology.

Page generated in 0.0202 seconds