1 |
Contributions to the theory of unequal probability samplingLundquist, Anders January 2009 (has links)
This thesis consists of five papers related to the theory of unequal probability sampling from a finite population. Generally, it is assumed that we wish to make modelassisted inference, i.e. the inclusion probability for each unit in the population is prescribed before the sample is selected. The sample is then selected using some random mechanism, the sampling design. Mostly, the thesis is focused on three particular unequal probability sampling designs, the conditional Poisson (CP-) design, the Sampford design, and the Pareto design. They have different advantages and drawbacks: The CP design is a maximum entropy design but it is difficult to determine sampling parameters which yield prescribed inclusion probabilities, the Sampford design yields prescribed inclusion probabilities but may be hard to sample from, and the Pareto design makes sample selection very easy but it is very difficult to determine sampling parameters which yield prescribed inclusion probabilities. These three designs are compared probabilistically, and found to be close to each other under certain conditions. In particular the Sampford and Pareto designs are probabilistically close to each other. Some effort is devoted to analytically adjusting the CP and Pareto designs so that they yield inclusion probabilities close to the prescribed ones. The result of the adjustments are in general very good. Some iterative procedures are suggested to improve the results even further. Further, balanced unequal probability sampling is considered. In this kind of sampling, samples are given a positive probability of selection only if they satisfy some balancing conditions. The balancing conditions are given by information from auxiliary variables. Most of the attention is devoted to a slightly less general but practically important case. Also in this case the inclusion probabilities are prescribed in advance, making the choice of sampling parameters important. A complication which arises in the context of choosing sampling parameters is that certain probability distributions need to be calculated, and exact calculation turns out to be practically impossible, except for very small cases. It is proposed that Markov Chain Monte Carlo (MCMC) methods are used for obtaining approximations to the relevant probability distributions, and also for sample selection. In general, MCMC methods for sample selection does not occur very frequently in the sampling literature today, making it a fairly novel idea.
|
2 |
Estimation de la variance en présence de données imputées pour des plans de sondage à grande entropieVallée, Audrey-Anne 07 1900 (has links)
Les travaux portent sur l’estimation de la variance dans le cas d’une non- réponse partielle traitée par une procédure d’imputation. Traiter les valeurs imputées comme si elles avaient été observées peut mener à une sous-estimation substantielle de la variance des estimateurs ponctuels. Les estimateurs de variance usuels reposent sur la disponibilité des probabilités d’inclusion d’ordre deux, qui sont parfois difficiles (voire impossibles) à calculer. Nous proposons d’examiner les propriétés d’estimateurs de variance obtenus au moyen d’approximations des probabilités d’inclusion d’ordre deux. Ces approximations s’expriment comme une fonction des probabilités d’inclusion d’ordre un et sont généralement valides pour des plans à grande entropie. Les résultats d’une étude de simulation, évaluant les propriétés des estimateurs de variance proposés en termes de biais et d’erreur quadratique moyenne, seront présentés. / Variance estimation in the case of item nonresponse treated by imputation is the main topic of this work. Treating the imputed values as if they were observed may lead to substantial under-estimation of the variance of point estimators. Classical variance estimators rely on the availability of the second-order inclusion probabilities, which may be difficult (even impossible) to calculate. We propose to study the properties of variance estimators obtained by approximating the second-order inclusion probabilities. These approximations are expressed in terms of first-order inclusion probabilities and are usually valid for high entropy sampling designs. The results of a simulation study evaluating the properties of the proposed variance estimators in terms of bias and mean squared error will be presented.
|
3 |
Bayesian Predictive Inference Under Informative Sampling and TransformationShen, Gang 29 April 2004 (has links)
We have considered the problem in which a biased sample is selected from a finite population, and this finite population itself is a random sample from an infinitely large population, called the superpopulation. The parameters of the superpopulation and the finite population are of interest. There is some information about the selection mechanism in that the selection probabilities are linearly related to the measurements. This is typical of establishment surveys where the selection probabilities are taken to be proportional to the previous year's characteristics. When all the selection probabilities are known, as in our problem, inference about the finite population can be made, but inference about the distribution is not so clear. For continuous measurements, one might assume that the the values are normally distributed, but as a practical issue normality can be tenuous. In such a situation a transformation to normality may be useful, but this transformation will destroy the linearity between the selection probabilities and the values. The purpose of this work is to address this issue. In this light we have constructed two models, an ignorable selection model and a nonignorable selection model. We use the Gibbs sampler and the sample importance re-sampling algorithm to fit the nonignorable selection model. We have emphasized estimation of the finite population parameters, although within this framework other quantities can be estimated easily. We have found that our nonignorable selection model can correct the bias due to unequal selection probabilities, and it provides improved precision over the estimates from the ignorable selection model. In addition, we have described the case in which all the selection probabilities are unknown. This is useful because many agencies (e.g., government) tend to hide these selection probabilities when public-used data are constructed. Also, we have given an extensive theoretical discussion on Poisson sampling, an underlying sampling scheme in our models especially useful in the case in which the selection probabilities are unknown.
|
4 |
Spatial sampling and predictionSchelin, Lina January 2012 (has links)
This thesis discusses two aspects of spatial statistics: sampling and prediction. In spatial statistics, we observe some phenomena in space. Space is typically of two or three dimensions, but can be of higher dimension. Questions in mind could be; What is the total amount of gold in a gold-mine? How much precipitation could we expect in a specific unobserved location? What is the total tree volume in a forest area? In spatial sampling the aim is to estimate global quantities, such as population totals, based on samples of locations (papers III and IV). In spatial prediction the aim is to estimate local quantities, such as the value at a single unobserved location, with a measure of uncertainty (papers I, II and V). In papers III and IV, we propose sampling designs for selecting representative probability samples in presence of auxiliary variables. If the phenomena under study have clear trends in the auxiliary space, estimation of population quantities can be improved by using representative samples. Such samples also enable estimation of population quantities in subspaces and are especially needed for multi-purpose surveys, when several target variables are of interest. In papers I and II, the objective is to construct valid prediction intervals for the value at a new location, given observed data. Prediction intervals typically rely on the kriging predictor having a Gaussian distribution. In paper I, we show that the distribution of the kriging predictor can be far from Gaussian, even asymptotically. This motivated us to propose a semiparametric method that does not require distributional assumptions. Prediction intervals are constructed from the plug-in ordinary kriging predictor. In paper V, we consider prediction in the presence of left-censoring, where observations falling below a minimum detection limit are not fully recorded. We review existing methods and propose a semi-naive method. The semi-naive method is compared to one model-based method and two naive methods, all based on variants of the kriging predictor.
|
5 |
Estimation simplifiée de la variance pour des plans complexesLefebvre, Isabelle 12 1900 (has links)
En présence de plans de sondage complexes, les méthodes classiques d’estimation de la variance présentent certains défis. En effet, les estimateurs de variance usuels requièrent les probabilités d’inclusion d’ordre deux qui peuvent être complexes à obtenir pour certains plans de sondage. De plus, pour des raisons de confidentialité, les fichiers externes de microdonnées n’incluent généralement pas les probabilités d’inclusion d’ordre deux (souvent sous la forme de poids bootstrap). En s’inspirant d’une approche développée par Ohlsson (1998) dans le contexte de l’échantillonnage de Poisson séquentiel, nous proposons un estimateur ne requérant que les probabilités d’inclusion d’ordre un. L’idée est d’approximer la stratégie utilisée par l’enquête (consistant du choix d’un plan de sondage et d’un estimateur) par une stratégie équivalente dont le plan de sondage est le plan de Poisson. Nous discuterons des plans proportionnels à la taille avec ou sans grappes. Les résultats d’une étude par simulation seront présentés. / In a complex design framework, standard variance estimation methods entail substantial challenges. As we know, conventional variance estimators involve second order inclusion probabilities, which can be difficult to compute for some sampling designs. Also, confidentiality standards generally prevent second order inclusion probabilities to be included in external microdata files (often in the form of bootstrap weights). Based on Ohlsson’s sequential Poisson sampling method
(1998), we suggest a simplified estimator for which we only need first order inclusion probabilities. The idea is to approximate a survey strategy (which consists of a sampling design and an estimator) by an equivalent strategy for which a Poisson sampling design is used. We will discuss proportional to size sampling and proportional to size cluster sampling. Results of a simulation study will be presented.
|
6 |
On unequal probability sampling designsGrafström, Anton January 2010 (has links)
The main objective in sampling is to select a sample from a population in order to estimate some unknown population parameter, usually a total or a mean of some interesting variable. When the units in the population do not have the same probability of being included in a sample, it is called unequal probability sampling. The inclusion probabilities are usually chosen to be proportional to some auxiliary variable that is known for all units in the population. When unequal probability sampling is applicable, it generally gives much better estimates than sampling with equal probabilities. This thesis consists of six papers that treat unequal probability sampling from a finite population of units. A random sample is selected according to some specified random mechanism called the sampling design. For unequal probability sampling there exist many different sampling designs. The choice of sampling design is important since it determines the properties of the estimator that is used. The main focus of this thesis is on evaluating and comparing different designs. Often it is preferable to select samples of a fixed size and hence the focus is on such designs. It is also important that a design has a simple and efficient implementation in order to be used in practice by statisticians. Some effort has been made to improve the implementation of some designs. In Paper II, two new implementations are presented for the Sampford design. In general a sampling design should also have a high level of randomization. A measure of the level of randomization is entropy. In Paper IV, eight designs are compared with respect to their entropy. A design called adjusted conditional Poisson has maximum entropy, but it is shown that several other designs are very close in terms of entropy. A specific situation called real time sampling is treated in Paper III, where a new design called correlated Poisson sampling is evaluated. In real time sampling the units pass the sampler one by one. Since each unit only passes once, the sampler must directly decide for each unit whether or not it should be sampled. The correlated Poisson design is shown to have much better properties than traditional methods such as Poisson sampling and systematic sampling.
|
Page generated in 0.088 seconds