1 |
Developing sampling weights for complex surveys : an approach to the School Physical Activity and Nutrition (SPAN) projectZeng, Qiong 05 August 2011 (has links)
Sampling weights are recommended to be incorporated in surveys to compensate for the disproportionality of the sample with respect to the target population of interest. This report presents how to develop sampling weights for a population-based study where a sample was randomly selected and demonstrates the process of developing such sampling weights. We exemplify the development of sampling weights with a real research project entitled School Physical Activity and Nutrition (SPAN) project. In this report, we first introduce the probability-based survey and related key concepts, such as sampling design, sampling frame and sampling weights. Then we discuss the sampling design and the construction of the sampling frame for the SPAN project. We next demonstrate the method and the process of developing the sampling weights for the SPAN project. Lastly, we present the results with an example. / text
|
2 |
Components Of Response Variance For Cluster SamplesAkdemir, Deniz 01 January 2003 (has links) (PDF)
Measures of data quality are important for the evaluation and
improvement of survey design and procedures. A detailed investigation of the
sources, magnitude and impact of errors is necessary to identify how survey
design and procedures may be improved and how resources allocated more
efficiently among various aspects of the survey operation. A major part of this
thesis is devoted to the overview of statistical theory and methods for
measuring the contribution of response variability to the overall error of a
survey.
A very common practice in surveys is to select groups (clusters) of
elements together instead of independent selection of elements. In practice cluster samples tend to produce higher sampling variance for statistics than
element samples of the same size. Their frequent use stems from the desirable
cost features that they have.
Most data collection and sample designs involve some overlapping
between interviewer workload and the sampling units (clusters). For those
cases, a proportion of the measurement variance, which is due to interviewers,
is reflected to some degree in the sampling variance calculations.
The prime purpose in this thesis is to determine a variance formula that
decomposes the total variance into sampling and measurement variance
components for two commonly used data collection and sample designs. Once
such a decomposition is obtained, determining an optimum allocation in
existence of measurement errors would be possible.
|
3 |
Advanced Sampling Methods for Solving Large-Scale Inverse ProblemsAttia, Ahmed Mohamed Mohamed 19 September 2016 (has links)
Ensemble and variational techniques have gained wide popularity as the two main approaches for solving data assimilation and inverse problems. The majority of the methods in these two approaches are derived (at least implicitly) under the assumption that the underlying probability distributions are Gaussian. It is well accepted, however, that the Gaussianity assumption is too restrictive when applied to large nonlinear models, nonlinear observation operators, and large levels of uncertainty. This work develops a family of fully non-Gaussian data assimilation algorithms that work by directly sampling the posterior distribution. The sampling strategy is based on a Hybrid/Hamiltonian Monte Carlo (HMC) approach that can handle non-normal probability distributions.
The first algorithm proposed in this work is the "HMC sampling filter", an ensemble-based data assimilation algorithm for solving the sequential filtering problem. Unlike traditional ensemble-based filters, such as the ensemble Kalman filter and the maximum likelihood ensemble filter, the proposed sampling filter naturally accommodates non-Gaussian errors and nonlinear model dynamics, as well as nonlinear observations. To test the capabilities of the HMC sampling filter numerical experiments are carried out using the Lorenz-96 model and observation operators with different levels of nonlinearity and differentiability. The filter is also tested with shallow water model on the sphere with linear observation operator. Numerical results show that the sampling filter performs well even in highly nonlinear situations where the traditional filters diverge.
Next, the HMC sampling approach is extended to the four-dimensional case, where several observations are assimilated simultaneously, resulting in the second member of the proposed family of algorithms. The new algorithm, named "HMC sampling smoother", is an ensemble-based smoother for four-dimensional data assimilation that works by sampling from the posterior probability density of the solution at the initial time. The sampling smoother naturally accommodates non-Gaussian errors and nonlinear model dynamics and observation operators, and provides a full description of the posterior distribution. Numerical experiments for this algorithm are carried out using a shallow water model on the sphere with observation operators of different levels of nonlinearity. The numerical results demonstrate the advantages of the proposed method compared to the traditional variational and ensemble-based smoothing methods.
The HMC sampling smoother, in its original formulation, is computationally expensive due to the innate requirement of running the forward and adjoint models repeatedly. The proposed family of algorithms proceeds by developing computationally efficient versions of the HMC sampling smoother based on reduced-order approximations of the underlying model dynamics. The reduced-order HMC sampling smoothers, developed as extensions to the original HMC smoother, are tested numerically using the shallow-water equations model in Cartesian coordinates. The results reveal that the reduced-order versions of the smoother are capable of accurately capturing the posterior probability density, while being significantly faster than the original full order formulation.
In the presence of nonlinear model dynamics, nonlinear observation operator, or non-Gaussian errors, the prior distribution in the sequential data assimilation framework is not analytically tractable. In the original formulation of the HMC sampling filter, the prior distribution is approximated by a Gaussian distribution whose parameters are inferred from the ensemble of forecasts. The Gaussian prior assumption in the original HMC filter is relaxed. Specifically, a clustering step is introduced after the forecast phase of the filter, and the prior density function is estimated by fitting a Gaussian Mixture Model (GMM) to the prior ensemble. The base filter developed following this strategy is named cluster HMC sampling filter (ClHMC ). A multi-chain version of the ClHMC filter, namely MC-ClHMC , is also proposed to guarantee that samples are taken from the vicinities of all probability modes of the formulated posterior. These methodologies are tested using a quasi-geostrophic (QG) model with double-gyre wind forcing and bi-harmonic friction. Numerical results demonstrate the usefulness of using GMMs to relax the Gaussian prior assumption in the HMC filtering paradigm.
To provide a unified platform for data assimilation research, a flexible and a highly-extensible testing suite, named DATeS , is developed and described in this work. The core of DATeS is implemented in Python to enable for Object-Oriented capabilities. The main components, such as the models, the data assimilation algorithms, the linear algebra solvers, and the time discretization routines are independent of each other, such as to offer maximum flexibility to configure data assimilation studies. / Ph. D.
|
4 |
Computation of estimates in a complex survey sample designMaremba, Thanyani Alpheus January 2019 (has links)
Thesis (M.Sc. (Statistics)) -- University of Limpopo, 2019 / This research study has demonstrated the complexity involved in complex survey sample design (CSSD). Furthermore the study has proposed methods to account for each step taken in sampling and at the estimation stage using the theory of survey sampling, CSSD-based case studies and practical implementation based on census attributes. CSSD methods are designed to improve statistical efficiency, reduce costs and improve precision for sub-group analyses relative to simple random sample(SRS).They are commonly used by statistical agencies as well as development and aid organisations. CSSDs provide one of the most challenging fields for applying a statistical methodology. Researchers encounter a vast diversity of unique practical problems in the course of studying populations. These include, interalia: non-sampling errors,specific population structures,contaminated distributions of study variables,non-satisfactory sample sizes, incorporation of the auxiliary information available on many levels, simultaneous estimation of characteristics in various sub-populations, integration of data from many waves or phases of the survey and incompletely specified sampling procedures accompanying published data. While the study has not exhausted all the available real-life scenarios, it has outlined potential problems illustrated using examples and suggested appropriate approaches at each stage. Dealing with the attributes of CSSDs mentioned above brings about the need for formulating sophisticated statistical procedures dedicated to specific conditions of a sample survey. CSSD methodologies give birth to a wide variety of approaches, methodologies and procedures of borrowing the strength from virtually all branches of statistics. The application of various statistical methods from sample design to weighting and estimation ensures that the optimal estimates of a population and various domains are obtained from the sample data.CSSDs are probability sampling methodologies from which inferences are drawn about the population. The methods used in the process of producing estimates include adjustment for unequal probability of selection (resulting from stratification, clustering and probability proportional to size (PPS), non-response adjustments and benchmarking to auxiliary totals. When estimates of survey totals, means and proportions are computed using various methods, results do not differ. The latter applies when estimates are calculated for planned domains that are taken into account in sample design and benchmarking. In contrast, when the measures of precision such as standard errors and coefficient of variation are produced, they yield different results depending on the extent to which the design information is incorporated during estimation.
The literature has revealed that most statistical computer packages assume SRS design in estimating variances. The replication method was used to calculate measures of precision which take into account all the sampling parameters and weighting adjustments computed in the CSSD process. The creation of replicate weights and estimation of variances were done using WesVar, astatistical computer package capable of producing statistical inference from data collected through CSSD methods.
Keywords: Complex sampling, Survey design, Probability sampling, Probability proportional to size, Stratification, Area sampling, Cluster sampling.
|
5 |
Threatened tree species across conservation zones in a nature reserve of North-Western VietnamDao, Thi Hoa Hong 03 March 2017 (has links)
No description available.
|
6 |
Odhad parametru při dvoufázovém stratifikovaném a skupinovém výběru / Parameter Estimation under Two-phase Stratified and Cluster SamplingŠedová, Michaela January 2011 (has links)
Title: Parameter Estimation under Two-phase Stratified and Cluster Sampling Author: Mgr. Michaela Šedová Department: Department of Probability and Mathematical Statistics Supervisor: Doc. Mgr. Michal Kulich, Ph.D. Abstract: In this thesis we present methods of parameter estimation under two-phase stratified and cluster sampling. In contrast to classical sampling theory, we do not deal with finite population parameters, but focus on model parameter inference, where the ob- servations in a population are considered to be realisations of a random variable. However, we consider the sampling schemes used, and thus we incorporate much of survey sampling theory. Therefore, the presented methods of the parameter estimation can be understood as a combination of the two approaches. For both sampling schemes, we deal with the concept where the population is considered to be the first-phase sample, from which a sub- sample is drawn in the second phase. The target variable is then observed only for the subsampled subjects. We present the mean value estimation, including the statistical prop- erties of the estimator, and show how this estimation can be improved if some auxiliary information, correlated with the target variable, is observed for the whole population. We extend the method to the regression problem....
|
7 |
複雜抽樣下反應變數遺漏時之迴歸分析 / Regression Analysis with Missing Value of Responses under Complex Survey許正宏, Hsu, Cheng-Hung Unknown Date (has links)
Gelman, King, 及Liu(1998)針對一連串且互相獨立的橫斷面調查提出多重設算程序,且對不同調查的參數以階層模式(hierarchical model)連結。本文為介紹複雜抽樣(分層或群集抽樣)之下,若Q個連續變數有遺漏現象時,如何結合對象之個別特性,各層或各群集的參數,以及連結各層或各群集參數的階層模式,以設算遺漏值及估計模式中之參數。
對遺漏值的處理採用單調資料擴展演算法,只需對破壞單調資料型態的遺漏值進行設算。由於考慮到不同的群集或層往往呈現不同的特性,因而以階層模式連絡各群集或各層的參數,並將Gelman, King, Liu(1998)的推導結果擴展到將個別對象之特性納入考量之上。對各群集而言,他們的共變異數矩陣Ψ及Σ為影響群內其他參數的收斂情形,由模擬獲得的結果,沒有證據顯示應懷疑收斂的問題。 / Gelman, king, and Liu (1998) use multiple imputation for a series of cross section survey, and link the parameter of different survey by hierarchical model. This text introduces a method to impute missing value and estimate the parameters affected by hierarchical model if Q continuous variables has missing value under complex survey.
For each cluster, the parameters are influenced by their variance-covariance matrix Ψ and Σ. The result obtained from the simulation have no clear evidence to doubt the convergence of parameters.
|
8 |
Accelerating microarchitectural simulation via statistical sampling principlesBryan, Paul David 05 December 2012 (has links)
The design and evaluation of computer systems rely heavily upon simulation. Simulation is also a major bottleneck in the iterative design process. Applications that may be executed natively on physical systems in a matter of minutes may take weeks or
months to simulate. As designs incorporate increasingly higher numbers of processor cores, it is expected the times required to simulate future systems will become an even greater issue. Simulation exhibits a tradeoff between speed and accuracy. By basing experimental procedures upon known statistical methods, the simulation of systems may be dramatically accelerated while retaining reliable methods to estimate error.
This thesis focuses on the acceleration of simulation through statistical processes. The first two techniques discussed in this thesis focus on accelerating single-threaded simulation via cluster sampling. Cluster sampling extracts multiple groups of contiguous
population elements to form a sample. This thesis introduces techniques to reduce sampling and non-sampling bias components, which must be reduced for sample measurements to be reliable. Non-sampling bias is reduced through the Reverse State Reconstruction algorithm, which removes ineffectual instructions from the skipped instruction stream between simulated clusters. Sampling bias is reduced via the Single Pass Sampling Regimen Design Process, which guides the user towards selected representative sampling regimens. Unfortunately, the extension of cluster sampling to include multi-threaded architectures is non-trivial and raises many interesting challenges. Overcoming these challenges will be discussed. This thesis also introduces thread skew, a useful metric that quantitatively measures the non-sampling bias associated with divergent thread progressions at the beginning of a sampling unit. Finally, the Barrier
Interval Simulation method is discussed as a technique to dramatically decrease the simulation times of certain classes of multi-threaded programs. It segments a program
into discrete intervals, separated by barriers, which are leveraged to avoid many of the challenges that prevent multi-threaded sampling.
|
9 |
Approaches for the optimisation of double sampling for stratification in repeated forest inventoriesvon Lüpke, Nikolas 26 March 2013 (has links)
Die zweiphasige Stichprobe zur Stratifizierung ist ein effizientes Inventurverfahren, das seine Praxistauglichkeit in verschiedenen Waldinventuren unter Beweis stellen konnte. Dennoch sind weitere Effizienzsteigerungen wünschenswert. In der vorliegenden Arbeit werden verschiedene Ansätze die Effektivität dieses Verfahrens zu steigern separat vorgestellt, in Fallstudien mit Daten der Niedersächsischen Betriebsinventur getestet und diskutiert.
Der erste Ansatz (Kapitel 2) beschäftigt sich mit der Anwendung der zweiphasigen Stichprobe zur Stratifizierung in Wiederholungsinventuren. In einem Zusammengesetzten Schätzer werden Daten eines aktuellen mit Simulationsergebnissen des vorhergehenden Inventurdurchgangs kombiniert. Dabei kann der Stichprobenumfang der aktuellen Inventur verringert werden, während die Daten aller Inventurpunkte des vorherigen Durchgangs für Simulationen genutzt werden. Zwar kann ein solcher Schätzer konstruiert werden, jedoch lässt die Fallstudie darauf schließen, dass keine, oder zumindest keine ausreichende, Effizienzsteigerung erzielt werden kann. Erklärt werden kann dies durch die großen Unterschiede zwischen den aktuellen Inventurergebnissen aus den reduzierten Inventuren und den prognostizierten Volumina aus den Simulationen. Eine Erhöhung der Effizienz dieses Verfahrens könnte nur durch Weiterentwicklungen der Waldwachstumsmodelle möglich werden.
In Wiederholungsinventuren kann jedoch eine höhere Effizienzsteigerung mit einem dreiphasigen Verfahren erreicht werden, das die zweiphasige Stichprobe mit der zwei\-phasigen Regressionsstichprobe kombiniert (Kapitel 3). Mittelwert- und Varianzschätzer, die auf dem sogenannten infinite population approach in der ersten Phase beruhen, werden präsentiert. Genutzt werden dabei die Korrelationen zwischen den aktuellen Inventurergebnissen und den Wachstumssimulationen auf der Basis des vorherigen Inventurdurchgangs. Statt der Simulationsergebnisse können auch einfach die Ergebnisse des vorherigen Inventurdurchgangs zur Berechnung der Korrelationen genutzt werden. Allerdings führt die Nutzung der Simulationsergebnisse als Regressor in den meisten Fällen zu besseren Ergebnissen. Bei verringertem Stichprobenumfang der Folgeinventur und damit einhergehendem Präzisionsverlust, ist die Effizienz des dreiphasigen Verfahrens höher als die des klassischen zweiphasigen Verfahrens. Die Nutzung der Vorinventur in Form eines stratenweisen Regressionsschätzers hat sich damit als erfolgreich und gegenüber dem zusammengesetzten Schätzer als deutlich überlegen gezeigt.
Als weiterer Ansatz wird die Erweiterung der zweisphasigen Stichprobe zur Stratifizierung um eine geclusterte Unterstichprobe zu einem dreiphasigen Design vorgestellt (Kapitel 4). Sowohl für den Ratio-to-Size- als auch für den unverzerrten Ansatz werden entsprechende Mittelwert- und Varianzschätzer präsentiert. Verglichen mit dem zweiphasigen Verfahren, führt dieses dreiphasige Design in der Fallstudie zu keiner Effizienzsteigerung. Gründe hierfür können in der vergleichsweise kleinen Größe der Forstämter und der hohen Stichprobendichte der Niedersächsischen Betriebsinventur gesehen werden. Sinnvolle Anwendungen dieses Verfahrens sind aber möglicherweise unter anderen Erschließungsbedingungen in Großgebieten denkbar.
In einer weiteren Fallstudie wird versucht existierende Probepunkte in Clustern von homogener Größe zusammenzufassen (Kapitel 5). Eine solche Zusammenfassung soll der Optimierung der Wegzeiten bei der Aufnahme von Inventurpunkten dienen. Dazu werden sieben verschiedene Methoden getestet und deren Ergebnisse miteinander verglichen. Durch einen Vergleich mit optimierten Richtwert-Lösungen wird zudem die Qualität dieser Lösungen evaluiert. Es zeigt sich, dass drei Algorithmen des Vehicle Routing Problems gut dazu geeignet sind, Cluster von homogener Größe zu erstellen. Nicht empfohlen werden kann dagegen die Verwendung von drei anderen Cluster-Algorithmen, sowie die Nutzung von Bewirtschaftungseinheiten als Cluster, da diese Methoden zu Clustern von sehr heterogener Größe führen.
|
10 |
Computação paralela aplicada a problemas eletromagneticos utilizando o metodo FDTD / Parallel computing applied to electromagnetic problems using the FDTD methodSantos, Carlos Henrique da Silva 08 May 2005 (has links)
Orientador: Hugo Enrique Hernandez Figueroa / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T08:10:41Z (GMT). No. of bitstreams: 1
Santos_CarlosHenriquedaSilva_M.pdf: 1752834 bytes, checksum: 8ed5b0902bb130762ff802db03187fbb (MD5)
Previous issue date: 2005 / Resumo: Esse trabalho tem por objetivo desenvolver soluções computacionais de alto desempenho a um baixo custo, seguindo as propostas incentivadoras do Governo Federal para adoção de software livre. Essas soluções possibilitam simular, de maneira eficiente, os domínios computacionais de médio e grande porte utilizados no eletromagnetismo computacional. Os bons resultados obtidos nesse trabalho mostram a importância e eficiência da computação massivamente paralela utilizando cluster Beowulf para o processamento do método FDTD aplicado em estruturas complexas, porém a um baixo custo financeiro. O desempenho desse sistema ficou comprovado na realização de experimentos para analisar a SAR na cabeça humana e estudar os efeitos de estruturas metamateriais / Abstract: This work has as objective to develop high performance computational solutions to a low cost, following the stimulated proposals of the Federal Government for adoption of free software. They make possible to simulate, in efficient way, the computational domains of middle and high size useful on the computational electromagnetism. The good results gotten in these work showed the importance and efficiency of the massive parallel computation using the Beowulf cluster for the process the FDTD method applied on complex structures, however to a low financial cost. The performance of this system was proved in the realization of experiments to analyze the SAR on the human head and to study the effects of metamarial structures / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
|
Page generated in 0.1007 seconds