Spelling suggestions: "subject:"smallarea estimation"" "subject:"smallestarea estimation""
11 |
Small area estimation of county-level forest attributes using forest inventory data and remotely sensed auxiliary informationAlegbeleye, Okikiola Michael 08 August 2023 (has links) (PDF)
The Forest Inventory and Analysis (FIA) program of the United States Department of Agriculture Forest Service collects forest inventory data that provide estimates with reasonable accuracy at the national scale. However, for smaller domains, these estimates are often not as accurate due to the small sample size. Small area estimation improves the accuracy of the estimates at smaller domains by relying on auxiliary information. This study compared direct (FIA estimates), indirect (multiple linear regression), and composite estimators (Fay-Herriot) using auxiliary information derived from Landsat and Global Ecosystem Dynamics Investigation (GEDI) to obtain county-level estimates of forest attributes namely total and merchantable volume (m3 ha-1), aboveground biomass (Mg ha-1), basal area (m2 ha-1), and Lorey’s mean height (m). Compared with FIA estimates, the composite estimator reduced error by 75-78% for all the variables of interest. This shows that a reasonable amount of precision can be achieved with auxiliary information from Landsat and GEDI, improving FIA estimates at the county level.
|
12 |
AComparison of Methods for Estimating State Subgroup Performance on the National Assessment of Educational Progress:Bamat, David January 2021 (has links)
Thesis advisor: Henry Braun / The State NAEP program only reports the mean achievement estimate of a subgroup within a given state if it samples at least 62 students who identify with the subgroup. Since some subgroups of students constitute small proportions of certain states’ general student populations, these low-incidence groups of students are seldom sufficiently sampled to meet this rule-of-62 requirement. As a result, education researchers and policymakers are frequently left without a full understanding of how states are supporting the learning and achievement of different subgroups of students.Using grade 8 mathematics results in 2015, this dissertation addresses the problem by comparing the performance of three different techniques in predicting mean subgroup achievement on NAEP. The methodology involves simulating scenarios in which subgroup samples greater or equal to 62 are treated as not available for calculating mean achievement estimates. These techniques comprise an adaptation of Multivariate Imputation by Chained Equations (MICE), a common form of Small Area Estimation known as the Fay-Herriot model (FH), and a Cross-Survey analysis approach that emphasizes flexibility in model specification, referred to as Flexible Cross-Survey Analysis (FLEX CS) in this study. Data used for the prediction study include public-use state-level estimates of mean subgroup achievement on NAEP, restricted-use student-level achievement data on NAEP, public-use state-level administrative data from Education Week, the Common Core of Data, the U.S. Census Bureau, and public-use district-level achievement data in NAEP-referenced units from the Stanford Education Data Archive.
To evaluate the accuracy of the techniques, a weighted measure of Mean Absolute Error and a coverage indicator quantify differences between predicted and target values. To evaluate whether a technique could be recommended for use in practice, accuracy measures for each technique are compared to benchmark values established as markers of successful prediction based on results from a simulation analysis with example NAEP data.
Results indicate that both the FH and FLEX CS techniques may be suitable for use in practice and that the FH technique is particularly appealing. However, before definitive recommendations are made, the analyses from this dissertation should be conducted employing math achievement data from other years, as well as data from NAEP Reading. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement and Evaluation.
|
13 |
Comparação de métodos de estimação em pequenas áreas para proporções: o caso da TIC Educação / A comparison of methods in small areas estimation for proportions: the ICT Education caseCoelho, Isabela Bertolini 28 April 2016 (has links)
A sociedade atual é também conhecida como Sociedade da Informação, pois o acesso às informações e ao conhecimento está disponível de maneira rápida através das Tecnologias de Informação e Comunicação (TIC), como computador, Internet e telefone celular. Assim, tem sido necessário elaborar novas maneiras de pensar e conviver com essas tecnologias. Para o desenvolvimento socioeconômico das nações é importante formar uma sociedade crítica, reflexo do processo educacional adotado; dessa maneira, é preciso se apropriar das TIC para obter práticas de ensino mais criativas e flexíveis. Para que essa integração gere resultados satisfatórios é preciso a união de diversos fatores como a infraestrutura disponível na escola, o domínio dos professores sobre a utilização nas atividades de ensino-aprendizagem, a integração no projeto político-pedagógico, a implementação de políticas públicas na área educacional pelo governo etc. Dessa forma, o levantamento de dados estatísticos sobre a adoção das TIC nos processos educacionais se faz necessário. Pesquisas amostrais são muito utilizadas com o intuito de conhecer determinada característica sobre uma população. O tamanho das amostras costuma ser planejado para a obtenção de dados para grandes áreas, no entanto, vem crescendo o desejo de se obter informações em níveis mais desagregados, onde o tamanho da amostra é pequeno para a produção de estimativas com precisão aceitável, sem aumentar o tamanho amostral. Em vista disso, a metodologia de estimação em pequenas áreas tem sido desenvolvida de forma a produzir estimativas com precisão adequada para as características de interesse, considerando a distribuição de probabilidade trazida no desenho amostral ou a utilização de modelos que emprestam informações para áreas semelhantes. O objetivo desta dissertação é a obtenção dessas estimativas para a proporção de escolas em que os professores usam a Internet em atividades de ensino-aprendizagem com os alunos para cada Unidade Federativa do Brasil, utilizando dados reais provenientes da pesquisa TIC Educação, produzida pelo CGI.br, e do Censo Escolar, produzido pelo INEP. Obtemos as estimativas por diferentes abordagens, tanto direto da amostra quanto através da construção de modelos de regressão logística, e as comparamos através da estimativa do erro quadrático médio e da proporção de acertos, através da matriz de confusão por validação leave-one-out. Para a consolidação dos resultados obtidos nos dados reais, fazemos um estudo de simulação de dados. O modelo de efeitos aleatórios é considerado como o que apresentou os melhores resultados. / The current society is also known as the Information Society because access to information and knowledge is available through Information and Communication Technologies (ICT) such as computer, Internet and mobile phone. Thus, new ways of thinking and living with these technologies have become necessary. For the socio-economic development of nations it is important to create a critical society, reection of adopted educational process; In that way, appropriating ICT should be necessary to obtain more creative and exible teaching practices. To obtain satisfactory performance it needs the union of several factors such as the infrastructure available in schools, the teacher\'s knowledge about how to adopt ICT on practical activities, the ICT integration on the political pedagogical project, the implementation of public policies on the educational sector etc. In this manner, collect statistical data about ICT adoption on teaching practices is necessary. Sample surveys are widely used in order to understand certain characteristics of a population. The sample sizes is often designed to obtain results for large areas, nevertheless, the desire to obtain these results for more disaggregated areas, where the sample size is small to produce reliable estimates, are increasing without increasing the sample size. Small area estimation methodology has been developed to produce reliable estimates about some desired characteristics considering the probability distribution introduced on the sample design or considering models to lend information to resembling domains. Our purpose is to obtain estimates to the proportion of schools wherein teachers use the Internet to teaching-learning activities with their students for each Federative Unit of Brazil using real data from ICT in Education Survey, conducted by CGI.br, and Scholar Census, conducted by INEP. We obtain these estimates from dierent aproaches both by direct estimator and by logistic regression models and we compare them under the mean squared error and the proportion of success using confusion matrix by leave-one-out cross-validation. To consolidate these results we do a simulation study. The logistic random eects model is considered the best approach.
|
14 |
Comparação de métodos de estimação em pequenas áreas para proporções: o caso da TIC Educação / A comparison of methods in small areas estimation for proportions: the ICT Education caseIsabela Bertolini Coelho 28 April 2016 (has links)
A sociedade atual é também conhecida como Sociedade da Informação, pois o acesso às informações e ao conhecimento está disponível de maneira rápida através das Tecnologias de Informação e Comunicação (TIC), como computador, Internet e telefone celular. Assim, tem sido necessário elaborar novas maneiras de pensar e conviver com essas tecnologias. Para o desenvolvimento socioeconômico das nações é importante formar uma sociedade crítica, reflexo do processo educacional adotado; dessa maneira, é preciso se apropriar das TIC para obter práticas de ensino mais criativas e flexíveis. Para que essa integração gere resultados satisfatórios é preciso a união de diversos fatores como a infraestrutura disponível na escola, o domínio dos professores sobre a utilização nas atividades de ensino-aprendizagem, a integração no projeto político-pedagógico, a implementação de políticas públicas na área educacional pelo governo etc. Dessa forma, o levantamento de dados estatísticos sobre a adoção das TIC nos processos educacionais se faz necessário. Pesquisas amostrais são muito utilizadas com o intuito de conhecer determinada característica sobre uma população. O tamanho das amostras costuma ser planejado para a obtenção de dados para grandes áreas, no entanto, vem crescendo o desejo de se obter informações em níveis mais desagregados, onde o tamanho da amostra é pequeno para a produção de estimativas com precisão aceitável, sem aumentar o tamanho amostral. Em vista disso, a metodologia de estimação em pequenas áreas tem sido desenvolvida de forma a produzir estimativas com precisão adequada para as características de interesse, considerando a distribuição de probabilidade trazida no desenho amostral ou a utilização de modelos que emprestam informações para áreas semelhantes. O objetivo desta dissertação é a obtenção dessas estimativas para a proporção de escolas em que os professores usam a Internet em atividades de ensino-aprendizagem com os alunos para cada Unidade Federativa do Brasil, utilizando dados reais provenientes da pesquisa TIC Educação, produzida pelo CGI.br, e do Censo Escolar, produzido pelo INEP. Obtemos as estimativas por diferentes abordagens, tanto direto da amostra quanto através da construção de modelos de regressão logística, e as comparamos através da estimativa do erro quadrático médio e da proporção de acertos, através da matriz de confusão por validação leave-one-out. Para a consolidação dos resultados obtidos nos dados reais, fazemos um estudo de simulação de dados. O modelo de efeitos aleatórios é considerado como o que apresentou os melhores resultados. / The current society is also known as the Information Society because access to information and knowledge is available through Information and Communication Technologies (ICT) such as computer, Internet and mobile phone. Thus, new ways of thinking and living with these technologies have become necessary. For the socio-economic development of nations it is important to create a critical society, reection of adopted educational process; In that way, appropriating ICT should be necessary to obtain more creative and exible teaching practices. To obtain satisfactory performance it needs the union of several factors such as the infrastructure available in schools, the teacher\'s knowledge about how to adopt ICT on practical activities, the ICT integration on the political pedagogical project, the implementation of public policies on the educational sector etc. In this manner, collect statistical data about ICT adoption on teaching practices is necessary. Sample surveys are widely used in order to understand certain characteristics of a population. The sample sizes is often designed to obtain results for large areas, nevertheless, the desire to obtain these results for more disaggregated areas, where the sample size is small to produce reliable estimates, are increasing without increasing the sample size. Small area estimation methodology has been developed to produce reliable estimates about some desired characteristics considering the probability distribution introduced on the sample design or considering models to lend information to resembling domains. Our purpose is to obtain estimates to the proportion of schools wherein teachers use the Internet to teaching-learning activities with their students for each Federative Unit of Brazil using real data from ICT in Education Survey, conducted by CGI.br, and Scholar Census, conducted by INEP. We obtain these estimates from dierent aproaches both by direct estimator and by logistic regression models and we compare them under the mean squared error and the proportion of success using confusion matrix by leave-one-out cross-validation. To consolidate these results we do a simulation study. The logistic random eects model is considered the best approach.
|
15 |
Essays on Inference in Linear Mixed ModelsKramlinger, Peter 28 April 2020 (has links)
No description available.
|
16 |
The Prevalence and Context of Adult Female Overweight and Obesity in Sub-Saharan AfricaOzodiegwu, Ifeoma 01 May 2019 (has links) (PDF)
Adult women bear a disproportionate burden of overweight and obesity in Sub-Saharan Africa (SSA). Precise information to understand disease distribution and assess determinants is lacking. Therefore, this dissertation aimed to: (i) analyze the prevalence of adult female overweight and obesity combined in lower-level administrative units; (ii) analyze the effect modification of educational attainment and age on the association between household wealth and adult female overweight and obesity; (iii) synthesize qualitative research evidence to describe contextual factors contributing to female overweight and obesity at different life stages. Bayesian and logistic regression models were constructed with Demographic and Health Survey (DHS) data to respectively estimate the prevalence of overweight and obesity and assess the interaction of education on the association between household wealth and overweight. The synthesis of qualitative research studies was conducted in accordance with PRISMA guidelines and findings were grouped by themes. Prevalence estimates revealed heterogeneity at second-level administrative units in the seven SSA countries examined, which was not visible in first-level administrative units. The combined prevalence of overweight and obesity ranged from 7.5 – 42.0% in Benin, 1.4 – 35.9% in Ethiopia, 1.6 – 44.7% in Mozambique, 1.0 – 67.9% in Nigeria, 2.2 - 72.4% in Tanzania, 3.9 – 39.9% in Zambia, and 4.5 - 50.6% in Zimbabwe. Additionally, education did not have a statistically significant modifying effect on the positive association between household wealth and overweight in the 22 SSA countries eligible for the study. Body shape and size ideals, barriers to healthy food choices and physical activity were key themes in the research synthesis encompassing four SSA countries. Positive symbolism, including beauty, was linked to overweight and obesity in adult women. Among adolescents, although being overweight or obese was not accepted, girls were expected to be voluptuous. Body image dissatisfaction and victimization characterized the experiences of non-conforming women and girls. Barriers to healthy nutrition included migration and the food environment. Whereas, barriers to physical activity included ageism. While additional work is encouraged to validate the prevalence estimates, overweight and obesity interventions must consider whether the determinants identified in this study are relevant to their context to inform improved outcomes.
|
17 |
Improving Estimates for Electronic Health Record Take up in Ohio: A Small Area Estimation TechniqueWeston, Daniel Joseph, II 06 January 2012 (has links)
No description available.
|
18 |
Methods for size estimation of hidden population using large-scale health dataWang, Jianing 04 February 2025 (has links)
2023 / Accurately estimating hidden population sizes is essential for effective policy-making, but a traditional census is typically not feasible. Data-driven approaches that use existing sources are needed for reliable prevalence estimates. Capture-recapture methods, used in ecology to estimate population size, have been advanced in epidemiology over the past two decades to improve disease prevalence estimates and can relax unrealistic assumptions of naive models. Yet given the present development within epidemiology, difficulties still exist in using conventional capture-recapture methods to estimate prevalence in subpopulations across spatial units. Additionally, there is a need to estimate prevalence in socioeconomically stratified groups to understand the groups with higher risks and discover potential health disparities. Unfortunately, conventional approaches heavily rely on stand-alone stratified analysis, which may be less effective when certain subgroups display similar or more intricate patterns of correlated healthcare engagement. To address these challenges, we first articulated the fundamental concepts behind the capture-recapture method by comparing it to a similar approach, the multiplier benchmark method. Then we focused on the capture-recapture structure and proposed a Bayesian hierarchical spatial capture-recapture model that estimates individual detection probabilities and spatial variation of OUD prevalence. The proposed model enables population structure estimation from coarse summaries to finer-scale components using a spatially explicit areal adjacency-based smoothing process model. Finally, an extension of the proposed model is presented to incorporate the correlation structure between socioeconomically stratified subpopulations. We applied the extended model to the Massachusetts Public Health Data Warehouse to evaluate the efficiency of the proposed method compared to traditional methods. We used simulation studies for each work to investigate the performance of the proposed estimators in varying circumstances and to determine which method may be more effective in different scenarios. Our comprehensive evaluation found that the proposed methods could accurately estimate area-specific and group-specific prevalence with lower bias and variance. These methods effectively address the issue of data sparsity in subpopulations and account for the underlying structure that more accurately reflects what occurs during ecological and data collection processes. The methods developed in this dissertation provide a powerful tool for accurately estimating the disease burden in hidden subpopulations, making them essential for targeted interventions and effective public health policies. / 2027-02-04T00:00:00Z
|
19 |
Inférence basée sur le plan pour l'estimation de petits domaines / Design-based inference for small area estimationRandrianasolo, Toky 18 November 2013 (has links)
La forte demande de résultats à un niveau géographique fin, notamment à partir d'enquêtes nationales, a mis en évidence la fragilité des estimations sur petits domaines. Cette thèse propose d'y remédier avec des méthodes spécifiques basées sur le plan de sondage. Celles-ci reposent sur la constructionde nouvelles pondérations pour chaque unité statistique. La première méthode consiste à optimiser le redressement du sous-échantillon d'une enquête inclusdans un domaine. La deuxième repose sur la construction de poids dépendant à la fois des unités statistiques et des domaines. Elle consiste à scinder les poids de sondage de l'estimateur global tout en respectant deux contraintes : 1/ la somme des estimations sur toute partition en domaines est égale à l'estimation globale ; 2/ le système de pondération pour un domaine particulier satisfait les propriétés de calage sur les variables auxiliaires connues pour le domaine. L'estimateur par scission ainsi obtenu se comporte de manière quasi analogue au célèbre estimateur blup (meilleur prédicteur linéaire sans biais). La troisième méthode propose une réécriture de l'estimateur blup sous la forme d'un estimateur linéaire homogène, en adoptant une approche basée sur le plan de sondage, bien que l'estimateur dépende d'un modèle. De nouveaux estimateurs blup modifiés sont obtenus. Leur précision, estimée par simulation avec application sur des données réelles, est assez proche de celle de l'estimateur blup standard. Les méthodes développées dans cette thèse sont ensuite appliquées à l'estimation d'indicateurs de la mobilité locale à partir de l'Enquête Nationale sur les Transports et les Déplacements 2007-2008. Lorsque la taille d'un domaine est faible dans l'échantillon, les estimations obtenues avec la première méthode perdent en précision, alors que la précision reste satisfaisante pour les deux autres méthodes. / The strong demand for results at a detailed geographic level, particularly from national surveys, has raised the problem of the fragility of estimates for small areas. This thesis addresses this issue with specific methods based on the sample design. These ones consist of building new weights for each statistical unit. The first method consists of optimizing the re-weighting of a subsample survey included in an area. The second one is based on the construction of weights that depend on the statistical units as well as the areas. It consists of splitting the sampling weights of the overall estimator while satisfying two constraints : 1/ the sum of the estimates on every partition into areas is equal to the overall estimate ; 2/ the system of weights for a given area satisfies calibration properties on known auxiliary variables at the level of the area. The split estimator thus obtained behaves almost similarly as the well-known blup (best linear unbiased predictor) estimator. The third method proposes a rewriting of the blup estimator, although model-based, in the form of a homogenous linear estimator from a design-based approach. New modified blup estimators are obtained. Their precision, estimated by simulation with an application to real data, is quite close to that of the standard blup estimator. Then, the methods developed in this thesis are applied to the estimation of local mobility indicators from the 2007-2008 French National Travel Survey. When the size of an area is small in the sample, the estimates obtained with the first method are not precise enough whereas the precision remains satisfactory for the two other methods.
|
20 |
Bayesian Models for the Analyzes of Noisy Responses From Small Areas: An Application to Poverty EstimationManandhar, Binod 26 April 2017 (has links)
We implement techniques of small area estimation (SAE) to study consumption, a welfare indicator, which is used to assess poverty in the 2003-2004 Nepal Living Standards Survey (NLSS-II) and the 2001 census. NLSS-II has detailed information of consumption, but it can give estimates only at stratum level or higher. While population variables are available for all households in the census, they do not include the information on consumption; the survey has the `population' variables nonetheless. We combine these two sets of data to provide estimates of poverty indicators (incidence, gap and severity) for small areas (wards, village development committees and districts). Consumption is the aggregate of all food and all non-food items consumed. In the welfare survey the responders are asked to recall all information about consumptions throughout the reference year. Therefore, such data are likely to be noisy, possibly due to response errors or recalling errors. The consumption variable is continuous and positively skewed, so a statistician might use a logarithmic transformation, which can reduce skewness and help meet the normality assumption required for model building. However, it could be problematic since back transformation may produce inaccurate estimates and there are difficulties in interpretations. Without using the logarithmic transformation, we develop hierarchical Bayesian models to link the survey to the census. In our models for consumption, we incorporate the `population' variables as covariates. First, we assume that consumption is noiseless, and it is modeled using three scenarios: the exponential distribution, the gamma distribution and the generalized gamma distribution. Second, we assume that consumption is noisy, and we fit the generalized beta distribution of the second kind (GB2) to consumption. We consider three more scenarios of GB2: a mixture of exponential and gamma distributions, a mixture of two gamma distributions, and a mixture of two generalized gamma distributions. We note that there are difficulties in fitting the models for noisy responses because these models have non-identifiable parameters. For each scenario, after fitting two hierarchical Bayesian models (with and without area effects), we show how to select the most plausible model and we perform a Bayesian data analysis on Nepal's poverty data. We show how to predict the poverty indicators for all wards, village development committees and districts of Nepal (a big data problem) by combining the survey data with the census. This is a computationally intensive problem because Nepal has about four million households with about four thousand households in the survey and there is no record linkage between households in the survey and the census. Finally, we perform empirical studies to assess the quality of our survey-census procedure.
|
Page generated in 0.0972 seconds