Global ETD Search

1	The Associations Between Bisphenol A and Phthalates, and Measures of Adiposity Among Canadians McCormack, Daniel January 2016 (has links) Bisphenol A (BPA) and phthalates are chemicals found in many consumer products including water bottles, food packaging and cosmetics. Previous research has shown that there is potential for these compounds to contribute to obesity. In this analysis, the Canadian Health Measures Survey was used to investigate possible associations between urinary concentrations of these compounds and measures of adiposity. BPA urine concentrations were found to decrease with age, and significant associations with BMI and waist circumference were found in linear regression in adults. No associations with measures of adiposity were found in logistic regression for adults and significant negative associations were found in children. A similar discrepancy was found for mono-(2-ethyl-5-hydroxyhexyl) phthalate and mono-(2-ethyl-5-oxohexyl) phthalate, which were significantly associated with obesity in adults, but showed several significant negative associations in children. Overall, this analysis showed that it is unlikely that BPA and phthalates are contributing to adiposity in the Canadian population. bisphenol A obesity complex survey phthalates
2	Statistical modeling of longitudinal survey data with binary outcomes Ghosh, Sunita 20 December 2007 Data obtained from longitudinal surveys using complex multi-stage sampling designs contain cross-sectional dependencies among units caused by inherent hierarchies in the data, and within subject correlation arising due to repeated measurements. The statistical methods used for analyzing such data should account for stratification, clustering and unequal probability of selection as well as within-subject correlations due to repeated measurements. <p>The complex multi-stage design approach has been used in the longitudinal National Population Health Survey (NPHS). This on-going survey collects information on health determinants and outcomes in a sample of the general Canadian population. <p>This dissertation compares the model-based and design-based approaches used to determine the risk factors of asthma prevalence in the Canadian female population of the NPHS (marginal model). Weighted, unweighted and robust statistical methods were used to examine the risk factors of the incidence of asthma (event history analysis) and of recurrent asthma episodes (recurrent survival analysis). Missing data analysis was used to study the bias associated with incomplete data. To determine the risk factors of asthma prevalence, the Generalized Estimating Equations (GEE) approach was used for marginal modeling (model-based approach) followed by Taylor Linearization and bootstrap estimation of standard errors (design-based approach). The incidence of asthma (event history analysis) was estimated using weighted, unweighted and robust methods. Recurrent event history analysis was conducted using Anderson and Gill, Wei, Lin and Weissfeld (WLW) and Prentice, Williams and Peterson (PWP) approaches. To assess the presence of bias associated with missing data, the weighted GEE and pattern-mixture models were used.<p>The prevalence of asthma in the Canadian female population was 6.9% (6.1-7.7) at the end of Cycle 5. When comparing model-based and design- based approaches for asthma prevalence, design-based method provided unbiased estimates of standard errors. The overall incidence of asthma in this population, excluding those with asthma at baseline, was 10.5/1000/year (9.2-12.1). For the event history analysis, the robust method provided the most stable estimates and standard errors. <p>For recurrent event history, the WLW method provided stable standard error estimates. Finally, for the missing data approach, the pattern-mixture model produced the most stable standard errors <p>To conclude, design-based approaches should be preferred over model-based approaches for analyzing complex survey data, as the former provides the most unbiased parameter estimates and standard errors. NPHS Survey GEE Missing data Survival analysis Longitudinal Complex survey
3	Statistical modeling of longitudinal survey data with binary outcomes Ghosh, Sunita 20 December 2007 (has links) Data obtained from longitudinal surveys using complex multi-stage sampling designs contain cross-sectional dependencies among units caused by inherent hierarchies in the data, and within subject correlation arising due to repeated measurements. The statistical methods used for analyzing such data should account for stratification, clustering and unequal probability of selection as well as within-subject correlations due to repeated measurements. <p>The complex multi-stage design approach has been used in the longitudinal National Population Health Survey (NPHS). This on-going survey collects information on health determinants and outcomes in a sample of the general Canadian population. <p>This dissertation compares the model-based and design-based approaches used to determine the risk factors of asthma prevalence in the Canadian female population of the NPHS (marginal model). Weighted, unweighted and robust statistical methods were used to examine the risk factors of the incidence of asthma (event history analysis) and of recurrent asthma episodes (recurrent survival analysis). Missing data analysis was used to study the bias associated with incomplete data. To determine the risk factors of asthma prevalence, the Generalized Estimating Equations (GEE) approach was used for marginal modeling (model-based approach) followed by Taylor Linearization and bootstrap estimation of standard errors (design-based approach). The incidence of asthma (event history analysis) was estimated using weighted, unweighted and robust methods. Recurrent event history analysis was conducted using Anderson and Gill, Wei, Lin and Weissfeld (WLW) and Prentice, Williams and Peterson (PWP) approaches. To assess the presence of bias associated with missing data, the weighted GEE and pattern-mixture models were used.<p>The prevalence of asthma in the Canadian female population was 6.9% (6.1-7.7) at the end of Cycle 5. When comparing model-based and design- based approaches for asthma prevalence, design-based method provided unbiased estimates of standard errors. The overall incidence of asthma in this population, excluding those with asthma at baseline, was 10.5/1000/year (9.2-12.1). For the event history analysis, the robust method provided the most stable estimates and standard errors. <p>For recurrent event history, the WLW method provided stable standard error estimates. Finally, for the missing data approach, the pattern-mixture model produced the most stable standard errors <p>To conclude, design-based approaches should be preferred over model-based approaches for analyzing complex survey data, as the former provides the most unbiased parameter estimates and standard errors. NPHS Survey GEE Missing data Survival analysis Longitudinal Complex survey
4	Propensity Score Methods for Estimating Causal Effects from Complex Survey Data Ashmead, Robert D. January 2014 (has links) No description available. Biostatistics propensity score complex survey data causal inference
5	Epidemiological Study of Coccidioidomycosis in Greater Tucson, Arizona Tabor, Joseph Anthony January 2009 (has links) The goal of this dissertation is to characterize the distribution and determinants of coccidioidomycosis in greater Tucson, Arizona, using landscape ecology and complex survey methods to control for environmental factors that affect <italic>Coccidioides</italic> exposure. Notifiable coccidioidomycosis cases reported to the health department in Arizona have dramatically increased since 1997 and indicate a potential epidemic of unknown causes. Epidemic determination is confounded by concurrent changes in notifiable disease reporting-compliance, misdiagnosis, and changing demographics of susceptible populations. A stratified, two-stage, address-based telephone survey of greater Tucson, Arizona, was conducted in 2002 and 2003. Subjects were recruited from direct marketing data by census block groups and landscape strata as determined using a geographic information system (GIS). Subjects were interviewed about potential risk factors. Address-level state health department notifiable-disease surveillance data were compared with self-reported survey data to estimate the true disease frequency.Comparing state surveillance data with the survey data, no coccidioidomycosis epidemic was detectable from 1992 to 2006 after adjusting surveillance data for reporting compliance. State health department surveillance reported only 20% of the probable reportable cases in 2001.Utilizing survey data and geographic coding, it was observed that spatial and temporal disease frequency was highly variable at the census block-group scale and indicates that localized soil disturbance events are a major group-level risk factor. Poststratification by 2000 census demographic data adjusted for selection bias into the survey and response rate. Being Hispanic showed similar odds ratio of self-reporting coccidioidomycosis diagnosis as of being non-Hispanic White race-ethnicity when controlled by other risk factors. Cigarette smoking in the home and having a home located in the low Hispanic foothills and low Hispanic riparian strata were associated with elevated risk of odds ratios for coccidioidomycosis. Sample stratification by landscape and demographics controlled for differential classification of susceptibility and exposures between strata.Clustered, address-based telephone surveys provide a feasible and valid method to recruit populations from address-based lists by using a GIS to design a survey and population survey statistical methods for the analysis. Notifiable coccidioidomycosis case surveillance can be improved by including reporting compliance in the analysis. Pathogen exposures and host susceptibility are important predictable group-level determinants of coccidioidomycosis that were controlled by stratified sampling using a landscape ecology approach. capture-recapture complex survey multiple imputation population control adjustment valley fever
6	On estimating variances for Gini coefficients with complex surveys: theory and application Hoque, Ahmed 29 September 2016 (has links) Obtaining variances for the plug-in estimator of the Gini coefficient for inequality has preoccupied researchers for decades with the proposed analytic formulae often being regarded as being too cumbersome to apply, as well as usually based on the assumption of an iid structure. We examine several variance estimation techniques for a Gini coefficient estimator obtained from a complex survey, a sampling design often used to obtain sample data in inequality studies. In the first part of the dissertation, we prove that Bhattacharya’s (2007) asymptotic variance estimator when data arise from a complex survey is equivalent to an asymptotic variance estimator derived by Binder and Kovačević (1995) nearly twenty years earlier. In addition, to aid applied researchers, we also show how auxiliary regressions can be used to generate the plug-in Gini estimator and its asymptotic variance, irrespective of the sampling design. In the second part of the dissertation, using Monte Carlo (MC) simulations with 36 data generating processes under the beta, lognormal, chi-square, and the Pareto distributional assumptions with sample data obtained under various complex survey designs, we explore two finite sample properties of the Gini coefficient estimator: bias of the estimator and empirical coverage probabilities of interval estimators for the Gini coefficient. We find high sensitivity to the number of strata and the underlying distribution of the population data. We compare the performance of two standard normal (SN) approximation interval estimators using the asymptotic variance estimators of Binder and Kovačević (1995) and Bhattacharya (2007), another SN approximation interval estimator using a traditional bootstrap variance estimator, and a standard MC bootstrap percentile interval estimator under a complex survey design. With few exceptions, namely with small samples and/or highly skewed distributions of the underlying population data where the bootstrap methods work relatively better, the SN approximation interval estimators using asymptotic variances perform quite well. Finally, health data on the body mass index and hemoglobin levels for Bangladeshi women and children, respectively, are used as illustrations. Inequality analysis of these two important indicators provides a better understanding about the health status of women and children. Our empirical results show that statistical inferences regarding inequality in these well-being variables, measured by the Gini coefficients, based on Binder and Kovačević’s and Bhattacharya’s asymptotic variance estimators, give equivalent outcomes. Although the bootstrap approach often generates slightly smaller variance estimates in small samples, the hypotheses test results or widths of interval estimates using this method are practically similar to those using the asymptotic variance estimators. Our results are useful, both theoretically and practically, as the asymptotic variance estimators are simpler and require less time to calculate compared to those generated by bootstrap methods, as often previously advocated by researchers. These findings suggest that applied researchers can often be comfortable in undertaking inferences about the inequality of a well-being variable using the Gini coefficient employing asymptotic variance estimators that are not difficult to calculate, irrespective of whether the sample data are obtained under a complex survey or a simple random sample design. / Graduate / 0534 / 0501 / 0463 / aahoque@gmail.com
7	Statistical models for estimating the intake of nutrients and foods from complex survey data Pell, David Andrew January 2019 (has links) Background: The consequences of poor nutrition are well known and of wide concern. Governments and public health agencies utilise food and diet surveillance data to make decisions that lead to improvements in nutrition. These surveys often utilise complex sample designs for efficient data collection. There are several challenges in the statistical analysis of dietary intake data collected using complex survey designs, which have not been fully addressed by current methods. Firstly, the shape of the distribution of intake can be highly skewed due to the presence of outlier observations and a large proportion of zero observations arising from the inability of the food diary to capture consumption within the period of observation. Secondly, dietary data is subject to variability arising from day-to-day individual variation in food consumption and measurement error, to be accounted for in the estimation procedure for correct inferences. Thirdly, the complex sample design needs to be incorporated into the estimation procedure to allow extrapolation of results into the target population. This thesis aims to develop novel statistical methods to address these challenges, applied to the analysis of iron intake data from the UK National Diet and Nutrition Survey Rolling Programme (NDNS RP) and UK national prescription data of iron deficiency medication. Methods: 1) To assess the nutritional status of particular population groups a two-part model with a generalised gamma (GG) distribution was developed for intakes that show high frequencies of zero observations. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. 2) To identify population groups at risk of low nutrient intakes, a linear quantile mixed-effects model was developed to model quantiles of the distribution of intake as a function of explanatory variables. The proposed approach was illustrated by comparing the quantiles of iron intake with Lower Reference Nutrient Intakes (LRNI) recommendations using NDNS RP. This thesis extended the estimation procedures of both the two-part model with GG distribution and the linear quantile mixed-effects model to incorporate the complex sample design in three steps: the likelihood function was multiplied by the sample weightings; bootstrap methods for the estimation of the variance and finally, the variance estimation of the model parameters was stratified by the survey strata. 3) To evaluate the allocation of resources to alleviate nutritional deficiencies, a quantile linear mixed-effects model was used to analyse the distribution of expenditure on iron deficiency medication across health boards in the UK. Expenditure is likely to depend on the iron status of the region; therefore, for a fair comparison among health boards, iron status was estimated using the method developed in objective 2) and used in the specification of the median amount spent. Each health board is formed by a set of general practices (GPs), therefore, a random intercept was used to induce correlation between expenditure from two GPs from the same health board. Finally, the approaches in objectives 1) and 2) were compared with the traditional approach based on weighted linear regression modelling used in the NDNS RP reports. All analyses were implemented using SAS and R. Results: The two-part model with GG distribution fitted to amount of iron consumed from selected episodically food, showed that females tended to have greater odds of consuming iron from foods but consumed smaller amounts. As age groups increased, consumption tended to increase relative to the reference group though odds of consumption varied. Iron consumption also appeared to be dependent on National Statistics Socio-Economic Classification (NSSEC) group with lower social groups consuming less, in general. The quantiles of iron intake estimated using the linear quantile mixed-effects model showed that more than 25% of females aged 11-50y are below the LRNI, and that 11-18y girls are the group at highest of deficiency in the UK. Predictions of spending on iron medication in the UK based on the linear quantile mixed-effects model showed areas of higher iron intake resulted in lower spending on treating iron deficiency. In a geographical display of expenditure, Northern Ireland featured the lowest amount spent. Comparing the results from the methods proposed here showed that using the traditional approach based on weighted regression analysis could result in spurious associations. Discussion: This thesis developed novel approaches to the analysis of dietary complex survey data to address three important objectives of diet surveillance, namely the mean estimation of food intake by population groups, identification of groups at high risk of nutrient deficiency and allocation of resources to alleviate nutrient deficiencies. The methods provided models of good fit to dietary data, accounted for the sources of data variability and extended the estimation procedures to incorporate the complex sample survey design. The use of a GG distribution for modelling intake is an important improvement over existing methods, as it includes many distributions with different shapes and its domain takes non-negative values. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. This also improves existing approaches that assume a zero correlation. The linear quantile mixed-effects model utilises the asymmetric Laplace distribution which can also accommodate many different distributional shapes, and likelihood-based estimation is robust to model misspecification. This method is an important improvement over existing methods used in nutritional research as it explicitly models the quantiles in terms of explanatory variables using a novel quantile regression model with random effects. The application of these models to UK national data confirmed the association of poorer diets and lower social class, identified the group of 11-50y females as a group at high risk of iron deficiency, and highlighted Northern Ireland as the region with the lowest expenditure on iron prescriptions.
8	Comparing Model-based and Design-based Structural Equation Modeling Approaches in Analyzing Complex Survey Data Wu, Jiun-Yu 2010 August 1900 (has links) Conventional statistical methods assuming data sampled under simple random sampling are inadequate for use on complex survey data with a multilevel structure and non-independent observations. In structural equation modeling (SEM) framework, a researcher can either use the ad-hoc robust sandwich standard error estimators to correct the standard error estimates (Design-based approach) or perform multilevel analysis to model the multilevel data structure (Model-based approach) to analyze dependent data. In a cross-sectional setting, the first study aims to examine the differences between the design-based single-level confirmatory factor analysis (CFA) and the model-based multilevel CFA for model fit test statistics/fit indices, and estimates of the fixed and random effects with corresponding statistical inference when analyzing multilevel data. Several design factors were considered, including: cluster number, cluster size, intra-class correlation, and the structure equality of the between-/within-level models. The performance of a maximum modeling strategy with the saturated higher-level and true lower-level model was also examined. Simulation study showed that the design-based approach provided adequate results only under equal between/within structures. However, in the unequal between/within structure scenarios, the design-based approach produced biased fixed and random effect estimates. Maximum modeling generated consistent and unbiased within-level model parameter estimates across three different scenarios. Multilevel latent growth curve modeling (MLGCM) is a versatile tool to analyze the repeated measure sampled from a multi-stage sampling. However, researchers often adopt latent growth curve models (LGCM) without considering the multilevel structure. This second study examined the influences of different model specifications on the model fit test statistics/fit indices, between/within-level regression coefficient and random effect estimates and mean structures. Simulation suggested that design-based MLGCM incorporating the higher-level covariates produces consistent parameter estimates and statistical inferences comparable to those from the model-based MLGCM and maintain adequate statistical power even with small cluster number. Structural Equation Modeling Model-based approach Design-based approach Multilevel modeling Robust Standard Error Estimator Complex Survey
9	Impact of Ignoring Nested Data Structures on Ability Estimation Shropshire, Kevin O'Neil 03 June 2014 (has links) The literature is clear that intentional or unintentional clustering of data elements typically results in the inflation of the estimated standard error of fixed parameter estimates. This study is unique in that it examines the impact of multilevel data structures on subject ability which are random effect predictions known as empirical Bayes estimates in the one-parameter IRT / Rasch model. The literature on the impact of complex survey design on latent trait models is mixed and there is no "best practice" established regarding how to handle this situation. A simulation study was conducted to address two questions related to ability estimation. First, what impacts does design based clustering have with respect to desirable statistical properties when estimating subject ability with the one-parameter IRT / Rasch model? Second, since empirical Bayes estimators have shrinkage properties, what impacts does clustering of first-stage sampling units have on measurement validity-does the first-stage sampling unit impact the ability estimate, and if so, is this desirable and equitable? Two models were fit to a factorial experimental design where the data were simulated over various conditions. The first model Rasch model formulated as a HGLM ignores the sample design (incorrect model) while the second incorporates a first-stage sampling unit (correct model). Study findings generally showed that the two models were comparable with respect to desirable statistical properties under a majority of the replicated conditions-more measurement error in ability estimation is found when the intra-class correlation is high and the item pool is small. In practice this is the exception rather than the norm. However, it was found that the empirical Bayes estimates were dependent upon the first-stage sampling unit raising the issue of equity and fairness in educational decision making. A real-world complex survey design with binary outcome data was also fit with both models. Analysis of the data supported the simulation design results which lead to the conclusion that modeling binary Rasch data may resort to a policy tradeoff between desirable statistical properties and measurement validity. / Ph. D. Complex survey designs clustering PSU nested data multilevel data hierarchical data two-level HGLM three-level HGLM Rasch ability estimation
10	[en] A COMPARATIVE STUDY OF METHODOLOGIES FOR MODELLING COMPLEX SURVEYS MODELLING - AN APPLICATION TO SAEB 99 / [pt] UM ESTUDO COMPARATIVO DAS METODOLOGIAS DE MODELAGEM DE DADOS AMOSTRAIS COMPLEXOS: UMA APLICAÇÃO AO SAEB 99 / [es] UN ESTUDIO COMPARATIVO DE LAS METODOLOGÍAS DE MODELAJE DE DATOS PROVENIENTES DE MUESTREOS COMPLEJOS UNA APLICACIÓN AL SAEB 99 MARCEL DE TOLEDO VIEIRA 23 July 2001 (has links) [pt] A consideração do desenho amostral é fundamental e indispensável em trabalhos que têm como objetivo a análise e modelagem de dados selecionados através de desenhos amostrais complexos. Desta forma torna-se possível a produção de resultados realmente úteis e confiáveis para os gestores de políticas públicas. O principal objetivo desta dissertação é chamar a atenção para a importância da utilização das técnicas adequadas ao tratamento de dados amostrais complexos, discutindo também as conseqüências de sua não adoção. As metodologias adequadas para a análise de dados amostrais complexos podem ser agrupadas em duas abordagens. A primeira, denominada de abordagem agregada, se baseia na incorporação de pesos e efeitos do plano amostral no ajuste dos modelos estatísticos. Através da outra abordagem, que é denominada de abordagem desagregada, a lógica de modelagem é modificada, incorporando os efeitos devidos à amostragem complexa. Isto pode ser feito através do uso de modelos lineares hierárquicos, ou multinível. Os dados analisados nesta dissertação foram coletados pelo Sistema Nacional de Avaliação da Educação Básica (SAEB) no ano de 1999. Esta pesquisa compreende um exame de conhecimentos e um levantamento sobre condições sócio-econômico-demográficas de mais de 200.000 alunos, suas escolas, professores e diretores. A amostra do SAEB 99 foi selecionada a partir de um plano amostral complexo. O desenho amostral do SAEB 99 considera amostragem aleatória estratificada de unidades conglomeradas, com múltiplas etapas. A estimação pontual de estatísticas descritivas a partir de dados amostrais complexos não apresenta grandes dificuldades na medida em que se utiliza de forma adequada os pesos na expansão da amostra. Será ilustrada, através de um exemplo, a importância dos pesos amostrais na estimação. Será verificado que sua não adoção no cálculo da média, na situação em questão, poderia gerar resultados superestimados. Nesta dissertação serão apresentados aspectos teóricos das técnicas (adequadas a dados amostrais complexos) de estimação pontual de parâmetros de modelos de regressão e de suas respectivas variâncias. Também é realizada uma discussão sobre o efeito do plano amostral, intervalos de confiança e testes de hipóteses, e sobre o pacote SUDAAN. Serão apresentados os resultados da aplicação das técnicas estudadas. Paralelamente, será conduzido um estudo dos determinantes da proficiência dos alunos. Ainda, serão apresentadas e analisadas as conseqüências de não se considerar o desenho amostral na estimação dos parâmetros dos modelos e de suas respectivas variâncias, para o SAEB 99. Será realizada uma interpretação educacional dos resultados apresentados. / [en] It is very important to consider the sample design in the analysis and modelling of complex survey data. It permits the production of correct results, which can be used for public political decision making and evaluation. The main objective of this dissertation is to give information about the importance of the use of the techniques for complex survey data. The methodologies for complex survey data analysis can be divided in two different approaches. The first is based on incorporating weights and design effects in the fitting of usual statistical models, such as contingency tables, regression, etc. This approach is called aggregated approach. The other approach, called disaggregated approach, modifies the model attempting to incorporate the complex population structure and/or design effects, for example using hierarchical (or multilevel) linear models. The data analysed in this dissertation were collected by the Brazilian National System of Basic Education Assessment (SAEB), in 1999. This survey applies an exam and asks social-economic-demographic information about more than 200.000 students, schools and teachers. The SAEB 99 sample were selected by a complex survey design, considering stratification and conglomeration, with multiples steps. There is not any problem in estimation of descriptive statistics, such as means, correlation and regression coefficients, provided that we correctly use the sample weights to expand the data. An example will be presented to verify the importance of the use of the sample weights. The theoretical aspects of the techniques for the estimation of regression model parameters and their variances will be presented. The design effect, confidence intervals, significance tests, and SUDAAN characteristics will also be discussed. The application of these techniques will be presented. It will be also conducted a study of the determinants of the student proficiency. It still will be presented and analysed the consequences of the non- consideration of the sample design in the estimation of parameters and their variances, for SAEB 99 data. The results will be educationally interpreted. / [es] La consideración del diseño muestral es fundamental e indispensable en trabajos que tienen como objetivo el análisis y modelaje de datos selecionados a través de diseños muestrales complejos. De esta forma es posible la producción de resultados realmente útiles y confiables para los gestores de políticas públicas. EL objetivo principal de esta disertación es llamar la atención para la importancia de la utilización de las técnicas adecuadas al tratamiento de datos muestrales complejos, discutiendo también las consecuencias de no adoptarlas. Las metodologías adecuadas para el análisis de datos muestrales complejos pueden ser agrupadas en dos abordajes. La primera, denominada de abordaje agregado, consiste en la incorporación de pesos y efectos del plano muestral en el ajuste de los modelos estadísticos. A través del otro abordaje, denominado de abordaje desagregado, se modifica la lógica, incorporando los efectos debidos al muestreo complejo. Esto puede realizarse a través del uso de modelos lineales jerárquicos, o multiníveles. Los datos analizados en esta disertación fueron colectados por el Sistema Nacional de Evaluación de la Educación Básica (SAEB) en el año de 1999. Esta investigación comprende un exámen de conocimientos y un levantamiento sobre condiciones socioeconómicas-demográficas de más de 200.000 alumnos, sus escuelas, profesores y directores. La muestra del SAEB 99 fue seleccionada a partir de un diseño muestral complejo. El diseño muestral del SAEB 99 considera el muestreo aleatório estratificado de unidades conglomeradas, con múltiples etapas. La estimación puntual de estadísticas descriptivas a partir de datos muestrales complejos no presenta grandes dificuldades si se utiliza de forma adecuada los pesos en la expansión de la muestra. Se ilustrará, a través de un ejemplo, la importancia de los pesos muestrales en la estimación. Será verificado que la no adopción de estos pesos en el cálculo de la media, podería generar resultados superestimados. En esta disertación serán presentados aspectos teóricos de las técnicas (adecuadas a datos de muestras complejas) de estimación puntual de parámetros de modelos de regresión y de sus respectivas varianzas. Se discute también el efecto del diseño muestral, intervalos de confianza y testes de hipótesis, y el paquete SUDAAN. Serán presentados los resultados de la aplicación de las técnicas estudiadas. Paralelamente, se estudian los determinantes de la proficiencia de los alumnos. Se presentan y analizan también, las consecuencias de no considerar el diseño muestral en la estimación de los parámetros del modelos y de sus respectivas varianzas, para el SAEB 99. Será realizada una interpretación educacional de los resultados presentados. [pt] PESQUISA COMPLEXA [pt] LINEARIZACAO [pt] PSEUDO-SEMELHANCA MAXIMA [pt] ABORDAGEM AGREGADA [en] COMPLEX SURVEY [en] LINEARIZATION [en] MAXIMUM PSEUDO-LIKELIHOOD [en] AGGREGATED APPROACH

Search results