Global ETD Search

1	Nonparametric Kernel Estimation Methods Using Complex Survey Data Clair, Luc 06 1900 (has links) This dissertation provides a thorough overview of the use of nonparametric estimation methods for analyzing data collected by complex sampling plans. Applied econometric analysis is often performed using data collected from large-scale surveys, which use complex sampling plans in order to reduce administrative costs and increase the estimation efficiency for subgroups of the population. These sampling plans result in unequal inclusion probabilities across units in the population. If one is interested in estimating descriptive statistics, it is highly recommended that one uses an estimator that weights each observation by the inverse of the unit's probability of being included in the sample. If one is interested in estimating causal effects, a weighted estimator should be used if the sampling criterion is correlated with the error term. The sampling criterion is the variable used to design the sampling scheme. If it is correlated with the error term, sampling is said to be endogenous and, if ignored, leads to inconsistent estimation. I consider three distinct probability weighted estimators: i) a nonparametric kernel regression estimator; ii) a conditional probability distribution function estimator; and iii) a nonparametric instrumental variable regression estimator. / Thesis / Doctor of Philosophy (PhD) Nonparametric Econometrics Complex Surveys
2	Interval Censoring and Longitudinal Survey Data Pantoja Galicia, Norberto January 2007 (has links) Being able to explore a relationship between two life events is of great interest to scientists from different disciplines. Some issues of particular concern are, for example, the connection between smoking cessation and pregnancy (Thompson and Pantoja-Galicia 2003), the interrelation between entry into marriage for individuals in a consensual union and first pregnancy (Blossfeld and Mills 2003), and the association between job loss and divorce (Charles and Stephens 2004, Huang 2003 and Yeung and Hofferth 1998). Establishing causation in observational studies is seldom possible. Nevertheless, if one of two events tends to precede the other closely in time, a causal interpretation of an association between these events can be more plausible. The role of longitudinal surveys is crucial, then, since they allow sequences of events for individuals to be observed. Thompson and Pantoja-Galicia (2003) discuss in this context several notions of temporal association and ordering, and propose an approach to investigate a possible relationship between two lifetime events. In longitudinal surveys individuals might be asked questions of particular interest about two specific lifetime events. Therefore the joint distribution might be advantageous for answering questions of particular importance. In follow-up studies, however, it is possible that interval censored data may arise due to several reasons. For example, actual dates of events might not have been recorded, or are missing, for a subset of (or all) the sampled population, and can be established only to within specified intervals. Along with the notions of temporal association and ordering, Thompson and Pantoja-Galicia (2003) also discuss the concept of one type of event "triggering" another. In addition they outline the construction of tests for these temporal relationships. The aim of this thesis is to implement some of these notions using interval censored data from longitudinal complex surveys. Therefore, we present some proposed tools that may be used for this purpose. This dissertation is divided in five chapters, the first chapter presents a notion of a temporal relationship along with a formal nonparametric test. The mechanisms of right censoring, interval censoring and left truncation are also overviewed. Issues on complex surveys designs are discussed at the end of this chapter. For the remaining chapters of the thesis, we note that the corresponding formal nonparametric test requires estimation of a joint density, therefore in the second chapter a nonparametric approach for bivariate density estimation with interval censored survey data is provided. The third chapter is devoted to model shorter term triggering using complex survey bivariate data. The semiparametric models in Chapter 3 consider both noncensoring and interval censoring situations. The fourth chapter presents some applications using data from the National Population Health Survey and the Survey of Labour and Income Dynamics from Statistics Canada. An overall discussion is included in the fifth chapter and topics for future research are also addressed in this last chapter. Interval censoring Complex sampling designs Bivariate density estimation Local likelihood Complex surveys Statistics
3	Interval Censoring and Longitudinal Survey Data Pantoja Galicia, Norberto January 2007 (has links) Being able to explore a relationship between two life events is of great interest to scientists from different disciplines. Some issues of particular concern are, for example, the connection between smoking cessation and pregnancy (Thompson and Pantoja-Galicia 2003), the interrelation between entry into marriage for individuals in a consensual union and first pregnancy (Blossfeld and Mills 2003), and the association between job loss and divorce (Charles and Stephens 2004, Huang 2003 and Yeung and Hofferth 1998). Establishing causation in observational studies is seldom possible. Nevertheless, if one of two events tends to precede the other closely in time, a causal interpretation of an association between these events can be more plausible. The role of longitudinal surveys is crucial, then, since they allow sequences of events for individuals to be observed. Thompson and Pantoja-Galicia (2003) discuss in this context several notions of temporal association and ordering, and propose an approach to investigate a possible relationship between two lifetime events. In longitudinal surveys individuals might be asked questions of particular interest about two specific lifetime events. Therefore the joint distribution might be advantageous for answering questions of particular importance. In follow-up studies, however, it is possible that interval censored data may arise due to several reasons. For example, actual dates of events might not have been recorded, or are missing, for a subset of (or all) the sampled population, and can be established only to within specified intervals. Along with the notions of temporal association and ordering, Thompson and Pantoja-Galicia (2003) also discuss the concept of one type of event "triggering" another. In addition they outline the construction of tests for these temporal relationships. The aim of this thesis is to implement some of these notions using interval censored data from longitudinal complex surveys. Therefore, we present some proposed tools that may be used for this purpose. This dissertation is divided in five chapters, the first chapter presents a notion of a temporal relationship along with a formal nonparametric test. The mechanisms of right censoring, interval censoring and left truncation are also overviewed. Issues on complex surveys designs are discussed at the end of this chapter. For the remaining chapters of the thesis, we note that the corresponding formal nonparametric test requires estimation of a joint density, therefore in the second chapter a nonparametric approach for bivariate density estimation with interval censored survey data is provided. The third chapter is devoted to model shorter term triggering using complex survey bivariate data. The semiparametric models in Chapter 3 consider both noncensoring and interval censoring situations. The fourth chapter presents some applications using data from the National Population Health Survey and the Survey of Labour and Income Dynamics from Statistics Canada. An overall discussion is included in the fifth chapter and topics for future research are also addressed in this last chapter. Interval censoring Complex sampling designs Bivariate density estimation Local likelihood Complex surveys Statistics
4	[en] A COMPARATIVE STUDY OF METHODOLOGIES FOR MODELLING COMPLEX SURVEYS MODELLING - AN APPLICATION TO SAEB 99 / [es] UN ESTUDIO COMPARATIVO DE LAS METODOLOGÍAS DE MODELAJE DE DATOS PROVENIENTES DE MUESTREOS COMPLEJOS UNA APLICACIÓN AL SAEB 99 / [pt] UM ESTUDO COMPARATIVO DAS METODOLOGIAS DE MODELAGEM DE DADOS AMOSTRAIS COMPLEXOS - UMA APLICAÇÃO AO SAEB 99 MARCEL DE TOLEDO VIEIRA 23 July 2001 (has links) [pt] A consideração do desenho amostral é fundamental e indispensável em trabalhos que têm como objetivo a análise e modelagem de dados selecionados através de desenhos amostrais complexos. Desta forma torna-se possível a produção de resultados realmente úteis e confiáveis para os gestores de políticas públicas. O principal objetivo desta dissertação é chamar a atenção para a importância da utilização das técnicas adequadas ao tratamento de dados amostrais complexos, discutindo também as conseqüências de sua não adoção. As metodologias adequadas para a análise de dados amostrais complexos podem ser agrupadas em duas abordagens. A primeira, denominada de abordagem agregada, se baseia na incorporação de pesos e efeitos do plano amostral no ajuste dos modelos estatísticos. Através da outra abordagem, que é denominada de abordagem desagregada, a lógica de modelagem é modificada, incorporando os efeitos devidos à amostragem complexa. Isto pode ser feito através do uso de modelos lineares hierárquicos, ou multinível. Os dados analisados nesta dissertação foram coletados pelo Sistema Nacional de Avaliação da Educação Básica (SAEB) no ano de 1999. Esta pesquisa compreende um exame de conhecimentos e um levantamento sobre condições sócio-econômico-demográficas de mais de 200.000 alunos, suas escolas, professores e diretores. A amostra do SAEB 99 foi selecionada a partir de um plano amostral complexo. O desenho amostral do SAEB 99 considera amostragem aleatória estratificada de unidades conglomeradas, com múltiplas etapas. A estimação pontual de estatísticas descritivas a partir de dados amostrais complexos não apresenta grandes dificuldades na medida em que se utiliza de forma adequada os pesos na expansão da amostra. Será ilustrada, através de um exemplo, a importância dos pesos amostrais na estimação. Será verificado que sua não adoção no cálculo da média, na situação em questão, poderia gerar resultados superestimados. Nesta dissertação serão apresentados aspectos teóricos das técnicas (adequadas a dados amostrais complexos) de estimação pontual de parâmetros de modelos de regressão e de suas respectivas variâncias. Também é realizada uma discussão sobre o efeito do plano amostral, intervalos de confiança e testes de hipóteses, e sobre o pacote SUDAAN. Serão apresentados os resultados da aplicação das técnicas estudadas. Paralelamente, será conduzido um estudo dos determinantes da proficiência dos alunos. Ainda, serão apresentadas e analisadas as conseqüências de não se considerar o desenho amostral na estimação dos parâmetros dos modelos e de suas respectivas variâncias, para o SAEB 99. Será realizada uma interpretação educacional dos resultados apresentados. / [en] It is very important to consider the sample design in the analysis and modelling of complex survey data. It permits the production of correct results, which can be used for public political decision making and evaluation. The main objective of this dissertation is to give information about the importance of the use of the techniques for complex survey data. The methodologies for complex survey data analysis can be divided in two different approaches. The first is based on incorporating weights and design effects in the fitting of usual statistical models, such as contingency tables, regression, etc. This approach is called aggregated approach. The other approach, called disaggregated approach, modifies the model attempting to incorporate the complex population structure and/or design effects, for example using hierarchical (or multilevel) linear models. The data analysed in this dissertation were collected by the Brazilian National System of Basic Education Assessment (SAEB), in 1999. This survey applies an exam and asks social-economic-demographic information about more than 200.000 students, schools and teachers. The SAEB 99 sample were selected by a complex survey design, considering stratification and conglomeration, with multiples steps. There is not any problem in estimation of descriptive statistics, such as means, correlation and regression coefficients, provided that we correctly use the sample weights to expand the data. An example will be presented to verify the importance of the use of the sample weights. The theoretical aspects of the techniques for the estimation of regression model parameters and their variances will be presented. The design effect, confidence intervals, significance tests, and SUDAAN characteristics will also be discussed. The application of these techniques will be presented. It will be also conducted a study of the determinants of the student proficiency. It still will be presented and analysed the consequences of the non- consideration of the sample design in the estimation of parameters and their variances, for SAEB 99 data. The results will be educationally interpreted. / [es] La consideración del diseño muestral es fundamental e indispensable en trabajos que tienen como objetivo el análisis y modelaje de datos selecionados a través de diseños muestrales complejos. De esta forma es posible la producción de resultados realmente útiles y confiables para los gestores de políticas públicas. EL objetivo principal de esta disertación es llamar la atención para la importancia de la utilización de las técnicas adecuadas al tratamiento de datos muestrales complejos, discutiendo también las consecuencias de no adoptarlas. Las metodologías adecuadas para el análisis de datos muestrales complejos pueden ser agrupadas en dos abordajes. La primera, denominada de abordaje agregado, consiste en la incorporación de pesos y efectos del plano muestral en el ajuste de los modelos estadísticos. A través del otro abordaje, denominado de abordaje desagregado, se modifica la lógica, incorporando los efectos debidos al muestreo complejo. Esto puede realizarse a través del uso de modelos lineales jerárquicos, o multiníveles. Los datos analizados en esta disertación fueron colectados por el Sistema Nacional de Evaluación de la Educación Básica (SAEB) en el año de 1999. Esta investigación comprende un exámen de conocimientos y un levantamiento sobre condiciones socioeconómicas-demográficas de más de 200.000 alumnos, sus escuelas, profesores y directores. La muestra del SAEB 99 fue seleccionada a partir de un diseño muestral complejo. El diseño muestral del SAEB 99 considera el muestreo aleatório estratificado de unidades conglomeradas, con múltiples etapas. La estimación puntual de estadísticas descriptivas a partir de datos muestrales complejos no presenta grandes dificuldades si se utiliza de forma adecuada los pesos en la expansión de la muestra. Se ilustrará, a través de un ejemplo, la importancia de los pesos muestrales en la estimación. Será verificado que la no adopción de estos pesos en el cálculo de la media, podería generar resultados superestimados. En esta disertación serán presentados aspectos teóricos de las técnicas (adecuadas a datos de muestras complejas) de estimación puntual de parámetros de modelos de regresión y de sus respectivas varianzas. Se discute también el efecto del diseño muestral, intervalos de confianza y testes de hipótesis, y el paquete SUDAAN. Serán presentados los resultados de la aplicación de las técnicas estudiadas. Paralelamente, se estudian los determinantes de la proficiencia de los alumnos. Se presentan y analizan también, las consecuencias de no considerar el diseño muestral en la estimación de los parámetros del modelos y de sus respectivas varianzas, para el SAEB 99. Será realizada una interpretación educacional de los resultados presentados. [pt] PESQUISA COMPLEXAS [en] COMPLEX SURVEYS [pt] ABORDAGEM AGREGADA [en] AGGREGATED APPROACH [pt] PSEUDO-SEMELHANCA MAXIMA [en] MAXIMUM PSEUDO-LIKELIHOOD [pt] LINEARIZACAO [en] LINEARIZATION
5	Analysis of Longitudinal Surveys with Missing Responses Carrillo Garcia, Ivan Adolfo January 2008 (has links) Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. The National Longitudinal Survey of Children and Youth (NLSCY), a large scale survey with a complex sampling design and conducted by Statistics Canada, follows a large group of children and youth over time and collects measurement on various indicators related to their educational, behavioral and psychological development. One of the major objectives of the study is to explore how such development is related to or affected by familial, environmental and economical factors. The generalized estimating equation approach, sometimes better known as the GEE method, is the most popular statistical inference tool for longitudinal studies. The vast majority of existing literature on the GEE method, however, uses the method for non-survey settings; and issues related to complex sampling designs are ignored. This thesis develops methods for the analysis of longitudinal surveys when the response variable contains missing values. Our methods are built within the GEE framework, with a major focus on using the GEE method when missing responses are handled through hot-deck imputation. We first argue why, and further show how, the survey weights can be incorporated into the so-called Pseudo GEE method under a joint randomization framework. The consistency of the resulting Pseudo GEE estimators with complete responses is established under the proposed framework. The main focus of this research is to extend the proposed pseudo GEE method to cover cases where the missing responses are imputed through the hot-deck method. Both weighted and unweighted hot-deck imputation procedures are considered. The consistency of the pseudo GEE estimators under imputation for missing responses is established for both procedures. Linearization variance estimators are developed for the pseudo GEE estimators under the assumption that the finite population sampling fraction is small or negligible, a scenario often held for large scale population surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study. The results show that the pseudo GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous response and binary response. Longitudinal surveys Complex surveys GEE Pseudo-GEE Missing values Weighted GEE Hot- deck imputation Variance estimation Joint randomization Statistics
6	Analysis of Longitudinal Surveys with Missing Responses Carrillo Garcia, Ivan Adolfo January 2008 (has links) Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. The National Longitudinal Survey of Children and Youth (NLSCY), a large scale survey with a complex sampling design and conducted by Statistics Canada, follows a large group of children and youth over time and collects measurement on various indicators related to their educational, behavioral and psychological development. One of the major objectives of the study is to explore how such development is related to or affected by familial, environmental and economical factors. The generalized estimating equation approach, sometimes better known as the GEE method, is the most popular statistical inference tool for longitudinal studies. The vast majority of existing literature on the GEE method, however, uses the method for non-survey settings; and issues related to complex sampling designs are ignored. This thesis develops methods for the analysis of longitudinal surveys when the response variable contains missing values. Our methods are built within the GEE framework, with a major focus on using the GEE method when missing responses are handled through hot-deck imputation. We first argue why, and further show how, the survey weights can be incorporated into the so-called Pseudo GEE method under a joint randomization framework. The consistency of the resulting Pseudo GEE estimators with complete responses is established under the proposed framework. The main focus of this research is to extend the proposed pseudo GEE method to cover cases where the missing responses are imputed through the hot-deck method. Both weighted and unweighted hot-deck imputation procedures are considered. The consistency of the pseudo GEE estimators under imputation for missing responses is established for both procedures. Linearization variance estimators are developed for the pseudo GEE estimators under the assumption that the finite population sampling fraction is small or negligible, a scenario often held for large scale population surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study. The results show that the pseudo GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous response and binary response. Longitudinal surveys Complex surveys GEE Pseudo-GEE Missing values Weighted GEE Hot- deck imputation Variance estimation Joint randomization Statistics

1

Page generated in 0.0589 seconds