Global ETD Search

1	Methods for handling missing data in cohort studies where outcomes are truncated by death Wen, Lan January 2018 (has links) This dissertation addresses problems found in observational cohort studies where the repeated outcomes of interest are truncated by both death and by dropout. In particular, we consider methods that make inference for the population of survivors at each time point, otherwise known as 'partly conditional inference'. Partly conditional inference distinguishes between the reasons for missingness; failure to make this distinction will cause inference to be based not only on pre-death outcomes which exist but also on post-death outcomes which fundamentally do not exist. Such inference is called 'immortal cohort inference'. Investigations of health and cognitive outcomes in two studies - the 'Origins of Variance in the Old Old' and the 'Health and Retirement Study' - are conducted. Analysis of these studies is complicated by outcomes of interest being missing because of death and dropout. We show, first, that linear mixed models and joint models (that model both the outcome and survival processes) produce immortal cohort inference. This makes the parameters in the longitudinal (sub-)model difficult to interpret. Second, a thorough comparison of well-known methods used to handle missing outcomes - inverse probability weighting, multiple imputation and linear increments - is made, focusing particularly on the setting where outcomes are missing due to both dropout and death. We show that when the dropout models are correctly specified for inverse probability weighting, and the imputation models are correctly specified for multiple imputation or linear increments, then the assumptions of multiple imputation and linear increments are the same as those of inverse probability weighting only if the time of death is included in the dropout and imputation models. Otherwise they may not be. Simulation studies show that each of these methods gives negligibly biased estimates of the partly conditional mean when its assumptions are met, but potentially biased estimates if its assumptions are not met. In addition, we develop new augmented inverse probability weighted estimating equations for making partly conditional inference, which offer double protection against model misspecification. That is, as long as one of the dropout and imputation models is correctly specified, the partly conditional inference is valid. Third, we describe methods that can be used to make partly conditional inference for non-ignorable missing data. Both monotone and non-monotone missing data are considered. We propose three methods that use a tilt function to relate the distribution of an outcome at visit j among those who were last observed at some time before j to those who were observed at visit j. Sensitivity analyses to departures from ignorable missingness assumptions are conducted on simulations and on real datasets. The three methods are: i) an inverse probability weighted method that up-weights observed subjects to represent subjects who are still alive but are not observed; ii) an imputation method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) a new augmented inverse probability method that combines the previous two methods and is doubly-robust against model misspecification.
2	Working correlation selection in generalized estimating equations Jang, Mi Jin 01 December 2011 (has links) Longitudinal data analysis is common in biomedical research area. Generalized estimating equations (GEE) approach is widely used for longitudinal marginal models. The GEE method is known to provide consistent regression parameter estimates regardless of the choice of working correlation structure, provided the square root of n consistent nuisance parameters are used. However, it is important to use the appropriate working correlation structure in small samples, since it improves the statistical efficiency of β estimate. Several working correlation selection criteria have been proposed (Rotnitzky and Jewell, 1990; Pan, 2001; Hin and Wang, 2009; Shults et. al, 2009). However, these selection criteria have the same limitation in that they perform poorly when over-parameterized structures are considered as candidates. In this dissertation, new working correlation selection criteria are developed based on generalized eigenvalues. A set of generalized eigenvalues is used to measure the disparity between the bias-corrected sandwich variance estimator under the hypothesized working correlation matrix and the model-based variance estimator under a working independence assumption. A summary measure based on the set of the generalized eigenvalues provides an indication of the disparity between the true correlation structure and the misspecified working correlation structure. Motivated by the test statistics in MANOVA, three working correlation selection criteria are proposed: PT (Pillai's trace type criterion),WR (Wilks' ratio type criterion) and RMR (Roy's Maximum Root type criterion). The relationship between these generalized eigenvalues and the CIC measure is revealed. In addition, this dissertation proposes a method to penalize for the over-parameterized working correlation structures. The over-parameterized structure converges to the true correlation structure, using extra parameters. Thus, the true correlation structure and the over-parameterized structure tend to provide similar variance estimate of the estimated β and similar working correlation selection criterion values. However, the over-parameterized structure is more likely to be chosen as the best working correlation structure by "the smaller the better" rule for criterion values. This is because the over-parameterization leads to the negatively biased sandwich variance estimator, hence smaller selection criterion value. In this dissertation, the over-parameterized structure is penalized through cluster detection and an optimization function. In order to find the group ("cluster") of the working correlation structures that are similar to each other, a cluster detection method is developed, based on spacings of the order statistics of the selection criterion measures. Once a cluster is found, the optimization function considering the trade-off between bias and variability provides the choice of the "best" approximating working correlation structure. The performance of our proposed criterion measures relative to other relevant criteria (QIC, RJ and CIC) is examined in a series of simulation studies. Generalized Eigenvalue Generalized Estimating Equation Longitudinal data Model Selection Penalization Working Correlation Structure Biostatistics
3	En jämförelse mellan individers självuppskattade livskvalitet och samhällets hälsopreferenser : En paneldatastudie av hjärtpatienter Lyth, Johan January 2006 (has links) Objective: In recent years there has been an increasing interest within the clinical (medical) science in measuring people’s health. When estimating quality of life, present practise is to use the EQ-5D questionnaire and an index which weighs the different questions. The question is what happens if the individuals estimate there own health, would it differ from the public preferences? The aim is to make a new prediction model based on the opinion of patients and compare it to the present model based on public preferences. Method: A sample of 362 patients with unstable coronary artery disease from the Frisc II trial, valued their quality of life in the acute phase and after 3, 6 and 12 months. The EQ-5D question form and also the Time Trade-off method (TTO), a direct method of valuing health was used. A regression technique managing panel data had to be used in estimating TTO by the EQ-5D and other variables like gender and age. Result: Different regression techniques vary in estimating parameters and standard errors. A Generalized Estimating Equation approach with empirical correlation structure is the most suitable regression technique for the data material. A model based on the EQ-5D question form and a continuous age variable proves to be the best model for an index derived by individuals. The difference between heart patients own opinion of health and the public preferences differs a great amount in the severe health conditions, but are rather small for healthy patients. Of the total 243 health conditions, only eight of the conditions were estimated higher by the public index. Conclusions: As the differences between the approaches are significantly large the choice of index could affect the decision making in a health economic study. Individuella preferenser livskvalitet hälsoekonomi paneldata Statistics Statistik
4	Selecting the Working Correlation Structure by a New Generalized AIC Index for Longitudinal Data Lin, Wei-Lun 28 November 2007 (has links) The analysis of longitudinal data has been a popular subject for the recent years. The growth of the Generalized Estimating Equation (GEE) Liang & Zeger, 1986) is one of the most influential recent developments in statistical practice for this practice. GEE methods are attractive both from a theoretical and a practical standpoint. In this paper, we are interested in the influence of different "working" correlation structures for modeling the longitudinal data. Furthermore, we propose a new AIC-like method for the model assessment which generalized AIC from the point of view of the data generating. By comparing the difference of the log-likelihood functions between different correlation models, we define the exact value to create an interval for our model selection. In this thesis, we combine the GEE method and a new generalized AIC Index for the longitudinal data with different correlation structures. Longitudinal data Generalized AIC Index Working Correlation Generalized Estimating Equation Mathematics
5	Spline-based sieve semiparametric generalized estimating equation for panel count data Hua, Lei 01 May 2010 (has links) In this thesis, we propose to analyze panel count data using a spline-based sieve generalized estimating equation method with a semiparametric proportional mean model E(N(t)\|Z) = Λ0(t) eβT0Z. The natural log of the baseline mean function, logΛ0(t), is approximated by a monotone cubic B-spline function. The estimates of regression parameters and spline coefficients are the roots of the spline based sieve generalized estimating equations (sieve GEE). The proposed method avoids assumingany parametric structure of the baseline mean function and the underlying counting process. Selection of an appropriate covariance matrix that represents the true correlation between the cumulative counts improves estimating efficiency. In addition to the parameters existing in the proportional mean function, the estimation that accounts for the over-dispersion and autocorrelation involves an extra nuisance parameter σ2, which could be estimated using a method of moment proposed by Zeger (1988). The parameters in the mean function are then estimated by solving the pseudo generalized estimating equation with σ2 replaced by its estimate, σ2n. We show that the estimate of (β0,Λ0) based on this two-stage approach is still consistent and could converge at the optimal convergence rate in the nonparametric/semiparametric regression setting. The asymptotic normality of the estimate of β0 is also established. We further propose a spline-based projection variance estimating method and show its consistency. Simulation studies are conducted to investigate finite sample performance of the sieve semiparametric GEE estimates, as well as different variance estimating methods with different sample sizes. The covariance matrix that accounts for the overdispersion generally increases estimating efficiency when overdispersion is present in the data. Finally, the proposed method with different covariance matrices is applied to a real data from a bladder tumor clinical trial. Counting process Generalized Estimating Equation Monotone polynomial splines Over-dispersion Semiparametric model Biostatistics
6	Evidências da sofisticação do padrão de consumo dos domicílios brasileiros: uma análise de cestas de produtos de consumo doméstico / Evidence of the sophistication of consumption patterns of Brazilian households: an analysis of household consumption product baskets Luppe, Marcos Roberto 21 December 2010 (has links) A economia brasileira passa por um momento positivo em sua história, devido principalmente a fatores gerados pela estabilidade econômica advinda com o Plano Real. O conjunto de dados apresentados neste trabalho evidencia uma melhora das condições socioeconômicas de grande parte da população, o que levou a um aumento da renda dos indivíduos e um fortalecimento do poder de consumo dos brasileiros. Nesse contexto, esta tese teve como objetivo a busca de evidências que indicassem uma mudança e possível sofisticação do padrão de consumo dos domicílios brasileiros. Além disso, procurou-se verificar em quais níveis socioeconômicos e em quais regiões as mudanças do padrão de consumo foram mais significativas. Os dados utilizados neste trabalho derivam de um painel de consumidores (Homescan) e foram analisadas informações de dez categorias de produtos de consumo doméstico para os anos de 2007, 2008 e 2009, considerando-se as áreas geográficas auditadas pela Nielsen e os níveis socioeconômicos dos domicílios. Nas análises dos dados, utilizaram-se modelos de equações de estimação generalizadas (EEG), além de análises estatísticas descritivas para avaliar a evolução das variáveis não-contempladas nesses modelos. Além disso, utilizaram-se dados de outra pesquisa (Retail Index) para complementar os resultados obtidos com o painel de consumidores. Os resultados das análises realizadas indicam uma mudança do padrão de consumo, primordialmente, nos domicílios de nível socioeconômico médio (classe C) e baixo (classes D e E) no período analisado. Quanto às áreas geográficas pesquisadas, os destaques foram o Nordeste, o grande Rio de Janeiro e a região Sul. Levando-se em consideração que as categorias analisadas são produtos mais elaborados e de maior valor agregado, o aumento do consumo da grande maioria das categorias nesses níveis socioeconômicos evidencia uma sofisticação do consumo desses domicílios. Esse ambiente de sofisticação dos padrões de consumo, principalmente das classes de renda média e baixa, exigirá das empresas que atuam no mercado de bens e serviços novas estratégias para atender as demandas de consumidores mais conscientes e exigentes. Assim, o grande desafio dessas empresas será decifrar o caminho da expansão e diversificação da cesta de compra desses consumidores. / The Brazilian economy is currently going through a positive time in its history, mainly as a result of factors generated by the economic stability conferred by the Plano Real financial plan. The data presented in this work shows an improvement in the socioeconomic conditions of the vast majority of the population, which has led to an increase in income for individuals, and a strengthening of the consumer power of Brazilians. In this context, this thesis looks for evidence that indicates a change and possible sophistication of consumer patterns in Brazilian households. It also seeks to determine the socioeconomic levels, and the regions in which the changes in consumer patterns are most significant. The data used in this work are derived from a panel of consumers (Homescan), and information from ten categories of domestic consumer goods were analyzed for the years 2007, 2008 and 2009, considering the geographic areas audited by Nielsen and the socioeconomic levels of the households. In the data analyses, generalized estimating equation (GEE) models are used, as well as descriptive statistical analyses, to evaluate the evolution of variables not included in these models. Data are also used from another survey (Retail Index), to complement the results obtained with the panel of consumers. The results of the analyses indicate a change in consumer patterns, particularly in households belonging to the middle (class C) and low (classes D and E) socioeconomic classes, for the period analyzed. In terms of geographical areas researched, the areas highlighted were the Northeast, the greater Rio de Janeiro and the South region. Taking into consideration that the categories analyzed consist of more elaborate products, with higher added value, the increased consumption for the majority of categories at these socioeconomic levels shows that consumption in these households has become more sophisticated. This environment of increasing sophistication of consumer patterns, particularly among the middle and low income classes, will require companies in the goods and services market to implement strategies to meet the requirements of these more aware and demanding consumers. Therefore, the greatest challenge for these companies is to seize the expansion and diversification path of the shopping basket for these consumers. Consumer panel Consumidor (Aspectos socioeconômicos) Consumo (Economia) Consumo (Padrões) Consumption pattern Equações de estimação Generalized estimating equation Household
7	Model Selection via Minimum Description Length Li, Li 10 January 2012 (has links) The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level. Minimum description length Model selection AIC BIC Data compression Linear mixed effects Generalized estimating equation Functional data 0463
8	Model Selection via Minimum Description Length Li, Li 10 January 2012 (has links) The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level. Minimum description length Model selection AIC BIC Data compression Linear mixed effects Generalized estimating equation Functional data 0463
9	Evidências da sofisticação do padrão de consumo dos domicílios brasileiros: uma análise de cestas de produtos de consumo doméstico / Evidence of the sophistication of consumption patterns of Brazilian households: an analysis of household consumption product baskets Marcos Roberto Luppe 21 December 2010 (has links) A economia brasileira passa por um momento positivo em sua história, devido principalmente a fatores gerados pela estabilidade econômica advinda com o Plano Real. O conjunto de dados apresentados neste trabalho evidencia uma melhora das condições socioeconômicas de grande parte da população, o que levou a um aumento da renda dos indivíduos e um fortalecimento do poder de consumo dos brasileiros. Nesse contexto, esta tese teve como objetivo a busca de evidências que indicassem uma mudança e possível sofisticação do padrão de consumo dos domicílios brasileiros. Além disso, procurou-se verificar em quais níveis socioeconômicos e em quais regiões as mudanças do padrão de consumo foram mais significativas. Os dados utilizados neste trabalho derivam de um painel de consumidores (Homescan) e foram analisadas informações de dez categorias de produtos de consumo doméstico para os anos de 2007, 2008 e 2009, considerando-se as áreas geográficas auditadas pela Nielsen e os níveis socioeconômicos dos domicílios. Nas análises dos dados, utilizaram-se modelos de equações de estimação generalizadas (EEG), além de análises estatísticas descritivas para avaliar a evolução das variáveis não-contempladas nesses modelos. Além disso, utilizaram-se dados de outra pesquisa (Retail Index) para complementar os resultados obtidos com o painel de consumidores. Os resultados das análises realizadas indicam uma mudança do padrão de consumo, primordialmente, nos domicílios de nível socioeconômico médio (classe C) e baixo (classes D e E) no período analisado. Quanto às áreas geográficas pesquisadas, os destaques foram o Nordeste, o grande Rio de Janeiro e a região Sul. Levando-se em consideração que as categorias analisadas são produtos mais elaborados e de maior valor agregado, o aumento do consumo da grande maioria das categorias nesses níveis socioeconômicos evidencia uma sofisticação do consumo desses domicílios. Esse ambiente de sofisticação dos padrões de consumo, principalmente das classes de renda média e baixa, exigirá das empresas que atuam no mercado de bens e serviços novas estratégias para atender as demandas de consumidores mais conscientes e exigentes. Assim, o grande desafio dessas empresas será decifrar o caminho da expansão e diversificação da cesta de compra desses consumidores. / The Brazilian economy is currently going through a positive time in its history, mainly as a result of factors generated by the economic stability conferred by the Plano Real financial plan. The data presented in this work shows an improvement in the socioeconomic conditions of the vast majority of the population, which has led to an increase in income for individuals, and a strengthening of the consumer power of Brazilians. In this context, this thesis looks for evidence that indicates a change and possible sophistication of consumer patterns in Brazilian households. It also seeks to determine the socioeconomic levels, and the regions in which the changes in consumer patterns are most significant. The data used in this work are derived from a panel of consumers (Homescan), and information from ten categories of domestic consumer goods were analyzed for the years 2007, 2008 and 2009, considering the geographic areas audited by Nielsen and the socioeconomic levels of the households. In the data analyses, generalized estimating equation (GEE) models are used, as well as descriptive statistical analyses, to evaluate the evolution of variables not included in these models. Data are also used from another survey (Retail Index), to complement the results obtained with the panel of consumers. The results of the analyses indicate a change in consumer patterns, particularly in households belonging to the middle (class C) and low (classes D and E) socioeconomic classes, for the period analyzed. In terms of geographical areas researched, the areas highlighted were the Northeast, the greater Rio de Janeiro and the South region. Taking into consideration that the categories analyzed consist of more elaborate products, with higher added value, the increased consumption for the majority of categories at these socioeconomic levels shows that consumption in these households has become more sophisticated. This environment of increasing sophistication of consumer patterns, particularly among the middle and low income classes, will require companies in the goods and services market to implement strategies to meet the requirements of these more aware and demanding consumers. Therefore, the greatest challenge for these companies is to seize the expansion and diversification path of the shopping basket for these consumers. Consumidor (Aspectos socioeconômicos) Consumo (Economia) Consumo (Padrões) Equações de estimação Consumer panel Consumption pattern Generalized estimating equation Household
10	Equação de estimação generalizada e influência local para modelos de regressão beta com medidas repetidas / Generalized estimating equation and local influence to beta regression models with repeated measures Venezuela, Maria Kelly 04 March 2008 (has links) Utilizando a teoria de função de estimação linear ótima (Crowder, 1987), propomos equações de estimação generalizadas para modelos de regressão beta (Ferrari e Cribari-Neto, 2004) com medidas repetidas. Além disso, apresentamos equações de estimação generalizadas para modelos de regressão simplex baseadas nas propostas de Song e Tan (2000) e Song et al. (2004) e equações de estimação generalizadas para modelos lineares generalizados com medidas repetidas baseadas nas propostas de Artes e Jorgensen (2000) e Liang e Zeger (1986). Todas essas equações de estimação são desenvolvidas sob os enfoques da modelagem da média com homogeneidade da dispersão e da modelagem conjunta da média e da dispersão com intuito de incorporar ao modelo uma possível heterogeneidade da dispersão. Como técnicas de diagnóstico, desenvolvemos uma generalização de algumas medidas de diagnóstico quando abordamos quaisquer equações de estimação definidas tanto para modelagem do parâmetro de posição considerando a homogeneidade do parâmetro de dispersão como para modelagem conjunta dos parâmetros de posição e dispersão. Entre essas medidas, destacamos a proposta da influência local (Cook, 1986) desenvolvida para equações de estimação. Essa medida teve um bom desempenho, em simulações, para destacar corretamente pontos influentes. Por fim, realizamos aplicações a conjuntos de dados reais. / Based on the concept of optimum linear estimating equation (Crowder, 1987), we develop generalized estimating equation (GEE) to analyze longitudinal data considering marginal beta regression models (Ferrari and Cribari-Neto, 2004). The GEEs are also presented to marginal simplex models for longitudinal continuous proportional data proposed by Song and Tan (2000) and Song et al. (2004) and to generalized linear models for longitudinal data based on the proposes of Artes and J$\\phi$rgensen (2000) and Liang and Zeger (1986). All of them are developed focusing the assumption of homogeneous dispersion and with varying dispersion. For the diagnostic techniques, we generalize some diagnostic measures for estimating equations to model the position parameter considering an homogeneous dispersion parameter and for joint modelling of position and dispersion parameters to take in account a possible heterogeneous dispersion. Among these measures, we point out the local influence (Cook, 1986) developed to estimating equations. This measure can correctly show influential observations in simulation study. Finally, the theory is applied to real data sets. beta distribution dados longitudinais distribuição beta equação de estimação generalizada generalized estimating equation influência local local influence longitudinal data medidas repetidas repeated measure

Search results