1 |
The extended empirical likelihoodWu, Fan 04 May 2015 (has links)
The empirical likelihood method introduced by Owen (1988, 1990) is a powerful
nonparametric method for statistical inference. It has been one of the most researched
methods in statistics in the last twenty-five years and remains to be a very active
area of research today. There is now a large body of literature on empirical likelihood
method which covers its applications in many areas of statistics (Owen, 2001).
One important problem affecting the empirical likelihood method is its poor accuracy,
especially for small sample and/or high-dimension applications. The poor
accuracy can be alleviated by using high-order empirical likelihood methods such as
the Bartlett corrected empirical likelihood but it cannot be completely resolved by
high-order asymptotic methods alone. Since the work of Tsao (2004), the impact of
the convex hull constraint in the formulation of the empirical likelihood on the finite sample
accuracy has been better understood, and methods have been developed to
break this constraint in order to improve the accuracy. Three important methods
along this direction are [1] the penalized empirical likelihood of Bartolucci (2007)
and Lahiri and Mukhopadhyay (2012), [2] the adjusted empirical likelihood by Chen,
Variyath and Abraham (2008), Emerson and Owen (2009), Liu and Chen (2010) and
Chen and Huang (2012), and [3] the extended empirical likelihood of Tsao (2013) and
Tsao and Wu (2013). The latter is particularly attractive in that it retains not only
the asymptotic properties of the original empirical likelihood, but also its important
geometric characteristics. In this thesis, we generalize the extended empirical likelihood
of Tsao and Wu (2013) to handle inferences in two large classes of one-sample
and two-sample problems.
In Chapter 2, we generalize the extended empirical likelihood to handle inference
for the large class of parameters defined by one-sample estimating equations, which
includes the mean as a special case. In Chapters 3 and 4, we generalize the extended
empirical likelihood to handle two-sample problems; in Chapter 3, we study the extended
empirical likelihood for the difference between two p-dimensional means; in
Chapter 4, we consider the extended empirical likelihood for the difference between
two p-dimensional parameters defined by estimating equations. In all cases, we give
both the first- and second-order extended empirical likelihood methods and compare
these methods with existing methods. Technically, the two-sample mean problem
in Chapter 3 is a special case of the general two-sample problem in Chapter 4. We
single out the mean case to form Chapter 3 not only because it is a standalone published
work, but also because it naturally leads up to the more difficult two-sample
estimating equations problem in Chapter 4. We note that Chapter 2 is the published paper Tsao and Wu (2014); Chapter 3 is
the published paper Wu and Tsao (2014). To comply with the University of Victoria
policy regarding the use of published work for thesis and in accordance with copyright
agreements between authors and journal publishers, details of these published work
are acknowledged at the beginning of these chapters. Chapter 4 is another joint paper
Tsao and Wu (2015) which has been submitted for publication. / Graduate / 0463 / fwu@uvic.ca
|
2 |
Methods for handling missing data in cohort studies where outcomes are truncated by deathWen, Lan January 2018 (has links)
This dissertation addresses problems found in observational cohort studies where the repeated outcomes of interest are truncated by both death and by dropout. In particular, we consider methods that make inference for the population of survivors at each time point, otherwise known as 'partly conditional inference'. Partly conditional inference distinguishes between the reasons for missingness; failure to make this distinction will cause inference to be based not only on pre-death outcomes which exist but also on post-death outcomes which fundamentally do not exist. Such inference is called 'immortal cohort inference'. Investigations of health and cognitive outcomes in two studies - the 'Origins of Variance in the Old Old' and the 'Health and Retirement Study' - are conducted. Analysis of these studies is complicated by outcomes of interest being missing because of death and dropout. We show, first, that linear mixed models and joint models (that model both the outcome and survival processes) produce immortal cohort inference. This makes the parameters in the longitudinal (sub-)model difficult to interpret. Second, a thorough comparison of well-known methods used to handle missing outcomes - inverse probability weighting, multiple imputation and linear increments - is made, focusing particularly on the setting where outcomes are missing due to both dropout and death. We show that when the dropout models are correctly specified for inverse probability weighting, and the imputation models are correctly specified for multiple imputation or linear increments, then the assumptions of multiple imputation and linear increments are the same as those of inverse probability weighting only if the time of death is included in the dropout and imputation models. Otherwise they may not be. Simulation studies show that each of these methods gives negligibly biased estimates of the partly conditional mean when its assumptions are met, but potentially biased estimates if its assumptions are not met. In addition, we develop new augmented inverse probability weighted estimating equations for making partly conditional inference, which offer double protection against model misspecification. That is, as long as one of the dropout and imputation models is correctly specified, the partly conditional inference is valid. Third, we describe methods that can be used to make partly conditional inference for non-ignorable missing data. Both monotone and non-monotone missing data are considered. We propose three methods that use a tilt function to relate the distribution of an outcome at visit j among those who were last observed at some time before j to those who were observed at visit j. Sensitivity analyses to departures from ignorable missingness assumptions are conducted on simulations and on real datasets. The three methods are: i) an inverse probability weighted method that up-weights observed subjects to represent subjects who are still alive but are not observed; ii) an imputation method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) a new augmented inverse probability method that combines the previous two methods and is doubly-robust against model misspecification.
|
3 |
Working correlation selection in generalized estimating equationsJang, Mi Jin 01 December 2011 (has links)
Longitudinal data analysis is common in biomedical research area. Generalized estimating equations (GEE) approach is widely used for longitudinal marginal models. The GEE method is known to provide consistent regression parameter estimates regardless of the choice of working correlation structure, provided the square root of n consistent nuisance parameters are used. However, it is important to use the appropriate working correlation structure in small samples, since it improves the statistical efficiency of β estimate. Several working correlation selection criteria have been proposed (Rotnitzky and Jewell, 1990; Pan, 2001; Hin and Wang, 2009; Shults et. al, 2009). However, these selection criteria have the same limitation in that they perform poorly when over-parameterized structures are considered as candidates. In this dissertation, new working correlation selection criteria are developed based on generalized eigenvalues. A set of generalized eigenvalues is used to measure the disparity between the bias-corrected sandwich variance estimator under the hypothesized working correlation matrix and the model-based variance estimator under a working independence assumption. A summary measure based on the set of the generalized eigenvalues provides an indication of the disparity between the true correlation structure and the misspecified working correlation structure. Motivated by the test statistics in MANOVA, three working correlation selection criteria are proposed: PT (Pillai's trace type criterion),WR (Wilks' ratio type criterion) and RMR (Roy's Maximum Root type criterion). The relationship between these generalized eigenvalues and the CIC measure is revealed.
In addition, this dissertation proposes a method to penalize for the over-parameterized working correlation structures. The over-parameterized structure converges to the true correlation structure, using extra parameters. Thus, the true correlation structure and the over-parameterized structure tend to provide similar variance estimate of the estimated β and similar working correlation selection criterion values. However, the over-parameterized structure is more likely to be chosen as the best working correlation structure by "the smaller the better" rule for criterion values. This is because the over-parameterization leads to the negatively biased sandwich variance estimator, hence smaller selection criterion value. In this dissertation, the over-parameterized structure is penalized through cluster detection and an optimization function. In order to find the group ("cluster") of the working correlation structures that are similar to each other, a cluster detection method is developed, based on spacings of the order statistics of the selection criterion measures. Once a cluster is found, the optimization function considering the trade-off between bias and variability provides the choice of the "best" approximating working correlation structure.
The performance of our proposed criterion measures relative to other relevant criteria (QIC, RJ and CIC) is examined in a series of simulation studies.
|
4 |
En jämförelse mellan individers självuppskattade livskvalitet och samhällets hälsopreferenser : En paneldatastudie av hjärtpatienterLyth, Johan January 2006 (has links)
Objective: In recent years there has been an increasing interest within the clinical (medical) science in measuring people’s health. When estimating quality of life, present practise is to use the EQ-5D questionnaire and an index which weighs the different questions. The question is what happens if the individuals estimate there own health, would it differ from the public preferences? The aim is to make a new prediction model based on the opinion of patients and compare it to the present model based on public preferences. Method: A sample of 362 patients with unstable coronary artery disease from the Frisc II trial, valued their quality of life in the acute phase and after 3, 6 and 12 months. The EQ-5D question form and also the Time Trade-off method (TTO), a direct method of valuing health was used. A regression technique managing panel data had to be used in estimating TTO by the EQ-5D and other variables like gender and age. Result: Different regression techniques vary in estimating parameters and standard errors. A Generalized Estimating Equation approach with empirical correlation structure is the most suitable regression technique for the data material. A model based on the EQ-5D question form and a continuous age variable proves to be the best model for an index derived by individuals. The difference between heart patients own opinion of health and the public preferences differs a great amount in the severe health conditions, but are rather small for healthy patients. Of the total 243 health conditions, only eight of the conditions were estimated higher by the public index. Conclusions: As the differences between the approaches are significantly large the choice of index could affect the decision making in a health economic study.
|
5 |
Generalized score tests for missing covariate dataJin, Lei 15 May 2009 (has links)
In this dissertation, the generalized score tests based on weighted estimating equations
are proposed for missing covariate data. Their properties, including the effects
of nuisance functions on the forms of the test statistics and efficiency of the tests,
are investigated. Different versions of the test statistic are properly defined for various
parametric and semiparametric settings. Their asymptotic distributions are also
derived. It is shown that when models for the nuisance functions are correct, appropriate
test statistics can be obtained via plugging the estimates of the nuisance
functions into the appropriate test statistic for the case that the nuisance functions
are known. Furthermore, the optimal test is obtained using the relative efficiency
measure. As an application of the proposed tests, a formal model validation procedure
is developed for generalized linear models in the presence of missing covariates.
The asymptotic distribution of the data driven methods is provided. A simulation
study in both linear and logistic regressions illustrates the applicability and the finite
sample performance of the methodology. Our methods are also employed to analyze
a coronary artery disease diagnostic dataset.
|
6 |
Selecting the Working Correlation Structure by a New Generalized AIC Index for Longitudinal DataLin, Wei-Lun 28 November 2007 (has links)
The analysis of longitudinal data has been a popular subject for the recent years. The growth of the Generalized Estimating Equation (GEE) Liang & Zeger, 1986) is one of the most influential recent developments in statistical practice for this practice. GEE methods are attractive both from a theoretical and a practical standpoint. In this paper, we are interested in the influence of different "working" correlation structures for modeling the longitudinal data. Furthermore, we propose a new AIC-like method for the model assessment which generalized AIC from the point of view of the data generating. By comparing the difference of the log-likelihood functions between different correlation models, we define the exact value to create an interval for our model selection. In this thesis, we combine the GEE method and a new generalized AIC Index for the longitudinal data with different correlation structures.
|
7 |
Spline-based sieve semiparametric generalized estimating equation for panel count dataHua, Lei 01 May 2010 (has links)
In this thesis, we propose to analyze panel count data using a spline-based
sieve generalized estimating equation method with a semiparametric proportional mean model E(N(t)|Z) = Λ0(t) eβT0Z. The natural log of the baseline mean function, logΛ0(t), is approximated by a monotone cubic B-spline function. The estimates of regression parameters and spline coefficients are the roots of the spline based sieve generalized estimating equations (sieve GEE). The proposed method avoids assumingany parametric structure of the baseline mean function and the underlying counting process. Selection of an appropriate covariance matrix that represents the true correlation between the cumulative counts improves estimating efficiency.
In addition to the parameters existing in the proportional mean function, the estimation that accounts for the over-dispersion and autocorrelation involves an extra nuisance parameter σ2, which could be estimated using a method of moment proposed by Zeger (1988). The parameters in the mean function are then estimated by solving the pseudo generalized estimating equation with σ2 replaced by its estimate, σ2n. We show that the estimate of (β0,Λ0) based on this two-stage approach is still consistent and could converge at the optimal convergence rate in the nonparametric/semiparametric regression setting. The asymptotic normality of the estimate of β0 is also established. We further propose a spline-based projection variance estimating method and show its consistency.
Simulation studies are conducted to investigate finite sample performance of the sieve semiparametric GEE estimates, as well as different variance estimating methods with different sample sizes. The covariance matrix that accounts for the overdispersion generally increases estimating efficiency when overdispersion is present in the data. Finally, the proposed method with different covariance matrices is applied to a real data from a bladder tumor clinical trial.
|
8 |
Evidências da sofisticação do padrão de consumo dos domicílios brasileiros: uma análise de cestas de produtos de consumo doméstico / Evidence of the sophistication of consumption patterns of Brazilian households: an analysis of household consumption product basketsLuppe, Marcos Roberto 21 December 2010 (has links)
A economia brasileira passa por um momento positivo em sua história, devido principalmente a fatores gerados pela estabilidade econômica advinda com o Plano Real. O conjunto de dados apresentados neste trabalho evidencia uma melhora das condições socioeconômicas de grande parte da população, o que levou a um aumento da renda dos indivíduos e um fortalecimento do poder de consumo dos brasileiros. Nesse contexto, esta tese teve como objetivo a busca de evidências que indicassem uma mudança e possível sofisticação do padrão de consumo dos domicílios brasileiros. Além disso, procurou-se verificar em quais níveis socioeconômicos e em quais regiões as mudanças do padrão de consumo foram mais significativas. Os dados utilizados neste trabalho derivam de um painel de consumidores (Homescan) e foram analisadas informações de dez categorias de produtos de consumo doméstico para os anos de 2007, 2008 e 2009, considerando-se as áreas geográficas auditadas pela Nielsen e os níveis socioeconômicos dos domicílios. Nas análises dos dados, utilizaram-se modelos de equações de estimação generalizadas (EEG), além de análises estatísticas descritivas para avaliar a evolução das variáveis não-contempladas nesses modelos. Além disso, utilizaram-se dados de outra pesquisa (Retail Index) para complementar os resultados obtidos com o painel de consumidores. Os resultados das análises realizadas indicam uma mudança do padrão de consumo, primordialmente, nos domicílios de nível socioeconômico médio (classe C) e baixo (classes D e E) no período analisado. Quanto às áreas geográficas pesquisadas, os destaques foram o Nordeste, o grande Rio de Janeiro e a região Sul. Levando-se em consideração que as categorias analisadas são produtos mais elaborados e de maior valor agregado, o aumento do consumo da grande maioria das categorias nesses níveis socioeconômicos evidencia uma sofisticação do consumo desses domicílios. Esse ambiente de sofisticação dos padrões de consumo, principalmente das classes de renda média e baixa, exigirá das empresas que atuam no mercado de bens e serviços novas estratégias para atender as demandas de consumidores mais conscientes e exigentes. Assim, o grande desafio dessas empresas será decifrar o caminho da expansão e diversificação da cesta de compra desses consumidores. / The Brazilian economy is currently going through a positive time in its history, mainly as a result of factors generated by the economic stability conferred by the Plano Real financial plan. The data presented in this work shows an improvement in the socioeconomic conditions of the vast majority of the population, which has led to an increase in income for individuals, and a strengthening of the consumer power of Brazilians. In this context, this thesis looks for evidence that indicates a change and possible sophistication of consumer patterns in Brazilian households. It also seeks to determine the socioeconomic levels, and the regions in which the changes in consumer patterns are most significant. The data used in this work are derived from a panel of consumers (Homescan), and information from ten categories of domestic consumer goods were analyzed for the years 2007, 2008 and 2009, considering the geographic areas audited by Nielsen and the socioeconomic levels of the households. In the data analyses, generalized estimating equation (GEE) models are used, as well as descriptive statistical analyses, to evaluate the evolution of variables not included in these models. Data are also used from another survey (Retail Index), to complement the results obtained with the panel of consumers. The results of the analyses indicate a change in consumer patterns, particularly in households belonging to the middle (class C) and low (classes D and E) socioeconomic classes, for the period analyzed. In terms of geographical areas researched, the areas highlighted were the Northeast, the greater Rio de Janeiro and the South region. Taking into consideration that the categories analyzed consist of more elaborate products, with higher added value, the increased consumption for the majority of categories at these socioeconomic levels shows that consumption in these households has become more sophisticated. This environment of increasing sophistication of consumer patterns, particularly among the middle and low income classes, will require companies in the goods and services market to implement strategies to meet the requirements of these more aware and demanding consumers. Therefore, the greatest challenge for these companies is to seize the expansion and diversification path of the shopping basket for these consumers.
|
9 |
Jackknife Empirical Likelihood for the Accelerated Failure Time Model with Censored DataBouadoumou, Maxime K 15 July 2011 (has links)
Kendall and Gehan estimating functions are used to estimate the regression parameter in accelerated failure time (AFT) model with censored observations. The accelerated failure time model is the preferred survival analysis method because it maintains a consistent association between the covariate and the survival time. The jackknife empirical likelihood method is used because it overcomes computation difficulty by circumventing the construction of the nonlinear constraint. Jackknife empirical likelihood turns the statistic of interest into a sample mean based on jackknife pseudo-values. U-statistic approach is used to construct the confidence intervals for the regression parameter. We conduct a simulation study to compare the Wald-type procedure, the empirical likelihood, and the jackknife empirical likelihood in terms of coverage probability and average length of confidence intervals. Jackknife empirical likelihood method has a better performance and overcomes the under-coverage problem of the Wald-type method. A real data is also used to illustrate the proposed methods.
|
10 |
Model Selection via Minimum Description LengthLi, Li 10 January 2012 (has links)
The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level.
|
Page generated in 0.1495 seconds