21 |
Uso de modelos com fração de cura na análise de dados de sobrevivência com omissão nas covariáveis / Use of cure rate models in survival data analysis with missing covariatesAngela Tavares Paes 01 June 2007 (has links)
Em estudos cujo interesse é avaliar o efeito de fatores prognósticos sobre a sobrevida ou algum outro evento de interesse, é comum o uso de modelos de regressão que relacionam tempos de sobrevivência e covariáveis. Quando covariáveis que apresentam dados omissos são incluídas nos modelos de regressão, os programas estatísticos usuais automaticamente excluem aqueles indivíduos que apresentam omissão em pelo menos uma das covariáveis. Com isso, muitos pesquisadores utilizam apenas as observações completas, descartando grande parte da informação disponível. Está comprovado que a análise baseada apenas nos dados completos pode levar a estimadores altamente viesados e ineficientes. Para lidar com este problema, alguns métodos foram propostos na literatura. O objetivo deste trabalho é estender métodos que lidam com dados de sobrevivência e omissão nas covariáveis para a situação em que existe uma proporção de pacientes na população que não são suscetíveis ao evento de interesse. A idéia principal é utilizar modelos com fração de cura incluindo ponderações para compensar possíveis desproporcionalidades na subamostra de casos completos, levando-se em conta uma possível relação entre omissão e pior prognóstico. Foi considerado um modelo de mistura no qual os tempos de falha foram modelados através da família Weibull ou do modelo semiparamétrico de Cox e as probabilidade de cura foram especificadas por um modelo logístico. Os métodos propostos foram aplicados a dados reais, em que a omissão foi simulada em 10\\%, 30\\% e 50\\% das observações. / Survival regression models are considered to evaluate the effect of prognostic factors for survival or some other event of interest. The standard statistical packages automatically exclude cases with at least one missing covariate value. Thus, many researchers use only the complete cases, discarding substantial part of the available information. It is known that this complete case analysis provides biased and inefficient estimates. The aim of this work is to extend survival models with missing covariate values to situations where some individuals are not susceptible to the event of interest. The main idea is to use cure rate models introducing individual weights to incorporate possible bias in the sample with complete cases, taking a possible relation between missingness and worse prognosis into account. Mixture models in which Weibull and Cox models are used to represent the failure times and logistic models to model the cure probabilities are considered. The performance of the procedure was evaluated via a simulation study. The proposed methods were applied to real data where the missingness was simulated in 10\\%, 30\\% and 50\\% of the observations.
|
22 |
Estimation in partly parametric additive Cox modelsLäuter, Henning January 2003 (has links)
The dependence between survival times and covariates is described e.g. by proportional hazard models. We consider partly parametric Cox models and discuss here the estimation of interesting parameters. We represent the ma- ximum likelihood approach and extend the results of Huang (1999) from linear to nonlinear parameters. Then we investigate the least squares esti- mation and formulate conditions for the a.s. boundedness and consistency of these estimators.
|
23 |
A Comparsion of Multiple Imputation Methods for Missing Covariate Values in Recurrent Event DataHuo, Zhao January 2015 (has links)
Multiple imputation (MI) is a commonly used approach to impute missing data. This thesis studies missing covariates in recurrent event data, and discusses ways to include the survival outcomes in the imputation model. Some MI methods under consideration are the event indicator D combined with, respectively, the right-censored event times T, the logarithm of T and the cumulative baseline hazard H0(T). After imputation, we can then proceed to the complete data analysis. The Cox proportional hazards (PH) model and the PWP model are chosen as the analysis models, and the coefficient estimates are of substantive interest. A Monte Carlo simulation study is conducted to compare different MI methods, the relative bias and mean square error will be used in the evaluation process. Furthermore, an empirical study based on cardiovascular disease event data which contains missing values will be conducted. Overall, the results show that MI based on the Nelson-Aalen estimate of H0(T) is preferred in most circumstances.
|
24 |
Log-linear Rasch-type models for repeated categorical data with a psychobiological applicationHatzinger, Reinhold, Katzenbeisser, Walter January 2008 (has links) (PDF)
The purpose of this paper is to generalize regression models for repeated categorical data based on maximizing a conditional likelihood. Some existing methods, such as those proposed by Duncan (1985), Fischer (1989), and Agresti (1993, and 1997) are special cases of this latent variable approach, used to account for dependencies in clustered observations. The generalization concerns the incorporation of rather general data structures such as subject-specific time-dependent covariates, a variable number of observations per subject and time periods of arbitrary length in order to evaluate treatment effects on a categorical response variable via a linear parameterization. The response may be polytomous, ordinal or dichotomous. The main tool is the log-linear representation of appropriately parameterized Rasch-type models, which can be fitted using standard software, e.g., R. The proposed method is applied to data from a psychiatric study on the evaluation of psychobiological variables in the therapy of depression. The effects of plasma levels of the antidepressant drug Clomipramine and neuroendocrinological variables on the presence or absence of anxiety symptoms in 45 female patients are analyzed. The individual measurements of the time dependent variables were recorded on 2 to 11 occasions. The findings show that certain combinations of the variables investigated are favorable for the treatment outcome. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
25 |
Enhancing Gene Expression Signatures in Cancer Prediction Models: Understanding and Managing Classification ComplexityKamath, Vidya P. 29 July 2010 (has links)
Cancer can develop through a series of genetic events in combination with
external influential factors that alter the progression of the disease. Gene expression
studies are designed to provide an enhanced understanding of the progression of cancer
and to develop clinically relevant biomarkers of disease, prognosis and response to
treatment. One of the main aims of microarray gene expression analyses is to develop
signatures that are highly predictive of specific biological states, such as the molecular
stage of cancer. This dissertation analyzes the classification complexity inherent in gene
expression studies, proposing both techniques for measuring complexity and algorithms
for reducing this complexity.
Classifier algorithms that generate predictive signatures of cancer models must
generalize to independent datasets for successful translation to clinical practice. The
predictive performance of classifier models is shown to be dependent on the inherent
complexity of the gene expression data. Three specific quantitative measures of
classification complexity are proposed and one measure ( f) is shown to correlate highly
(R 2=0.82) with classifier accuracy in experimental data.
Three quantization methods are proposed to enhance contrast in gene expression
data and reduce classification complexity. The accuracy for cancer prognosis prediction
is shown to improve using quantization in two datasets studied: from 67% to 90% in lung
cancer and from 56% to 68% in colorectal cancer. A corresponding reduction in
classification complexity is also observed.
A random subspace based multivariable feature selection approach using costsensitive
analysis is proposed to model the underlying heterogeneous cancer biology and
address complexity due to multiple molecular pathways and unbalanced distribution of
samples into classes. The technique is shown to be more accurate than the univariate ttest
method. The classifier accuracy improves from 56% to 68% for colorectal cancer
prognosis prediction.
A published gene expression signature to predict radiosensitivity of tumor cells is
augmented with clinical indicators to enhance modeling of the data and represent the
underlying biology more closely. Statistical tests and experiments indicate that the
improvement in the model fit is a result of modeling the underlying biology rather than
statistical over-fitting of the data, thereby accommodating classification complexity
through the use of additional variables.
|
26 |
Seleção de covariáveis para ajuste de regressão logística na análise de abundância de invertebrados edáficos em diferentes agroecossistemas / Covariates selection for logistic regression adjustment in analysis of edaphic invertebrates abundance in different agroecosystemsOliveira, Luciane da Silva 25 February 2011 (has links)
Made available in DSpace on 2015-03-26T13:32:11Z (GMT). No. of bitstreams: 1
texto completo.pdf: 852842 bytes, checksum: 70e1222560798c97f05cf66d73772591 (MD5)
Previous issue date: 2011-02-25 / Logistic regression is the analysis usual statistical method used to verify the relationship between a dichotomous variable response and the interest explanatory variables. This work aimed to carry out a study about the factors influencing the invertebrates abundance on the soil under different management forms, using the logistic regression. This objective is that these invertebrates are considered excellent indicators of the use type and soil quality, working in several fundamental processes for maintaining the soil fertility and quality in agroecosystems and natural ecosystems, according to Brown et al. (1998), Hendrix et al. (2006), and Souza (2010). For covariates selection, the Collett (1994) proposal was used and the involved parameters estimators in each model, their interpretations, statistical properties, and some criteria for judging the suitability of the selected models were presented. The methodology presented by this work was applied to two real datasets (dry and rainy season). In the final adjusted model for the analyzed dataset in the dry season, it was verified that the covariates System Type, Calcium in litter, Soil organic matter, Potassium in litter, and the interaction between Calcium and Potassium in litter were important to explain the presence of more than nine individuals on the soil. In the final adjusted model for the analyzed dataset in the rainy season, the significant covariates to explain the presence of one hundred and one individuals on average on the soil were Magnesium in litter, Total organic carbon in the litter, Litter organic matter, and Ambient temperature. For two mentioned models, there were a good discriminatory performance and excellent areas under the ROC (Receiver Operating Characteristic) curve, thus confirming the validity of using logistic regression techniques for the models construction to describe the analyzed data. / A regressão logística é o método estatístico usual de análise utilizado com a finalidade de verificar a relação entre uma variável resposta dicotômica e variáveis explicativas de interesse. Este trabalho teve como objetivo realizar um estudo sobre os fatores que influenciam a abundância de invertebrados no solo sob diferentes formas de manejo utilizando a Regressão Logística. Tal objetivo reside no fato destes invertebrados serem considerados excelentes indicadores do tipo de uso e qualidade do solo, atuando em vários processos fundamentais para a manutenção da fertilidade e qualidade dos solos de agroecossistemas e ecossistemas naturais de acordo com Brown et al. (1998) e Hendrix et al. (2006), citado Souza (2010). Para seleção de covariáveis foi utilizada a proposta de Collett (1994) e foram apresentados estimadores dos parâmetros envolvidos em cada modelo e suas interpretações, propriedades estatísticas e critérios para se julgar a adequabilidade dos modelos selecionados. A metodologia apresentada neste trabalho foi aplicada a dois conjuntos de dados reais (período seco e chuvoso). No modelo final ajustado para o conjunto de dados analisado no período seco verificou-se que as covariáveis Tipo de Sistema, Cálcio em serapilheira, Matéria orgânica do solo, Potássio em serapilheira e a interação entre Cálcio e Potássio em serapilheira foram importantes para explicar a presença de mais de 9 indivíduos, em média, no solo. Já no modelo final ajustado para o conjunto de dados analisado no período chuvoso, as covariáveis significativas para explicar a presença de 101 indivíduos, em média, no solo foram Magnésio em serapilheira, Carbono orgânico total na serapilheira, Matéria orgânica da serapilheira e Temperatura ambiente. Para os dois modelos citados houve bom desempenho discriminatório e excelentes áreas sob a curva ROC, confirmando assim a validade da utilização de técnicas de regressão logística na construção dos modelos para descrever os dados analisados.
|
27 |
Testing the Limits of Latent Class AnalysisJanuary 2012 (has links)
abstract: The purpose of this study was to examine under which conditions "good" data characteristics can compensate for "poor" characteristics in Latent Class Analysis (LCA), as well as to set forth guidelines regarding the minimum sample size and ideal number and quality of indicators. In particular, we studied to which extent including a larger number of high quality indicators can compensate for a small sample size in LCA. The results suggest that in general, larger sample size, more indicators, higher quality of indicators, and a larger covariate effect correspond to more converged and proper replications, as well as fewer boundary estimates and less parameter bias. Based on the results, it is not recommended to use LCA with sample sizes lower than N = 100, and to use many high quality indicators and at least one strong covariate when using sample sizes less than N = 500. / Dissertation/Thesis / M.A. Psychology 2012
|
28 |
Modeling Recurrent Gap Times Through Conditional GEELiu, Hai Yan 16 August 2018 (has links)
We present a theoretical approach to the statistical analysis of the dependence of the gap time length between consecutive recurrent events, on a set of explanatory random variables and in the presence of right censoring. The dependence is expressed through regression-like and overdispersion parameters, estimated via estimating functions and equations. The mean and variance of the length of each gap time, conditioned on the observed history of prior events and other covariates, are known functions of parameters and covariates, and are part of the estimating functions. Under certain conditions on censoring, we construct normalized estimating functions that are asymptotically unbiased and contain only observed data. We then use modern mathematical techniques to prove the existence, consistency and asymptotic normality of a sequence of estimators of the parameters. Simulations support our theoretical results.
|
29 |
Reward abnormalities among women with bulimia nervosa: A functional magnetic resonance imaging studyBohon, Cara, 1981- 06 1900 (has links)
x, 73 p. : ill. A print copy of this thesis is available through the UO Libraries. Search the library catalog for the location and call number. / The current study measured BOLD brain response using functional magnetic resonance imaging (fMRI) to explore the hypothesis that women with bulimia nervosa have a hyper-responsivity of the mesolimbic reward system. Women with bulimia nervosa and healthy controls (N=24) completed an fMRI paradigm involving anticipated and actual receipt of chocolate milkshake and a tasteless control solution. Women with bulimia nervosa showed less activation than healthy controls in the right anterior insula in response to anticipatory food reward and in the left medial orbitofrontal cortex, right posterior insula, right precentral gyms, and right mid dorsal insula in response to consummatory food reward. Covariates related to bulimia diagnosis accounted for some of these effects, but not all. Results suggest that bulimia nervosa may be related to hypo-functioning of the brain reward system rather than hyper-functioning. Implications for intervention and future research are discussed. / Committee in charge: Jeffrey Measelle, Chairperson, Psychology;
Jennifer Ablow, Member, Psychology;
Don Tucker, Member, Psychology;
Eric Stice, Member, Not from U of 0;
William Harbaugh, Outside Member, Economics
|
30 |
Um estudo de métodos bayesianos para dados de sobrevivência com omissão nas covariáveis / A study of Bayesian methods for survival data with missing covariates.Demerson Andre Polli 14 March 2007 (has links)
O desenvolvimento de métodos para o tratamento de omissões nos dados é recente na estatística e tem sido alvo de muitas pesquisas. A presença de omissões em covariáveis é um problema comum na análise estatística e, em particular nos modelos de análise de sobrevivência, ocorrendo com freqüência em pesquisas clínicas, epidemiológicas e ambientais. Este trabalho apresenta propostas bayesianas para a análise de dados de sobrevivência com omissões nas covariáveis considerando modelos paramétricos da família Weibull e o modelo semi-paramétrico de Cox. Os métodos estudados foram avaliados tanto sob o enfoque paramétrico quanto o semiparamétrico considerando um conjunto de dados de portadores de insuficiência cardíaca. Além disso, é desenvolvido um estudo para avaliar o impacto de diferentes proporções de omissão. / The development of methods dealing with missing data is recent in Statistics and is the target of many researchers. The presence of missing values in the covariates is very common in statistical analysis and, in particular, in clinical, epidemiological and enviromental studies for survival data. This work considers a bayesian approach to analise data with missing covariates for parametric models in the Weibull family and for the Cox semiparametric model. The studied methods are evaluated for the parametric and semiparametric approaches considering a dataset of patients with heart insufficiency. Also, the impact of different omission proportions is assessed.
|
Page generated in 0.0577 seconds