11 |
Hidden Markov Chain Analysis: Impact of Misclassification on Effect of Covariates in Disease Progression and RegressionPolisetti, Haritha 01 November 2016 (has links)
Most of the chronic diseases have a well-known natural staging system through which the disease progression is interpreted. It is well established that the transition rates from one stage of disease to other stage can be modeled by multi state Markov models. But, it is also well known that the screening systems used to diagnose disease states may subject to error some times. In this study, a simulation study is conducted to illustrate the importance of addressing for misclassification in multi-state Markov models by evaluating and comparing the estimates for the disease progression Markov model with misclassification opposed to disease progression Markov model. Results of simulation study support that models not accounting for possible misclassification leads to bias. In order to illustrate method of accounting for misclassification is illustrated using dementia data which was staged as no cognitive impairment, mild cognitive impairment and dementia and diagnosis of dementia stage is prone to error sometimes. Subjects entered the study irrespective of their state of disease and were followed for one year and their disease state at follow up visit was recorded. This data is used to illustrate that application of multi state Markov model which is an example of Hidden Markov model in accounting for misclassification which is based on an assumption that the observed (misclassified) states conditionally depend on the underlying true disease states which follow the Markov process. The misclassification probabilities for all the allowed disease transitions were also estimated. The impact of misclassification on the effect of covariates is estimated by comparing the hazard ratios estimated by fitting data with progression multi state model and by fitting data with multi state model with misclassification which revealed that if misclassification has not been addressed the results are biased. Results suggest that the gene apoe ε4 is significantly associated with disease progression from mild cognitive impairment to dementia but, this effect was masked when general multi state Markov model was used. While there is no significant relation is found for other transitions.
|
12 |
Adjusting retrospective noise exposure assessment for use of hearing protection devicesSbihi, Hind 11 1900 (has links)
Earlier retrospective noise exposure assessments for use in epidemiological research were not adequately characterized because they did not properly account for use of hearing protection devices (HPD) which would result in potential misclassification. Exposure misclassification has been shown to attenuate exposure-outcomes relations. In the case of already subtle relationships such as noise and cardiovascular diseases, this would potentially annihilate any association.
We investigated two approaches using Workers’ Compensation Board (WorkSafe BC) audiometric surveillance data to (i) re-assess the noise exposure in a cohort of lumber mill workers in British Columbia using data on the use of HPD and the determinants of their use available through WorkSafe BC, and (ii) test the validity of the new exposure measures by testing their predictions of noise-induced hearing loss, a well-established association.
Work history, noise exposure measurements, and audiometric surveillance data were merged together, forming job-exposure-audiometric information for each of 13,147 lumber mill workers. Correction factors specific to each type and class of HPD were determined based on research and standards. HPD-relevant correction factors were created using 1) deterministic methods and self-reported HPD use after filling gaps in the exposure history, or 2) a model of the determinants of use of HPD, then adjusting noise estimates according to the methods’ predictions and attenuation factors. For both methods, the HPD-adjusted and unadjusted noise exposure estimates were cumulated across all jobs each worker held in a cohort-participating lumber mill.
Finally, these noise metrics were compared by examining how well each predicted hearing loss. Analyses controlled for gender, age, race as well as medical and non-occupational risk factors.
Both methods led to a strengthening of the noise-hearing loss relationships compared to methods using HPD-unadjusted noise estimates. The method based on the modeling of HPD use had the best performance with a four-fold increase in the slope compared to the unadjusted noise-hearing loss slope.
Accounting for HPD use in noise exposure assessment is necessary since we have shown that misclassification attenuated the exposure-response relationships. Exposure-response analyses subsequent to exposure reassessment provide predictive validity and gives confidence in the exposure adjustment methods. / Medicine, Faculty of / Population and Public Health (SPPH), School of / Graduate
|
13 |
Statistical Methods for Dealing with Outcome Misclassification in Studies with Competing Risks Survival OutcomesMpofu, Philani Brian 02 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In studies with competing risks outcomes, misidentifying the event-type responsible
for the observed failure is, by definition, an act of misclassification. Several authors have
established that such misclassification can bias competing risks statistical analyses, and have
proposed statistical remedies to aid correct modeling. Generally, these rely on adjusting
the estimation process using information about outcome misclassification, but invariably
assume that outcome misclassification is non-differential among study subjects regardless
of their individual characteristics. In addition, current methods tend to adjust for the
misclassification within a semi-parametric framework of modeling competing risks data.
Building on the existing literature, in this dissertation, we explore the parametric modeling
of competing risks data in the presence of outcome misclassification, be it differential or
non-differential. Specifically, we develop parametric pseudo-likelihood-based approaches
for modeling cause-specific hazards while adjusting for misclassification information that is
obtained either through data internal or external to the current study (respectively, internal
or external-validation sampling). Data from either type of validation sampling are used
to model predictive values or misclassification probabilities, which, in turn, are used to
adjust the cause-specific hazard models. We show that the resulting pseudo-likelihood
estimates are consistent and asymptotically normal, and verify these theoretical properties
using simulation studies. Lastly, we illustrate the proposed methods using data from a
study involving people living with HIV/AIDS (PLWH)in the East-African consortium of the International Epidemiologic Databases for the Evaluation of HIV/AIDS (IeDEA EA). In
this example, death is frequently misclassified as disengagement from care as many deaths
go unreported to health facilities caring for these patients. In this application, we model
the cause-specific hazards of death and disengagement from care among PLWH after they
initiate anti-retroviral treatment, while adjusting for death misclassification. / 2021-03-10
|
14 |
Feature Selection with Missing DataSarkar, Saurabh 25 October 2013 (has links)
No description available.
|
15 |
Latent Class Analysis of Diagnostic Tests: The Effect of Dependent Misclassification Errors / Latent Class Analysis: Dependent Misclassification ErrorsTorrance, Virginia L. January 1994 (has links)
Latent class modelling is one method used in the evaluation of diagnostic tests when there is no gold standard test that is perfectly accurate. The technique demonstrates maximum likelihood estimates of the prevalence of a disease or a condition and the error rates of diagnostic tests or observers. This study reports the effect of departures from the latent class model assumption of independent misclassifications between observers or tests conditional on the true state of the individual being tested. It is found that estimates become biased in the presence of dependence. Most commonly the prevalence of the disease is overestimated when the true prevalence is at less than 50% and the error rates of dependent observers are underestimated. If there are also independent observers in the group, their error rates are overestimated. The most dangerous scenario in which to use latent class methods int he evaluation of tests is when the true prevalence is low and the false positive rate is high. This is common to many screening situations. / Thesis / Master of Science (MS)
|
16 |
Would two-stage scoring models alleviate bank exposure to bad debt?Abdou, H.A., Mitra, S., Fry, John, Elamer, Ahmed A. 2019 March 1915 (has links)
Yes / The main aim of this paper is to investigate how far applying suitably conceived and designed credit scoring models can properly account for the incidence of default and help improve the decision-making process. Four statistical modelling techniques, namely, discriminant analysis, logistic regression, multi-layer feed-forward neural network and probabilistic neural network are used in building credit scoring models for the Indian banking sector. Notably actual misclassification costs are analysed in preference to estimated misclassification costs. Our first-stage scoring models show that sophisticated credit scoring models, in particular probabilistic neural networks, can help to strengthen the decision-making processes by reducing default rates by over 14%. The second-stage of our analysis focuses upon the default cases and substantiates the significance of the timing of default. Moreover, our results reveal that State of residence, equated monthly instalment, net annual income, marital status and loan amount, are the most important predictive variables. The practical implications of this study are that our scoring models could help banks avoid high default rates, rising bad debts, shrinking cash flows and punitive cost-cutting measures.
|
17 |
Evaluating and Reducing the Effects of Misclassification in a Sequential Multiple Assignment Randomized Trial (SMART)He, Jun 01 January 2018 (has links)
SMART designs tailor individual treatment by re-randomizing patients to subsequent therapies based on their response to initial treatment. However, the classification of patients being responders/non-responders could be inaccurate and thus lead to inappropriate treatment assignment. In a two-step SMART design, by assuming equal randomization, and equal variances of misclassified patients and correctly classified patients, we evaluated misclassification effects on mean, variance, and type I error/ power of single sequential treatment outcome (SST), dynamic treatment outcome (DTRs), and overall outcome. The results showed that misclassification could introduce bias to estimates of treatment effect in all types of outcome. Though the magnitude of bias could vary according to different templates, there were a few constant conclusions: 1) for any fixed sensitivity the bias of mean of SSTs responders always approached to 0 as specificity increased to 1, and for any fixed specificity the bias of mean of SSTs non-responders always approached to 0 as sensitivity increased to 1; 2) for any fixed specificity there was monotonic nonlinear relationship between the bias of mean of SSTs responders and sensitivity, and for any fixed sensitivity there was also monotonic nonlinear relationship between the bias of mean of SSTs non-responders and specificity; 3) the bias of variance of SSTs was always non-monotone nonlinear equation; 4) the variance of SSTs under misclassification was always over-estimated; 5) the maximized absolute relative bias of variance of SSTs was always ¼ of the squared mean difference between misclassified patients and correctly classified patients divided by true variance, but it might not be observed in the range of sensitivity and specificity (0,1); 6) regarding to sensitivity and specificity, the bias of mean of DTRs or overall outcomes was always linear equation and their bias of variance was always non-monotone nonlinear equation; 7) the relative bias of mean/ variance of DTRs or overall outcomes could approach to 0 where sensitivity or specificity wasn’t necessarily to be 1. Furthermore, the results showed that the misclassification could affect statistical inference. Power could be less or bigger than planned 80% under misclassification and showed either monotonic or non-monotonic pattern as sensitivity or specificity decreased.
To mitigate these adverse effects, patient observations could be weighted by the likelihood that their response was correctly classified. We investigated both normal-mixture-model (NM) and k-nearest-neighbor (KNN) strategies to attempt to reduce bias of mean and variance and improve inference at final stage outcome. The NM estimated the early stage probabilities of being a responder for each patient through optimizing the likelihood function by EM algorithm, while KNN estimated these probabilities based upon classifications for the k nearest observations. Simulations were used to compare the performance of these approaches. The results showed that 1) KNN and NM produced modest reductions of bias of point estimates of SSTs; 2) both strategies reduced bias on point estimates of DTRs when the misclassified patients and correctly classified patients from same initial treatment had unequal means; 3) NM reduced the bias of point estimates of overall outcome more than KNN; 4) in general, there were little effect on power adjustment; 5) type I error should always be preserved at 0.05 regardless of misclassification when same response rate and same treatment effects among responders or among non-responders were assumed, but the observed type I error tended to be less than 0.05; 6) KNN preserved type I error at 0.05, but NM might increase type I error rate. Even though most of time both KNN and NM strategies improved point estimates in SMART designs while we knew misclassification might be involved, the tradeoff were increased type I error rate and little effect on power.
Our work showed that misclassification should be considered in SMART design because it introduced bias, but KNN or NM strategies at the final stage couldn’t completely reduce bias of point estimates or improve power. However, in future by adjusting with covariates, these two strategies might be used to improve the classification accuracy in the early stage outcomes.
|
18 |
Measurement Error and Misclassification in Interval-Censored Life History DataWhite, Bethany Joy Giddings January 2007 (has links)
In practice, data are frequently incomplete in one way or another. It can be a significant challenge to make valid inferences about the parameters of interest in this situation. In this thesis, three
problems involving such data are addressed. The first two problems involve interval-censored life history data with mismeasured
covariates. Data of this type are incomplete in two ways. First, the exact event times are unknown due to censoring. Second, the true covariate is missing for most, if not all, individuals. This work
focuses primarily on the impact of covariate measurement error in progressive multi-state models with data arising from panel (i.e., interval-censored) observation. These types of problems arise frequently in clinical settings (e.g. when disease progression is of interest and patient information is collected during irregularly spaced clinic visits). Two and three state models are considered in this thesis. This work is motivated by a research program on psoriatic arthritis (PsA) where the effects of error-prone covariates on rates of disease progression are of interest and patient information is collected at clinic visits (Gladman et al. 1995; Bond et al. 2006). Information regarding the error distributions were available based on results from a separate study conducted to evaluate the reliability of clinical measurements that are used in PsA treatment and follow-up (Gladman et al. 2004). The asymptotic bias of covariate effects obtained ignoring error in covariates is investigated and shown to be substantial in some settings. In a series of simulation studies, the performance of corrected likelihood methods and methods based on a simulation-extrapolation (SIMEX) algorithm (Cook \& Stefanski 1994) were investigated to address covariate measurement error. The methods implemented were shown to result in much smaller empirical biases and empirical coverage probabilities which were closer to the nominal levels.
The third problem considered involves an extreme case of interval censoring known as current status data. Current status data arise when individuals are observed only at a single point in time and it is then determined whether they have experienced the event of interest. To complicate matters, in the problem considered here, an unknown proportion of the population will never experience the event of interest. Again, this type of data is incomplete in two ways. One assessment is made on each individual to determine whether or not an event has occurred. Therefore, the exact event times are unknown for those who will eventually experience the event. In addition, whether or not the individuals will ever experience the event is unknown for those who have not experienced the event by the assessment time. This problem was motivated by a series of orthopedic trials looking at the effect of blood thinners in hip and knee replacement surgeries. These blood thinners can cause a negative serological response in some patients. This response was the outcome of interest and the only available information regarding it was the seroconversion time under current status observation. In this thesis, latent class models with parametric, nonparametric and piecewise constant forms of the seroconversion time distribution are described. They account for the fact that only a proportion of the population will experience the event of interest. Estimators based on an EM algorithm were evaluated via simulation and the orthopedic surgery data were analyzed based on this methodology.
|
19 |
Prediction Performance of Survival ModelsYuan, Yan January 2008 (has links)
Statistical models are often used for the prediction of
future random variables. There are two types of prediction, point
prediction and probabilistic prediction. The prediction accuracy is
quantified by performance measures, which are typically based on
loss functions. We study the estimators of these performance
measures, the prediction error and performance scores, for point and
probabilistic predictors, respectively. The focus of this thesis is
to assess the prediction performance of survival models that analyze
censored survival times. To accommodate censoring, we extend the
inverse probability censoring weighting (IPCW) method, thus
arbitrary loss functions can be handled. We also develop confidence
interval procedures for these performance measures.
We compare model-based, apparent loss based and cross-validation
estimators of prediction error under model misspecification and
variable selection, for absolute relative error loss (in chapter 3)
and misclassification error loss (in chapter 4). Simulation results
indicate that cross-validation procedures typically produce reliable
point estimates and confidence intervals, whereas model-based
estimates are often sensitive to model misspecification. The methods
are illustrated for two medical contexts in chapter 5. The apparent
loss based and cross-validation estimators of performance scores for
probabilistic predictor are discussed and illustrated with an
example in chapter 6. We also make connections for performance.
|
20 |
Measurement Error and Misclassification in Interval-Censored Life History DataWhite, Bethany Joy Giddings January 2007 (has links)
In practice, data are frequently incomplete in one way or another. It can be a significant challenge to make valid inferences about the parameters of interest in this situation. In this thesis, three
problems involving such data are addressed. The first two problems involve interval-censored life history data with mismeasured
covariates. Data of this type are incomplete in two ways. First, the exact event times are unknown due to censoring. Second, the true covariate is missing for most, if not all, individuals. This work
focuses primarily on the impact of covariate measurement error in progressive multi-state models with data arising from panel (i.e., interval-censored) observation. These types of problems arise frequently in clinical settings (e.g. when disease progression is of interest and patient information is collected during irregularly spaced clinic visits). Two and three state models are considered in this thesis. This work is motivated by a research program on psoriatic arthritis (PsA) where the effects of error-prone covariates on rates of disease progression are of interest and patient information is collected at clinic visits (Gladman et al. 1995; Bond et al. 2006). Information regarding the error distributions were available based on results from a separate study conducted to evaluate the reliability of clinical measurements that are used in PsA treatment and follow-up (Gladman et al. 2004). The asymptotic bias of covariate effects obtained ignoring error in covariates is investigated and shown to be substantial in some settings. In a series of simulation studies, the performance of corrected likelihood methods and methods based on a simulation-extrapolation (SIMEX) algorithm (Cook \& Stefanski 1994) were investigated to address covariate measurement error. The methods implemented were shown to result in much smaller empirical biases and empirical coverage probabilities which were closer to the nominal levels.
The third problem considered involves an extreme case of interval censoring known as current status data. Current status data arise when individuals are observed only at a single point in time and it is then determined whether they have experienced the event of interest. To complicate matters, in the problem considered here, an unknown proportion of the population will never experience the event of interest. Again, this type of data is incomplete in two ways. One assessment is made on each individual to determine whether or not an event has occurred. Therefore, the exact event times are unknown for those who will eventually experience the event. In addition, whether or not the individuals will ever experience the event is unknown for those who have not experienced the event by the assessment time. This problem was motivated by a series of orthopedic trials looking at the effect of blood thinners in hip and knee replacement surgeries. These blood thinners can cause a negative serological response in some patients. This response was the outcome of interest and the only available information regarding it was the seroconversion time under current status observation. In this thesis, latent class models with parametric, nonparametric and piecewise constant forms of the seroconversion time distribution are described. They account for the fact that only a proportion of the population will experience the event of interest. Estimators based on an EM algorithm were evaluated via simulation and the orthopedic surgery data were analyzed based on this methodology.
|
Page generated in 0.0951 seconds