Spelling suggestions: "subject:"biostatistics"" "subject:"bioestatistics""
41 |
SYNDROMIC SURVEILLANCE FOR THE EARLY DETECTION OF INFLUENZA OUTBREAKSRizzo, Sara L 02 February 2006 (has links)
Syndromic surveillance is a new mechanism utilized to detect naturally occurring and bioterroristic outbreaks. The public health significance is its potential to alert public health to outbreaks earlier and allow a timelier public health response. It involves monitoring data that can be collected in near real-time to find anomalous data. Syndromic surveillance includes school and work absenteeism, over-the-counter drug sales, and hospital admissions data to name a few. This study is an assessment of an extension of the use of syndromic surveillance as an improvement to the traditional method to detect more routine public health problems, specifically, the detection of influenza outbreaks. The assessment involves the prediction of outbreaks in four areas during the period October 15, 2003 to March 31, 2004. The four areas studied included Allegheny County, Pennsylvania, Jefferson County, Kentucky, Los Angeles County, California, and Salt Lake County, Utah. Two aspects of community activity were used as the method for syndromic surveillance, over-the-counter pharmaceutical sales and hospital chief complaints. The over-the-counter sales encompassed a panel of six items including anti-diarrheal medication, anti-fever adult medication, anti-fever pediatric medication, cough and cold products, electrolytes, and thermometers. Additionally, two of the seven hospital chief complaints used in the RODS open source paradigm were monitored. These were constitutional and respiratory chief complaints.
Application of standard statistical algorithms showed that the system was able to identify unusual activity several weeks prior to the time when the local health departments were able to identify an outbreak using the standard methods. The largest improvement in detection using syndromic surveillance occurred in Los Angeles where the outbreak was detected 52 days before the Centers for Disease Control had declared widespread activity for the state. In each county over-the-counter sales detected the outbreak sooner then hospital chief complaints, but the hospital chief complaints detect the outbreaks consistently across the various algorithms.
More conclusive evidence regarding the possible improvement in outbreak detection with syndromic surveillance can be obtained once a longer time frame has passed to allow more historical data to accumulate. Conducting additional studies on influenza outbreaks in other jurisdictions would also be useful assessments.
|
42 |
Sequential therapy in metastatic breast cancer: Survival analysis with time dependent covariatesVuga, Marike 01 February 2006 (has links)
Metastatic breast cancer, a disease with a high mortality rate among women, is a major public health problem in the United States and other developed countries. This study evaluated the effect of certain treatments within the clinical setting during the patients individual courses of sequential treatments. A database based on clinical data from one practice of the University of Pittsburgh Cancer Institute Breast Cancer Program was used to analyze from data metastatic breast cancer patients receiving sequential therapies. Data from the clinic cohort were available from January 1999 to July 2005.
Taxanes, a specific class of chemotherapeutic agents including Taxol® and Taxotere® have been demonstrated to be very effective in tumor control and symptom relief in metastatic breast cancer patients. However, it is unclear whether there is a benefit in survival compared to non-taxane compounds. Therefore, the survival among patients who received taxane-containing regimes versus those who never received taxane-containing regimes as chemotherapeutic agents needs attention.
The purpose of this study is to investigate the survival benefit of taxanes, after initiating chemotherapy or hormonal therapy. Hence, survival analyses with time dependent covariates were employed. The results showed that taxane was beneficial for survival in women with metastatic breast cancer. However, the effect strongly depended on the estrogen receptor type. Patients who had metastatic breast cancer with negative estrogen receptors benefited from taxane therapy. In contrary, taxane showed an adverse effect in patients with positive estrogen receptor cancer. The combination of toxic side effects from the drug, patient characteristics, and timeline of taxane intervention might have possibly contributed to this finding.
These results will facilitate the development of guidelines for the management of metastatic breast cancer. In the meantime it will be useful to guide clinicians in their decision-making regarding therapeutic regimes for metastatic breast cancer providing physicians and health care professionals with an important tool to improve public health.
|
43 |
Assessing Agreement Among Raters And Identifying Atypical Raters Using A Log-Linear Modeling ApproachKastango, Kari B. 06 June 2006 (has links)
When an outcome is rated by several raters, ensuring consistency across raters increases the reliability of the measurement. Tanner and Young (1985) proposed a general class of log-linear models to assess agreement among K raters and a rating scale with C nominal categories. Their methodology can be used to assess pair-wise agreement among three or more raters. Rogel et al. (1996, 1998) extended this work by assessing various patterns of agreement among rater sub-groups of size K-1. These models can be used to test the assumption of rater exchangeability. Although parameters from these models can be used to identify atypical raters, no formal inferential procedures are available. I propose a formal inferential approach that can be used to test the assumption of rater exchangeability and to identify an atypical rater. The global and heterogeneous partial agreement model is fit to the data and pair-wise comparisons of the K partial agreement parameters are made, adjusting the p-values for the multiple comparisons made. The heterogeneous partial agreement parameter that is constantly involved in the pair-wise comparisons that are statistically significant is distinguished. The premise is that, if there is an atypical rater, at least one heterogeneous partial agreement parameter will differ from at least one of the remaining K-1 partial agreement parameters. The approach is illustrated using published data from an intestinal biopsy rating study with six raters (Rogel et al., 1998). Overall Type I error and the power of the inferential approach to correctly identify atypical raters are assessed via simulation with rater sub-groups of size 5. The Bonferroni, Sidak, and Holms Step-down procedures using the Bonferroni and Sidak adjustments are used to control the overall Type I error. Being able to correctly identify an atypical rater, if present, and improving the consistency of ratings directly, influence the reliability of the measurement and the power of the study for a given sample size. Consequently, more informative studies can be conducted of interventions (e.g., behavioral, medicinal) that may have a significant positive impact on the publics health.
|
44 |
Latent variable models for longitudinal study with informative missingnessQin, Li 07 June 2006 (has links)
Missing problem is very common in today's public health studies because of responses measured longitudinally. In this dissertation we proposed two latent variable models for longitudinal data with informative missingness. In the first approach, a latent variable model is developed for the categorical data, dividing the observed data into two latent classes: a 'regular' class and a 'special' class. Outcomes belonging to the regular class can be modeled using logistc regression and the outcomes in the special class have pre-deterministic values. Under the important assumption of conditional independence in the latent variable models, the longitudinal responses and the missingness process are independent given the latent classes. Parameters that we are interested in are estimated by the method of maximum likelihood based on the above assumption and correlation between responses. In the second approach, the latent variable in the proposed model is continuous and assumed to be normally distributed with unity variance. In the latent variable model, the values of the latent variable are affected by the missing patterns and the latent variable is also a covariate in modeling the longitudinal responses. We use the EM algorithm to obtain the estimates of the parameters and Gauss-Hermite quadrature is used to approximate the integral of the latent variable. The covariance matrix of the estimates can be calculated by using the bootstrap method or obtained from the inverse of the Fisher information matrix of the final marginal likelihood.
|
45 |
A COMPARISON OF LOGISTIC REGRESSION TO RANDOM FORESTS FOR EXPLORING DIFFERENCES IN RISK FACTORS ASSOCIATED WITH STAGE AT DIAGNOSIS BETWEEN BLACK AND WHITE COLON CANCER PATIENTSGeng, Ming 01 June 2006 (has links)
Introduction: Colon cancer is one of the most common malignancies in America. According to the American Cancer Society, blacks have lower survival rate than whites. Many previous studies suggested that it is because blacks were more likely to be diagnosed at a late stage. Hence, it is crucial to determine factors that are associated with colon cancer stage at diagnosis.
Objectives: The objectives of this study are twofold: 1)To compare logistic regression modeling to Random Forests classification with respect to variables selected and classification accuracy; and 2) To evaluate the factors related to colon cancer stage at diagnosis in a population based study. Many studies have compared
Classification and Regression Trees (CART) to logistic regression and found that they have very similar power with respect to the proportion correctly classified and the variables selected. This study extends previous methodological research by comparing the Random Forests classification techniques to logistic regression modeling using a relatively small and incomplete dataset. Methods and Materials: The data used in this research were from National Cancer Institute Black/White Cancer Survival Study which had 960 cases of invasive colon cancer. Stage at diagnosis was used as the dependent variable for fitting logistic regression models and Random Forests Classification to multiple potential explanatory variables, which included some missing data. Results: Odds ratio (blacks vs. whites) decreased from 1.628 (95%CI: 1.068-2.481) to 1.515 (95% CI: 0.920-2.493) after adjustment was made for patient delay in diagnosis, occupation, histology and grade of tumor. Race became no longer important after these variables were entered in the Random Forests. These four variables were identified as the most important variables associated with racial disparity in colon cancer stage at diagnosis in both logistic regression and Random Forests. The correct
classification rate was 47.9% using logistic regression and was 33.9% using Random Forests. Conclusion: 1). Logistic regression and Random Forests had very similar power in variable selection. 2). Logistic regression had higher classification accuracy than Random Forests with respect to overall correct classification rate.
|
46 |
Simulation of meta-analysis for assessing the impact of study variability on parameter estimates for survival dataKarpova, Irina 01 June 2006 (has links)
Meta-analysis is a statistical method of public health relevance that is used to combine the results of individual studies which evaluate the same treatment effect. A test that is commonly used to decide whether the results are homogeneous, and determines model choice for meta-analysis, is called Cochran's Q-test. A major drawback of the Q-test, when the outcomes are normally distributed, is its low power when the number of studies is small, and excessive power when the number of studies is large.
In this thesis, we propose a Cochran's Q--test for survival analysis data. Using
simulations, we examine how the power of Cochran's test changes with different numbers of studies, different weight allocations per study, and the amount of censored observations. We show that the power increases with the increasing number of studies, but lowers with the increasing number of censored observations, and whenever one study comprises a large proportion of the total weight. We conclude that the test of heterogeneity should not be considered as the only determinant of the model choice for meta-analysis. Other methods such as graphical exploration, stratified analysis, or regression modeling should be used in conjunction with the formal statistical test.
|
47 |
A Review and Comparison of Methods for Detecting Outliers in Univariate Data SetsSeo, Songwon 09 August 2006 (has links)
Most real-world data sets contain outliers that have unusually large or small values when compared with others in the data set. Outliers may cause a negative effect on data analyses, such as ANOVA and regression, based on distribution assumptions, or may provide useful information about data when we look into an unusual response to a given study. Thus, outlier detection is an important part of data analysis in the above two cases. Several outlier labeling methods have been developed. Some methods are sensitive to extreme values, like the SD method, and others are resistant to extreme values, like Tukey's method. Although these methods are quite powerful with large normal data, it may be problematic to apply them to non-normal data or small sample sizes without knowledge of their characteristics in these circumstances. This is because each labeling method has different measures to detect outliers, and expected outlier percentages change differently according to the sample size or distribution type of the data.
Many kinds of data regarding public health are often skewed, usually to the right, and lognormal distributions can often be applied to such skewed data, for instance, surgical procedure times, blood pressure, and assessment of toxic compounds in environmental analysis. This paper reviews and compares several common and less common outlier labeling methods and presents information that shows how the percent of outliers changes in each method according to the skewness and sample size of lognormal distributions through simulations and application to real data sets. These results may help establish guidelines for the choice of outlier detection methods in skewed data, which are often sen in the public health field.
|
48 |
REANALYSIS OF THE NATIONAL CANCER INSTITUTE'S ACRYLONITRILE COHORT STUDY BY IMPUTATION OF MISSING SMOKING INFORMATIONCunningham, Michael 27 July 2006 (has links)
A cohort study of workers exposed to the chemical acrylonitrile (AN) was carried-out in the late 1980s by the National Cancer Institute (NCI) to determine if there were any excess cancer risks associated with workplace exposures to AN. The results of the study did not show any overwhelming evidence that AN exposure was related to increased cancer risk, but did yield several results worth noting. Firstly, the authors reported an overall lung cancer risk of 3.6 for ever-smokers versus never-smokers, which appeared to be much too low. Secondly, there was a slight increase in the lung cancer relative risk due to exposure in the upper quintile of cumulative AN exposure. Lastly, there was a large proportion of missing smoking information for the employees selected in the sample.
Because results of occupational cohort studies such as the NCIs are used as the basis for determining health risks associated with workplace exposures and because acrylonitrile is widely used in the manufacturing of plastics, it is very important from a public health perspective to eliminate any possible sources of confounding or bias. The goal of this reanalysis is to address the issues of missing smoking information and the low overall lung cancer relative risk in ever-smokers to determine if the slight excess in the highest AN exposure category appears to be valid. This was accomplished using imputation, a procedure that predicts a smoking status for the missings based on complete observations. The NCI analyses were then repeated with the imputed data to see if there were any differences in the overall smoking lung cancer RR or the lung cancer RR in the upper quintile of AN exposure.
The overall lung cancer RR due smoking could not be increased dramatically using the weighting schemes in this paper. Also, the lung cancer RRs in the upper quintile of AN exposure were not much lower than those in the original NCI study, so their analysis with the missing smoking information does not appear to have been biased. However, the smoking adjusted lung cancer RRs for cumulative AN exposure using the imputed data have a much flatter exposure-response trend than the NCI analysis, which, when combined with the only slightly elevated RR in the upper exposure group, could be used as evidence against an increased lung cancer risk due to high AN exposure.
|
49 |
IDENTIFICATION AND ASSESSMENT OF LONGITUDINAL BIOMARKERS USING FRAILTY MODELS IN SURVIVAL ANALYSISKo, Feng-shou 25 September 2006 (has links)
A biomarker is a measurement which can be used as a predictor or sometimes even a surrogate for a biological endpoint that directly measures a patient's disease or survival status. Biomarkers are often measured over time and so are referred to as longitudinal biomarkers. Biomarkers are of public health interest because they can provide early detection of life threatening or fatal diseases.
It is important in public health to be able to identify biomarkers to predict survival for patients because it can reduce the time and cost necessary to resolve the study question or used to identify subsets of patients who would be appropriate candidates for the
administration of a targeted therapy. In this dissertation, we introduce a method employing a frailty model to identify longitudinal biomarkers or surrogates for a time to event outcome. Our method is an extension of earlier work by Wulfson, Tsiatis, and Song where it was assumed that the event times have the same baseline hazard. In our method, we allow random effects to be present in both the longitudinal biomarker and underlying survival function. The random effect in the biomarker is introduced via an explicit term while the random effect in the underlying survival function is introduced by the inclusion of frailty parameters into the model. We use simulations to explore how the number of individuals, the number of time points per individual and the functional form of the random effects from the longitudinal biomarkers influence the power to detect the association of the longitudinal biomarker and the survival time. We also explore effect of missingness on how a biomarker predicts a time to event outcome. We conclude that for a given sample size, the biomarker effectiveness for relatively small numbers of subjects and large numbers of observed time points is better than for relatively large numbers of subjects and small numbers of observed time points. We also conclude that when the missing data mechanism is missing at random (MAR), our method works reasonably well. However, when the missing data mechanism is non-ignorable, our method doesn't perform well in determining whether or not potential biomarkers are good predictors of a time to event outcome. Finally, we apply our method to liver cirrhosis data and conclude that prothrombin is a good predictor of time to liver cirrhosis and thus, can be used as a potential surrogate for liver failure.
|
50 |
A Strategy for Stepwise Regression Procedures in Survival Analysis with Missing CovariatesLi, Jia 25 September 2006 (has links)
The selection of variables used to predict a time to event outcome is a common and important issue when analyzing survival data. This is an essential step in accurately assessing risk factors in medical and public health studies. Ignoring an important variable in a regression model may result in biased and inefficient estimates of the outcomes. Such bias can have major implications in public health studies because it may cause potential risk factors to be falsely declared as associated with an outcome, such as mortality or conversely, be falsely declared not associated with the outcome. Stepwise regression procedures are widely used for model selection. However, they have inherent limitations, and can lead to unreasonable results when there are missing values in the potential covariates.
In the first part of this dissertation, multiple imputations are used to deal with missing covariate information. We review two powerful imputation procedures, Multiple Imputation by Chain Equations (MICE) and estimation/multiple imputation for Mixed categorical and continuous data (MIX) that implement different multiple imputation methods. We compare the performance of these two procedures by assessing the bias, efficiency and robustness in several simulation studies using time to event outcomes. Practical limitations and valuable features of these two procedures are also assessed. In the second part of the dissertation, we use imputation together with a criterion called the Brier Score to formulate an overall stepwise model selection strategy. The strategy has the advantage of enabling one to perform model selection and evaluate the predictive accuracy of a selected model at the same time, all while taking into account the missing values in the covariates. This comprehensive strategy is implemented by defining the Weighted Brier Score (WBS) using weighted survival functions. We use simulations to assess this strategy and further demonstrate its use by analyzing survival data from the National Surgical Adjuvant Breast and Bowel Project (NSABP) Protocol B-06.
|
Page generated in 0.0781 seconds