Return to search

Optimal Data-driven Methods for Subject Classification in Public Health Screening

Biomarker testing, wherein the concentration of a biochemical marker is measured to predict the presence or absence of a certain binary characteristic (e.g., a disease) in a subject, is an essential component of public health screening. For many diseases, the concentration of disease-related biomarkers may exhibit a wide range, particularly among the disease positive subjects, in part due to variations caused by external and/or subject-specific factors. Further, a subject's actual biomarker concentration is not directly observable by the decision maker (e.g., the tester), who has access only to the test's measurement of the biomarker concentration, which can be noisy. In this setting, the decision maker needs to determine a classification scheme in order to classify each subject as test negative or test positive. However, the inherent variability in biomarker concentrations and the noisy test measurements can increase the likelihood of subject misclassification.

We develop an optimal data-driven framework, which integrates optimization and data analytics methodologies, for subject classification in disease screening, with the aim of minimizing classification errors. In particular, our framework utilizes data analytics methodologies to estimate the posterior disease risk of each subject, based on both subject-specific and external factors, coupled with robust optimization methodologies to derive an optimal robust subject classification scheme, under uncertainty on actual biomarker concentrations. We establish various key structural properties of optimal classification schemes, show that they are easily implementable, and develop key insights and principles for classification schemes in disease screening.

As one application of our framework, we study newborn screening for cystic fibrosis in the United States. Cystic fibrosis is one of the most common genetic diseases in the United States. Early diagnosis of cystic fibrosis can substantially improve health outcomes, while a delayed diagnosis can result in severe symptoms of the disease, including fatality. We demonstrate our framework on a five-year newborn screening data set from the North Carolina State Laboratory of Public Health. Our study underscores the value of optimization-based approaches to subject classification, and show that substantial reductions in classification error can be achieved through the use of the proposed framework over current practices. / Doctor of Philosophy / A biomarker is a measurable characteristic that is used as an indicator of a biological state or condition, such as a disease or disorder. Biomarker testing, where a biochemical marker is used to predict the presence or absence of a disease in a subject, is an essential tool in public health screening. For many diseases, related biomarkers may have a wide range of concentration among subjects, particularly among the disease positive subjects. Furthermore, biomarker levels may fluctuate based on external factors (e.g., temperature, humidity) or subject-specific characteristics (e.g., weight, race, gender). These sources of variability can increase the likelihood of subject misclassification based on a biomarker test.

We develop an optimal data-driven framework, which integrates optimization and data analytics methodologies, for subject classification in disease screening, with the aim of minimizing classification errors. We establish various key structural properties of optimal classification schemes, show that they are easily implementable, and develop key insights and principles for classification schemes in disease screening.

As one application of our framework, we study newborn screening for cystic fibrosis in the United States. Cystic fibrosis is one of the most common genetic diseases in the United States. Early diagnosis of cystic fibrosis can substantially improve health outcomes, while a delayed diagnosis can result in severe symptoms of the disease, including fatality. As a result, newborn screening for cystic fibrosis is conducted throughout the United States. We demonstrate our framework on a five-year newborn screening data set from the North Carolina State Laboratory of Public Health. Our study underscores the value of optimization-based approaches to subject classification, and show that substantial reductions in classification error can be achieved through the use of the proposed framework over current practices.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/101611
Date01 July 2019
CreatorsSadeghzadeh, Seyedehsaloumeh
ContributorsIndustrial and Systems Engineering, Bish, Ebru K., Zimmerman, Scott J., Chen, Xi, Bish, Douglas R.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0013 seconds