Return to search

Inferring Race and Ethnicity from Clinical Notes: Annotation, Model Auditing, and Ethical Implications

Many areas of clinical informatics research rely on accurate and complete race and ethnicity (RE) patient information, such as estimating disease risk, assessing quality and performance metrics, and identifying health disparities. Structured data in the electronic health record (EHR) is an easily accessible source for patient-level information, however RE information is often missing or inaccurate in structured EHR data. Furthermore, current federal standards on RE categories have been acknowledged as inadequate, and in need of more granular realizations.

While more difficult to extract data from, clinical notes provide a rich, nuanced and subjective source of information that can be leveraged to increase granularity and/or recover RE information missing in structured data. State-of-the-art clinical natural language processing (NLP) approaches can enable researchers to extract RE information from clinical notes, however, NLP has also been shown to inherit, exacerbate, and create new biased and harmful associations, especially in modern deep learning approaches. This thesis explores the relationships between direct and indirect explicit mentions of RE and RE inferences in clinical text annotated by humans, and leverages an approach to audit deep NLP models for their learned associations.

We develop gold-standard annotations for information related to RE (RE indicators) and RE labels. We leverage four RE indicators: country of origin, spoken language, direct race, and direct ethnicity mention. We find high agreement between annotators for RE label assignments, and that sentences assigned RE categories have drastically different distributions of RE indicators. Furthermore, we find high agreement between structured and unstructured sources of RE information, and that unstructured data can be used to recover missing RE information in structured data.

Leveraging the gold-standard RE annotations, we train a model to label sentences with RE information and audit the model to examine the alignment between salient features and RE indicators. While our models attain good classification performance, this does not translate into high overlap with RE indicators. We find evidence for learned associations that are benign mistakes, helpful but not strictly correct, and potentially harmful mistakes if not addressed by future work.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/mst0-g412
Date January 2022
CreatorsBear Don't Walk, Oliver J.
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0019 seconds