Structured electronic health record (EHR) data are commonly incomplete and can lack diagnostic detail. Clinical reports, on the other hand, are typically comprehensive and contain a wealth of detailed medical information. Pathologists invest considerable time and specialized training to create information-rich pathology reports, but the necessary manual review of these reports for clinical or research use is a high barrier to their routine utilization. The automated extraction of clinical targets directly from pathology reports would allow for the structured aggregation of relevant patient data that are not currently routinely captured in the EHR. In this dissertation, I apply recently developed transformer models to predict clinical targets from cancer pathology report text.
In the first chapter, I present a pathology report corpus that I fully processed and made publicly available, and perform a proof-of-concept cancer type classification. In the second chapter, I discuss a set of cancer stage classification models that I fine-tune on the pathology report corpus and then externally validate on reports from Columbia University Irving Medical Center (CUIMC).
In the last chapter, I explore additional applications for this methodology, developing a generalizable model to classify prostate cancer reports into primary Gleason score categories, applying a transformer model to classify reports into diagnosis categories for a Barrett’s esophagus patient cohort in a low-data environment, and performing a proof-of-concept prediction of adverse drug events from 1D drug representations.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/52zw-1w57 |
Date | January 2024 |
Creators | Kefeli, Jenna |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0018 seconds