Return to search

Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record

The rapid adoption of Electronic Health Records (EHRs)--electronic versions of a patient's medical history--has been instrumental in streamlining administrative tasks, increasing transparency, and enabling continuity of care across providers. An unintended consequence of the increased documentation burden, however, has been reduced face-time with patients and, concomitantly, a dramatic rise in clinician burnout. Time spent maintaining and making sense of a patient's electronic record is a leading cause of burnout. In this thesis, we pinpoint a particularly time-intensive, yet critical, documentation task: generating a summary of a patient's hospital admissions, and propose and evaluate automated solutions. In particular, we focus on faithfulness, i.e., accurately representing the patient record, and completeness, i.e., representing the full context, as the sine qua non for safe deployment of a hospital-course summarization tool in a clinical setting.

The bulk of this thesis is broken up into four chapters: §2 Creating and Analyzing the Data, §3 Improving the Faithfulness of Summaries, §4 Measuring the Faithfulness of Summaries, and, finally, §5 Generating Grounded, Complete Summaries with LLMs. Each chapter links back to the core themes of faithfulness and completeness, while the chapters are linked to each other in that the findings from each chapter shape the direction of subsequent chapters.

Given the documentation authored throughout a patient's hospitalization, hospital-course summarization requires generating a lengthy paragraph that tells the story of the patient admission. In § 2, we construct a dataset based on 109,000 hospitalizations (2M source notes) and perform exploratory analyses to motivate future work on modeling and evaluation [NAACL 2021]. The presence of highly abstractive, entity dense references, coupled with the high stakes nature of text generation in a clinical setting, motivates us to focus on faithfulness and adequate coverage of salient medical entities.

In § 3, we address faithfulness from a modeling perspective by revising noisy references [EMNLP 2022] and, to reduce the reliance on references, directly calibrating model outputs to metrics [ACL 2023].

These works relied heavily on automatic metrics as human annotations were limited. To fill this gap, in §4, we conduct a fine-grained expert annotation of system errors in order to meta-evaluate existing metrics and better understand task-specific issues of domain adaptation and source-summary alignments. We find that automatically generated summaries can exhibit many errors, including incorrect claims and critical omissions, despite being highly extractive. These errors are missed by existing metrics. To learn a metric which is less correlated to extractiveness (copy-and-paste), we derive noisy faithfulness labels from an ensemble of existing metrics and train a faithfulness classifier on these pseudo labels [MLHC 2023].

Finally, in § 5, we demonstrate that fine-tuned LLMs (Mistral and Zephyr) are highly prone to entity hallucinations and cover fewer salient entities. We improve both coverage and faithfulness by performing sentence-level entity planning based on a set of pre-computed salient entities from the source text, which extends our work on entity-guided news summarization ([ACL, 2023] and [EMNLP, 2023]).

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/a95a-1878
Date January 2024
CreatorsAdams, Griffin
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0019 seconds