Global ETD Search

Return to search

Fine-tuning and evaluating a Swedish language model for automatic discharge summary gener- ation from Swedish clinical notes

Background Healthcare professionals spend large amounts of time on documentation tasks in contemporary healthcare. One such documentation task is the discharge summary which summarizes a care episode. However, research shows that many discharge summaries written today are of lacking quality. One method which has the po- tential to alleviate the situation is natural language processing, specifically text summarization, as it could automatically summarize patient notes into a discharge summary. Aim This thesis aims to provide initial knowledge on the topic of summarization of Swedish clinical text into discharge summaries. Furthermore, this thesis aims to provide knowledge specifically on performing summarization using the Stockholm EPR Gastro ICD-10 Pseudo Corpus II dataset, consisting of Swedish electronic health record data. Method Using the design science framework, an artefact was produced in the form of a model, based on a pre-trained Swedish BART model, which can summarize patient notes into a discharge summary. This model was developed using the Hugging Face library and evaluated both via ROUGE scores as well as via a manual evaluation performed by a now retired healthcare professional. Results The discharge summaries produced from a test set by the artefact model achieved ROUGE-1/2/L/S scores of 0.280/0.057/0.122/0.068. The manual evaluation im- plies that the artefact is prone to fail to accurately include clinically important information, that the artefact produces text with low readability, and that the artefact is very prone to produce severe hallucinations. Conclusion The artefact’s performance is worse than the results of previous studies on the topic of summarization of patient notes into discharge summaries, in terms of ROUGE scores. The manual evaluation of the artefact performance suggests sev- eral shortcomings in its capabilities to accurately summarize a care episode. Since this was the first major work conducted on the topic of text summarization using the Stockholm EPR Gastro ICD-10 Pseudo Corpus II dataset, there are many possible directions for future works.

http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-228684

Patient Discharge Summaries

text summarization

Natural Language Processing

Transformer

BART

Other Computer and Information Science

Annan data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-228684
Date	January 2023
Creators	Berg, Nils
Publisher	Stockholms universitet, Institutionen för data- och systemvetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Fine-tuning and evaluating a Swedish language model for automatic discharge summary gener- ation from Swedish clinical notes

Description

Links & Downloads

Tags

Additional Fields