Return to search

Exploring Patient Classification Based on Medical Records : The case of implant bearing patients

In this thesis, the application of transformer-based models on the real-world task of identifying patients as implant bearing is investigated. The task is approached as a classification task and five transformer-based models relying on the BERT architecture are implemented, along with a Support Vector Machine (SVM) as a baseline for comparison. The models are fine-tuned with Swedish medical texts, i.e. patients’ medical histories. The five transformer-based models in question makes use of two pre-trained BERT models, one released by the National Library of Sweden and a second one using the same pre-trained model but which has also been further pre-trained on domain specific language. These are in turn fine-tuned using five different types of architectures. These are: (1) a typical BERT model, (2) GAN-BERT, (3) RoBERT, (4) chunkBERT, (5) a frequency based optimized BERT. The final classifier, an SVM baseline, is trained using TF-IDF as the feature space. The data used in the thesis comes from a subset of an unreleased corpus from four Swedish clinics that cover a span of five years. The subset contains electronic medical records of patients belonging to the radiology, and cardiology clinics. Four training sets were created, respectively containing 100, 200, 300, and 903 labelled records. The test set, containing 300 labelled samples, was also created from said subset. The labels upon which the models are trained are created by labelling the patients as implant bearing based on the amount of implant terms each patient history contain. The results are promising, and show favourable performance when classifying the patient histories. Models trained on 903 and 300 samples are able to outperform the baseline, and at their peak, BERT, chunkBERT and the frequency based optimized BERT achieves an F1-measure of 0.97. When trained using 100 and 200 labelled records all of the transformerbased models are outperformed by the baseline, except for the semi-supervised GAN-BERT which is able to achieve competitive scores with 200 records. There is not a clear delineation between using the pre-trained BERT or the BERT model that has additional pre-training on domain specific language. However, it is believed that further research could shed additional light on the subject since the results are inconclusive. / Patient-Safe Magnetic Resonance Imaging Examination by AI-based Medical Screening

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-187214
Date January 2022
CreatorsDanielsson, Benjamin
PublisherLinköpings universitet, Artificiell intelligens och integrerade datorsystem
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds