Global ETD Search

Return to search

Extracting Clinical Findings from Swedish Health Record Text

Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. In this thesis, however, information extraction from Swedish clinical corpora is explored, particularly focusing on the extraction of clinical findings. Unlike most previous studies, Clinical Finding was divided into the two more granular sub-categories Finding (symptom/result of a medical examination) and Disorder (condition with an underlying pathological process). For detecting clinical findings mentioned in Swedish health record text, a machine learning model, trained on a corpus of manually annotated text, achieved results in line with the obtained inter-annotator agreement figures. The machine learning approach clearly outperformed an approach based on vocabulary mapping, showing that Swedish medical vocabularies are not extensive enough for the purpose of high-quality information extraction from clinical text. A rule and cue vocabulary-based approach was, however, successful for negation and uncertainty classification of detected clinical findings. Methods for facilitating expansion of medical vocabulary resources are particularly important for Swedish and other languages with less extensive vocabulary resources. The possibility of using distributional semantics, in the form of Random indexing, for semi-automatic vocabulary expansion of medical vocabularies was, therefore, evaluated. Distributional semantics does not require that terms or abbreviations are explicitly defined in the text, and it is, thereby, a method suitable for clinical corpora. Random indexing was shown useful for extending vocabularies with medical terms, as well as for extracting medical synonyms and abbreviation dictionaries.

Named entity recognition

Corpora development

Clinical text processing

Distributional semantics

Random indexing

Vocabulary expansion

Assertion classification

Clinical text mining

Electronic health records

Swedish

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-109254
Date	January 2014
Creators	Skeppstedt, Maria
Publisher	Stockholms universitet, Institutionen för data- och systemvetenskap, Stockholm University : Department of Computer and Systems Sciences, Stockholm University
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Doctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Report Series / Department of Computer & Systems Sciences, 1101-8526 ; 15-001

Page generated in 0.0024 seconds

Extracting Clinical Findings from Swedish Health Record Text

Description

Links & Downloads

Tags

Additional Fields