Global ETD Search

Return to search

TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes

Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines' performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable.

Information Extraction

Machine Learning

Natural Language Processing

Semi-structured data

Computer Sciences

Library and Information Science

Identifer	oai:union.ndltd.org:USF/oai:scholarcommons.usf.edu:etd-5517
Date	01 January 2012
Creators	Finch, Dezon K.
Publisher	Scholar Commons
Source Sets	University of South Flordia
Detected Language	English
Type	text
Format	application/pdf
Source	Graduate School Theses and Dissertations

Page generated in 0.0017 seconds

TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes

Description

Links & Downloads

Tags

Additional Fields