Global ETD Search

Return to search

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Many manual biomedical annotation tasks can be categorized as instances of the typical multi-label classification problem where several categories or labels from a fixed set need to assigned to an input instance. MeSH term assignment to biomedical articles and diagnosis code extraction from medical records are two such tasks. To address this problem automatically, in this thesis, we present a way to utilize latent associations between labels based on output label sets. We used random indexing as a method to determine latent associations and use the associations as a novel feature in a learning-to-rank algorithm that reranks candidate labels selected based on either k-NN or binary relevance approach. Using this new feature as part of other features, for MeSH term assignment, we train our ranking model on a set of 200 documents, test it on two public datasets, and obtain new state-of-the-art results in precision, recall, and mean average precision. In diagnosis code extraction, we reach an average micro F-score of 0.478 based on a large EMR dataset from the University of Kentucky Medical Center, the first study of its kind to our knowledge. Our study shows the advantages and potential of random indexing method in determining and utilizing implicit relationships between labels in multi-label classification problems.

Information Retrieval

Other Computer Engineering

Identifer	oai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:cs_etds-1033
Date	01 January 2015
Creators	Lu, Yuan
Publisher	UKnowledge
Source Sets	University of Kentucky
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations--Computer Science

Page generated in 0.0025 seconds

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Description

Links & Downloads

Tags

Additional Fields