Return to search

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Many manual biomedical annotation tasks can be categorized as instances of the typical multi-label classification problem where several categories or labels from a fixed set need to assigned to an input instance. MeSH term assignment to biomedical articles and diagnosis code extraction from medical records are two such tasks. To address this problem automatically, in this thesis, we present a way to utilize latent associations between labels based on output label sets. We used random indexing as a method to determine latent associations and use the associations as a novel feature in a learning-to-rank algorithm that reranks candidate labels selected based on either k-NN or binary relevance approach. Using this new feature as part of other features, for MeSH term assignment, we train our ranking model on a set of 200 documents, test it on two public datasets, and obtain new state-of-the-art results in precision, recall, and mean average precision. In diagnosis code extraction, we reach an average micro F-score of 0.478 based on a large EMR dataset from the University of Kentucky Medical Center, the first study of its kind to our knowledge. Our study shows the advantages and potential of random indexing method in determining and utilizing implicit relationships between labels in multi-label classification problems.

Identiferoai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:cs_etds-1033
Date01 January 2015
CreatorsLu, Yuan
PublisherUKnowledge
Source SetsUniversity of Kentucky
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations--Computer Science

Page generated in 0.0025 seconds