The majority of current multi-label classification research focuses on learning dependency structures among output labels. This paper provides a novel theoretical view on the purported assumption that effective multi-label classification models must exploit output dependencies. We submit that the flurry of recent dependency-exploiting, multi-label algorithms may stem from the deficiencies in existing datasets, rather than an inherent need to better model dependencies. We introduce a novel categorization of multi-label metrics, namely, evenly and unevenly weighted label metrics. We explore specific features that predispose datasets to improved classification by methods that model label dependence. Additionally, we provide an empirical analysis of 15 benchmark datasets, 1 real-life dataset, and a variety of synthetic datasets. We assert that binary relevance (BR) yields similar, if not better, results than dependency-exploiting models for metrics with evenly weighted label contributions. We qualify this claim with discussions on specific characteristics of datasets and models that render negligible the differences between BR and dependency-learning models.
Identifer | oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-7114 |
Date | 01 November 2016 |
Creators | Brodie, Michael Benjamin |
Publisher | BYU ScholarsArchive |
Source Sets | Brigham Young University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | All Theses and Dissertations |
Rights | http://lib.byu.edu/about/copyright/ |
Page generated in 0.0019 seconds