We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/7241 |
Date | 16 October 2001 |
Creators | Rennie, Jason D. M., Rifkin, Ryan |
Source Sets | M.I.T. Theses and Dissertation |
Language | en_US |
Detected Language | English |
Format | 14 p., 1240992 bytes, 1091543 bytes, application/postscript, application/pdf |
Relation | AIM-2001-026, CBCL-210 |
Page generated in 0.0018 seconds