Return to search

Solving a mixed-integer programming formulation of a classification model with misclassification limits

Classification, the development of rules for the allocation of observations to one or more groups, is a fundamental problem in machine learning and has been applied to many problems in medicine and business. We consider aspects of a classification model developed by Gallagher, Lee, and Patterson that is based on a result by Anderson. The model seeks to maximize the probability of correct G-group classification, subject to limits on misclassification probabilities. The mixed-integer programming formulation of the model is an empirical method for estimating the parameters of an optimal classification rule, which are identified as coefficients of linear functions by Anderson.

The model is shown to be a consistent method for estimating the parameters of the optimal solution to the problem of maximizing the probability of correct classification subject to limits on inter-group misclassification probabilities. A polynomial time algorithm is described for two-group instances. The method is NP-complete for a general number of groups, and an approximation is formulated as a mixed-integer program (MIP). The MIP is difficult to solve due to the formulation of constraints wherein certain variables are equal to the maximum of a set of linear functions. These constraints are conducive to an ill-conditioned coefficient matrix. Methods for generating edges of the conflict graph and conflict hypergraphs are discussed. The conflict graph is employed for finding cuts in a branch-and-bound framework. This technique and others lead to improvement in solution time over industry-standard software on instances generated by real-world data. The classification accuracy of the model in relation to standard classification methods on real-world and simulated data is also noted.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/7473
Date25 August 2005
CreatorsBrooks, J. Paul
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeDissertation
Format1129431 bytes, application/pdf

Page generated in 0.0017 seconds