Global ETD Search

Return to search

Learning to rank by maximizing the AUC with linear programming for problems with binary output

Ranking is a popular machine learning problem that has been studied extensively for more then a decade. Typical machine learning algorithms are generally built to optimize predictive performance (usually measured in accuracy) by minimizing classification error. However, there are many real world problems where correct ordering of instances is of equal or greater importance than correct classification. Learning algorithms that are built to minimize classification error are often not effective when ordering within or among classes. This gap in research created a necessity to alter the objective of such algorithms to focus on correct ranking rather then classification.
Area Under the ROC Curve (AUC), which is equivalent to the Wicoxon-Mann-Whitney (WMW) statistic, is a widely accepted performance measure for evaluating ranking performance in binary classification problems. In this work we present a linear programming approach (LPR), similar to 1-norm Support Vector Machines (SVM), for ranking instances with binary outputs by maximizing an approximation to the WMW statistic. Our formulation handles non-linear problems by making use of kernel functions. The results on several well-known benchmark datasets show that our approach ranks better than 2-norm SVM and faster than the support vector ranker (SVR).
The number of constraints in the linear programming formulation increases quadratically with the number of data points considered for the training of the algorithm. We tackle this problem by implementing a number of exact and approximate speed-up approaches inspired by well-known methods such as chunking, clustering and subgradient methods. The subgradient method is the most promising because of its solution quality and its fast convergence to the optimal solution.
We adopted LPR formulation to survival analysis. With this approach it is possible to order subjects by risk for experiencing an event. Such an ordering enables determination of high-risk and low-risk groups among the subjects that can be helpful not only in medical studies but also in engineering, business and social sciences. Our results show that our algorithm is superior in time-to-event prediction to the most popular survival analysis tool, Cox's proportional hazard regression.

Ranking

Identifer	oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-1336
Date	01 January 2007
Creators	Ataman, Kaan
Contributors	Street, W. Nick
Publisher	University of Iowa
Source Sets	University of Iowa
Language	English
Detected Language	English
Type	dissertation
Format	application/pdf
Source	Theses and Dissertations
Rights	Copyright 2007 Kaan Ataman

Page generated in 0.0022 seconds

Learning to rank by maximizing the AUC with linear programming for problems with binary output

Description

Links & Downloads

Tags

Additional Fields