Return to search

Methods for convex optimization and statistical learning

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2016. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 219-225). / We present several contributions at the interface of first-order methods for convex optimization and problems in statistical machine learning. In the first part of this thesis, we present new results for the Frank-Wolfe method, with a particular focus on: (i) novel computational guarantees that apply for any step-size sequence, (ii) a novel adjustment to the basic algorithm to better account for warm-start information, and (iii) extensions of the computational guarantees that hold in the presence of approximate subproblem and/or gradient computations. In the second part of the thesis, we present a unifying framework for interpreting "greedy" first-order methods -- namely Frank-Wolfe and greedy coordinate descent -- as instantiations of the dual averaging method of Nesterov, and we discuss the implications thereof. In the third part of the thesis, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal low-rank solutions for nuclear norm regularized matrix completion and, for more general problems, induces near-optimal "well-structured" solutions. We establish computational guarantees that trade off efficiency in computing near-optimal solutions with upper bounds on the rank of iterates. We then present extensive computational results that show significant computational advantages over existing related approaches, in terms of delivering low rank and low run-time to compute a target optimality gap. In the fourth part of the thesis, we analyze boosting algorithms in linear regression from the perspective modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression can be viewed as subgradient descent to minimize the maximum absolute correlation between features and residuals. We also propose a slightly modified boosting algorithm that yields an algorithm for the Lasso, and that computes the Lasso path. Our perspective leads to first-ever comprehensive computational guarantees for all of these boosting algorithms, which provide a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm, for any dataset. In the fifth and final part of the thesis, we present several related results in the contexts of boosting algorithms for logistic regression and the AdaBoost algorithm. / by Paul Grigas. / Ph. D.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/106683
Date January 2016
CreatorsGrigas, Paul (Paul Edward)
ContributorsRobert M. Freund., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center.
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format225 pages, application/pdf
RightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.002 seconds