Thesis: S.M., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018. / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 101-109). / In this thesis, we study the computational and statistical aspects of several sparse models when the number of samples and/or features is large. We propose new statistical estimators and build new computational algorithms - borrowing tools and techniques from areas of convex and discrete optimization. First, we explore an Lq-regularized version of the Best Subset selection procedure which mitigates the poor statistical performance of the best-subsets estimator in the low SNR regimes. The statistical and empirical properties of the estimator are explored, especially when compared to best-subsets selection, Lasso and Ridge. Second, we propose new computational algorithms for a family of penalized linear Support Vector Machine (SVM) problem with a hinge loss function and sparsity-inducing regularizations. Our methods bring together techniques from Column (and Constraint) Generation and modern First Order methods for non-smooth convex optimization. These two components complement each others' strengths, leading to improvements of 2 orders of magnitude when compared to commercial LP solvers. Third, we present a novel framework inspired by Hierarchical Bayesian modeling to predict user session-length on on-line streaming services. The time spent by a user on a platform depends upon user-specific latent variables which are learned via hierarchical shrinkage. Our framework incorporates flexible parametric/nonparametric models on the covariates and outperforms state-of- the-art estimators in terms of efficiency and predictive performance on real world datasets from the internet radio company Pandora Media Inc. / by Antoine Dedieu. / S.M.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/119354 |
Date | January 2018 |
Creators | Dedieu, Antoine |
Contributors | Rahul Mazumder., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center. |
Publisher | Massachusetts Institute of Technology |
Source Sets | M.I.T. Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | 121 pages, application/pdf |
Rights | MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission., http://dspace.mit.edu/handle/1721.1/7582 |
Page generated in 0.0017 seconds