Global ETD Search

Return to search

New optimization approaches to matrix factorization problems with connections to natural language processing

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, May, 2020 / Cataloged from the official PDF of thesis. / Includes bibliographical references (pages 245-260). / In this thesis, we propose novel formulation optimization methods for four matrix factorization problems in depth: sparse principal component analysis, compressed sensing, discrete component analysis, and latent Dirichlet allocation. For each new formulations, we develop efficient solution algorithms using discrete and robust optimization, and demonstrate tractability and effectiveness in computational experiments. In Chapter 1, we develop a framework for matrix factorization problems and provide a technical introduction to topic modeling with examples. Chapter 2, Certifiably optimal sparse principal component analysis, addresses the sparse principal component analysis (SPCA) problem. We propose a tailored branch-and- bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality. / We apply our methods to real data sets to demonstrate that our approach scales well and provides superior solutions compared to existing methods, explaining a higher proportion of variance and permitting more control over the desired sparsity. Chapter 3, optimal compressed sensing in submodular settings, presents a novel algorithm for compressed sensing that guarantees optimality under submodularity conditions rather than restricted isometry property (RIP) conditions. The algorithm defines submodularity properties of the loss function, derives lower bounds, and generates these lower bounds as constraints for use in a cutting planes algorithm. The chapter also develops a local search heuristic based on this exact algorithm. Chapter 4, Robust topic modeling, develops a new form of topic modeling inspired by robust optimization and by discrete component analysis. / The new approach builds uncertainty sets using one-sided constraints and two hypothesis tests, uses alternating optimization and projected gradient methods, including Adam and mirror descent, to find good local optima. In computational experiments, we demonstrate that these models are better able to avoid over-fitting than LDA and PLSA, and result in more accurate reconstruction of the underlying topic matrices. In Chapter 5, we develop modifications to latent Dirichlet allocation to account for differences in the distribution of topics by authors. The chapter adds author-specific topic priors to the generative process and allows for co-authorship, providing the model with increased degrees of freedom and enabling it to model an enhanced set of problems. The code for the algorithms developed in each chapter in the Julia language is available freely on GitHub at https://github.com/lauren897 / by Lauren Elizabeth Berk. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center

Operations Research Center.

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/127291
Date	January 2020
Creators	Berk, Lauren Elizabeth.
Contributors	Robert Freund., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center
Publisher	Massachusetts Institute of Technology
Source Sets	M.I.T. Theses and Dissertation
Language	English
Detected Language	English
Type	Thesis
Format	260 pages, application/pdf
Rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.0023 seconds

New optimization approaches to matrix factorization problems with connections to natural language processing

Description

Links & Downloads

Tags

Additional Fields