Global ETD Search

Return to search

Machine learning approaches to challenging problems : interpretable imbalanced classification, interpretable density estimation, and causal inference

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 111-118). / In this thesis, I address three challenging machine-learning problems. The first problem that we address is the imbalanced data problem. We propose two algorithms to handle highly imbalanced classification problems. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. The second method uses an approximation in order to assist with scalability. Specifically, it follows a characterize-then-discriminate approach. The positive class is first characterized by boxes, and then each box boundary becomes a separate discriminative classifier. This method is computationally advantageous because it can be easily parallelized, and considers only the relevant regions of the feature space. The second problem is a density estimation problem for categorical data sets. We present tree- and list- structured density estimation methods for binary/categorical data. We present three generative models, where the first one allows the user to specify the number of desired leaves in the tree within a Bayesian prior. The second model allows the user to specify the desired number of branches within the prior. The third model returns lists (rather than trees) and allows the user to specify the desired number of rules and the length of rules within the prior. Finally, we present a new machine learning approach to estimate personalized treatment effects in the classical potential outcomes framework with binary outcomes. Strictly, both treatment and control outcomes must be measured for each unit in order to perform supervised learning. However, in practice, only one outcome can be observed per unit. To overcome the problem that both treatment and control outcomes for the same unit are required for supervised learning, we propose surrogate loss functions that incorporate both treatment and control data. The new surrogates yield tighter bounds than the sum of the losses for the treatment and control groups. A specific choice of loss function, namely a type of hinge loss, yields a minimax support vector machine formulation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlinear (also non-parametric) estimation. / by Siong Thye Goh. / Ph. D.

Operations Research Center.

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/119281
Date	January 2018
Creators	Goh, Siong Thye
Contributors	Cynthia Rudin and Roy Welsch., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center.
Publisher	Massachusetts Institute of Technology
Source Sets	M.I.T. Theses and Dissertation
Language	English
Detected Language	English
Type	Thesis
Format	118 pages, application/pdf
Rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.2429 seconds

Machine learning approaches to challenging problems : interpretable imbalanced classification, interpretable density estimation, and causal inference

Description

Links & Downloads

Tags

Additional Fields