Global ETD Search

Return to search

HIGHER ORDER OPTIMIZATION TECHNIQUES FOR MACHINE LEARNING

<div>
<div>
<div>
<p>First-order methods such as Stochastic Gradient Descent are methods of choice
for solving non-convex optimization problems in machine learning. These methods
primarily rely on the gradient of the loss function to estimate descent direction.
However, they have a number of drawbacks, including converging to saddle points
(as opposed to minima), slow convergence, and sensitivity to parameter tuning. In
contrast, second order methods that use curvature information in addition to the
gradient, have been shown to achieve faster convergence rates, theoretically. When
used in the context of machine learning applications, they offer faster (quadratic)
convergence, stability to parameter tuning, and robustness to problem conditioning.
In spite of these advantages, first order methods are commonly used because of their
simplicity of implementation and low per-iteration cost. The need to generate and
use curvature information in the form of a dense Hessian matrix makes each iteration
of second order methods more expensive.
</p><p><br></p>
<p>In this work, we address three key problems associated with second order methods
– (i) what is the best way to incorporate curvature information into the optimization
procedure; (ii) how do we reduce the operation count of each iteration in a second order method, while maintaining its superior convergence property; and (iii) how do we
leverage high-performance computing platforms to significant accelerate second order
methods. To answer the first question, we propose and validate the use of Fisher
information matrices in second order methods to significantly accelerate convergence.
The second question is answered through the use of statistical sampling techniques
that suitably sample matrices to reduce per-iteration cost without impacting convergence. The third question is addressed through the use of graphics processing units
(GPUs) in distributed platforms to deliver state of the art solvers.</p></div></div></div><div><div><div>
<p>Through our work, we show that our solvers are capable of significant improvement
over state of the art optimization techniques for training machine learning models.
We demonstrate improvements in terms of training time (over an order of magnitude
in wall-clock time), generalization properties of learned models, and robustness to
problem conditioning.
</p>
</div>
</div>
</div>

10.25394/pgs.11328545.v1

Computer Engineering

Applied Computer Science

Computer Software

Distributed Computing

Optimisation

machine learning methods

optimization process

Distributed Computing System

GPU implementations

Convex optimization solvers

Non-Convex Optimization

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/11328545
Date	09 December 2019
Creators	Sudhir B. Kylasa (5929916)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/HIGHER_ORDER_OPTIMIZATION_TECHNIQUES_FOR_MACHINE_LEARNING/11328545

Page generated in 0.002 seconds

HIGHER ORDER OPTIMIZATION TECHNIQUES FOR MACHINE LEARNING

Description

Links & Downloads

Tags

Additional Fields