Return to search

Reliable validation : new perspectives on adaptive data analysis and cross-validation / New perspectives on adaptive data analysis and cross-validation

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2018. / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 107-109). / Validation refers to the challenge of assessing how well a learning algorithm performs after it has been trained on a given data set. It forms an important step in machine learning, as such assessments are then used to compare and choose between algorithms and provide reasonable approximations of their accuracy. In this thesis, we provide new approaches for addressing two common problems with validation. In the first half, we assume a simple validation framework, the holdout set, and address an important question of how many algorithms can be accurately assessed using the same holdout set, in the particular case where these algorithms are chosen adaptively. We do so by first critiquing the initial approaches to building a theory of adaptivity, then offering an alternative approach and preliminary results within this approach, all geared towards characterizing the inherent challenge of adaptivity. In the second half, we address the validation framework itself. Most common practice does not just use a single holdout set, but averages results from several, a family of techniques known as cross-validation. In this work, we offer several new cross-validation techniques with the common theme of utilizing training sets of varying sizes. This culminates in hierarchical cross-validation, a meta-technique for using cross-validation to choose the best cross-validation method. / by Samuel Scott Elder. / Ph. D.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/120660
Date January 2018
CreatorsElder, Samuel Scott
ContributorsJonathan Kelner and Tamara Broderick., Massachusetts Institute of Technology. Department of Mathematics., Massachusetts Institute of Technology. Department of Mathematics.
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format109 pages, application/pdf
RightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.0017 seconds