Return to search

Forward engineering object recognition : a scalable approach / Forward engineering object recognition : a scalable approach, simple baselines, efficient benchmarks, high-throughput solution discovery and large-scale applications

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2011. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (p. 254-302). / The ease with which we recognize visual objects belies the computational difficulty of this feat. Despite the concerted efforts of both biological and computer vision research communities over the last forty years, human-level visual recognition remains an unsolved problem. The impact of a robust yet inexpensive solution would dramatically change computer science and neuroscience, unleashing a host of innovative applications in our modern society. In this thesis, we identify two operational barriers that have obstructed progress towards finding a solution { namely the lack of clear indicators and operational definitions of success, and the currently limited exploration of the staggeringly large hypothesis space of biologically- inspired solutions. To break down these barriers, we first establish new neuroscience-motivated baselines and new suites of fully-controlled benchmarks for object and face recognition. We also compare and contrast a variety of high-level visual systems, both artificial (state-of-the- art computer vision) and biological (humans). Then, we propose a simple high-throughput approach to undertake a systematic exploration of the biologically-inspired model class. By leveraging recent advances in massively parallel computing, we show that it is possible to generate a multitude of candidate models, screen them for desirable properties and discover robust solutions. Finally, we validate the scalability of our approach by showing its potential on large-scale real-world" applications. Taken together, this thesis represents a humble attempt towards an integrated approach to the problem of brain-inspired object recognition { spanning the engineering, specification, evaluation, and application of an interesting set of biologically-inspired ideas, driven and enabled by massively parallel technology. Even relatively early instantiations of this approach yield algorithms that achieve state-of-the-art performance in object recognition tasks and can generalize to other image domains. In addition, it offers insight into which computational ideas may be important for achieving this performance. Such insights can then be "fed back" into the design of new candidate models, constraining the search space and suggesting improvements, further guiding "evolutionary" progress. We hope that our results will point a new way forward, both in the creation of powerful yet simple computer vision systems and in providing insights into the computational underpinnings of biological vision. / by Nicolas Pinto. / Ph.D.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/62622
Date January 2011
CreatorsPinto, Nicolas
ContributorsJames J. DiCarlo., Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences., Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences.
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format[6], ii, 302 p., application/pdf
RightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.015 seconds