Building reliable machine learning systems for neuroscience

Neuroscience as a field is collecting more data than at any other time in history. The scale of this data allows us to ask fundamental questions about the mechanisms of brain function, the basis of behavior, and the development of disorders. Our ambitious goals as well as the abundance of data being recorded call for reproducible, reliable, and accessible systems to push the field forward. While we have made great strides in building reproducible and accessible machine learning (ML) systems for neuroscience, reliability remains a major issue.

In this dissertation, we show that we can leverage existing data and domain expert knowledge to build more reliable ML systems to study animal behavior. First, we consider animal pose estimation, a crucial component in many scientific investigations. Typical transfer learning ML methods for behavioral tracking treat each video frame and object to be tracked independently. We improve on this by leveraging the rich spatial and temporal structures pervasive in behavioral videos. Our resulting weakly supervised models achieve significantly more robust tracking. Our tools allow us to achieve improved results when we have imperfect, limited data while requiring users to label fewer training frames and speeding up training. We can more accurately process raw video data and learn interpretable units of behavior. In turn, these improvements enhance performance on downstream applications.

Next, we consider a ubiquitous approach to (attempt to) improve the reliability of ML methods, namely combining the predictions of multiple models, also known as deep ensembling. Ensembles of classical ML predictors, such as random forests, improve metrics such as accuracy by well-understood mechanisms such as improving diversity. However, in the case of deep ensembles, there is an open methodological question as to whether, given the choice between a deep ensemble and a single neural network with similar accuracy, one model is truly preferable over the other. Via careful experiments across a range of benchmark datasets and deep learning models, we demonstrate limitations to the purported benefits of deep ensembles. Our results challenge common assumptions regarding the effectiveness of deep ensembles and the “diversity” principles underpinning their success, especially with regards to important metrics for reliability, such as out-of-distribution (OOD) performance and effective robustness. We conduct additional studies of the effects of using deep ensembles when certain groups in the dataset are underrepresented (so-called “long tail” data), a setting whose importance in neuroscience applications is revealed by our aforementioned work.

Altogether, our results demonstrate the essential importance of both holistic systems work and fundamental methodological work to understand the best ways to apply the benefits of modern machine learning to the unique challenges of neuroscience data analysis pipelines. To conclude the dissertation, we outline challenges and opportunities in building next-generation ML systems.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/2806-aa41
Date January 2024
CreatorsBuchanan, Estefany Kelly
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0085 seconds