Return to search

Representational Capabilities of Feed-forward and Sequential Neural Architectures

Despite the widespread empirical success of deep neural networks over the past decade, a comprehensive understanding of their mathematical properties remains elusive, which limits the abilities of practitioners to train neural networks in a principled manner. This dissertation provides a representational characterization of a variety of neural network architectures, including fully-connected feed-forward networks and sequential models like transformers.

The representational capabilities of neural networks are most famously characterized by the universal approximation theorem, which states that sufficiently large neural networks can closely approximate any well-behaved target function. However, the universal approximation theorem applies exclusively to two-layer neural networks of unbounded size and fails to capture the comparative strengths and weaknesses of different architectures.

The thesis addresses these limitations by quantifying the representational consequences of random features, weight regularization, and model depth on feed-forward architectures. It further investigates and contrasts the expressive powers of transformers and other sequential neural architectures. Taken together, these results apply a wide range of theoretical tools—including approximation theory, discrete dynamical systems, and communication complexity—to prove rigorous separations between different neural architectures and scaling regimes.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/k33d-sm22
Date January 2024
CreatorsSanford, Clayton Hendrick
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0021 seconds