Global ETD Search

Return to search

Representational Capabilities of Feed-forward and Sequential Neural Architectures

Despite the widespread empirical success of deep neural networks over the past decade, a comprehensive understanding of their mathematical properties remains elusive, which limits the abilities of practitioners to train neural networks in a principled manner. This dissertation provides a representational characterization of a variety of neural network architectures, including fully-connected feed-forward networks and sequential models like transformers.

The representational capabilities of neural networks are most famously characterized by the universal approximation theorem, which states that sufficiently large neural networks can closely approximate any well-behaved target function. However, the universal approximation theorem applies exclusively to two-layer neural networks of unbounded size and fails to capture the comparative strengths and weaknesses of different architectures.

The thesis addresses these limitations by quantifying the representational consequences of random features, weight regularization, and model depth on feed-forward architectures. It further investigates and contrasts the expressive powers of transformers and other sequential neural architectures. Taken together, these results apply a wide range of theoretical tools—including approximation theory, discrete dynamical systems, and communication complexity—to prove rigorous separations between different neural architectures and scaling regimes.

https://doi.org/10.7916/k33d-sm22

Computer science

Neural networks (Computer science)

Deep learning (Machine learning)

Computer networks--Scalability

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/k33d-sm22
Date	January 2024
Creators	Sanford, Clayton Hendrick
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0021 seconds

Representational Capabilities of Feed-forward and Sequential Neural Architectures

Description

Links & Downloads

Tags

Additional Fields