Global ETD Search

Return to search

Shortcut Transformers and the Learnability of Automata

Transformers have emerged as a powerful architecture for various tasks in natural language processing, computer vision, and multi-modal domains. Despite their success, understanding the computational capabilities and limitations of transformers remains a challenge. This work focuses on relating transformers to deterministic finite automata (DFAs) and empirically investigates the architecture's ability to simulate DFAs of varying complexities. We empirically explore the simulation of DFAs by transformers, specifically focusing on the solvable A4-DFA and the non-solvable A5-DFA. We conduct experiments to evaluate the in-distribution and out-of-distribution accuracy of sub-linear depth transformers with positive results on both accounts. Additionally, we examine the impact of widening the transformer to find even shallower transformers for the A4-DFA. While no significant improvements are observed compared to the sub-linear depth transformers, further exploration of hyperparameters is needed for more reliable results.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-210546

transformers

modern neural networks

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-210546
Date	January 2023
Creators	Martens, Willeke
Publisher	Umeå universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UMNAD ; 1404

Page generated in 0.0022 seconds

Shortcut Transformers and the Learnability of Automata

Description

Links & Downloads

Tags

Additional Fields