Return to search

Shortcut Transformers and the Learnability of Automata

Transformers have emerged as a powerful architecture for various tasks in natural language processing, computer vision, and multi-modal domains. Despite their success, understanding the computational capabilities and limitations of transformers remains a challenge. This work focuses on relating transformers to deterministic finite automata (DFAs) and empirically investigates the architecture's ability to simulate DFAs of varying complexities. We empirically explore the simulation of DFAs by transformers, specifically focusing on the solvable A4-DFA and the non-solvable A5-DFA. We conduct experiments to evaluate the in-distribution and out-of-distribution accuracy of sub-linear depth transformers with positive results on both accounts. Additionally, we examine the impact of widening the transformer to find even shallower transformers for the  A4-DFA. While no significant improvements are observed compared to the sub-linear depth transformers, further exploration of hyperparameters is needed for more reliable results.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-210546
Date January 2023
CreatorsMartens, Willeke
PublisherUmeå universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUMNAD ; 1404

Page generated in 0.0022 seconds