Formal Algorithms for Transformers

Author(s): Mary Phuong, Marcus Hutter
Venue: N/A
Year: 2022

Paper: http://www.hutter1.net/publ/transalg.pdf

Abstract

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (not results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.

Additional information

arXiv
A Mathematical Framework for Transformer Circuits