Augmenting Self-attention with Persistent Memory

Sainbayar SukhbaatarEdouard GraveGuillaume LampleHerve JegouArmand Joulin

   Papers with code   Abstract  PDF

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.