Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov

   Papers with code   Abstract  PDF

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence... (read more)

Benchmarked Models

RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
Transformer-XL Large
18.19
18.30