Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov

   Papers with code   Abstract  PDF

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.