Dynamic Evaluation of Transformer Language Models

Ben KrauseEmmanuel KahembweIain MurraySteve Renals

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data

