Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Mohammad ShoeybiMostofa PatwaryRaul PuriPatrick LeGresleyJared CasperBryan Catanzaro

   Papers with code   Abstract  PDF

Recent work in unsupervised language modeling demonstrates that training large neural language models advances the state of the art in Natural Language Processing applications. However, for very large models, memory constraints limit the size of models that can be practically trained... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.