Scaling Neural Machine Translation

Myle OttSergey EdunovDavid GrangierMichael Auli

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation... (read more)

