Scaling Neural Machine Translation

Myle OttSergey EdunovDavid GrangierMichael Auli

   Papers with code   Abstract  PDF

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation... (read more)

Benchmarked Models

RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
Transformer Big
25.39
29.30
2
Transformer Big
25.39
29.30
RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
Transformer Big
32.76
--
2
Transformer Big
32.76
--
RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
Transformer Big
42.43
43.20