RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan LiuMyle OttNaman GoyalJingfei DuMandar JoshiDanqi ChenOmer LevyMike LewisLuke ZettlemoyerVeselin Stoyanov

   Papers with code   Abstract  PDF

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results... (read more)

Benchmarked Models

RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD