Benchmarking Every Open Source Model

Explore Benchmarks

ImageNet Image Classification
361 Models
COCO minival Object Detection
102 Models
WMT2014 English-German Machine Translation
10 Models
WMT2019 English-German Machine Translation
10 Models
WikiText-103 Language Modelling
8 Models
SQuAD1.1 dev Question Answering
5 Models
WMT2014 English-French Machine Translation
5 Models
SQuAD2.0 dev Question Answering
1 Model

Latest Results

REPOSITORY MODEL BENCHMARK CODE RESULT PAPER RESULT ε-REPRODUCED
ResNeXt-101 32x48d
ImageNet Image Classification
85.4% 85.4%
Xception
ImageNet Image Classification
79.0% 79.0%
Mask R-CNN
COCO minival Object Detection
0.477 0.474
ResNeXt-101-32x8d
ImageNet Image Classification
79.1% 79.3%
Transformer-XL Large
WikiText-103 Language Modelling
18.19 18.30

How it Works

INTEGRATED WITH GITHUB
Connect your repo and benchmark models for free
MODELS RUN ON BENCHMARKS
Free GPUs to run your code on public benchmarks
COMPARED TO PAPERS
Results compared to papers to test reproducibility

How to Add Your Repository

Step One : Evaluate Locally

Each public benchmark has its own instructions on how to use. For example, to use the Image Classification on ImageNet benchmark on your model in a framework-independent way, create a sotabench.py file like this:

Example sotabench.py structure
from sotabencheval.image_classification import ImageNetEvaluator

evaluator = ImageNetEvaluator(
    # automatically compare to this paper
    model_name='ResNeXt-101-32x8d',
    paper_arxiv_id='1611.05431'
)

predictions = ... # use your model to make predictions

evaluator.add(predictions)
evaluator.save()
Or, alternatively, you can use the PyTorch convenience wrapper.
Step Two : Connect Git and Showcase Results

Sotabench is like Continuous Integration, but instead of running unit tests, it benchmarks models in sotabench.py on every commit.

For developers, this is an easier way to continuously test their ML models, allowing for direct comparison with other repositories and papers.

For the community, this is a free and always up-to-date reference of ML model implementations.