allenai / allennlp

EM F1
SPEED
MODEL CODE PAPER
ε-REPR
CODE PAPER
ε-REPR
PAPER
GLOBAL RANK
BIDAF
(single)
68.4% 67.7%
77.8% 77.3%
205.6 #5
See Full Build Details +get badge code
[![SotaBench](https://img.shields.io/endpoint.svg?url=https://sotabench.com/api/v0/badge/gh/mkardas/allennlp)](https://sotabench.com/user/marcin/repos/mkardas/allennlp)

How the Repository is Evaluated

The full sotabench.py file - source
from sotabencheval.utils import set_env_on_server, SOTABENCH_CACHE
from sotabencheval.question_answering import SQuADEvaluator, SQuADVersion
from tqdm import tqdm
import torch

set_env_on_server("ALLENNLP_CACHE_ROOT", SOTABENCH_CACHE / "allennlp")

from allennlp.data import DatasetReader
from allennlp.data.iterators import DataIterator
from allennlp.models.archival import load_archive
from allennlp.nn.util import move_to_device

BATCH_SIZE = 64
CUDA_ID = 0


def load_model(url, batch_size=BATCH_SIZE):
    archive = load_archive(url, cuda_device=CUDA_ID)
    model = archive.model
    reader = DatasetReader.from_params(archive.config["dataset_reader"])
    iterator_params = archive.config["iterator"]
    iterator_params["batch_size"] = batch_size
    data_iterator = DataIterator.from_params(iterator_params)
    data_iterator.index_with(model.vocab)
    return model, reader, data_iterator


def evaluate(model, dataset, data_iterator, evaluator):
    model.eval()
    evaluator.reset_time()
    with torch.no_grad():
        for batch in tqdm(data_iterator(dataset, num_epochs=1, shuffle=False),
                          total=data_iterator.get_num_batches(dataset)):
            batch = move_to_device(batch, CUDA_ID)
            predictions = model(**batch)
            answers = {metadata['id']: prediction
                       for metadata, prediction in zip(batch['metadata'], predictions['best_span_str'])}
            evaluator.add(answers)
            if evaluator.cache_exists:
                break


evaluator = SQuADEvaluator(
    local_root="data/nlp/squad",
    model_name="BiDAF (single)",
    paper_arxiv_id="1611.01603",
    version=SQuADVersion.V11
)

model, reader, data_iter = load_model("https://allennlp.s3.amazonaws.com/models/bidaf-model-2017.09.15-charpad.tar.gz")
dataset = reader.read(evaluator.dataset_path)

evaluate(model, dataset, data_iter, evaluator)

evaluator.save()
print(evaluator.results)
STATUS
BUILD
COMMIT MESSAGE
RUN TIME
Add time measurement
mkardas   7644621  ·  Oct 07 2019
0h:08m:27s
Add SQuAD 1.1 sotabench script
mkardas   7095227  ·  Oct 03 2019
0h:09m:07s
0h:08m:03s
0h:07m:47s
0h:02m:29s
0h:08m:15s
0h:07m:49s
0h:07m:03s
Avoid overwritting of sotabencheval
mkardas   bc76eba  ·  Oct 02 2019
0h:07m:05s
Use development version of sotabench
mkardas   7456434  ·  Oct 02 2019
0h:06m:08s
SQuAD sotabench script
mkardas   ddc0e96  ·  Oct 02 2019
0h:06m:59s
0h:03m:05s
SQuAD sotabench script
mkardas   ddc0e96  ·  Oct 02 2019
unknown
0h:06m:49s