Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where... (read more)
Benchmarked Models
RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD