Deep High-Resolution Representation Learning for Visual Recognition

Jingdong WangKe SunTianheng ChengBorui JiangChaorui DengYang ZhaoDong LiuYadong MuMingkui TanXinggang WangWenyu LiuBin Xiao

   Papers with code   Abstract  PDF

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation... (read more)

Benchmarked Models

RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
2
HRNetV2-W64
79.5%
--
4
HRNetV2-W48
79.3%
--
6
HRNetV2-W40
78.9%
--
8
HRNetV2-W44
78.9%
--
10
HRNetV2-W32
78.4%
--
12
HRNetV2-W30
78.2%
--
14
HRNetV2-W18
76.8%
--
15
HRNet-W18-C-Small-V2
75.1%
--
16
HRNet-W18 Small V2
75.1%
--
17
HRNet-W18-C-Small-V1
72.3%
--
18
HRNet-W18 Small V1
72.3%
--