Pay Less Attention with Lightweight and Dynamic Convolutions

Felix WuAngela FanAlexei BaevskiYann N. DauphinMichael Auli

   Papers with code   Abstract  PDF

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step... (read more)

Benchmarked Models

RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
DynamicConv
29.76
29.70
2
DynamicConv
(without GLUs)
29.17
--
3
DynamicConv
(without GLUs)
29.17
--
4
LightConv
29.17
28.90
5
LightConv
29.17
28.90
6
DynamicConv
29.10
29.70
7
LightConv
(without GLUs)
28.44
--
8
LightConv
(without GLUs)
28.44
--
RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
DynamicConv
(without GLUs)
38.28
--
2
DynamicConv
(without GLUs)
38.28
--
3
LightConv
(without GLUs)
37.78
--
4
LightConv
(without GLUs)
37.78
--
5
DynamicConv
37.66
--
6
DynamicConv
37.64
--
RANK
MODEL
REPO
CODE RESULT
PAPER RESULT
ε-REPRODUCED
BUILD
1
DynamicConv
42.41
43.20
2
LightConv
42.25
43.10