Attention Is All You Need

Ashish VaswaniNoam ShazeerNiki ParmarJakob UszkoreitLlion JonesAidan N. GomezLukasz KaiserIllia Polosukhin

   Papers with code   Abstract  PDF

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.