Learning to Compute Word Embeddings On the Fly

Dzmitry BahdanauTom BoscStanisław JastrzębskiEdward GrefenstettePascal VincentYoshua Bengio

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data... (read more)

