FRAGE: Frequency-Agnostic Word Representation

Chengyue GongDi HeXu TanTao QinLiwei WangTie-Yan Liu

   Papers with code   Abstract  PDF

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.