Learned in Translation: Contextualized Word Vectors

Bryan McCannJames BradburyCaiming XiongRichard Socher

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors... (read more)

