Shaping Visual Representations with Language for Few-shot Classification. (arXiv:1911.02683v1 [cs.CV])

5 years ago

Shaping Visual Representations with Language for Few-shot Classification. (arXiv:1911.02683v1 [cs.CV])

Jesse Mu, Percy Liang, Noah Goodman

Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification.

Publisher URL: http://arxiv.org/abs/1911.02683

DOI: arXiv:1911.02683v1