comp-syn: Perceptually Grounded Word Embeddings with Color

Desikan, Bhargav Srinivasa; Hull, Tasker; Nadler, Ethan O.; Guilbeault, Douglas; Kar, Aabir Abubaker; Chu, Mark; Sardo, Donald Ruggiero Lo

Computer Science > Computation and Language

arXiv:2010.04292 (cs)

[Submitted on 8 Oct 2020 (v1), last revised 19 Oct 2020 (this version, v2)]

Title:comp-syn: Perceptually Grounded Word Embeddings with Color

Authors:Bhargav Srinivasa Desikan, Tasker Hull, Ethan O. Nadler, Douglas Guilbeault, Aabir Abubaker Kar, Mark Chu, Donald Ruggiero Lo Sardo

View PDF

Abstract:Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings for over 40,000 English words, each associated with crowd-sourced word concreteness judgments.

Comments:	9 pages, 3 figures, all code and data available at this https URL. Forthcoming in the Proceedings of the 28th International Conference on Computational Linguistics
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Cite as:	arXiv:2010.04292 [cs.CL]
	(or arXiv:2010.04292v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.04292

Submission history

From: Douglas Guilbeault R [view email]
[v1] Thu, 8 Oct 2020 22:50:06 UTC (1,865 KB)
[v2] Mon, 19 Oct 2020 05:22:54 UTC (1,865 KB)

Computer Science > Computation and Language

Title:comp-syn: Perceptually Grounded Word Embeddings with Color

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:comp-syn: Perceptually Grounded Word Embeddings with Color

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators