Experiences with children: on failed referring expressions and crossed concepts

Mauricio Mazuecos
3 min readJan 17, 2021
Photo by AHMED HINDAWI on Unsplash

This is a short anecdote involving children and referring expressions that left me thinking. As a nerd I am, I quickly thought about its relation to visually grounded NLP.

Time ago, my partner came to visit with their child. We spent the morning playing together and then they had to leave. The child started grabbing their toys and then said "I’m missing two white balls". We started looking for those and the child said that they were under a kitchens cabinet. We tried to find them but we could not see them, so we tried to reach for them with a broom. In that moment we found out that they were not "white balls". They were toy wheels. The child responded:

If it’s round, then it’s a ball.

The wheels were in front of our eyes, just that we didn’t pay them much attention. We were looking for the “white balls”. We had two of their most important features: they were round and contained the color white. Despite that, we completely ignored them when looking for them. We had a bad referring expression. But the reasoning behind the expression "two white balls" was correct given the context and the world’s knowledge of the child.

Referring expressions and crossed concepts

A referring expression is a noun phrase whose objective is to identify a particular object. Visually grounded dialogs tend to rely on these expressions to ground the context of a conversation into an object. When these phrases fail to identify a single objects, dialogs can end up weird and full of misunderstandings.

People can have non bijective functions internally relating words to real world entities or concepts. When people think of the same object when presented with two different words that represent different real world objects then I say they have a crossed concept. My “research” on this (a.k.a. talking with people) showed me that people can have different crossed concepts (thinking in the same object upon the words pot and vase, couch and bed, ants and spiders, etc). Then, it might not be the used word the most important feature of the referent of a dialog, but the attributes of such object what is relevant at the moment of determining which object the dialog is based on.

Take the example of the “white balls”. Being white an round were clear attributes of the wheels we were looking for. Although retrieving such attributes to try to infer which object the other speaker is actually referring to is not a trivial mental process to do.

An actual example from the MsCOCO dataset (COCO_train2014_000000348616.jpg). Both pots and vase are annotated as “vase” by human annotators. Even though this might be due to the low number of categories in the dataset, a crossed concept between pot and vase can be injected into models that train on this data.

At the date of writing this, I could not find work in recovering from failed referring expressions in dialog. Most of the work in visually grounded dialog is based on VisDial (Das et al., 2017) and GuessWhat?! (De Vries et al., 2017). I could not see that these datasets were capturing this phenomena. This is important mainly in the case of children and non native speakers (although not limited to them). Limited language capabilities makes room for failed referring expressions that may end up in little (or big) problems as virtual voice assistants become more popular and accessible for everyone.

I thought about this issue when I first-hand experienced a failed referring expression. As I like to share, I share with you this short text and hopefully get you thinking about something that may or may not have happened to you. Thank you for reading!

--

--

Mauricio Mazuecos

Computer Scientist. PhD student in Computer Science. Responsible use of AI is a must.