The Winograd schema is a test designed to
stretch computer language processing to its limit. If a machine can determine an intended referent
in a sentence based only on clues from context, then the machine is using reasoning to parse
the language, and, perhaps, it should be called intelligent. Humans are pretty good at working out referents
based on context. For example: "The trophy would not fit in the brown suitcase
because it was too big." What was too big? Linguists will sometimes denote referents
in text by using subscript numbers or letters. So if you want to solve that “x”, and work
out what it refers to in that sentence, then you have to know that a suitcase usually
contains things, and that a trophy usually
doesn’t contain things. And you have to know how containers work, that larger things can’t fit inside
smaller things. It seems simple to us,
because we understand not only how syntax works and whether
a sentence is grammatically valid, but also because we’ve also got experience of
suitcases and trophies and reality itself. Solving that “x” requires comprehension. Terry Winograd started this line of thinking
in 1972 by proposing sentences that required that sort of context. "The city councilmen refused the demonstrators
a permit because they 'feared violence' "or because they 'advocated revolution'." ...it was the 70s, there weren’t many councilwomen. And there was a lot of revolution. Winograd argued solving this sentence
requires knowing not only who can issue permits or what a demonstrator is, but also the political interests of a council, the ways in which they’d view
political change. That referent changes based purely on the
context and the meaning of those last two words. Now there are plenty of sentences where the
“x” is ambiguous, where even humans don't have enough context
to work out which pronoun refers to which person. If you’re writing a gay romance story, and both your main characters
use the same pronouns and appear all the time in the same sentences, then it’s going to get confusing fast. In more amateur work, like slash fiction, inexperienced writers often try to clear that
up with synecdoche in place of pronouns. But that’s awkward. More experienced writers tend to use context
and careful sentence construction, and readers never notice it. But Winograd schemas are designed
to be unambiguous for humans, to the point where we wouldn’t even have
to think about it, but difficult for computers. Natural Language Processing, the ability to
"understand" language, is in huge demand right now;
it’s what powers Siri, and Alexa, and the Google Assistant. So many companies are trying to create code
that can understand human requests, and let’s be honest,
most of that code sucks right now. They often just rely on collections of sentences, each word tagged with its part of speech;
they just pick up on keywords like “call” or “how much is” and extract what they
hope are the right bits around them. That doesn’t work for a Winograd schema. A team of researchers at the computer science
department of New York University have assembled a list of 150 of them, as a test. So: “I spread the cloth on the table in order
to protect it.” To protect what? Obvious if you understand what a cloth and
a table are, and why you might spread a cloth on a table, but if those are just tagged as nouns with
no context, then that’s completely ambiguous. Hi. Future Tom here, just interrupting
to say that after this video was finished and done and
all the graphics were completed, I found out that some had published a new
version of GPT-2, which is the leading machine-learning
text-generation system, and I figured that if I didn't
address that here, a load of people in the comments would be all, "Well, what about machine learning?
What about AI?" So I got a version of GPT-2, it's set up kind of like a Dungeons & Dragons
adventure system I told it to create a suitcase and a trophy and to put the suitcase in the trophy to see
what would happen. ARTIFICIAL VOICE:
"The moment you think about it, "you know what to do, so you do it with your
eyes closed: "you place your hand on top of it, then with
your other hand "you take out your gun
from inside your jacket pocket, "take aim at it, pull back one inch…" Yeah, if you think machine learning's going
to save us, uh, not yet! And of course, sometimes words can switch
parts of speech. "Main" is usually an adjective,
but in video games, it can be a verb or a noun. Humans who’ve not heard that use before
won’t understand it. The solution for now? Accept that Siri’s
not going to be great at complicated questions, and hire a team of underpaid contract workers
to manually tag data, in order to help programs "learn" patterns. But those methods will continue to fall short, because computers are missing the breadth
of knowledge that humans have access to. At least, for now. Artificial language processing
remains 10 years away, just as it has for the last few decades. One of my co-authors, Gretchen McCulloch,
has a wonderful podcast called "Lingthusiasm". You can listen to it at the link in the description.