A weblog by Will Fitzgerald

Category Archives: Artificial Intelligence

What is missing? Watson, AI and search technology

All geekdom, and much that is outside our realm, is abuzz with news and discussion of the contest by IBM’s Jeopardy-playing computer, Watson, vs Ken Jennings and Brad Rutter. In last night’s game (the second half of a game started on Monday, Watson won decisively, answering many, many more questions than Jennings and Rutter combined. At the end of the first half, Rutter and Watson were tied. Lots of famous (and formerly famous) artificial intelligence researchers have weighed in on what this means for AI. I want to discuss one small point which highlights the difference between human intelligence in question answering and computerized question answering.

In Monday’s show, one of the Jeopardy “answers” was, “It was the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904.” The correct answer was something like, “What is a missing leg?” Apparently, the play sequence went like this: Ken Jennings buzzed in with “What is a missing hand?” Watson then buzzed in with “What is a leg?” This was first deemed correct by Alex Trebek, the host, but this judgment was reversed by the judges on review. I didn’t see the show (a Valentine’s Day dinner took precedent), but apparently the TV show edited out Trebek’s original decision. Because of the reversal, Watson was unable to give a “more specific answer,” and Rutter was unable to buzz in on Watson’s error.

It seems that Trebek was treating Watson’s answer as if a human had given it: If a human had said “What is a leg?” as a follow-up to the wrong question, “What is a missing hand?” it would make sense to treat this as having the same intention as “What is a missing leg?” But Watson doesn’t know what the other contestants are saying, and so it actually had no such context in which to give its answer. I think it is plausible that Trebek would have awarded Watson a correct answer if Watson had given its answer without the context Jennings’s question (or, perhaps, would have asked Watson to give a more specific answer), given the “anatomical oddity” context of the question.

People laughed when Watson gave a similar wrong answer to another of Jennings’s errors. Jennings answered “What are the ’20s?,” and then Watson said, “What is 1920s?” Interestingly, the press reports have pretty much all said that Watson gave the same answer as Jennings. But Watson’s answer, though it would have the same intent if given by a person, is different, both in its raw surface form and in its grammatical incorrectness. I don’t think the Jeopardy rules require grammatically correctness in the questions, for human players don’t have this kind of problem. They didn’t know, or remember in the game play, that Watson didn’t receive the other contestants’ answers.

Watson was penalized for getting the 1920s question wrong, and penalized for getting the leg question right, but in the wrong way. I find it fascinating that people–sometimes in real time, and sometimes only on deliberation–can navigate what it means to have correct and incorrect intentions with respect to human-made artifacts such as Watson, especially within the strong expectations set up by IBM and the Jeopardy game to treat the Watson system as an intentional agent. Most of Watson’s human-seeming qualities come from the expectations that get set up. For example, Watson uses templates such as “Let’s finish up <CATEGORY>, Alex” when there is only one answer left in the category). But the stronger expectations are set by giving it a human name, a synthesized voice, putting it between two humans, using the pronoun “he” when referring to the system, etc. But even given these expectations, people can notice, and react to, the breakdowns (and, in the case of the missing “missing,” a seeming non-breakdown).

Search engines, such as Google and Bing, have a feature often internally called “Answers,” that give an answer to an implied question right on the search results page. For example, enter “molecular weight of hydrogen” or “capital of Michigan” in Bing or Google, and you’ll get the answer without having to go to another page. No one confuses this with human intelligence, yet, at some level of analysis, this is what Watson is doing. Granted, search engines are not optimized for searches phrased as questions (if you ask Google today, “What was George Eyser’s problem?,” the caption will say “he ran out of the paper dishes on which to serve the ice…”; in this case, Bing does a better job, but it could have gone either way). But the extraction of facts and references from a very large number of documents and data sources, based on vague and ambiguous search queries is the essential job of search engines. For the most part, Watson is a search engine, specialized for Jeopardy, that has been given a human moniker and a few human mannerisms.


Language Identification: A Computational Linguistics Primer

Slides and results from a talk I gave at Kalamazoo College on language identification.

My co-worker at Powerset, Chris Biemann, has a nice paper on Unsupervised Language Identification

Ockham’s Razor is Dull

It’s all (well, mostly) about representation. Peter Turney:

[F]iguring out how to represent the problem is 95% of the work. By the time you have the representation right, the tool that you use to finish the remaining 5% is not terribly important.

OpenDMAP paper

For the few who might be interested: OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression (PDF).

OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at http://bionlp.sourceforge.net/

I worked on an early version of this system.

What's the opposite of 'hype'?

There must be an antonym for ‘hype.’ To hype something is to engage in hyperbole about it: Apple products have generated their share of hype. (For example: Steve Jobs said, “We made the buttons on the screen look so good you’ll want to lick them.”) I don’t mean ‘anti-hype’ in the sense of telling the approximate truth about something, desengaño or dis-illusionment. But after the IPhone announcement, Steve Ballmer said, “There’s no chance that the iPhone is going to get any significant market share. No chance.” Ballmer was deliberately understating the case. In rhetoric, meiosis comes pretty close. And diss comes pretty close, too, though both of these lack that ‘the person doing this should really know better” connotation of ‘hype.’

Anyway, I got to thinking about this after listening to Marketplace’s piece on the company I work for, Powerset, in which Leo LaPorte “explains its efforts” to Kai Ryssdal.

Example 1: LaPorte says, “I just want to point out that artificial intelligence has been a horrendous failure since the day the term was coined.” Well, the term was coined approximately 50 years ago, and, although there have been very significant failures in AI, there have been plenty of successes as well (see, for example, the list on the AAAI website, and its article on the “The AI Effect“).

Example 2: LaPorte (having been asked about AskJeeves) says, “AskJeeves is a very good example. They’re still around, they’ve been around as long as Google, they’ve spent a lot of money on advertising. But they’re still a distant second.” In other words, a company is only successful if they have more than half the market share; and an implication that Powerset claims it can beat Google in market share. As far as I know, Powerset has never claimed we can take away Google’s lead in the search market place–we are working on things which we think are better than some of Google’s approaches in core search, but this isn’t a claim that we can beat Google in market share.

I’d have a little more respect for the Marketplace commentary if they’d managed to spell our company name correctly. You know, you could have googled it to get it right, Kai.

Why the New Yorker cartoon caption contest winners are not especially funny

Here’s a QA with the primary gatekeeper to the New Yorker cartoon caption contest:

Q. Did your predecessor or Bob give you any advice when looking through the responses?
A. My predecessor stared me in the eyes and warned me that reading too many captions in one sitting could make a man crazy. Oh, and also to “pick the funny ones.”

Q. After a while isn’t it difficult to decide what’s funny? Do you say to yourself—“#4,347, sort of funny. #4,348—sort of but not quite funny enough?”
A. I’ve developed a system of sorting algorithms that allows a laptop to pick the finalists without any human input.

Q. Really?
A. Yes and no. What actually happens is that when each entry is received it’s sorted by keywords. The keywords are grouped into 5 or 6 categories. Then I sort through all the one-liners, zingers, gags, goofs and gaffes, looking for the very best—which I pass on to Bob.

Q. Uh…you had me, and then you lost me.
A. Take, for example, a recent contest cartoon depicting crash test dummies. All entered captions were broken into keyword groups like “insurance,” “driving,” “crashing.” So at that point it’s easier to read them and make the best choice.

Q. What if I decide to send in a caption in Esperanto?
A. All the unique captions are grouped together in a category we call “Huh?” “Huh?” captions have indeed made the finals. No Esperanto yet, though.

In other words,

  • There are too many entries to sort through,
  • They are sorted by an intern (who also has to do a lot of photocopying, as he says elsewhere in the interview),
  • A keyword based sorting algorithm does the first sorting.
  • Yep, that seems like a recipe for caption mediocrity.

    I wonder if it wouldn’t be better just to randomly select, say, fifty captions and choose the funniest of these.

Powerlabs coming soon ….

Powerlabs coming soon! See the our Powerset home page.

Actually, a little more robot overlordship than I'm currently experiencing would be nice

Sitting down to dinner, I said something like today I wrote a program that wrote a program, and its running righ now. Daughter and wife looked slightly alarmed, and dear wife asked if there were any chance that I was working myself out of a job. I said I didn’t think there was much chance that the coming robot wars were about to begin. Getting back to my computer later, I saw this on the console:

~/Work/ps_wordnet/src $ ./convert_wn_file.py -V3.0 -l/tmp ../data/wnprolog/3-0/*.pl > convert.py ; python convert.py
Traceback (most recent call last):
File "convert.py", line 368926, in ?
NameError: name 'n' is not defined

Yep, not much chance at all.

Quotes on names (2)

The uniqueness and immense pragmatic convenience of proper names in our language lie precisely in the fact that they enable us to refer publicly to objects without being forced to raise issues and come to agreement on what descriptive characteristics exactly constitute the identity of the object. They function not as descriptions, but as pegs on which to hang descriptions (Searle).

Quotes on names (1)

When Mr N.N. dies, one says that the bearer of the name dies, not that the meaning dies.

— Wittgenstein, Philosophical Investigations