Will.Whim

A weblog by Will Fitzgerald

Category Archives: Artificial Intelligence

WordNet, saints and Robin Hood

Why are you reading this, when you should be reading Natalia’s ball-peen hammer to the head?

One of the many things that it takes for a computer to “understand” text (as we are trying to do at Powerset) is for it recognize names and what they refer to. So for example, in the sentence “Maid Marian is the female companion to the legendary figure Robin Hood.”, a computer needs to see the names Maid Marian and Robin Hood are present, and have some kind of internal representation of Ms. Marian an d Mr. Hood.

WordNet is a kind of super-dictionary that knows things like John F. Kennedy was a president, and Robin Hood is a fictional character. Alas, it knows naught of Marian (although it does know Little John).

But, all abstractions are leaky, so I’m not particularly interested in bashing WordNet. Of course there will be gaps—some small (like not having Marian); some large (WordNet thinks saints of the Catholic flavor are a kind of god). One of the things I’m working on at Powerset is addressing some of these gaps.

What’s more disturbing is when there seem to be structural problems in the representational system. I am currently working on “named entity recognition,” meaning (at least) building systems that find names of things in running text, and knowing what types of things are being named. So, last week, I was trying to get a list of all the names of people in WordNet. Unfortunately, fictional characters are not “people” to WordNet. Oddly, there is one exception: Ali Baba is both a fictional character as well as a woodcutter (which, by turns, is a kind of person). But Ali Baba doesn’t cut down trees and chop wood as a job (as the WordNet gloss has woodcutters do); he cuts down imaginary trees and chops imaginary wood for his imaginary job; everything about him is fictional, right down to his forty thieves (well, maybe not the fortiness, but that’s another essay).

I think the right solution for this, within the WordNet framework, is to take advantage of WordNet’s adjectives. That is, noun types (like person or woodcutter) could be modified by adjectival descriptions, (like fictional or imaginary). After all, this is what adjectives are for, more or less. There’s a bunch of tricky stuff involved in doing this; but it’s the kind of thing lexicographers love, I think. So let ‘em at it!

(Rewritten April 29; translated into English)

Advertisements

Semantic Parser, best in show

I got an email today from Larry Hunter at University of Colorado School of Medicine, who writes to say that a DMAP-based parser was the “international world-champion” in the The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) data text mining challenge. The chart above are the results from the Protein-Protein Interaction Task.

My colleague at Powerset, Jim Firby and I worked on some of the semantic parsing technology used by the Colorado group. Members of the Center for Computational Pharmacology Biomedical Text Mining Group were the direct researchers on this project though. Congratulations, Larry and team!

Word game!

A fun word game oddly related to my current work: Varun’s word games.

Powerset in NY Times, Venturebeat

The company I work for, Powerset, is being written up in the New York Times and Venturebeat, both reporting on a deal between Powerset and PARC. The New York Times article (behind a registration wall, alack) mostly follows the storied arc of the Palo Alto Research Center, where the graphical user interface and ethernet were born but capitalized on more by rivals rather than its original owner, Xerox. Our CEO, Barney Pell is quoted:

For a lot of things, keyword search works well. But I think we are going to look back in 10 years and say, remember when we used to search using keywords.

The Venturebeat article is more interesting, I think. It tells a bit more of the story behind the deal between Powerset and PARC. It also includes an interview with Peter Norvig, the Director of Research at Google. Norvig is a GOFAI (“good old fashioned artificial intelligence”) guy who switched to what is now the standard modern AI approaches to language (crunching the statistics of words and other things) “about 15 years ago,” as he says in the Venturebeat article. He also says (when asked about natural language and search),

We feel there is a lot to do in the field of search, with many ways to approach it. Search remains at the core of everything Google does and we are always working to improve it.

This feels similar to what Pell said in his weblog a while ago, “Search is in its early days, and natural language is the future of search.”

This makes us, of course, a rebel with a clause.

Machine evolution and consciousness (circa 1872)

From Samuel Butler’s The Book of the Machines in Erewhon:

Complex now, but how much simpler and more intelligibly organised may [a machine] not become in another hundred thousand years? or in twenty thousand? For man at present believes that his interest lies in that direction; he spends an incalculable amount of labour and time and thought in making machines breed always better and better; he has already succeeded in effecting much that at one time appeared impossible, and there seem no limits to the results of accumulated improvements if they are allowed to descend with modification from generation to generation. It must always be remembered that man’s body is what it is through having been moulded into its present shape by the chances and changes of many millions of years, but that his organisation never advanced with anything like the rapidity with which that of the machines is advancing. This is the most alarming feature in the case, and I must be pardoned for insisting on it so frequently.

Butler’s protagonist, Higgs, is reporting the theory behind the Erewhonians’ decision to limit technologoical development.

Lemonodor: Stanford Wins Grand Challenge

Lemonodor: Stanford Wins Grand Challenge Great pix from John and Lori — follow the Flickr links.

Borges: a whale of an error

I received an email from Justin Bur that says (quoting with permission):

Your English version of Borges’ celebrated essay http://www.entish.org/essays/Wilkins.html contains a rather spectacular mistranslation based presumably on a typo in the original Spanish. Section 16 of Wilkins’ taxonomy classifies the *whale*, not beauty, as a viviparous oblong fish. There is a complete text online at http://reliant.teknowledge.com/Wilkins, and the relevant passage is on page 132.

The confusion seems to arise from the graphic similarity between “ballena” (whale) and “belleza” (beauty) in Spanish. I cannot imagine that Borges himself did not know the difference, but I have no idea where the error first arose.

Hmm. I based my translation on the Spanish original at http://www.ldc.upenn.edu/myl/wilkins.html. It has belleza/beauty. Douglas Crockford’s Spanish version and translation has ballena/whale. I looked for other online copies of the Spanish original: a comp lit site has “belleza.” Another Language and Literacy post (one has to scroll down quite a bit) has “belleza/beauty.” This Finnish site has “belleza.” I don’t have a Spanish print edition of Otras Inquisiciones to check.

Perhaps Borges was using a Babelfish translation. Seriously, I’m curious whether Borges was using a bad translation, or just had an eye for ‘beauty.’ This might mean I have to actually go to a physical library to find out …

those which, from a distance, look like flies (*)

Daughter Jane straightened out our DVDs today, at her mother’s request. She decided to organize by genre:

  1. Fighting movies (e.g. Crouching Tiger, Hidden Dragon),
  2. Animation (e.g. The Incredibles),
  3. Classicish (e.g. It’s a Wonderful Life),
  4. Magic (e.g., Harry Potter),
  5. Modern Action (e.g., Charlie’s Angels),
  6. Princess Movies (e.g. The Princess Diaries),
  7. Movies That I Have No Intention of Watching in the Next Millenium (e.g. 2001, A Space Oddesey).

(*) See The Analytical Language of John Wilkins.

The Onion | Never In My Wildest Dreams Did I Think I'd Get Bored Watching Robots Fight

TeamOsaka's Robocup entry…