Will.Whim

A weblog by Will Fitzgerald

WordNet, saints and Robin Hood

Why are you reading this, when you should be reading Natalia’s ball-peen hammer to the head?

One of the many things that it takes for a computer to “understand” text (as we are trying to do at Powerset) is for it recognize names and what they refer to. So for example, in the sentence “Maid Marian is the female companion to the legendary figure Robin Hood.”, a computer needs to see the names Maid Marian and Robin Hood are present, and have some kind of internal representation of Ms. Marian an d Mr. Hood.

WordNet is a kind of super-dictionary that knows things like John F. Kennedy was a president, and Robin Hood is a fictional character. Alas, it knows naught of Marian (although it does know Little John).

But, all abstractions are leaky, so I’m not particularly interested in bashing WordNet. Of course there will be gaps—some small (like not having Marian); some large (WordNet thinks saints of the Catholic flavor are a kind of god). One of the things I’m working on at Powerset is addressing some of these gaps.

What’s more disturbing is when there seem to be structural problems in the representational system. I am currently working on “named entity recognition,” meaning (at least) building systems that find names of things in running text, and knowing what types of things are being named. So, last week, I was trying to get a list of all the names of people in WordNet. Unfortunately, fictional characters are not “people” to WordNet. Oddly, there is one exception: Ali Baba is both a fictional character as well as a woodcutter (which, by turns, is a kind of person). But Ali Baba doesn’t cut down trees and chop wood as a job (as the WordNet gloss has woodcutters do); he cuts down imaginary trees and chops imaginary wood for his imaginary job; everything about him is fictional, right down to his forty thieves (well, maybe not the fortiness, but that’s another essay).

I think the right solution for this, within the WordNet framework, is to take advantage of WordNet’s adjectives. That is, noun types (like person or woodcutter) could be modified by adjectival descriptions, (like fictional or imaginary). After all, this is what adjectives are for, more or less. There’s a bunch of tricky stuff involved in doing this; but it’s the kind of thing lexicographers love, I think. So let ‘em at it!

(Rewritten April 29; translated into English)

Advertisements

One response to “WordNet, saints and Robin Hood

  1. Lukas Biewald April 30, 2007 at 6:59 pm

    I think a huge number of Wordnet’s problems come from the fact that it tries to enforce a tree structure when the hypernym/hyponym should be a directed acyclic graph. Then all people could be a hyponym of person as they should be as well as other things.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: