Why are you reading this, when you should be reading Natalia’s ball-peen hammer to the head?
One of the many things that it takes for a computer to “understand” text (as we are trying to do at Powerset) is for it recognize names and what they refer to. So for example, in the sentence “Maid Marian is the female companion to the legendary figure Robin Hood.”, a computer needs to see the names Maid Marian and Robin Hood are present, and have some kind of internal representation of Ms. Marian an d Mr. Hood.
WordNet is a kind of super-dictionary that knows things like John F. Kennedy was a president, and Robin Hood is a fictional character. Alas, it knows naught of Marian (although it does know Little John).
But, all abstractions are leaky, so I’m not particularly interested in bashing WordNet. Of course there will be gaps—some small (like not having Marian); some large (WordNet thinks saints of the Catholic flavor are a kind of god). One of the things I’m working on at Powerset is addressing some of these gaps.
What’s more disturbing is when there seem to be structural problems in the representational system. I am currently working on “named entity recognition,” meaning (at least) building systems that find names of things in running text, and knowing what types of things are being named. So, last week, I was trying to get a list of all the names of people in WordNet. Unfortunately, fictional characters are not “people” to WordNet. Oddly, there is one exception: Ali Baba is both a fictional character as well as a woodcutter (which, by turns, is a kind of person). But Ali Baba doesn’t cut down trees and chop wood as a job (as the WordNet gloss has woodcutters do); he cuts down imaginary trees and chops imaginary wood for his imaginary job; everything about him is fictional, right down to his forty thieves (well, maybe not the fortiness, but that’s another essay).
I think the right solution for this, within the WordNet framework, is to take advantage of WordNet’s adjectives. That is, noun types (like person or woodcutter) could be modified by adjectival descriptions, (like fictional or imaginary). After all, this is what adjectives are for, more or less. There’s a bunch of tricky stuff involved in doing this; but it’s the kind of thing lexicographers love, I think. So let ‘em at it!
(Rewritten April 29; translated into English)