First, take this fun quiz.
And then …
I counted up the 100 most common ‘words’ for the Google unigram data, and the following ‘words’ from the Quiz 100 are not found in the Google data:
call, come, could, day, did, down, each, go, had, her, him, hot, know, long, look, make, many, most, number, over, people, said, she, side, sound, them, then, thing, two, water, way, word, write
and the following words are in the Google data, but not in the Quiz data:
also, am, any, business, c, click, contact, e, free, get, help, here, home, information, its, me, new, news, not, online, only, our, page, pm, s, search, service, services, site, us, view, web, x
Also ‘s is in the Google top list.
You’d only get 67/100 if you were the WorldWideWeb. The point is, I guess, that the ‘domain’ matters a lot, as well as what you count as a ‘word.’
(copy of a comment made at Metafilter)