A weblog by Will Fitzgerald


I’ve been playing with Microsoft Research’s ngram server in my spare time. I’ll write some more about this in time, I hope.

The API returns base 10 logs for its probabilities. Because the probabilities returned are so very low, it makes more sense to return a log probability. I could be wrong, but I think it’s normal practice to return base 2 logs, or base e logs for log probabilities. They are easily to convert from one base to another, so it doesn’t really matter. But I was wondering why base 10 logs are returned.

The penny dropped when I was just playing around printing out base 10 of some probabilities:

log10(1.0) = 0.0
log10(0.1) = -1.0
log10(0.01) = -2.0
log10(0.001) = -3.0

I felt silly (since log10x = y by definition means x = 10y). But it’s really helpful to look at a number like ‑5.2721823 and know there are 5 zeros to the right of the decimal point. It’s even easier to parse than 0.0000053434.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: