A weblog by Will Fitzgerald

Monthly Archives: May 2009

Zipf's law and city size

Olivia Judson writes a column for the New York Times, and she had a guest column by mathematician Steven Strogatz, a math professor at Cornell, “Math and the City.”

Strogatz writes:

One of the pleasures of looking at the world through mathematical eyes is that you can see certain patterns that would otherwise be hidden. This week’s column is about one such pattern. It’s a beautiful law of collective organization that links urban studies to zoology. It reveals Manhattan and a mouse to be variations on a single structural theme.

The mathematics of cities was launched in 1949 when George Zipf, a linguist working at Harvard, reported a striking regularity in the size distribution of cities. He noticed that if you tabulate the biggest cities in a given country and rank them according to their populations, the largest city is always about twice as big as the second largest, and three times as big as the third largest, and so on. In other words, the population of a city is, to a good approximation, inversely proportional to its rank. Why this should be true, no one knows.

When Strogatz means by “about twice as big” here isn’t “almost exactly,” but what he says in the restatement, “the population of a city is, to a good approximation, inversely proportional to its rank.”

He comes into a lot of criticism in the comments from people in different countries pointing out that (for example) New York isn’t almost exactly twice as big as Los Angeles. But that is not what he is saying; he is saying that if you know that San Francisco is the 41st biggest city, you can ‘to a good approximation’ estimate its population. Specifically, you can find a good regression equation of the form population = c + r^n, where c is a constant, r is the rank, and n is an estimated power. He suggest that n will be “between 0.7 and 0.9”.

I took population estimates for the 480 largest metropolitan areas in the world (from PopulationData.net) and plugged them into that advanced scientific calculation engine, Excel.

Here is the equation I got (population in millions):

population = 110.98r^-0.744

The estimate is not very accurate for the very biggest cities (the top ten, including NYC and LA), but after that, the equation estimates the population from the rank within a million or so. For example, it estimates that the population of San Francisco (the 41st largest city) to be 7 million instead of the actual population of 7.35 million — only off by 350,000 based on rank alone!