# Zipf law for City Size Distributions

A striking empirical regularity in urban economics is that there is a linear relationship between the log rank and the log population of metro areas. Take, for instance, the list of U.S. metro areas, ordered in decreasing population sizes. Assign rank 1 to the most populated metro area, rank 2 to the second most-populated area. Then Zipf’s “law” states that there is a linear relationship between the log rank and log population, with a slope of 1. With a simple scatter plot using 2015 data, and the 135 largest metro areas, we get this:

This is a fairly striking statistical regularity: the estimated coefficient is 1.01, but perhaps even more surprising the R squared of the regression is 98% ! I don’t particularly remember better fitting relationships in my career as an economist. Now this coefficient of 1, found in Gabaix (1999) and oft-repeated, turns out to be a non-robust estimate. This is what we get with the full sample of 380+ metro areas.

The coefficient is quite different from 1, it is actually 0.8 with a very small standard error. So here the impressive finding is the linearity of the relationship, not so much the slope of -1.  Replications of Zipf’s law for multiple countries also reach the same conclusion: for more than half of countries, the Zipf coefficient is statistically different from 1. The R squared remains strikingly close to 100% though. Zipf law should simply be about that fact, not the strong constraint on the slope.

So what does a deviation from Zipf’s law mean? Zipf’s law is the consequence of Gibrat’s law: all cities grow on average with the same proportionality coefficient, regardless of their size. If Gibrat’s law is not satisfied, then Zipf’s law typically won’t be. We typically find that smaller cities have higher proportional growth variance, suggesting that they are subject to more shocks — a lack of sectoral diversification in smaller metro areas makes them both more susceptible to booms and to busts.