Site icon Early Retirement Now

The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations

The rule to withdraw 4% of assets during retirement is considered “safe” because the Trinity Study has declared it so. The term “Trinity Study” has become something of a dogma, almost scripture, for the early retirement community. The 25 times annual consumption rule and the equivalent 4% withdrawal rate rule of thumb are referenced pretty much everywhere in the community. One almost gets the impression that what the Holy Trinity is to Christianity (you know; The Father, The Son and the Holy Spirit), the Trinity Study is to the Early Retirement community.

To be sure, many already retired folks probably have lower withdrawal rates than 4%, especially the now very successful bloggers with extra income on the side, but a lot of aspiring retirement planners in the process of saving for retirement might cut it a little tight if they retire the minute their net worth passed 25 times consumption. We have passed that 25 times consumption threshold a while ago but we still keep saving. We like to get somewhere closer to 35 to 40 times annual consumption.

Part 3 of our series deals with another fallacy of the safe withdrawal rate: the illusion of 100% or close to 100% certainty of backward looking simulations. Parts 1 and 2 already dealt with potentially lower expected returns going forward (equities and bonds). Here we want to raise another major issue with the Trinity Study or any other backward-looking simulations, like cFIREsim:

The small sample size

How can there be a small sample size problem when we have so many different retirement cohorts? Say, we use the cFIREsim site with 50 year retirement cohorts from 1871 through 1966, i.e., the last year that would generate 50 calendar years of returns from 1966 to 2015. That’s 96 cohorts, how can that be a small sample?

96 cohorts would indeed be a large enough number to draw statistically significant and relevant conclusions if those 96 cohorts were independent of each other. Too bad, those 96 cohorts are not independent observations. The cohorts starting in 1964 and 1965 have a very similar payoff profile because 49 of the the 50 simulation periods overlap. Strictly speaking, there are just under 3 truly independent samples of 50 year return streams: 1871-1920, 1921-1970, and we are charitable with the last one, 1966-2015, by ignoring the little bit of overlap in the first 5 years. So, if someone tells us their strategy has worked 96 out of 96 times, we would be impressed. But if someone tells us their strategy works 3 out of 3 times, we would be suspicious about that 100% certainty claim.

Same goes for the 95-100% success probabilities they cooked up in the Trinity study. Because we look at shorter retirement horizons (only 30 years) but at the same time work with fewer years (1926 onward), we get the same number of three independent windows; 1926-1955, 1956-1985 and then whatever the final window of simulations might be, probably 1985-2014 or 1986-2015, potentially with a little bit of overlap.

A 100% success rate out of 3 observations does not induce a lot of confidence!

Well, maybe using completely non-overlapping windows is a bit too strict. After all, we know that the first 10-15 years determine the success vs. failure of the withdrawal strategy. When we cut down the number of effective observations by a factor of ten, to 6 observations in the Trinity Study and 10 observations in the cFIREsim, even a 100% success rate is not very confidence-inspiring. For example, if the true success rate had been 75%, with six independent tries you would achieve a 100% success rate with a probability of 0.75^6=0.178. So you have a more than one in six chance of dumb luck generating a perfect record despite a less than perfect probability. In statistics (and oftentimes in life) I like to know what’s the 95% confidence interval, that is, the range of plausible parameter values for this illusive success probability. That confidence interval for the true underlying success probability after observing a 100% success rate is:

Suddenly the high success probabilities in the Trinity Study don’t look so certain any more. Of course, the researchers behind the study know this (at least we hope !). But the journalists who lack even rudimentary finance and statistics training (or any training at all, for that matter) spread the “good news” and eventually the results that are significantly less than 100% certain are further parroted unfiltered in the blogosphere to unsuspecting readers. If you don’t do your homework and look under the hood of some of the personal finance wisdom, you will be disappointed! So, always do your homework, never trust anything at face value you find on the web. People might have an inspiring life-story and money-saving and tax-saving tips posted on their blog, but that doesn’t make them financial experts. Even worse, some of the calculations from the so-called actual financial experts are wrong and/or misleading, see our expose of some of the false and exaggerated claims from Michael Kitces.

And the sample size is even smaller

Another reason why we should be at least suspicious about simulations with past returns: the future will look very different from the past. Before 1913 there wasn’t even a central bank in this country. During the Great Depression there was a central bank, but it didn’t really know how to help the economy (or didn’t care). The post WWII recovery, a central bank that let inflation escalate in the 1970s, a central bank under Paul Volcker that sank the economy to cure inflation, the tech revolution, the housing boom and bust, might all be not that representative for what’s ahead.

What is ahead? Some would argue that we have now entered a never-before-seen era of low growth, just google “secular stagnation.” While we agree that this is a bit exaggerated, again, to make headlines and catch the readers’ attention, it’s not that far-fetched. But lower growth than in the past is a possibility with a debt overhang, Social Security and Medicare causing more debt in the future, and all the slowdown abroad as well. If GDP doesn’t grow at the pace that prevailed in your cFIREsim and Trinity Study, why do we believe the stock market should?

We still use the historical data. It’s the best we can do!

Of course, we’re not saying that historical simulations are useless. Quite the contrary. We use them extensively, both via cFIREsim and also in our own calculations. We learned a lot from those simulations. We only say that there is uncertainty surrounding every estimate from those simulations!

Stay tuned for future parts of this series!

Intro: Pros and cons of different withdrawal rate rules

Part 1: Equity expected returns

Part 2: Bond expected returns

Part 3: The small-sample problem in historical simulations

Part 4: More bad news on equity expected returns

 

Exit mobile version