The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations

The rule to withdraw 4% of assets during retirement is considered “safe” because the Trinity Study has declared it so. The term “Trinity Study” has become something of a dogma, almost scripture, for the early retirement community. The 25 times annual consumption rule and the equivalent 4% withdrawal rate rule of thumb are referenced pretty much everywhere in the community. One almost gets the impression that what the Holy Trinity is to Christianity (you know; The Father, The Son and the Holy Spirit), the Trinity Study is to the Early Retirement community.

To be sure, many already retired folks probably have lower withdrawal rates than 4%, especially the now very successful bloggers with extra income on the side, but a lot of aspiring retirement planners in the process of saving for retirement might cut it a little tight if they retire the minute their net worth passed 25 times consumption. We have passed that 25 times consumption threshold a while ago but we still keep saving. We like to get somewhere closer to 35 to 40 times annual consumption.

Part 3 of our series deals with another fallacy of the safe withdrawal rate: the illusion of 100% or close to 100% certainty of backward looking simulations. Parts 1 and 2 already dealt with potentially lower expected returns going forward (equities and bonds). Here we want to raise another major issue with the Trinity Study or any other backward-looking simulations, like cFIREsim:

The small sample size

How can there be a small sample size problem when we have so many different retirement cohorts? Say, we use the cFIREsim site with 50 year retirement cohorts from 1871 through 1966, i.e., the last year that would generate 50 calendar years of returns from 1966 to 2015. That’s 96 cohorts, how can that be a small sample?

96 cohorts would indeed be a large enough number to draw statistically significant and relevant conclusions if those 96 cohorts were independent of each other. Too bad, those 96 cohorts are not independent observations. The cohorts starting in 1964 and 1965 have a very similar payoff profile because 49 of the the 50 simulation periods overlap. Strictly speaking, there are just under 3 truly independent samples of 50 year return streams: 1871-1920, 1921-1970, and we are charitable with the last one, 1966-2015, by ignoring the little bit of overlap in the first 5 years. So, if someone tells us their strategy has worked 96 out of 96 times, we would be impressed. But if someone tells us their strategy works 3 out of 3 times, we would be suspicious about that 100% certainty claim.

Same goes for the 95-100% success probabilities they cooked up in the Trinity study. Because we look at shorter retirement horizons (only 30 years) but at the same time work with fewer years (1926 onward), we get the same number of three independent windows; 1926-1955, 1956-1985 and then whatever the final window of simulations might be, probably 1985-2014 or 1986-2015, potentially with a little bit of overlap.

A 100% success rate out of 3 observations does not induce a lot of confidence!

Well, maybe using completely non-overlapping windows is a bit too strict. After all, we know that the first 10-15 years determine the success vs. failure of the withdrawal strategy. When we cut down the number of effective observations by a factor of ten, to 6 observations in the Trinity Study and 10 observations in the cFIREsim, even a 100% success rate is not very confidence-inspiring. For example, if the true success rate had been 75%, with six independent tries you would achieve a 100% success rate with a probability of 0.75^6=0.178. So you have a more than one in six chance of dumb luck generating a perfect record despite a less than perfect probability. In statistics (and oftentimes in life) I like to know what’s the 95% confidence interval, that is, the range of plausible parameter values for this illusive success probability. That confidence interval for the true underlying success probability after observing a 100% success rate is:

  • cFIRESim (10 independent observations): 74.1% to 100%
  • Trinity Study (6 independent observations): 60.7% to 100%
  • Strict non-overlapping windows (3 independent observations of 50y in cFIREsim, 30y in Trinity Study): 36.8% to 100%

Suddenly the high success probabilities in the Trinity Study don’t look so certain any more. Of course, the researchers behind the study know this (at least we hope !). But the journalists who lack even rudimentary finance and statistics training (or any training at all, for that matter) spread the “good news” and eventually the results that are significantly less than 100% certain are further parroted unfiltered in the blogosphere to unsuspecting readers. If you don’t do your homework and look under the hood of some of the personal finance wisdom, you will be disappointed! So, always do your homework, never trust anything at face value you find on the web. People might have an inspiring life-story and money-saving and tax-saving tips posted on their blog, but that doesn’t make them financial experts. Even worse, some of the calculations from the so-called actual financial experts are wrong and/or misleading, see our expose of some of the false and exaggerated claims from Michael Kitces.

And the sample size is even smaller

Another reason why we should be at least suspicious about simulations with past returns: the future will look very different from the past. Before 1913 there wasn’t even a central bank in this country. During the Great Depression there was a central bank, but it didn’t really know how to help the economy (or didn’t care). The post WWII recovery, a central bank that let inflation escalate in the 1970s, a central bank under Paul Volcker that sank the economy to cure inflation, the tech revolution, the housing boom and bust, might all be not that representative for what’s ahead.

What is ahead? Some would argue that we have now entered a never-before-seen era of low growth, just google “secular stagnation.” While we agree that this is a bit exaggerated, again, to make headlines and catch the readers’ attention, it’s not that far-fetched. But lower growth than in the past is a possibility with a debt overhang, Social Security and Medicare causing more debt in the future, and all the slowdown abroad as well. If GDP doesn’t grow at the pace that prevailed in your cFIREsim and Trinity Study, why do we believe the stock market should?

We still use the historical data. It’s the best we can do!

Of course, we’re not saying that historical simulations are useless. Quite the contrary. We use them extensively, both via cFIREsim and also in our own calculations. We learned a lot from those simulations. We only say that there is uncertainty surrounding every estimate from those simulations!

Stay tuned for future parts of this series!

Intro: Pros and cons of different withdrawal rate rules

Part 1: Equity expected returns

Part 2: Bond expected returns

Part 3: The small-sample problem in historical simulations

Part 4: More bad news on equity expected returns

 

16 thoughts on “The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations

  1. In Europe things do not look great either. We keep lurching from financial crisis to financial crisis. Greece is in trouble again and Brexit looms. I do not envisage much growth on the horizon.

  2. I like your disagreement with the commonly accepted process. And I agree with you that there really have only been 3 periods, that does make the science less sound. However, my other proposal is that if the markets go completely sour in the next 50 years, it won’t matter where you have your money (as it will be worthless). In other words, I believe there certainly could be big crashes, but if there are, everyone’s retirement could be in trouble, regardless of where it is or how much it is.

    I look forward to seeing the next post in this series!

    1. Hi, and thanks for stopping by. Good point! Nobody can tell what’s the nature of low returns and more uncertainty going forward. We certainly don’t want to get into the business of pinpointing when the next crash occurs.
      The good news is that we don’t think the market will go sour for the entire 50 years. But a decade of sideways moving markets is all it takes to get the 4% rule in trouble. One thing we learned is that it doesn’t necessarily take a crash right after retirement to eventually wipe out your savings. 1966 was one of the worst times to start retirement and the stock market mostly moved sideways. That planted the seed of failure. When the big recessions in 1973-75 and 1980, 1981-20 happened, the portfolio was too depleted to handle them.
      Cheers!

  3. Being new to the blogosphere I am utterly amazed how much even seasoned bloggers buy into the 4% myth. Good stuff highlighting the warning signs. I only hope the FIRE community starts to sit up and take notice because big , big trouble lies ahead if they don’t. Otherwise it’s back to the grindstone for many. For some, that will be just fine. But others not so much.
    Enjoy reading your work.

    1. Hi Mr. PIE. Thanks for stopping by again and thanks for the feedback! Of course, we hope that we’re wrong and everything turns out well with the 4% rule. But it’s best to prepare for a 3% rule as well. 🙂

  4. I think its great that you questioned this. Just blindly following the 4% rule and spending that much thinking all will be well could leave you in a bad financial situation down the road. I use it as a guideline and am leaning towards a 3% SWR plus maybe doing some part time work that I enjoy as a buffer. And if the market was to take a dive, its good to have an understanding of your expenses and spending so you know where you can cut back if you really needed to. One thing I’ve been working on is to challenge all my expenses/ bills and cut out any wastefulness before the actual need arises. One saying I have is: Things are good, and we need to keep it that way”

    1. Thanks for stopping by and thank you so much for that compliment. We have some calculation for the post already, but found so many other topics to write about that part 4 got pushed back further and further. One of these days I promise I will wrap it up and publish part 4. Stay tuned! 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.