The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations

The rule to withdraw 4% of assets during retirement is considered “safe” because the Trinity Study has declared it so. The term “Trinity Study” has become something of a dogma, almost scripture, for the early retirement community. The 25 times annual consumption rule and the equivalent 4% withdrawal rate rule of thumb are referenced pretty much everywhere in the community. One almost gets the impression that what the Holy Trinity is to Christianity (you know; The Father, The Son and the Holy Spirit), the Trinity Study is to the Early Retirement community.

To be sure, many already retired folks probably have lower withdrawal rates than 4%, especially the now very successful bloggers with extra income on the side, but a lot of aspiring retirement planners in the process of saving for retirement might cut it a little tight if they retire the minute their net worth passed 25 times consumption. We have passed that 25 times consumption threshold a while ago but we still keep saving. We like to get somewhere closer to 35 to 40 times annual consumption.

Part 3 of our series deals with another fallacy of the safe withdrawal rate: the illusion of 100% or close to 100% certainty of backward looking simulations. Parts 1 and 2 already dealt with potentially lower expected returns going forward (equities and bonds). Here we want to raise another major issue with the Trinity Study or any other backward-looking simulations, like cFIREsim:

The small sample size

How can there be a small sample size problem when we have so many different retirement cohorts? Say, we use the cFIREsim site with 50 year retirement cohorts from 1871 through 1966, i.e., the last year that would generate 50 calendar years of returns from 1966 to 2015. That’s 96 cohorts, how can that be a small sample?

96 cohorts would indeed be a large enough number to draw statistically significant and relevant conclusions if those 96 cohorts were independent of each other. Too bad, those 96 cohorts are not independent observations. The cohorts starting in 1964 and 1965 have a very similar payoff profile because 49 of the the 50 simulation periods overlap. Strictly speaking, there are just under 3 truly independent samples of 50 year return streams: 1871-1920, 1921-1970, and we are charitable with the last one, 1966-2015, by ignoring the little bit of overlap in the first 5 years. So, if someone tells us their strategy has worked 96 out of 96 times, we would be impressed. But if someone tells us their strategy works 3 out of 3 times, we would be suspicious about that 100% certainty claim.

Same goes for the 95-100% success probabilities they cooked up in the Trinity study. Because we look at shorter retirement horizons (only 30 years) but at the same time work with fewer years (1926 onward), we get the same number of three independent windows; 1926-1955, 1956-1985 and then whatever the final window of simulations might be, probably 1985-2014 or 1986-2015, potentially with a little bit of overlap.

A 100% success rate out of 3 observations does not induce a lot of confidence!

Well, maybe using completely non-overlapping windows is a bit too strict. After all, we know that the first 10-15 years determine the success vs. failure of the withdrawal strategy. When we cut down the number of effective observations by a factor of ten, to 6 observations in the Trinity Study and 10 observations in the cFIREsim, even a 100% success rate is not very confidence-inspiring. For example, if the true success rate had been 75%, with six independent tries you would achieve a 100% success rate with a probability of 0.75^6=0.178. So you have a more than one in six chance of dumb luck generating a perfect record despite a less than perfect probability. In statistics (and oftentimes in life) I like to know what’s the 95% confidence interval, that is, the range of plausible parameter values for this illusive success probability. That confidence interval for the true underlying success probability after observing a 100% success rate is:

cFIRESim (10 independent observations): 74.1% to 100%
Trinity Study (6 independent observations): 60.7% to 100%
Strict non-overlapping windows (3 independent observations of 50y in cFIREsim, 30y in Trinity Study): 36.8% to 100%

Suddenly the high success probabilities in the Trinity Study don’t look so certain any more. Of course, the researchers behind the study know this (at least we hope !). But the journalists who lack even rudimentary finance and statistics training (or any training at all, for that matter) spread the “good news” and eventually the results that are significantly less than 100% certain are further parroted unfiltered in the blogosphere to unsuspecting readers. If you don’t do your homework and look under the hood of some of the personal finance wisdom, you will be disappointed! So, always do your homework, never trust anything at face value you find on the web. People might have an inspiring life-story and money-saving and tax-saving tips posted on their blog, but that doesn’t make them financial experts. Even worse, some of the calculations from the so-called actual financial experts are wrong and/or misleading, see our expose of some of the false and exaggerated claims from Michael Kitces.

And the sample size is even smaller

Another reason why we should be at least suspicious about simulations with past returns: the future will look very different from the past. Before 1913 there wasn’t even a central bank in this country. During the Great Depression there was a central bank, but it didn’t really know how to help the economy (or didn’t care). The post WWII recovery, a central bank that let inflation escalate in the 1970s, a central bank under Paul Volcker that sank the economy to cure inflation, the tech revolution, the housing boom and bust, might all be not that representative for what’s ahead.

What is ahead? Some would argue that we have now entered a never-before-seen era of low growth, just google “secular stagnation.” While we agree that this is a bit exaggerated, again, to make headlines and catch the readers’ attention, it’s not that far-fetched. But lower growth than in the past is a possibility with a debt overhang, Social Security and Medicare causing more debt in the future, and all the slowdown abroad as well. If GDP doesn’t grow at the pace that prevailed in your cFIREsim and Trinity Study, why do we believe the stock market should?

We still use the historical data. It’s the best we can do!

Of course, we’re not saying that historical simulations are useless. Quite the contrary. We use them extensively, both via cFIREsim and also in our own calculations. We learned a lot from those simulations. We only say that there is uncertainty surrounding every estimate from those simulations!

Stay tuned for future parts of this series!

Intro: Pros and cons of different withdrawal rate rules

Part 1: Equity expected returns

Part 2: Bond expected returns

Part 3: The small-sample problem in historical simulations

Part 4: More bad news on equity expected returns

21 thoughts on “The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations”

Mrs Smelling Freedom says:

May 14, 2016 at 12:36 pm

In Europe things do not look great either. We keep lurching from financial crisis to financial crisis. Greece is in trouble again and Brexit looms. I do not envisage much growth on the horizon.

Loading...

Reply
1. earlyretirementnow.com says:
  
  May 14, 2016 at 10:17 pm
  
  Yes, Europe is in trouble in many ways. Great quality of life, though. But negative interest rates and going from one crisis to another.
  
  Loading...
  
  Reply
Rob @ MoneyNomad says:

May 14, 2016 at 10:49 pm

I like your disagreement with the commonly accepted process. And I agree with you that there really have only been 3 periods, that does make the science less sound. However, my other proposal is that if the markets go completely sour in the next 50 years, it won’t matter where you have your money (as it will be worthless). In other words, I believe there certainly could be big crashes, but if there are, everyone’s retirement could be in trouble, regardless of where it is or how much it is.

I look forward to seeing the next post in this series!

Loading...

Reply
1. earlyretirementnow.com says:
  
  May 14, 2016 at 11:42 pm
  
  Hi, and thanks for stopping by. Good point! Nobody can tell what’s the nature of low returns and more uncertainty going forward. We certainly don’t want to get into the business of pinpointing when the next crash occurs.
  The good news is that we don’t think the market will go sour for the entire 50 years. But a decade of sideways moving markets is all it takes to get the 4% rule in trouble. One thing we learned is that it doesn’t necessarily take a crash right after retirement to eventually wipe out your savings. 1966 was one of the worst times to start retirement and the stock market mostly moved sideways. That planted the seed of failure. When the big recessions in 1973-75 and 1980, 1981-20 happened, the portfolio was too depleted to handle them.
  Cheers!
  
  Loading...
  
  Reply
Mr. PIE says:

May 17, 2016 at 4:52 am

Being new to the blogosphere I am utterly amazed how much even seasoned bloggers buy into the 4% myth. Good stuff highlighting the warning signs. I only hope the FIRE community starts to sit up and take notice because big , big trouble lies ahead if they don’t. Otherwise it’s back to the grindstone for many. For some, that will be just fine. But others not so much.
Enjoy reading your work.

Loading...

Reply
1. earlyretirementnow.com says:
  
  May 17, 2016 at 9:47 am
  
  Hi Mr. PIE. Thanks for stopping by again and thanks for the feedback! Of course, we hope that we’re wrong and everything turns out well with the 4% rule. But it’s best to prepare for a 3% rule as well. 🙂
  
  Loading...
  
  Reply
Arrgo says:

May 18, 2016 at 7:31 am

I think its great that you questioned this. Just blindly following the 4% rule and spending that much thinking all will be well could leave you in a bad financial situation down the road. I use it as a guideline and am leaning towards a 3% SWR plus maybe doing some part time work that I enjoy as a buffer. And if the market was to take a dive, its good to have an understanding of your expenses and spending so you know where you can cut back if you really needed to. One thing I’ve been working on is to challenge all my expenses/ bills and cut out any wastefulness before the actual need arises. One saying I have is: Things are good, and we need to keep it that way”

Loading...

Reply
1. earlyretirementnow.com says:
  
  May 18, 2016 at 9:46 pm
  
  Lower withdrawal rate, flexibility in terms of both expenses and future income, so you should be in great shape for early retirement.
  Thanks for stopping by again!
  
  Loading...
  
  Reply
Link says:

July 13, 2016 at 1:12 pm

Really enjoying this series. Looking forward to part 4.

Loading...

Reply
1. earlyretirementnow.com says:
  
  July 13, 2016 at 1:34 pm
  
  Thanks for stopping by and thank you so much for that compliment. We have some calculation for the post already, but found so many other topics to write about that part 4 got pushed back further and further. One of these days I promise I will wrap it up and publish part 4. Stay tuned! 🙂
  
  Loading...
  
  Reply
  1. JSpring says:
    
    January 17, 2021 at 4:52 am
    
    Did you ever get around to wrapping up “Part 4: More bad news on equity expected returns” ?
    
    Loading...
    
    Reply
    1. earlyretirementnow.com says:
      
      January 17, 2021 at 9:30 am
      
      Well, I started writing the new and improved series: https://earlyretirementnow.com/safe-withdrawal-rate-series/
      I guess that’s enough bad news for now. 🙂
      
      Loading...
      
      Reply
Pingback: The 4% Rule is not as good as we hoped – Part 1: Equity expected returns – Early Retirement Now
Pingback: Pros and cons of different withdrawal rate rules – Early Retirement Now
Pingback: Safe Withdrawal Rate SWR
Pingback: Why the 4% rule doesn’t work ⋆ THE Passive Income Blogger
Zoroaster says:

August 9, 2024 at 8:59 pm

Thanks for this post, and for all of your work. Could I ask, can we use a Monte Carlo simulation to avoid this problem (of the small sample size arising from few non-overlapping 30yr periods)?

I understand there are disadvantages to Monte Carlo, and some of them do insert single years or multiple years of historical sequences, but between the Monte Carlo options that use sequences of historical data and the options that don’t, we should have enough of a way to check the (purely deterministic) historical simulations (with the small sample size problem), right? (Hopefully?)

Thank you again!

Loading...

Reply
1. earlyretirementnow.com says:
  
  August 11, 2024 at 12:50 am
  
  The overlapping issue is vastly overhyped. For SWR analysis, cohorts even 5Y apart are sufficiently different from a statistical perspective because the large swings in the SWRs comes from a bear market right out of the gates (=Sequence Risk). So, with 150 years of data we got 30 mostly independent observations. I prefer 30 episodes that are historically relevant over 100,000 MC draws that are unable to replicate the important features of return data, like asset valuation mean reversion, varying S/B correlations, etc. Monte Carlo is often garbage in garbage out. 100,000x that garbage is still garbage.
  
  Loading...
  
  Reply
Zoroaster says:

August 16, 2024 at 4:36 pm

Thank you so much for your reply.

There’s a article at kitces.com that says, as conservative as Monte Carlo is compared to purely deterministic calculators like Cfiresim, even the “Traditional CMA Capital Market Assumption” Monte Carlo (which I understand is the most common type of MC) is *way* too liberal (opposite of conservative) on SWR and is overstating success rate *very* signficantly.

https://www.kitces.com/blog/monte-carlo-models-simulation-forecast-error-brier-score-retirement-planning/

I ran a simulation in MoneyGuidePro’s Monte Carlo (Projected Mode I believe) and got 97% success for SWR 2.892% for 40yrs. The Kitces.com article says that when a “Traditional CMA” MC (like MoneyGuidePro?) reports a 97% success rate, it’s actually only 80% success rate, and if the “Traditional CMA” MC reports a 98.5% success rate, it’s actually only 90% success rate, and if the “Traditional CMA” MC reports a 99% success rate, it’s actually only 95% success rate. Getting 97% in a MC and having it be only 80% is quite bad. You can see this in their graph, which I uploaded here (I added grid lines so you can see it better). To get to 99% in MoneyGuidePro (and thus 95% in the Kitces article), for 40yrs, SWR needs to be 2.738%.

https://postimg.cc/1n1HhfZq

What do you make of this article? What do you think is going on here: how is it getting the data, why is it saying Traditional CMA MC is overstating the SWR success rate so significantly?

Thank you so much again for everything.

Loading...

Reply
1. earlyretirementnow.com says:
  
  August 20, 2024 at 11:02 am
  
  I have no idea where how the numbers in that chart were created. Specifically, I know how the y values are computed, but have no idea how the x-values (Predicted probabilities of success) came about.
  
  Loading...
  
  Reply
  1. Zoroaster says:
    
    August 20, 2024 at 4:39 pm
    
    Yes, you’re right – they don’t seem to have shared the underlying data. Thanks for looking, though – I appreciate it.
    
    Loading...
    
    Reply

	INVESTOR28109 on An Updated Google Sheet DIY Wi…
	ksniper233 on Options Trading Series: Part 1…
	cheitzig on A Retirement Tax-Planning Case…
	Tonya on The Ultimate Guide to Safe Wit…
	Rcrez on Options Trading Series: Part 1…
	Markus on Using Leverage in Retirement –…
	Ben on Options Trading Series: Part 1…
	Jason on Options Trading Series: Part 1…

The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations

The small sample size

And the sample size is even smaller

We still use the historical data. It’s the best we can do!

Stay tuned for future parts of this series!

Like this:

Related

21 thoughts on “The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations”

Leave a ReplyCancel reply

The small sample size

And the sample size is even smaller

We still use the historical data. It’s the best we can do!

Stay tuned for future parts of this series!

Share this:

Like this:

Related

21 thoughts on “The 4% Rule is not as good as we hoped – Part 3: The small-sample problem in historical simulations”

Leave a ReplyCancel reply

Discover more from Early Retirement Now