I have written before about survivorship bias. Frank over at Engineering Returns has developed a survivor-free S&P500 database and further demonstrates the impact of survivorship on a simple RSI2 trading system.
Plainly speaking, anyone backtesting and not using de-listed data is going to have results that are not accurate.
While I have not tested this idea sufficiently enough for it to be more than a theory, I theorize that a short-term system which holds stocks for a few days to a week or so may indeed show improvement when using a survivor-free database. (However, Frank’s recent testing shows my theory may be incorrect.) Conversely, systems that are of the trend-following variety (and usually hold stocks for longer periods of time) seem to suffer worse when de-listed data is used. For a good example of this, see my started, but not-yet-finished-series on building a momentum rotational system. I was shocked at how much performance was degraded when adding de-listed data, to the extent that it stopped the series in its tracks.
While it should be obvious how survivorship bias will affect trading systems that trade a portfolio of stocks, what may not be as obvious is that it will also affect indicators. Specifically, an indicator like a breadth indicator, which uses the data from hundreds or thousands of stocks, is going to be affected by survivorship bias. If such an indicator is applied to a trading system which was developed with de-listed data, the impact of survivorship bias is compounded.
Inevitably, this type of post leads people to ask where I get my data. I use Norgate’s Premium Data, which offers a de-listed data base for a very low one-time fee.Comments »