Backtesting

Full text

A common practice in evaluating backtests of trading strategies is to discount the reported Sharpe ratios by 50%.¹ There are good economic and statistical reasons for reducing the Sharpe ratios. The discount is a result of data mining. This mining may manifest itself in academic researchers searching for asset-pricing factors that explain the behavior of equity returns, or by researchers at firms that specialize in quantitative equity strategies trying to develop profitable systematic strategies.

The 50% haircut is only a rule of thumb. Our article's goal is to develop an analytical way to determine the haircut's magnitude.

Our framework relies on the statistical concept of multiple testing. Suppose you have some new data, Y, and you propose that variable X explains Y. Your statistical analysis finds a significant relation between Y and X with at -ratio of 2.0, which has a probability value of 0.05. We refer to this as a single test. Now consider the same researcher trying to explain Y with variablesX₁ ,X₂ , ...,X₁₀₀ . In this case, you cannot use the same criteria for significance. You expect that, by chance, some of these variables will producet -ratios of 2.0 or higher. What is an appropriate cut off for statistical significance?

In this article, we present three approaches to multiple testing and the question in the previous example. Thet -ratio is generally higher as the number of tests (or X variables) increases.

In summary, any given strategy produces a Sharpe ratio. We transform the Sharpe ratio into at -ratio. Suppose thatt -ratio is 3.0. Although at -ratio of 3.0 is highly significant in a single test, it may not be if we take multiple tests into account. We proceed to calculate ap -value that appropriately reflects multiple testing.

To do this, we must make an assumption on the number of previous tests. For example, Harvey, Liu, and Zhu [2015] (HLZ) document that at least 316 factors have been tested in the quest to explain the cross-sectional patterns in equity returns. Suppose the adjustedp -value is 0.05. We then calculate an adjustedt -ratio; in this case, it is 2.0. With this newt -ratio, we determine...

Show less

Content area

Full text

Suggested sources