Backtesting a Trading Strategy Derived from VIX Backwardation, Market Breadth and Market Spreads

Introduction

This project was conducted in collaboration with Tradewell and was generally guided by co-founder Robson Chow. Tradewell is building a backtesting platform for traders which enables users to backtest the performance of many metrics and signals across different asset classes. The ultimate goal was to create a combination of different signals in order to create a "Risk On/Risk Off" composite signal.

This metric would be of use in a sizing context and exposure context, where traders could, for example, overlay it on existing strategies in order to reduce risk and volatility on long periods of time. Effectively, a "Risk on" period can be thought of as a period of relatively steady growth and low volatility (2012 to 2015 in U.S Equity markets for example) and a "Risk off" period would be one where volatility and probability of shocks (to the upside or downside) is larger than normal.The approach we took to composing such a signal was heavily reliant on data that was made available to us by Tradewell.

For this particular exercise, we agreed on the usage of three main verticals. The first is shocks in the VIX Index which are known to be indicators of future volatility. The second is. The last vertical we chose to explore is the extremes in market spreads in a multi-facet setting (Bonds, Commodities, Sectors). The approach we took was bottom up, in that we started examining each signal individually and then created a composite signal which targets more robustness through orthogonality and additivity of the sub-signals. Once we built our individual and composite signals, we backtested them using even-driven backtest systems, the Tradewell platform and several empirical statistics method in order to confirm we had indeed calibrated a robust model for signaling periods of high relative risk.

We further calibrated parameters for both our individual signals and composite metric by tuning rolling periods, thresholds and levels in order to get effective triggers which we were comfortable reflected Risk On and Risk Off regimes. It is also good to note that throughout our research and implementation, there was emphasis on transparency and explainability of methods. This is why we prioritized signals created by dynamic thresholds and compositions of signals as opposed to more advanced types of Machine Learning or Statistical methods.

Finally, we concluded that the combination ad benefit stemming from the orthogonality of composite signals offers an interesting, robust and explainable approach to risk regime classification. Improvements can be made to the method by parameter optimization in the weightings and periodicity of indicators but needs to be done in a parsimonious fashion as to avoid over-fitting. We also think that the addition of metrics such as market liquidity or intra-sector correlations could help the robustness of the indicator. The next step is to test this signal on several adapted time series such as other indexes and market-reflecting ETFs.

Individual Signals

Market Breadth

The use of market breadth as an indicator of relatively volatile or risky market regimes is not new. Formally, market breadth can be defined as measure of directional divergence within a subgroup of traded assets.There is an argument for "healthy breadth", a confirmation of new market highs which indicates that a high ratio of index members are following the index in it’s high. At the time of writing, a counter example of this effect is occurring in NASDAQ members relative to the index, where very few members have carried the growth into the end of 2021.

It can also be noted that there is a counter-intuitive aspect to breadth, in that the empirical assumptions3usually imply that low breadth is a signal for higher future returns, simply because it is synonymous with a"market bottom", after which assets have only one way to go : up.

Other structural factors such as the dispersion of market capitalization and weighting within an index constitute arguments for using breadth. If a handful of stocks which have, through momentum, grown their market capitalization and significance in the index experience downturns, the index will be very much affected.A commonly used measure of breadth is the proportion of stocks in an index which are currently at their52 Week High relative to those currently at their 52 Week Low:

Where the first term represents the ratio of index or group members with positive momentum, at the highpoint of their 52 week realizations and the second one, the ratio of stocks at their 52 week low relative to the total number of stocks.It is also common to use breadth relative to shorter term moving averages such as:

Where we compare the ratio of index members trading above their 100 day moving average versus those trading below. These limits are classically computed as:

‍

Empirical Evidence

The main approach to testing the strength of this breadth signal was through relative windows, in a way answering the following question "How does short term relative levels in breadth affect subsequent returns and volatility?"

In order to test the assumptions made in the literature review, we can define the relative level of breath asa rolling z-score as such:

Where i is the observation iterator, n is the window size. We can then utilize this indicator and play with several windows in order to determine which is best to use. Once we calibrate the signal, we can analyze the underlying asset or index in order to examine the effects of shocks in breadth. The approach revolves around slicing the target time series relative to the level of the Z-Score

Z > 2 : Indicates abnormally high breadth relative to window size
Z < -2 : Indicates abnormally low breadth relative to window size
-2 < Z < 2 Indicates relatively standard range for short term breadth

The fist statistical test we implement is a Welch test, defined as:

‍

Where X¯ is the sample mean, s is the standard deviation and N is the sample size. We use a Welch test instead of a t-test in this context since we are dealing with uneven sample sizes and heteroskedasticity. We further examine the means and standard deviations of each population to establish the significance of a clustering by breadth level.

Running a 20 day window, using 52 W breadth on the SP500 gives out the following results:

SPX vs Shocks in Breadth (Z-score of 52W Highs-Lows)

‍

SPX Log Returns vs Shocks in Breadth (Z-Score of 52W Highs-Lows)

‍

SPX 30D Realized Volatility vs Shocks in Breadth (Z-Score of 52W Highs-Lows)

‍

Welch Test Results:

We can conclude from the above that given the t-test and p-values of our Welch test, that the level of relative market breadth has an impact on the distribution of Log Returns and Volatility in the index. We can further examine from our visual outputs that shocks in breadth succeed at characterizing periods of high market stress (Dot-Com Bubble, 2008 Crisis, Covid Downturn).

‍

Parameter Estimation

We can modify two main parameters in order to see the impact on our Welch test outputs : the periods of the Breadth (52 weeks vs 26 weeks) and the size of the rolling window. We run all the data and obtain the following results:

We gather from the above results that the initial approach of using 52 weeks as a breadth parameter with a 20 day rolling window gives out the best statistical result. As rolling windows increase in size, the means of returns under different breadth regimes seem to differ at less significant extents. In terms of volatility however, we are able to observe divergences in means for all indicators.

We can also observe that 26 weeks breadth using a 20 day window size also gives out significant test results.It turns out that the signal was too noisy relative to its 52 weeks variant as can be seen from the below plot.We therefore choose to use 52 weeks breadth along with a 20 day window in our aggregate signal.

‍

SPX Log Returns vs Shocks in Breadth (Z-Score of 26W Highs-Lows)

VIX

Literature Review

The 2007-2009 crisis has intensified the need for indicators of the risk aversion of market participants. It has also become increasingly commonplace to assume that changes in risk appetites are an important determinant of asset prices. Not surprisingly, the behavioral finance literature 5 has developed “sentiment indices,” and financial institutions have created a wide variety of “risk aversion indicators”. One simple candidate indicator is the VIX and there are many other other indicators based on the term structure of VIX. We will discuss each of them later.

VIX is the ticker symbol and the popular name for the Chicago Board Options Exchange’s CBOE Volatility Index, a popular measure of the stock market’s expectation of volatility based on SP 500 index options. It isa 30-day expectation of volatility given by a weighted portfolio of out-of-the-money European options on theSP 500. Formula for the same is below.

where τ is the number of average days in a month (30 days), r is the risk-free rate, F is the 30-day forward price on the SP 500, and P(K) and C(K) are prices for puts and calls with strike K and 30 days to maturity.

The VIX is actually a measure of trader’s expectations about volatility in the SP 500. The VIX is charted like an index and the higher it goes the higher trader’s expectations are for short term market volatility. TheVIX rises with higher market volatility because it measures the prices of the out of the money SP 500 index options. If option sellers think volatility is going to increase (in the near term) they will require larger premiums from option buyers. This increase in option prices is used in the calculation for the VIX index. Conversely, if traders think volatility is going to drop option sellers will have to reduce premiums to attract buyers. Falling option prices will be reflected in a falling VIX index.

If the VIX is falling, investors are trading bullish strategies and taking on more risk. If the VIX is rising, traders may be shorting the market and trying to limit risk within their portfolio. Because the VIX is typically range bound traders are particularly interested in periods when the index is hitting support or resistance levels.

How bearish or bullish traders feel based on the VIX index is important because it indicates what is going on with attitudes towards risk. Recently, increases in investor fear have been associated with falling stocks, rising bonds and a stronger dollar. The same is true in reverse as the VIX has been retreating from multi-year highs in early 2009.

The objective of this study is an empirical examination of risk on and risk off scenario. Specifically, we analyze the ability of the VIX, market breadth, spreads indicators to estimate the future risk.

‍

Empirical Evidence

There is evidence of an interrelation between implied market volatility and economic uncertainty. There are many VIX-related indicators which has been historically used by many traders to gauge marker risk and take positions based on that. There are ample amount of empirical evidence of predicting risk on and risk off situation using these indicators. In this report we discuss in detail about one of the indicators i.e VIX-VXV. Other indicators that we have used in the model is briefly discussed below.

VIX-VXV: This indicator calculates the Spread between the 1m VIX index and the 3m VIX index. This metric is a measurement of the contango or backwardation of the term structure of short dated volatility indices.The spread between the two tells us if expected near term volatility exceeds longer term expected volatility and vice versa. Hence a high ratio (>1.2) means a high current volatility, but expected lower volatility 3 months out, whereas a low ratio (<0.80) means traders and investors expect very little to worry about now but a lot more 3 months out. These extreme high and low readings often coincide with bottoms and tops, respectively(see Figures 1 and 2 below).

If the short-term fear gets high enough, we will see the value of the VIX become larger than the value of theVXV. Technically, you could see a similar relationship when the VIX futures curve goes into backwardation –where the near term VIX future is at a higher price than long term. But using the VIX/VXV will give you the ability to chart this a little quicker.

This indicator doesn’t flash very often. This is not meant to be a super short term timing indicator with a ton of entries and exits. It is a high odds signal, but when you’re wrong you are wrong big. That means this signal is not meant for you to go 100% long stocks, but it fits better in the context of option traders that like to sell premium and can aggressively manage risk. When you see this signal show up, it should perk your ears up and you should start looking for dip buys that you have been avoiding.

High VIX:VXV Ratio

‍

We also studied other indicators which are finally used in the model construction. Other indicators which we found useful are mentioned below:

VIX-HRV: The spread between the VIX Index and the historical realized volatility for the SP 500. The spread indicates the "volatility risk premium or discount" aka the cost of options in implied volatility compared to historical realized volatility of the underlying asset.

VIX-MAD: The spread between the VIX Index and the Mean Average Deviation for the SP 500. Similar to the VIX-HRV spread, this spread is used to indicate the "Volatility risk premium or discount" aka the cost of options in implied volatility compared to the Mean Average Deviation (MAD) of the underlying asset.

VIX-SRV: The spread between the VIX Index and the subsequent realized volatility for the SP 500. This spread indicates whether selling options was profitable relative to the subsequent realized volatility that occurred over the following 21 trading days.

VIX-VXM: Spread between the 3m VIX index and the 6m VIX index. This metric is a measurement of the contango or backwardation of the term structure of short dated compared to medium dated volatility indices.

VIX-VXST: Spread between the 1m VIX index and the 9d VIX index. This metric is a measurement of the contango or backwardation of the term structure of short dated volatility indices.

VIX-YangZhang: The spread between the VIX Index and the Yang-Zhang volatility estimator for the SP500. The spread indicates the "volatility risk premium or discount" aka the cost of options in implied volatility compared to historical realized volatility of the underlying asset. Yang-Zhang is used in order to capture more price data (ie. intraday data) than a typical close to close volatility estimator.

Crude Oil Volatility Index: The Cboe Crude Oil ETF Volatility IndexSM OVX is a VIX®-style estimate of the expected 30-day volatility of crude oil as priced by the United States Oil Fund, USO. Like VIX, OVX is calculated by interpolating between two weighted sums of option mid-quote values, in this case options on the OVX ETF.

Emerging Market ETF Volatility Index: The Cboe Emerging Markets Volatility IndexSM VXEEM isa VIX-style estimate of the expected 30-day volatility of returns on the MSCI EEM Index. Like VIX, VXEEM is calculated by interpolating between two weighted sums of option mid-quote values, in this case options on EEM.

Energy Sector ETF Volatility Index: The Cboe Sector ETF Volatility IndexSM VXXLE estimates the expected 30-day volatility of the price of the Energy Sector ETF XLE. Similar to VIX®, VXXLE is derived by applying the VIX algorithm to options on the XLE Energy Sector ETF.

Gold ETF Volatility Index: The Gold ETF Volatility IndexSM GVZ is a VIX®-style estimate of the expected 30-day volatility of returns on the SPDR Gold Shares ETF GLD . Like VIX,GVIX is calculated by interpolating between two weighted sums of option mid-quote values, in this case options on GLD.

Nasdaq 100 Volatility Index: The CBOE Nasdaq Volatility Index VXN is a measure of market expectations of 30-day volatility for the Nasdaq 100 index, as implied by the prices of options listed on this index.The VXN index is a widely watched gauge of market sentiment and volatility for the Nasdaq-100, which includes the top 100 U.S. and international non-financial securities by market capitalization listed on the Nasdaq.

Russell 2000 Volatility Index: The Cboe Russell 2000 Volatility IndexSM RVX is a VIX®-style estimate of the expected 30-day volatility of Russell 2000® Index returns. RVX is calculated by interpolating between two weighted sums of option mid-quote values, in this case options on the Russell 2000 Index RUT.

‍

Extremes in market spreads

Literature Review

There are four major traded markets where the return seeking investors invest - commodities, bonds, stocks and currencies. In most economic cycles and stress scenarios, these markets move in a specific order and correlations increase especially in uncertain/risk off times. Asset classes carry different risks and most often investors are more likely to invest in high risk instruments in expectation of higher returns however investors sell these instruments when the perceived risk is very high in the market, as was observed during 2008-09 Global Financial crisis.

In an expanding economy with positive outlook on corporate earnings, expansionary fiscal policies, low interest rate environment, perceived riskiness in the market is low and investors pile on risky assets. However, in the face of risk aversion shocks, investors rebalance their portfolios away from risky assets and towards safe assets. Previous studies (Hartmann, Straetmans, and de Vries, 2004; Baur and Lucey, 2010) provide empirical evidence of Flight-to-safety or Flight-to-quality phenomenon which suggests the flow of funds from risky investments like stocks to less risky or "safe-haven" assets like Gold, Bonds, US dollar.

Empirical Evidence

We analyze the relationship between these markets through spreads in 20 day rolling returns and their movement during different market conditions. Due to limitation of data, we restrict the analysis for spreads from2005 through 2022.

We observe the relationship of different market spreads during historical risk off scenarios like GFC, 2011-12 Euro area crisis and Covid crisis.

‍

Bonds

Bonds are in general considered to be less risky investments than stocks and hence in economic conditions with high uncertainty, funds start flowing from stocks to bonds and provide higher return on bonds compared to stocks. From below chart, it is evident that this relationship was observed during the Great Financial Crisis(GFC) of 2007-08, during euro area crisis and also during 2020 Covid pandemic.

Difference in Returns between iShare 20Y+ Treasury Bond ETF and SPDR S&P 500 ETF

‍

Increased risk aversion in the market causing flow of funds to bond market squeezes out the growth capital for the companies or makes it expensive and hence impacting the future economic growth. Such market conditions are usually followed by drop in Fed rates and expansionary monetary and fiscal policy to stimulate movement of funds back to stocks. Hence in the above chart, we could see significant spikes in Bond-Stock returns spread during GFC, Euro area crisis and during Covid, but those elevated returns in bonds soon comedown due to expansionary policy actions.

Commodities

We can make similar argument for relationship between commodities especially Gold and stock market. Historically, Gold has been considered as safe haven and we would expect the funds to flow from risky markets to safe haven like Gold. Below chart shows the spread in 20day rolling return of Gold ETF and SP 500 ETF.We could clearly see the abnormally high returns in Gold compared to SP 500 during the periods of uncertainty.

Difference in Returns between SPDR Gold ETF and SPDR S&P 500 ETF

‍

We could see that Gold returns are higher than SP 500 during Euro area crisis and Covid crisis, but it is not very clear for GFC. However when we see the relationship using spreads in 60day rolling returns, it is clear that Gold returns are indeed higher than SP 500 returns in all previous crisis/risk-off scenarios.

60D Rolling Average: Difference in Returns between SPDR Gold ETF and SPDR S&P 500 ETF

Stocks

From the above analysis of Bond-Stock and Gold-Stock spreads, it is clear that stocks under-perform during risk-off market scenarios. However certain industries and companies within the stock market tend to perform better than others during crisis. Economic theory suggests that consumers tend to hold on discretionary purchases like white goods, automobiles, entertainment, etc while continue to buy consumer staples like food, cigarettes, laundry products, etc. In-fact consumers may spend more on staples as discretionary consumption has reduced. We analyze the relationship between returns of Consumer Staples ETF and ConsumerDiscretionary ETF to investigate if above discussed theory holds.

SPDR Consumer Staples ETF Returns - SPDR Discretionary ETF Returns

‍

We could see in the above chart that consumer staples ETF had higher 20 day rolling return than consumer discretionary ETF during GFC, Euro area crisis and Covid crisis. We also see high returns for Consumer staplesETF during 2018 when Fed raised interest rates to prevent tight labor market conditions to cause excessive inflation.

Currencies

Currency market also tend to show a particular behavior in uncertain times like GFC and Covid crisis. The US dollar is considered to be a safe haven currency due to multiple reasons - strength of US economy vis a vis other economies, international forex and commodity transactions being largely dollar denominated. Hence flight to safety suggest that funds should flow to safe haven currencies like US dollar during heightened economic uncertainty. We analyzed the relationship between returns of SP 500 ETF and USDX index which tracks the US dollar against basket of 6 currencies.

US Dollar against a Basket of currencies USDX - S&P 500 ETF

‍

It is evident from above graph that US Dollar appreciates in uncertain times indicating flow of funds toUS. This is also evident from spread in 20day rolling return of SP 500 ETF and Emerging markets Stock Index returns. We don’t see significant spread in SP 500 returns vs Emerging markets returns during the Covid crisis as Covid impacted both developed and emerging markets equally and funds moved from stocks to bonds.

‍

Difference is Returns between SPDR S&P 500 ETF and Emerging Markets Stock Index ETF

Signal Composition

Aggregation

In order to combine and backtest all our signals, we decide to merge them into a common data frame and compute the appropriate measures of dynamic thresholding we established in individual research sections. In order to create a homogenous signal, we establish a three-step discretization process. We have a first layer which applies 20 quantiles to a 252-day rolling window for each of the above signals. We then compute cross sectional averages across the signals every day and discretize this composite into cross section signal quantiles(1,0,-1) which is our "Risk-on, Risk-off" indicator.

Weightings

We tried different bootstraps for the weightings but ended up overfitting the signal, causing our approach to gravitate towards equal weighting for all our signals.

Results

Backtesting

A Risk-on/Risk-off identification model was built based on methodology described above. To measure the performance of our model, we built a dynamic trading strategy to take a long position in SPY in risk-on scenario and short position in SPY as per model output. We backtest this strategy on historical market data dated back to 2006 due to data availability for certain indicators.

Below graph shows the backtesting results of the strategy against SPY. It is evident from below, that the strategy based on risk-on/risk-off model defined based on breadth, spread and volatility indicators generates higher returns than long only position in SPY. The strategy underperforms during post Covid rally but the gap has narrowed down in 2022.

The above table shows the final composite performance of the strategy from 2006 till March 2022.We can also visualize the strategy relative to the SP index:

SPX vs Composite Signal

‍

Backtesting a Trading Strategy Derived from VIX Backwardation, Market Breadth and Market Spreads

Introduction

Individual Signals

Market Breadth

Empirical Evidence

‍

Parameter Estimation

VIX

Literature Review

Empirical Evidence

Extremes in market spreads

Literature Review

Empirical Evidence

Bonds

Commodities

Stocks

Currencies

Signal Composition

Aggregation

Weightings

Results

Backtesting

Get started with Tradewell for free

Platform

Additional

Company

Social