top of page

Pattern Recognition in Sectorial ETFs: Correlation to the Future

Continuing with our investigation on the possibility to capture profits from economic sector rotation through investments in sectorial ETFs, we will deepen our analysis and check for the capacity of these ETFs to predict the price or returns of one another, the market in general and check if in a relatively short period cycle (7 to 10 years) investment interest in one sector sparks a future reaction in the others.


We are extending our previous analysis to contain the future price and returns of each of the selected ETFs and market indexes for various time frames:

futures = [3,5,7,15,22,66,132]
for column in price_history.columns:
    for future in futures:
        price_history['Price_'+str(future)+"_"+column] = price_history[column].shift(-future)
        price_history['Returns_'+str(future)+"_"+column] = price_history[column].pct_change(-future)
        price_history['Direction_'+str(future)+"_"+column] = price_history[column].pct_change(-future) > 0

The resulting dataframe contains a set of future states for each historical current price. With these values we can run a correlation analysis using prices and future prices as values. These values are highly variable in time and across instruments so we are going to (optionally, results should be similar without doing it) standardize and winsorize our data using scipy tools, we carefully have to avoid the directionality column, boolean values, that are standardized to two floats depending on the number of samples if passed through a z-score transformation:

# Statistical normalizations:
from scipy.stats import zscore
from scipy.stats.mstats import winsorize

# Avoid Boolean data columns
include = [col for col in price_history.columns if price_history[col].dtype != np.bool]
price_history[include] = price_history[include].apply(zscore, nan_policy='omit')
price_history[include] = price_history[include].apply(winsorize, nan_policy='omit')

For a little bit of handling simplicity we will record the column names for each of the categories:

cols = price_history.columns
price_cols = [col for col in cols if 'Price_' in col]
returns_cols = [col for col in cols if 'Returns_' in col]
direction_cols = [col for col in cols if 'Direction_' in col]

And using our custom heatmap plotting functions we can now obtain huge color patterns showing where interesting relations are. Plots are very large, zooming in is required to inspect interesting areas:

plot_corr_hm(price_history[price_cols], size=(64,64))
plot_corr_hm(price_history[returns_cols], size=(64,64), title = 'Future Returns Correlations')
plot_corr_hm(price_history[direction_cols], size=(64,64), title = 'Price Direction Correlations')
plot_corr_hm(price_history, size=(64,64), title = 'EVERYTHING WITH EVERYTHING')

Correlation in future prices:

Correlation in future returns:


Correlation in future returns directionality:

Correlation in Everything to Everything:


We have the correlation plots for prices and future prices, returns and future returns, direction of returns (true: positive returns, false: negative returns) and the legendary correlation matrix of everything against everything, the final sign of desperation when playing Monkey Island and you just iterate all inventory items over all inventory items until the fish eye combines with the extensible ball picker. By doing this it is almost certain that patterns will emerge, we are using 10 year data this time and these correlations reflect just that, historical correlation.


The interesting patterns of the future correlations are the areas where the monotony is broken, the diagonals fade out or the colors stand out. As discussed in our previous plot, the strongest effect in price is in the anticorrelation of natural resources in North America (IGE) with dividend paying technology stocks (TDIV) and the relatively low correlation of energy sector (TPYP) with other sectors. In terms of returns, the relatively lower correlation of healthcare (IXJ) to the rest of the sectors stands out and in directionality there are a bunch of very specific ETF and timeframe combinations that show slightly negative correlation. These structures may show the very faint effect of the sector rotation, the wheels of the economy churning, although not as clear as an image from the Hubble telescope, these may be illusions.


As it is difficult to analyze each of these future pairs we have to develop a way to quickly inspect top and bottom correlations values:

# Inspect top and bottom of correlation matrices:
corr = price_history.corr()
corr_series = corr.unstack()
ordered_corrs = pd.DataFrame(corr_series.sort_values(kind="quicksort", ascending=False), columns =['Correlation'])
ordered_corrs = ordered_corrs[ordered_corrs!=1].dropna()
top_corrs = ordered_corrs.iloc[0:20]
bottom_corrs = ordered_corrs.iloc[-20:]

# We still have inverted pairs, we only need one row of each pair (::2 slicing). 
from IPython.display import display
display(bottom_corrs[::2])
display(top_corrs[::2])

IPython display function can be imported into the notebook (the notebook will be shared in the final installment) to properly print the dataframes. Also, our correlation dataframe contains back and forth correlation pairs we can just omit repeats by slicing every 2 items with [::2]. This yields, for the indexes containing "Returns":


Negative correlations are interesting, positive correlations are concentrated in same future frames, these will provide no predictive power at the daily scale and we will have to find another use for them. As we got ourselves into a string-index problem, we will have to suffer through regular expressions to remove all correlations corresponding to the same future point. We will try our best to prevent this monstrosity in the next publication:

# We will use regex.
# Regex is like learning latin, you have to memorize the declinations, 
# lest you expend eternity checking the manual. Then each language    
# uses different descriptors. It is a difficult task.
import re
c_corrs = ordered_corrs.reset_index()
c_corrs['D1'] = c_corrs['level_0'].apply(lambda x: re.findall(r'\d+', x))
c_corrs['D2'] = c_corrs['level_1'].apply(lambda x: re.findall(r'\d+', x))
c_corrs['D1'] = c_corrs['D1'].apply(lambda x: int(x[0]) if len(x)>0 else 0)
c_corrs['D2'] = c_corrs['D2'].apply(lambda x: int(x[0]) if len(x)>0 else 0)
c_corrs['is_valid'] = c_corrs['D2'] != c_corrs['D1']
c_corrs.columns = ['Factor_1', 'Factor_2', 'Correlation', 'D1', 'D2', 'is_valid']
clean_corrs = c_corrs[c_corrs['is_valid']].set_index(['Factor_1', 'Factor_2'])[['Correlation']][::2]

Now we have a full correlation matrix with future relations in prices, returns and direction of returns with more than 40.000 historical correlation values, including autocorrelations:


This list gives us just the correlation, cannot stop pointing it out to ourselves, we can have a certain historic perspective for price and returns correlations at different time frames, still it will be difficult to extract predictions for the future just from correlations. For illustrative purposes we will look at the shape of one of the highly negatively correlated pairs that could be of interest, lest take that price of FSTA against the price of IGE 22 days into the future, the unshifted correlation is of -0.69, the leading one is -0.72:

Note that the price in these charts is winsorized and standardized, still COVID19 drop effects are notoriously high.


At this point we could try and take future historical correlation such as this to develop an options based trading model, that by looking at implied volatilities and historical future correlations could potentially discover mispriced assets. It is worth first to finalize this analysis (before forking it) in the next publication and discover the true predictive power of these pairs and check the relationship between this predictive or inference power and the associated correlation levels.


Remember that information in ostirion.net does not constitute financial advice, we do not hold positions in any of the companies or assets that we mention in our posts at the time of posting. If you are in need of algorithmic model development, deployment, verification or validation do not hesitate and contact us. We will be also glad to help you with your predictive machine learning or artificial intelligence challenges.


20 views0 comments

Recent Posts

See All

Bình luận


bottom of page