Fractionally Differentiated Price as Feature for Machine Learning Models

We discussed the memory content and the statistical properties of fractionally differentiated (fracdiff for shorts) time series in this previous post. In this publication, we will compare the performance of the fracdiff time series as a future predictor compared to baseline predictions with undifferentiated price series or d=1 differentiated series.

We will use the premise from this publication, that is, past behavior of the SPY index can predict the value of the VIX indicator. We will also assume that we can obtain predictions from regression. We will employ commonly used methods to generate a simple "up" or "down" prediction for several days into the future.

Our first step is to obtain the required price and VIX value data using Quantconnect add methods:

qb = QuantBook()
spy = qb.AddEquity('SPY').Symbol
vix = qb.AddData(CBOE, 'VIX').Symbol
symbol_list = [spy, vix]
#Dates for training the model:
start_date, end_date = datetime(2005, 1, 1), datetime(2015, 1, 1)
history = qb.History(qb.Securities.Keys, start_date, end_date, Resolution.Daily)

The resulting dataframe has to be reindexed and pivoted; we are going to use only the "close" price information this time, we ignore volume and bid-ask data that we could include in a more complex model:

reindex_history = history.reset_index()
reindex_history['time'] = reindex_history['time'].dt.date
close_prices = reindex_history.pivot(index = 'time', columns ='symbol', values = 'close').dropna()

From our previous publication, we know that the price of SPY is non-stationary during this period. In contrast, we know that VIX closing value for this period is stationary. In any case, we will check both for their statistical properties in a fracdiff range of 0 (original time series) to 1 (day to day difference). We are using MLFinlab library again. MLFinlab creators have changed their approach to the development of the library; we still think the library is excellent and useful, and we maintain patronage within our (modest) possibilities:

symbol_list = [spy, vix]
fracdiff_series_dictionary = {}
fracdiff_series_adf_tests = {}
fracdiff_values = np.linspace(0,1,11)
for frac in fracdiff_values:
    for symbol in symbol_list:
        frac = round(frac,1)
        dic_index = (symbol, frac)
        fracdiff_series_dictionary[dic_index] = frac_diff_ffd(close_prices[[symbol]], frac, thresh=0.0001)
        fracdiff_series_dictionary[dic_index].dropna(inplace=True)
        fracdiff_series_adf_tests[dic_index] = adfuller(fracdiff_series_dictionary[dic_index])

We keep only those fracdiff series that are stationary. These are committed to dictionaries in case we want, in the future, to use additional symbols or data or inspect them.

stationary_fracdiffs = [key for key in fracdiff_series_adf_tests.keys() if fracdiff_series_adf_tests[key][1] < 0.05 ]
first_stationary_fracdiff = {}
diff_series = {}
for symbol, d in stationary_fracdiffs:
    ds = [kvp for kvp in stationary_fracdiffs if kvp[0]==symbol]
    first_stationary_fracdiff[symbol] = sorted(ds , key = lambda x: x[1], reverse=False)[0][1]
    diff_series[symbol] = fracdiff_series_dictionary[(symbol,first_stationary_fracdiff[symbol])]
    diff_series[symbol].columns =['diff_close_'+str(symbol)]
    
for symbol in symbol_list:    
    print("The lowest stationary fractionally differentiated series for ", symbol," is for d=", first_stationary_fracdiff[symbol],".")

The lowest stationary fractionally differentiated series for  SPY  is for d= 0.5 .
The lowest stationary fractionally differentiated series for  VIX.CBOE  is for d= 0.0 .

We check that for this period, SPY differentiated by 0.5 yields a statistically stationary series. VIX index value is stationary for d = 0, no differentiation needed. Now VIX time series and SPY differentiated by 0.5 should exhibit well-behaving statistical properties that will permit the use of machine learning tools to generate predictions. We will create the prediction target as "days into the future" movement as up (True) or down (False). In this manner, our machine learning model just (as if it were easy) has to learn to classify the prediction input data into one of two classes:

prediction_target_days = [1,3,5,7,13,23]
predicted_symbol = vix
predicting_symbol = spy
data = close_prices
data['SPY_1D_Return'] = fracdiff_series_dictionary[(spy,1)]
data['VIX_1D_Return'] = fracdiff_series_dictionary[(vix,1)]
for symbol in symbol_list:
    data = data.merge(diff_series[symbol],left_index=True, right_index=True)
    for future in prediction_target_days:
        data[('fut_dir_'+str(future)+'_'+str(symbol))] = data[[symbol]].shift(-future) > data[[symbol]]
data.dropna(inplace=True)

This data is split into a training set and a testing set. We are using a small testing set in this case as we will use cross-validation to train our prediction model and eventually (next publication) run a trading backtest that will serve as the effective validation of the model:

# Get sizes of each of the datasets
test_size = 0.1 
num_test = int(test_size*len(data))
num_train = len(data)- num_test
print("Training data points: %d " %num_train)
print("Test data points: %d" % num_test)
# Split into train, cv, and test
train = data[:num_train]
test = data[num_train:]

print("Training DataFrame Shape: ",train.shape)
print("Testing DataFrame Shape: ",test.shape)

Training data points: 2086 
Test data points: 231
Training DataFrame Shape:  (2086, 17)
Testing DataFrame Shape:  (231, 17)

We can now scale testing and training data. Remember to fit scalers to training data first and use that fit to transform testing data; otherwise, data leaks can occur. We will use a minimum-maximum scaler and a standard scaler in sequence. The function becomes a bit monstrous as we are trying to keep the scaling transform data for the future in the form of a dictionary that we can eventually return when we needed.

from sklearn.preprocessing import StandardScaler, MinMaxScaler

def scale_data(selected_features, selected_target, predict_window, train, test):
    min_max_scaler_parameters = {}
    standard_scaler_parameters = {}
    scaled_train_features = pd.DataFrame()
    scaled_test_features = pd.DataFrame()
    for feature in selected_features:
        min_max_scaler = MinMaxScaler(feature_range=(-1,1))
        min_max_scaler_parameters[feature] = min_max_scaler.fit(np.array(train[feature]).reshape(-1,1))
        scaled_train_features[str(feature)+"_scaled"] = min_max_scaler.transform(np.array(train[feature]).reshape(-1,1)).flatten()
        
        standard_scaler = StandardScaler()
        standard_scaler_parameters[feature] = standard_scaler.fit(np.array(scaled_train_features[str(feature)+"_scaled"]).reshape(-1,1))
        scaled_train_features[str(feature)+"_scaled"] = standard_scaler.transform(np.array(scaled_train_features[str(feature)+"_scaled"]).reshape(-1,1)).flatten()
        
        if test_size == 0.0: continue
        scaled_test_features[str(feature)+"_scaled"] = min_max_scaler.transform(np.array(test[feature]).reshape(-1,1)).flatten()
        scaled_test_features[str(feature)+"_scaled"] = standard_scaler.transform(np.array(scaled_test_features[str(feature)+"_scaled"]).reshape(-1,1)).flatten()
    
    X_train = np.array(scaled_train_features)
    X_test = np.array(scaled_test_features)
    y_train = train['fut_dir_'+str(predict_window)+'_'+str(selected_target)]
    y_test = test['fut_dir_'+str(predict_window)+'_'+str(selected_target)]
        
    return X_train, X_test, y_train, y_test

We now have entries for each day with the 0.5 fradiff of the closing SPY price, the daily change in SPY price, and the current value of VIX. We can use this data to predict the future direction, and to do so; we will feed the data points sequentially into our machine learning model in windows of several days. Our training data will include a look-back time series of our prediction values, containing as many previous' days data as we wish. We can flexibly generate prediction inputs, targets, and prediction futures for deeper analysis:

def input_features(data, lookback_window, predict_window):
    nb_samples = len(data) - lookback_window - predict_window
    input_list = [np.expand_dims(data[i:lookback_window+i,:], axis=0) for i in range(nb_samples)] #here nb_samples comes in handy
    input_mat = np.concatenate(input_list, axis=0) 
    return input_mat

def output_targets(data, lookback_window, predict_window):
    nb_samples = len(data) - lookback_window - predict_window
    input_list = [np.expand_dims(data[i:lookback_window+i][-1], axis=0) for i in range(nb_samples)] #here nb_samples comes in handy
    input_mat = np.concatenate(input_list, axis=0) 
    return input_mat

def generate_windows(X_train, X_test, y_train, y_test, lookback_window, predict_window, predict_symbol):
    #format the features into the batch size
    X_train_windows = input_features(X_train, lookback_window, predict_window).reshape(-1,lookback_window*len(selected_features))
    X_test_windows = input_features(X_test, lookback_window, predict_window).reshape(-1,lookback_window*len(selected_features))
    y_train = np.array(train['fut_dir_'+str(predict_window)+"_"+str(predict_symbol)])
    y_test = np.array(test['fut_dir_'+str(predict_window)+"_"+str(predict_symbol)])
    y_train_windows = output_targets(y_train, lookback_window, predict_window).astype(int)
    y_test_windows = output_targets(y_test, lookback_window, predict_window).astype(int)
    return X_train_windows, X_test_windows, y_train_windows, y_test_windows

Notice that there is a double shape change in the definition of the look-back matrices. This look-back window generator is also used to feed another LSTM model that requires a three-dimensional array; it is a valid LSTM input with minor changes. With the data in place, let's put the differentiated features to the test with an array of spot models to gain a quick indication for which classification model (out of a subsample of all sklearn classifiers) may have better initial performance:

from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.model_selection import cross_val_score

def spot_models(X_train_matrix, y_train_matrix):
    models = []
    models.append(('Logistic Regression', LogisticRegression(solver = 'liblinear')))
    models.append(('K-Nearest Neighbours', KNeighborsClassifier())) 
    models.append(('Random Forest', RandomForestClassifier(n_estimators = 10))) 
    models.append(('Support Vector Classifier', svm.SVC(gamma='auto')))
    models.append(('Multi-layer Perceptron', MLPClassifier(max_iter = 1000, solver = 'sgd', learning_rate='adaptive', hidden_layer_sizes=(3))))    
    results = []
    names = []
    tscv = TimeSeriesSplit(n_splits=10)
    for name, model in models:
            cv_results = cross_val_score(model, X_train_matrix, y_train_matrix, cv=tscv, scoring='roc_auc')
        results.append(cv_results)
        names.append(name)
        print('%s: Mean Acc.: %.2f std: %.2f - min: %.2f - max: %.2f' % (name, cv_results.mean(), cv_results.std(),cv_results.min(), cv_results.max()))
    # Compare Algorithms
    plt.boxplot(results, labels=names)
    plt.title('Algorithm Accuracy Comparison')
    plt.show()

A time-series cross-validation model is used, as it can better describe the chronological phenomenon of price change.

We will fit our model in sequence to a logistic regression, a multi-layered perceptron, a K-nearest neighbors classifier, a random forest classifier, and a support vector machine classifier. We will not worry ourselves with the detailed definition of these models; these are to gain a high-level view into which approach could be better for our test. These models are then fit and scored will all the available input data we have for prediction; stationary fracdiff of SPY prices, current SPY price, current VIX value, and the 1-day returns of both SPY and VIX:

selected_features = ['diff_close_SPY', vix, 'SPY_1D_Return', spy, 'VIX_1D_Return']
lookback_window = 22
predict_window = 5
selected_target = vix
X_train, X_test, y_train, y_test = scale_data(selected_features, selected_target, predict_window, train, test)
X_train_windows, X_test_windows, y_train_windows, y_test_windows = generate_windows(X_train, X_test, y_train, y_test, lookback_window, predict_window, predict_symbol)

spot_models(X_train_windows, y_train_windows)

The accuracy statistics for each of the models are:

Logistic Regression: Mean Acc.: 0.56, std: 0.07 - min: 0.47 - max: 0.68 K-Nearest Neighbours: Mean Acc.: 0.52, std: 0.04 - min: 0.46 - max: 0.62 Random Forest: Mean Acc.: 0.55, std: 0.07 - min: 0.42 - max: 0.67 Support Vector Classifier: Mean Acc.: 0.56, std: 0.06 - min: 0.46 - max: 0.70 Multi-layer Perceptron: Mean Acc.: 0.56, std: 0.07 - min: 0.45 - max: 0.71

In the form of a boxplot, that could result much more useful:

Box plot for machine learning models spot check.

For very similar mean accuracies of 55%, the support vector classifier model offers the greatest potential among the considered classifiers. We will select the SVC model as our baseline and try to improve the performance using a parameter grid search, basically brute forcing our way into the parameters without deeper investigation:

from sklearn.model_selection import GridSearchCV
model = svm.SVC()
param_search = { 
                'kernel': ['linear', 'rbf', 'sigmoid'],
                'gamma': [1e-2, 1e-3],
                'class_weight' : ['balanced'],
                'C': [1, 10, 100],
                'decision_function_shape': ['ovo', 'ovr']
                }    
tscv = TimeSeriesSplit(n_splits=10)
gsearch = GridSearchCV(estimator=model, cv=tscv, param_grid=param_search, scoring = 'roc_auc', verbose=2)
gsearch.fit(X_train_windows, y_train_windows)
best_score = gsearch.best_score_
best_model = gsearch.best_estimator_

Verbose option 2 in the sklearn parameter grid search will produce a good amount of data and indicate the model training times for each parameter calculation if we want to go deeper into parameter selection in a time-efficient manner. This is the partial output of the grid search log:

Fitting 10 folds for each of 36 candidates, totalling 360 fits
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear 
[CV]  C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total=   0.0s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear 
[CV]  C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total=   0.0s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear 
[CV]  C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total=   0.1s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear

After finding the best possible model from the parameter search grid it is time to test the model with the validation data we have kept separate so far:

print('Best Model Score:', best_score.round(2))
print('Best Model Parameters:', best_model)
validation_score = best_model.score(X_test_windows, y_test_windows).round(2)
print('Best Model Validation Score:', validation_score)

Best Model Score: 0.58 Best Model Parameters: SVC(C=1, cache_size=200, class_weight='balanced', coef0=0.0, decision_function_shape='ovo', degree=3, gamma=0.001, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) Best Model Validation Score: 0.55

So we have been able to classify correctly 55% of the test data points. The test data points are 104 True and 100 False, so this is slightly above a naive classifier. Let's finally investigate if our fracdiff feature was important in obtaining this results by removing it from the training data and repeating the whole process:

Best Model Score: 0.58 Best Model Parameters: SVC(C=1, cache_size=200, class_weight='balanced', coef0=0.0, decision_function_shape='ovo', degree=3, gamma=0.001, kernel='sigmoid', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) Best Model Validation Score: 0.49

The fracdiff feature is definitively contributing positively to the score of the model. The favored kernel without the fracdiff feature is the sigmoid kernel instead of the RBF kernel, indicating that the fracdiff feature could be carrying most of the information in the previous model following a gaussian distribution that is lost without it.

It seems that using a fractional differentiation approach to obtain a feature we previously suspected to have predictive power improves the results of our prediction. In our next post, we will implement the best model from this research into a validation backtest and evaluate what is the profitability of the predictive power of the model.

Remember that publications from Ostirion.net are not financial advice. Ostirion.net does not hold any positions on any of the mentioned instruments at the time of publication. If you need further information, asset management support, automated trading strategy development, or tactical strategy deployment, you can contact us.

OSTIRION

Fractionally Differentiated Price as Feature for Machine Learning Models

Recent Posts