One of the challenges of trying to predict the stock market prices using machine learning is the underlying assumption that the data used for training the model is stationary. Machine learning algorithms have to predict the behaviour of the market based on previous examples from the past, a past that may exhibit very different characteristics when compared to the present and the future. This is the daily closing price plot for the SPY index since 2000, as an example:

If we look at the beginning of the series we observe that the price values in US dollars are around 100, and clearly trend up for the whole period to 200 and 300 ranges. There are also several sharp drops during the period that go below the 100 mark. The statistical properties of this time series clearly change during time. If we pass the Augmented Dickey Fuller Test (__ADF Test__) to this series we discover that the p-value is 0.999 and the ADF statistic is 2.5 indicating a clearly non-stationary series. Mean and variance of price change greatly during this period, many statistical inference tools will fail when applied to more recent periods if we use the past as a predictor. What is generally accepted as an stationary series is the day to day change of the price of stocks, for example taking the first differential of the price series we obtain the plot for the daily change:

The ADF statistic is -17.18 and the p-value is very low. This indicates that the series of daily returns maintains a constant mean and variance in the 2000 to 2020 period, the mean is $0.04 and the variance is $1.88. It is clear that by looking at this price differential we obtain a stationary series but it is very difficult to see what was the general trend in the market for any given time period, the past is deleted, we can only reconstruct the **total change** from the most recent past, we would not be able to reconstruct the **price** levels with this changes. If we tried to reconstruct the prices out of just the changes we would obtain an incorrect price profile:

The balance between stationarity and series memory can be maintained through fractional differentiation according to Marcos Lopez de Prado in __Advances in Financial Machine Learning__ and also explained in this __conference video__. We can investigate the characteristics of fractionally differentiated stock price time series using __MLFinlab package__. This package implements many useful financial machine learning functions and is currently completely open source, please check out this __patreon from the creators__ to maintain this package open sourced. If the fractional differences are plotted the following set of curves is obtained:

We can see how the profiles are flattened as the differentiation goes from the top non-differentiated series to the bottom first derivative. It is of interest to see what happens at the differentials just below 1:

Fractional differential series start to exhibit the general trend of the SPY index at low fractions, the trends become apparent to the eye at differential fractions of 0.8. On the other end of the fractional differences we can see this:

The fractionally differentiated prices in the 0.1 to 0.5 range may exhibit stationarity and the the same time allow us to reconstruct more accurately the primitive time series, that is, we can retain the past memory of the series while at the same time maintaining a constant mean and variance that will allow for a more accurate statistical inference in machine learning methods. The ADF test results for the fractionally differentiated series are these:

The series in question is relatively large, in this case we will need a fractional differentiation order above 0.8 to obtain an statistically acceptable stationary series. For shorter price series lower differentiation orders may yield stationary series, in most cases we want to use more recent price samples in our machine learning models, the premise being that during 20 years the trading and price discovery methods have changed and there is no benefit in using older market data to perform predictions.

In the future we will use fractional differentiation to obtain variable values that may predict market prices and exhibit well-behaving statistical properties and at the same time maintain a certain memory. For the time being remember that no information in __ostirion.net__ constitutes financial advice, we do not hold positions in any of the companies or assets that we mention in our posts at the time of posting. If you are in need of asset management model development, deployment, verification or validation do not hesitate and __contact us__. We can also automate any successful trading strategy you may have with total confidence as we produce individualized confidentiality and non-compete agreements.

## Comments