CUSUM Event Identification: Not Just for Sampling

H-Barrio
Apr 6, 2021
3 min read

Cumulative sum filtering is a signal sampling method presented in Marcos Lopez de Prado. 2018. Advances in Financial Machine Learning (1st. ed.). Wiley Publishing. CUSUM filtering is used to better monitor changes in the regimes of sequences and has been extensively applied in the biomedical and manufacturing fields. Our implementation of CUSUM will not necessarily act as a filter; we will use the method to obtain labels for observations that then can be used as class features, classification labels, and also filtering or sampling as presented in the text.

The CUSUM filter algorithm will accumulate the change of a variable over the observations and assign a value of 1 or -1 to those events in which a given limit is exceeded. These limits can be arbitrarily chosen or, for the case of price time series, linked to a measure of recent volatility. In this implementation, we will enable the generation of CUSUM labels according to limits calculated from the exponentially weighted moving standard deviation of returns. In this way, we may better capture structural changes in the market compared to using fixed expected returns or loss limits.

Our CUSUM events filtering function will be this:

def cusum_events(df: pd.DataFrame,
                 h: float=None,
                 span: int=100,
                 devs: float=2.5) -> pd.DataFrame:
    '''    
    Compute CUSUM events for a given price series.    
    Args:        
    df (pd.DataFrame): Dataframe with price time series                           
    in a single column.        
    h (float): Arbitrary cumulative returns value limit to trigger                   
    the CUSUM filter. 
    The filter is symmetric. 
    If h is None exponentially weighted standard deviation will                   
    be used.        
    span (int): Span for exponential weighting of standard deviation.        
    devs (float): Standard deviations to compute variable                       
    trigger limits if h is not defined.
    Returns:        
    pd.DataFrame: Dataframe containing differentiated series.    
    '''    
    # Events e:
    e = pd.DataFrame(0, index=df.index,
                     columns=['CUSUM_Event'])
    s_pos = 0
    s_neg = 0
    r = df.pct_change()
    
    for idx in r.index:
        if h is None:
            h_ = r[:idx].ewm(span=span).std().values[-1][0]*devs
        else: h_ = h
        s_pos = max(0, s_pos+r.loc[idx].values)
        s_neg = min(0, s_neg+r.loc[idx].values)        
        if s_neg < -h_:
            s_neg = 0
            e.loc[idx] = -1elif s_pos > h_:
            s_pos = 0
            e.loc[idx] = 1
     return e

This function takes a price series dataframe and an arbitrary returns limit h to calculate fixed events. If no h limit is defined, span and devs will control the ewm and the deviations to generate events.

Let us get some very common close prices for our testing and checking of the function:

self = QuantBook()
spy = self.AddEquity('SPY').Symbol
bonds = self.AddEquity('TLT').Symbol
start = datetime(2010,1,1)
end = datetime(2020,1,1)
spy_close = self.History(spy, start, end, Resolution.Daily).unstack(level=0)['close']
bonds_close = self.History(bonds, start, end, Resolution.Daily).unstack(level=0)['close']

With these close prices for SPY and TLT, we can obtain the event labels at various standard deviations from the 100 days exponentially weighted moving standard deviation:

c_filter = cusum_events(spy_close, devs=5)
plot_texts = {"y_label": 'CUSUM Event', "title" :'CUSUM Events'}
rh.plot_df(c_filter, **plot_texts)

We can now count how many times we would have sampled the SPY price series, or what distribution of labels are we generating at 5 standard deviations, just as an example:

c_filter.squeeze().value_counts()

Yielding the following distribution of events:

We have a full dataframe with the labels, not only filtered points in time, so these values can be used for any operation we need:

Now, for TLT using a fixed event limit of 5% change, the same approach:

c_filter = cusum_events(bonds_close, h=0.05)
rh.plot_df(c_filter, **plot_texts)

We have another tool now, and a flexible one, that can be used for subsampling, labeling, or feature generation flexibly. We will find a use for it in future spot machine learning models that will, hopefully, learn from or be able to predict these events. The function is added to our fractio repository here.

Information in ostirion.net does not constitute financial advice; we do not hold positions in any of the companies or assets that we mention in our posts at the time of posting. If you require quantitative model development, deployment, verification, or validation, do not hesitate and contact us. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, trading, or risk evaluations.

OSTIRION

CUSUM Event Identification: Not Just for Sampling

Recent Posts

Comments