Cats or Dogs in the Stock Market (II)
Now that we have a method to extract the images, we are going to find the data using QuantConnect research environment.
# QuantBook Analysis Tool
# For more information see [https://www.quantconnect.com/docs/research/overview]
qb = QuantBook()
#Using our prefered color through the analysis
c_map = "Spectral"
#For the time being we will use a single company, 3M. symbols = ["MMM"]
#Generate the security objects needed by QuantConnect framework: for symbol in symbols: qb.AddEquity(symbol)
#We will train the model n_years:
n_years = 5
#There are approximately 220 trading days in a year, it can be made #approximate and we can adjust the dates to full days later. A full #day, not preceding holidays, will have 390 trading minutes, from 9:30 #to 16:00 on the NYSE. trading_days_per_year = 220 trading_minutes_per_day = 390
#We will request these days in minutes: minutes_to_request = int(n_years*trading_days_per_year*trading_minutes_per_day) days_to_request = int(minutes_to_request / trading_minutes_per_day)
history = qb.History(qb.Securities.Keys, minutes_to_request, Resolution.Minute)
We are also going to generate this data but in a different resolution, daily, so we can obtain the closing price for each day.
#Obtain daily data for the symbols:
daily_history = qb.History(qb.Securities.Keys, int(days_to_request), Resolution.Daily)
QuantConnect framework will generate the following dataframe from this call:
One important thing to note is that QuantConnect will stamp the closing of the day information with the 00:00:00 hour of the next day, so, to harmonize the dates of the minute data and the daily data when can just re-index the time, or shift the values. In this case it is probably easier to re-index the time index:
#Day has to be reindexed to one day before to align daily data that is reported next day at 00:00:00
daily_history["Day"] = daily_history['time'].map(lambda x: x.date()) - timedelta(days=1)
daily_history.set_index(["Day","symbol"],inplace = True)
Now we have to dataframes, one with daily information and one with minute level information. We can also at this point drop all uneeded information from the daily dataframe, as we are only going to use the close values.
daily_close = daily_history.drop(["high","low","open","volume","time"], axis = 1)
The dataframes are ready at this point to extract the features, that will be the images from the previous post in this series, and the targets, that will formed by a categorical indicator that will tell us how the price closed for the day, relative to a current price or relative to the opening price. We leave for the next installment of this series the check of what the model can predict with highest accuracy.