It is time to put our randomly generated Deep Neural Networks (DNNs) to the test. These neural networks, left to chance due to our incapability to decide, a priori, what model architecture could be better, were defined in our previous post here. The function we will use to produce our random models again, in condensed form, is this:
import tensorflow as tf
import keras as keras
import random
from keras.models import Sequential
from keras.layers import Dense
def make_random_model(n_features, max_depth, max_width):
N = n_features
D = max_depth
W = max_width
I = keras.initializers.__dict__.items()
A = keras.activations.__dict__.items()
initializers = [i for i, cls in I if isinstance(cls, type)]
activations = [a for a, cls in A if callable(cls) and
a[0].islower()]
remove = ['get', 'keras_export', 'get_globals', 'initializer']
activations = [a for a in activations if 'serial' not in a]
activations = [a for a in activations if a not in remove]
initializers = [init for init in initializers if
init[0].islower() ]
initializers = [init for init in initializers if init not in
remove]
model = Sequential()
# The Input Layer:
init = random.choice(initializers)
act = random.choice(activations)
w = random.randint(1, W)
model.add(Dense(w, activation=act,
input_dim=N, kernel_initializer=init))
# The Hidden Layers:
d = random.randint(1, D)
for i in range(d):
init = random.choice(initializers)
act = random.choice(activations)
w = random.randint(1, W)
model.add(Dense(w, activation=act, kernel_initializer=init))
# The Output Layer:
init = random.choice(initializers)
model.add(Dense(1, activation='linear', kernel_initializer=init))
return model
This function will generate a completely random neural network of dense layers. We could add other randomly selected layers today for a simple simulation. This is more than enough randomness. The network will be bounded by maximum depths and widths, not to end up with very lengthy training periods. If you have time, or an idle CPU or GPU, feel free to increase this number as much as you want.
The data we will use to feed our randomly built neural networks is the California Housing Prices dataset, available as an example in Google Colab. This data set is selected for simplicity and illustrates the performance of a neural network model built without architectural design.
To load the dataset:
import pandas as pd
train_df = pd.read_csv('/content/sample_data/california_housing_train.csv')
test_df = pd.read_csv('/content/sample_data/california_housing_test.csv')
train_df
The shape of the training dataset is:
We have the location of a given house, the size of the house, and some demographic data regarding the location. These values are associated with a house value that we will try to predict. We are not touching this data; we are not going to analyze it. Instead, we will feed it to a random neural network cohort and see the predicting baseline that we can achieve.
We can separate the features from the target, making the features data set our input:
t_name = 'median_house_value'
headers = list(train_df.columns)
targets = train_df[t_name]
features = [c for c in headers if c != t_name]
inputs = train_df[features]
The construction and training of the model require a set of external libraries that we will import:
import tensorflow as tf
import keras as keras
import random
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import KFold
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.preprocessing import MinMaxScaler as minimax
from sklearn.preprocessing import StandardScaler as standard
from keras import backend as K
As we are trying to predict via regression the house value, we can conveniently define our loss as the root mean square error, using the Keras backend calculations:
def rmse(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
The training of the random models needs to be fast and data-comprehensive. We will train the random models in very short periods, with very little patience, and check which random models perform better. That model we will keep for a more in-depth training session. We initialize our logging storage, set the maximum level of initial trial folds we want to perform, 25 for a fast first pass, and the maximum size for both the width and the depth of the random neural networks. We have no "best model" yet, and we will keep our scores in a list:
logdir = '/content/log'
N_FOLDS = 25
M_SIZE = 10
kfold = KFold(n_splits=N_FOLDS, shuffle=True)
best_model = False
model_scores = []
At each loop, we will use common callback functions for the training and common functions for standardizing and scaling the data:
# Callbacks:
es = EarlyStopping(monitor='val_loss', patience=3,
restore_best_weights=True)
rop = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=1,
mode='auto', min_delta=0.0001,
cooldown=0, min_lr=0)
tb_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
mm_scaler = minimax()
std_scaler = standard()
We have an early stop with short patience and a very short patience learning rate reducer. We also have a tensorboard logger and both a minimax scaler and standard scaler from sklearn. For each fold loop, we will create a random model, compile it, and fit the training data to the scaler to transform the validation data avoiding leakage. Targets are unchanged:
fold_no = 1
EPOCHS = 5
for train, val in kfold.split(inputs, targets):
# Create a random model:
model = make_random_model(len(features), M_SIZE, M_SIZE)
# Compile the model
model.compile(loss=rmse, optimizer='adam', metrics=[rmse])
# Normalize and Scale fold data:
X_train = mm_scaler.fit_transform(inputs.iloc[train])
X_train = std_scaler.fit_transform(X_train)
y_train = targets.iloc[train]
X_val = mm_scaler.transform(inputs.iloc[val])
X_val = std_scaler.transform(X_val)
y_val = targets.iloc[val]
Inside the loop, we generate a message at the start of the loop, fit the model with our short-tempered callbacks and evaluate the score. If the score is better than the previous scores (the error is smaller than previous errors), we keep this model as our baseline.
# Generate a print
print('---------------------')
print(f'Training for fold {fold_no} ...')
# Fit data to model
history = model.fit(X_train, y_train, validation_data=(X_val, y_val),
epochs=EPOCHS, verbose=0,callbacks=[es, rop,
tb_callback])
# Generate generalization metrics
scores = model.evaluate(X_val, y_val, verbose=0)
print(f'Score for fold {fold_no}: {model.metrics_names[0]} of
{scores[0]:.2f}; {model.metrics_names[1]} of {scores[1]:.2f}$')
score = scores[1]
model_scores.append(score)
if score <= min(model_scores):
print('A better model, saving it!')
best_model = model
# Increase fold number
fold_no = fold_no + 1
Now, training will be hit and miss. We are not setting a random seed so that each run is different. One sample preliminary random training looks like this:
In this case, we are lucky, or unlucky, as we hit the best possible model in the first fold. The best validation RMSE we get is 60,876 USD. Now, this model is saved as the "best model," and we can proceed to train it further, this time in earnest and with more patience and potentially more epochs. We go through the same procedure for the best model, continuing its training from the defined parameters and weights for 100 epochs with a patience of 10 epochs:
# Modify the Early Stopping
es = EarlyStopping(monitor='loss', patience=10,
restore_best_weights=True)
rop = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=1,
mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
# Normalize and Scale data:
mm_scaler = minimax()
std_scaler = standard()
X_train = mm_scaler.fit_transform(train_df[features])
X_train = std_scaler.fit_transform(X_train)
y_train = train_df[t_name]
# Fit data to model
history = best_model.fit(X_train, y_train,
epochs=100, verbose=1,callbacks=[es, rop, tb_callback])
This model took 23 epochs to stop, so 13 epochs before hitting the maximum fit for an in-sample RMSE of 61,615USD:
The model is finally this:
best_model.summary()
There are 8 layers in total with 190 parameters, randomly created. How well does the model predict the test, unseen data? We can evaluate the model with TensorFlow built-in methods using our set-aside test data:
X_val = mm_scaler.transform(test_df[features])
X_val = std_scaler.transform(X_val)
y_val = test_df[t_name]
test_scores = best_model.evaluate(X_val, y_val, verbose=1)
The score is finally an RMSE of 61,761.30 USD. Is this score good? Not really. There are models on the internet that post below 50,000 USD RMSE values. Then, why make these random neural networks? The answer is: to quickly check the possible performance values and promising architectures. We have not yet taken any time to analyze the data, purge the data, or engineer any new features. We know that, more or less, we can achieve a 60K USD error and the architecture that generates this model. Very quickly. We can also let the model run more random models overnight, without supervision, if we can afford the CPU or GPU, and check in the morning for randomly made architectures that solve the problem. As an additional benefit, without exploring the data, we have a clear view of capabilities with the minimum human-introduced vias through statistical discrimination. The model is random, the model is chaos, and the model is pure even if it is not very good.
We could introduce more random variables into the model for more complex problems, dropout layers, convolution layers... the possibilities for randomly typed neural networks are infinite. These are computationally extensive and unexplainable. They are also fast to generate and practically unsupervised, giving an excellent first view of deep learning capabilities for a given problem. The pretty diagram for the sequential model is this:
If you require quantitative model development, deployment, verification, or validation, do not hesitate and contact us. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, automation, or intelligence gathering from satellite, drone, or fixed-point imagery.
The notebook, in Google Colab, for this post is located here.
Comments