This post creates a plain vanilla LSTM and learns a simple pattern in the sequence.
The Plain Vanilla LSTM takes in a sequence [x1, x2, x3, ... x50]
and then tries to learn that the output of the sequence is the second element of the sequence x2
.
Data Preparation
1
2
3
4
5
6
7
8
9
10
11
12
|
import numpy as np
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import LSTM, Dense
import matplotlib.pyplot as plt
n_max = 5
n_steps = 10
n_samples = 1000
np.random.seed(1234)
X = np.random.randint(0,n_max-1,n_samples*n_steps).reshape(n_samples, n_steps, 1)
X_cat = to_categorical(X, num_classes = n_max)
Y = X_cat[:,1,:]
|
The following generates a 10 step sequence with digits between 0 and 5. The input and output are converted to one hot vectors
Model Building
1
2
3
4
5
|
model = Sequential()
model.add(LSTM(25, input_shape= (n_steps, n_max)))
model.add(Dense(n_max, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['acc'])
model.summary()
|
1
2
3
4
5
6
7
8
9
|
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 25) 3100
_________________________________________________________________
dense_1 (Dense) (None, 5) 130
=================================================================
Total params: 3,230
Trainable params: 3,230
Non-trainable params: 0
|
The number of parameters in layer 1 is 4*((5+1)*25+25*25)
Then numbers of of parameters in layer 2 is (25 +1)*5 = 130
Model Training
1
2
3
4
5
6
7
8
|
n_epochs = 250
history =model.fit(X_cat,Y, validation_split=0.2, epochs=n_epochs)
# Generate training data
X = np.random.randint(0,n_max-1,n_samples*n_steps).reshape(n_samples, n_steps, 1)
X_cat = to_categorical(X, num_classes = n_max)
Y = X_cat[:,1,:]
print("Accuracy :", model.evaluate(X_cat,Y))
print(" The # of examples correctly classified are ", np.sum(np.argmax(model.predict(X_cat),axis=1) == np.argmax(Y,axis=1)))
|
Takeaways
- Input is a sequence of steps
- Model is a classification model
- 5 attractive properties of Vanilla LSTM
- Sequence classification conditional on multiple distributed input time steps.
- Memory of precise input observations over thousands of time steps.
- Sequence prediction as a function of prior time steps.
- Robust to the insertion of random time steps on the input sequence.
- Robust to the placement of signal data on the input sequence.