This post creates a sequence to sequence LSTM and learns a simple pattern in the sequence.

The input to the LSTM is a sequence of characters that encapsulates an addition operation. For example " 2+10", " 9+7",“10+10” and the output of the sequence is “12”,“15”,“20” etc.

There are many new concepts that are related to Sequence to Sequence modeling.

  • The input is a sequence of 5 characters and each of the characters can be one hot encoded. The allowable characters are 0,1,2,3,4,5,6,7,8,9,+,''. Hence the input sequence of 5 characters is converted in to a one hot encoded vectors of length 12. Thus input sample is of shape (5,12)
  • The input is passed in to LSTM layer of 75 cells. Since the output is of sequence length 2, there needs to be a repeater that needs to be put in place. Hence the output of (2,75) is then passed on to the decoder layer. RepeatVector that acts as bridge between the encoder and decoder LSTM
  • The decoded LSTM has 50 cells and hence outputs with a tensor with shape (2,50)
  • TimeDistributedVector wraps the output and sends each time step one at a time in to Dense Layer

It is extremely important to understand the model structure so that you can clearly set the input and output tensors in to the model

Data Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
from keras.layers import Dense

n_examples = 1000 n_numbers = 2 largest = 10

X = np.random.randint(1, largest+1,n_examples*n_numbers).reshape(n_examples,n_numbers) Y = X.sum(axis=1) X = pd.DataFrame(X) X.columns = ['a','b'] X = X.apply(lambda x: (str(x['a']) + "+"+str(x['b'])).rjust(5), axis=1) X = X.values Y = pd.DataFrame(Y) Y = Y.iloc[:,0].astype('str').apply(lambda x : x.rjust(2)).values

One Hot Encoding

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
alphabet = [ '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , '+' ,' ']
alphabet_lookup = dict(zip(alphabet,range(len(alphabet))))
X_train = np.zeros((len(X), 5, len(alphabet)))
Y_train = np.zeros((len(Y),2,len(alphabet)))

for i,pat in enumerate(X): for j,p in enumerate(pat): X_train[i,j,alphabet_lookup[p]] = 1

for i,pat in enumerate(Y): for j,p in enumerate(pat): Y_train[i,j,alphabet_lookup[p]] = 1

Model Set up

1
2
3
4
5
6
7
8
9

model  = Sequential()
model.add(LSTM(75, input_shape=(5,12)))
model.add(RepeatVector(2))
model.add(LSTM(50,return_sequences=True))
model.add(TimeDistributed(Dense(12, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_68 (LSTM)               (None, 75)                26400     
_________________________________________________________________
repeat_vector_10 (RepeatVect (None, 2, 75)             0         
_________________________________________________________________
lstm_69 (LSTM)               (None, 2, 50)             25200     
_________________________________________________________________
time_distributed_8 (TimeDist (None, 2, 12)             612       
=================================================================
Total params: 52,212
Trainable params: 52,212
Non-trainable params: 0

Model Training

1
2
n_epochs = 50
history  = model.fit(X_train,Y_train, epochs=n_epochs, validation_split=0.2)

There are many new things that one can be learn via this example ?

  • Basic ideas of Sequence to Sequence modeling
  • How to set up keras for seq2seq models?
  • Why should one use RepeatVector in model building ?
  • Why should one use TimeDistributed in model building?
  • Try to read this article on Medium