I have been struggling to implement parity of sequence since many weeks. Finally after going through Geron’s book, I am now able to successfully implement an algo that learns the parity of a sequence. This does not use the classification tweak that many apply to solve the parity problem

Create Training and Validation Data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import numpy as np
import re
from sklearn.model_selection import train_test_split
import tensorflow as tf
input_seed = 1234
time_steps = 50
n_samples = 100000
N = 20
X = ['{0:050b}'.format(i) for i in range(2**N)]
X = [[int(j) for j in list(i)] for i in X ]
X = np.asarray(X)
X = X.reshape(X.shape[0],X.shape[1],1)
Y = [np.cumsum(i)%2 for i in X]
Y = np.asarray(Y)
Y = X.reshape(Y.shape[0],Y.shape[1],1)
np.random.seed(input_seed)
idx     = np.arange(len(X))
np.random.shuffle(idx)
X, Y    = X[idx,:,:], Y[idx]
sample_size = 100000
X,Y     = X[:sample_size,:,:], Y[:sample_size]
X_train, X_valid, Y_train, Y_valid = train_test_split(X,Y,test_size=0.25, random_state = input_seed)

Set up the RNN Model in TensorFlow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
tf.reset_default_graph()

hidden_units = 32

tf_X = tf.placeholder(tf.float32, shape=[None, time_steps, 1]) tf_Y = tf.placeholder(tf.float32, shape=[None, time_steps, 1]) rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units= hidden_units, activation=tf.nn.relu) outputs, states =tf.nn.dynamic_rnn(rnn_cell, inputs = tf_X, dtype=tf.float32)

stacked_outputs = tf.reshape(outputs,[-1,hidden_units]) stacked_outputs = tf.contrib.layers.fully_connected(stacked_outputs, 1,activation_fn=None) outputs = tf.reshape(stacked_outputs,[-1,time_steps,1])

loss = tf.square(outputs - tf_Y) total_loss = tf.reduce_mean(loss) learning_rate = 0.001 accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(outputs), tf_Y),tf.int32)) optimizer = tf.train.AdamOptimizer(learning_rate= learning_rate).minimize(loss=total_loss) batch_size =1000 n_batches = int(X_train.shape[0]/batch_size) epochs = 20 i = 0

Train the Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for e in range(epochs):
        idx     = np.arange(len(X_train))
        np.random.shuffle(idx)
        X_train, Y_train    = X_train[idx,:,:], Y_train[idx]
        for i in range(n_batches):
            x  = X_train[(i*batch_size):((i+1)*batch_size),:,:]
            y  = Y_train[(i*batch_size):((i+1)*batch_size),:,:]
            _, curr_loss = sess.run([optimizer, total_loss],
                                   feed_dict={tf_X:x, tf_Y:y})
        loss_val,output_val,accuracy_val = sess.run([total_loss,outputs,accuracy], feed_dict={tf_X:X_valid, tf_Y:Y_valid})
        print("Epoch:",str(e), " Loss:", loss_val," Accuracy",accuracy_val)

The output from testing validation data is

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
Epoch: 0  Loss: 0.06488915  Accuracy 0
Epoch: 1  Loss: 0.01823107  Accuracy 1
Epoch: 2  Loss: 0.0028200247  Accuracy 1
Epoch: 3  Loss: 0.0001802158  Accuracy 1
Epoch: 4  Loss: 1.3688104e-05  Accuracy 1
Epoch: 5  Loss: 5.641931e-06  Accuracy 1
Epoch: 6  Loss: 3.428546e-06  Accuracy 1
Epoch: 7  Loss: 2.266378e-06  Accuracy 1
Epoch: 8  Loss: 1.6181465e-06  Accuracy 1
Epoch: 9  Loss: 1.2760864e-06  Accuracy 1
Epoch: 10  Loss: 1.2253626e-06  Accuracy 1
Epoch: 11  Loss: 4.0318373e-06  Accuracy 1
Epoch: 12  Loss: 8.517197e-07  Accuracy 1
Epoch: 13  Loss: 1.5624798e-06  Accuracy 1
Epoch: 14  Loss: 7.2328066e-07  Accuracy 1
Epoch: 15  Loss: 7.47879e-07  Accuracy 1
Epoch: 16  Loss: 7.501688e-07  Accuracy 1
Epoch: 17  Loss: 6.304133e-07  Accuracy 1
Epoch: 18  Loss: 8.2147415e-07  Accuracy 1
Epoch: 19  Loss: 5.487213e-07  Accuracy 1

Hence one can see the with in 20 epochs, the network has learned the pattern completely. One can do a manual check of classification accuracy

1
2
3
4
5
output_val = np.round(output_val)
output_val = output_val.astype(int)
x=np.sum(Y_valid==output_val)
y=np.sum(Y_valid!=output_val)
print(y/(x+y))

The above gives 100% accuracy

You could also plot and see the pattern of the residuals

1
2
3
import matplotlib.pyplot as plt
plt.scatter(np.arange(600000),output_val.flatten() -Y_valid.flatten())
plt.show()