I have been struggling to implement parity of sequence since many weeks. Finally after going through Geron’s book, I am now able to successfully implement an algo that learns the parity of a sequence. This does not use the classification tweak that many apply to solve the parity problem
Create Training and Validation Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
import numpy as np
import re
from sklearn.model_selection import train_test_split
import tensorflow as tf
input_seed = 1234
time_steps = 50
n_samples = 100000
N = 20
X = ['{0:050b}'.format(i) for i in range(2**N)]
X = [[int(j) for j in list(i)] for i in X ]
X = np.asarray(X)
X = X.reshape(X.shape[0],X.shape[1],1)
Y = [np.cumsum(i)%2 for i in X]
Y = np.asarray(Y)
Y = X.reshape(Y.shape[0],Y.shape[1],1)
np.random.seed(input_seed)
idx = np.arange(len(X))
np.random.shuffle(idx)
X, Y = X[idx,:,:], Y[idx]
sample_size = 100000
X,Y = X[:sample_size,:,:], Y[:sample_size]
X_train, X_valid, Y_train, Y_valid = train_test_split(X,Y,test_size=0.25, random_state = input_seed)
|
Set up the RNN Model in TensorFlow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
tf.reset_default_graph()
hidden_units = 32
tf_X = tf.placeholder(tf.float32, shape=[None, time_steps, 1])
tf_Y = tf.placeholder(tf.float32, shape=[None, time_steps, 1])
rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units= hidden_units, activation=tf.nn.relu)
outputs, states =tf.nn.dynamic_rnn(rnn_cell, inputs = tf_X,
dtype=tf.float32)
stacked_outputs = tf.reshape(outputs,[-1,hidden_units])
stacked_outputs = tf.contrib.layers.fully_connected(stacked_outputs, 1,activation_fn=None)
outputs = tf.reshape(stacked_outputs,[-1,time_steps,1])
loss = tf.square(outputs - tf_Y)
total_loss = tf.reduce_mean(loss)
learning_rate = 0.001
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(outputs), tf_Y),tf.int32))
optimizer = tf.train.AdamOptimizer(learning_rate= learning_rate).minimize(loss=total_loss)
batch_size =1000
n_batches = int(X_train.shape[0]/batch_size)
epochs = 20
i = 0
|
Train the Model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for e in range(epochs):
idx = np.arange(len(X_train))
np.random.shuffle(idx)
X_train, Y_train = X_train[idx,:,:], Y_train[idx]
for i in range(n_batches):
x = X_train[(i*batch_size):((i+1)*batch_size),:,:]
y = Y_train[(i*batch_size):((i+1)*batch_size),:,:]
_, curr_loss = sess.run([optimizer, total_loss],
feed_dict={tf_X:x, tf_Y:y})
loss_val,output_val,accuracy_val = sess.run([total_loss,outputs,accuracy], feed_dict={tf_X:X_valid, tf_Y:Y_valid})
print("Epoch:",str(e), " Loss:", loss_val," Accuracy",accuracy_val)
|
The output from testing validation data is
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
Epoch: 0 Loss: 0.06488915 Accuracy 0
Epoch: 1 Loss: 0.01823107 Accuracy 1
Epoch: 2 Loss: 0.0028200247 Accuracy 1
Epoch: 3 Loss: 0.0001802158 Accuracy 1
Epoch: 4 Loss: 1.3688104e-05 Accuracy 1
Epoch: 5 Loss: 5.641931e-06 Accuracy 1
Epoch: 6 Loss: 3.428546e-06 Accuracy 1
Epoch: 7 Loss: 2.266378e-06 Accuracy 1
Epoch: 8 Loss: 1.6181465e-06 Accuracy 1
Epoch: 9 Loss: 1.2760864e-06 Accuracy 1
Epoch: 10 Loss: 1.2253626e-06 Accuracy 1
Epoch: 11 Loss: 4.0318373e-06 Accuracy 1
Epoch: 12 Loss: 8.517197e-07 Accuracy 1
Epoch: 13 Loss: 1.5624798e-06 Accuracy 1
Epoch: 14 Loss: 7.2328066e-07 Accuracy 1
Epoch: 15 Loss: 7.47879e-07 Accuracy 1
Epoch: 16 Loss: 7.501688e-07 Accuracy 1
Epoch: 17 Loss: 6.304133e-07 Accuracy 1
Epoch: 18 Loss: 8.2147415e-07 Accuracy 1
Epoch: 19 Loss: 5.487213e-07 Accuracy 1
|
Hence one can see the with in 20 epochs, the network has learned the pattern completely. One can do a manual check of classification accuracy
1
2
3
4
5
|
output_val = np.round(output_val)
output_val = output_val.astype(int)
x=np.sum(Y_valid==output_val)
y=np.sum(Y_valid!=output_val)
print(y/(x+y))
|
The above gives 100% accuracy
You could also plot and see the pattern of the residuals
1
2
3
|
import matplotlib.pyplot as plt
plt.scatter(np.arange(600000),output_val.flatten() -Y_valid.flatten())
plt.show()
|