Skip to content
Home » Recurrent Neural Network Tensorflow | Introduction

Recurrent Neural Network Tensorflow | Introduction

Recurrent Neural Networks (RNN) - Deep Learning w/ Python, TensorFlow & Keras p.7

Workings of LSTMs in RNN

Step 1: Decide How Much Past Data It Should Remember

The first step within the LSTM is to determine which information should be omitted from the cell therein particular time step. The sigmoid function determines this. it’s at the previous state (ht-1) together with the present input xt and computes the function.

Consider the subsequent two sentences:

Let the output of h(t-1) be “Alice is good in Physics. John, on the opposite hand, is nice at Chemistry.”Let the present input at x(t) be “John plays football well. He told me yesterday over the phone that he had served because the captain of his college team.”The forget gate realizes there may well be a change in context after encountering the primary punctuation mark. It compares with the present input sentence at x(t). the subsequent sentence talks about John, that the information on Alice is deleted. The position of the topic is vacated and assigned to John.

Step 2: Decide How Much This Unit Adds to the Current State

In the second layer, there are two parts. One is that the sigmoid function, and also the other is that the tanh function. within the sigmoid function, it decides which values to let through (0 or 1). tanh function gives weightage to the values which are passed, deciding their level of importance (-1 to 1).

With the present input at x(t), the input gate analyzes the important information — John plays football, and also the incontrovertible fact that he was the captain of his college team is vital.“He told me yesterday over the phone” is a smaller amount importance; hence it’s forgotten. This process of adding some new information may be done via the input gate.

Step 3: Decide What Part of the Current Cell State Makes It to the Output

The third step is to determine what the output are. First, we run a sigmoid layer, which decides what parts of the cell state make it to the output. Then, we put the cell state through tanh to push the values to be between -1 and 1 and multiply it by the output of the sigmoid gate.

Let’s consider this instance to predict the subsequent word within the sentence: “John played tremendously well against the opponent and won for his team. For his contributions, brave ____ was awarded player of the match.”There can be many choices for the empty space. this input brave is an adjective, and adjectives describe a noun. So, “John” can be the most effective output after brave.

Applications of RNN

RNN has multiple uses, especially when it comes to predicting the future. In the financial industry, RNN can be helpful in predicting stock prices or the sign of the stock market direction (i.e., positive or negative).

RNN is useful for an autonomous car as it can avoid a car accident by anticipating the trajectory of the vehicle.

RNN is widely used in text analysis, image captioning, sentiment analysis and machine translation. For example, one can use a movie review to understand the feeling the spectator perceived after watching the movie. Automating this task is very useful when the movie company does not have enough time to review, label, consolidate and analyze the reviews. The machine can do the job with a higher level of accuracy.

Recurrent Neural Networks (RNN) - Deep Learning w/ Python, TensorFlow & Keras p.7
Recurrent Neural Networks (RNN) – Deep Learning w/ Python, TensorFlow & Keras p.7

Python3


train_seq


tokenizer.texts_to_sequences(X_train)


test_seq


tokenizer.texts_to_sequences(X_test)


train_pad


pad_sequences(train_seq,


maxlen


40


truncating


"post"


padding


"post"


test_pad


pad_sequences(test_seq,


maxlen


40


truncating


"post"


padding


"post"

Train a Recurrent Neural Network (RNN) in TensorFlow

Now that the data is ready, the next step is building a Simple Recurrent Neural network. Before training with SImpleRNN, the data is passed through the Embedding layer to perform the equal size of Word Vectors.

Note: We use return_sequences = True only when we need another layer to stack.

Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:


for input_example_batch, target_example_batch in dataset.take(1): example_batch_predictions = model(input_example_batch) print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)

In the above example the sequence length of the input is

100

but the model can be run on inputs of any length:


model.summary()

Model: “my_model” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding (Embedding) multiple 16896 gru (GRU) multiple 3938304 dense (Dense) multiple 67650 ================================================================= Total params: 4022850 (15.35 MB) Trainable params: 4022850 (15.35 MB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

To get actual predictions from the model you need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

Try it for the first example in the batch:


sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1) sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

This gives us, at each timestep, a prediction of the next character index:


sampled_indices

array([15, 52, 19, 34, 6, 39, 41, 62, 50, 61, 42, 26, 29, 57, 34, 46, 12, 61, 53, 14, 26, 50, 5, 8, 29, 44, 2, 65, 62, 52, 53, 26, 25, 39, 64, 36, 53, 21, 34, 30, 12, 58, 61, 43, 38, 29, 1, 26, 47, 35, 52, 30, 10, 20, 59, 9, 11, 34, 59, 45, 56, 20, 39, 29, 46, 10, 54, 56, 57, 17, 19, 19, 14, 40, 12, 12, 4, 54, 22, 17, 31, 7, 61, 44, 56, 36, 5, 38, 30, 32, 23, 21, 52, 39, 42, 30, 42, 8, 17, 53])

Decode these to see the text predicted by this untrained model:


print("Input:\n", text_from_ids(input_example_batch[0]).numpy()) print() print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy())

Input: b” of woman in the world,\nAy, every dram of woman’s flesh is false, If she be.\n\nLEONTES:\nHold your pea” Next Char Predictions: b”BmFU’ZbwkvcMPrUg;vnAMk&-Pe zwmnMLZyWnHUQ;svdYP\nMhVmQ3Gt.:UtfqGZPg3oqrDFFAa;;$oIDR,veqW&YQSJHmZcQc-Dn”

TensorFlow Tutorial 10 - Recurrent Neural Nets (RNN & LSTM & GRU)
TensorFlow Tutorial 10 – Recurrent Neural Nets (RNN & LSTM & GRU)

Frequently Asked Questions (FAQs)

Q1. What’s the Difference Between a Feedforward Neural Network and Recurrent Neural Network?

In this deep learning interview question, the interviewee expects you to relinquish an in depth answer.

  • A Feedforward Neural Network signals travel in one direction from input to output. There are not any feedback loops; the network considers only this input. It cannot memorize previous inputs (e.g., CNN).
  • A Recurrent Neural Network’s signals travel in both directions, creating a looped network. It considers this input with the previously received inputs for generating the output of a layer and might memorize past data because of its internal memory.

Q2. What Are the Applications of a Recurrent Neural Network (RNN)?

The RNN are often used for sentiment analysis, text mining, and image captioning. Recurrent Neural Networks also can address statistic problems like predicting the costs of stocks during a month or quarter.

Q3. What Are the Softmax and ReLU Functions?

Softmax is an activation function that generates the output between zero and one. It divides each output, specified the whole sum of the outputs is adequate to one. Softmax is usually used for output layers.

ReLU (or Rectified Linear Unit) is that the most generally used activation function. It gives an output of X if X is positive and zeros otherwise. ReLU is commonly used for hidden layers.

Q4. What Are Hyperparameters?

This is another commonly asked deep learning interview question. With neural networks, you’re usually working with hyperparameters once the information is formatted correctly. A hyperparameter may be a parameter whose value is about before the educational process begins. It determines how a network is trained and also the structure of the network (such because the number of hidden units, the training rate, epochs, etc.).

Q5. What’s going to Happen If the training Rate is ready Too Low or Too High?

When your learning rate is simply too low, training of the model will progress very slowly as we are making minimal updates to the weights. it’ll take many updates before reaching the minimum point.If the training rate is ready too high, this causes undesirable divergent behavior to the loss function thanks to drastic updates in weights. it’s going to fail to converge (model can provides a good output) or perhaps diverge (data is simply too chaotic for the network to train).

Q6. What’s Dropout and Batch Normalization?

Dropout could be a technique of dropping by the wayside hidden and visual units of a network randomly to stop overfitting of information (typically dropping 20 percent of the nodes). It doubles the quantity of iterations needed to converge the network.

Batch normalization is that the technique to enhance the performance and stability of neural networks by normalizing the inputs in every layer in order that they need mean output activation of zero and variance of 1.

Q7. What’s Overfitting and Underfitting, and the way to Combat Them?

Overfitting occurs when the model learns the main points and noise within the training data to the degree that it adversely impacts the execution of the model on new information. it’s more likely to occur with nonlinear models that have more flexibility when learning a target function. An example would be if a model is watching cars and trucks, but only recognizes trucks that have a selected box shape. it would not be ready to notice a flatbed truck because there’s only a selected quite truck it saw in training. The model performs well on training data, but not within the universe.

Underfitting alludes to a model that’s neither well-trained on data nor can generalize to new information. This usually happens when there’s less and incorrect data to coach a model. Underfitting has both poor performance and accuracy.

To combat overfitting and underfitting, you’ll resample the info to estimate the model accuracy (k-fold cross-validation) and by having a validation dataset to judge the model.

Q8. How Are Weights Initialized in an exceedingly Network?

There are two methods here: we are able to either initialize the weights to zero or assign them randomly.

  • Initializing all weights to 0: This makes your model almost like a linear model. All the neurons and each layer perform the identical operation, giving the identical output and making the deep net useless.
  • Initializing all weights randomly: Here, the weights are assigned randomly by initializing them very near 0. It gives better accuracy to the model since every neuron performs different computations. this is often the foremost commonly used method.

Q9. What Are the various Layers on CNN?

There are four layers in CNN:

  • Convolutional Layer – the layer that performs a convolutional operation, creating several smaller picture windows to travel over the info.
  • ReLU Layer – it brings non-linearity to the network and converts all the negative pixels to zero. The output could be a rectified feature map.
  • Pooling Layer – pooling may be a down-sampling operation that reduces the dimensionality of the feature map.
  • Fully Connected Layer – this layer recognizes and classifies the objects within the image.

Q10. what’s Pooling on CNN, and the way Does It Work?

Pooling is employed to scale back the spatial dimensions of a CNN. It performs down-sampling operations to cut back the dimensionality and creates a pooled feature map by sliding a filter matrix over the input matrix.

Q11. How Does an LSTM Network Work?

Long-Short-Term Memory (LSTM) could be a special reasonably recurrent neural network capable of learning long-term dependencies, remembering information for long periods as its default behavior. There are three steps in an LSTM network:

  • Step 1: The network decides what to forget and what to recollect.
  • Step 2: It selectively updates cell state values.
  • Step 3: The network decides what a part of this state makes it to the output.

Q12. What Are Vanishing and Exploding Gradients?

While training an RNN, your slope can become either too small or too large; this makes the training difficult. When the slope is simply too small, the matter is thought as a “Vanishing Gradient.” When the slope tends to grow exponentially rather than decaying, it’s remarked as an “Exploding Gradient.” Gradient problems cause long training times, poor performance, and low accuracy.

Q13. what’s the Difference Between Epoch, Batch, and Iteration in Deep Learning?

Epoch – Represents one iteration over the whole dataset (everything put into the training model).Batch – Refers to once we cannot pass the whole dataset into the neural network directly, so we divide the dataset into several batches.Iteration – if we’ve got 10,000 images as data and a batch size of 200. then an epoch should run 50 iterations (10,000 divided by 50).

Python3


sns.countplot(data


data, x


'Class Name'


plt.xticks(rotation


90


plt.show()

Output:

Countplot for Class Name Category

Countplots help us to understand the distribution of the whole data along the different categories of a particular column.

Recurrent Neural Networks (RNNs), Clearly Explained!!!
Recurrent Neural Networks (RNNs), Clearly Explained!!!

Setup


import os import datetime import IPython import IPython.display import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns import tensorflow as tf mpl.rcParams['figure.figsize'] = (8, 6) mpl.rcParams['axes.grid'] = False

2023-10-27 05:27:51.778665: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-27 05:27:51.778713: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-27 05:27:51.780357: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Setup


import numpy as np import tensorflow as tf import keras from keras import layers

2023-11-16 12:10:07.977993: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-16 12:10:07.978039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-16 12:10:07.979464: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

ML with Recurrent Neural Networks (NLP Zero to Hero - Part 4)
ML with Recurrent Neural Networks (NLP Zero to Hero – Part 4)

Backpropagation Through Time

Backpropagation through time is once we apply a Backpropagation algorithm to a Recurrent Neural network that has statistic data as its input.

In a typical RNN, one input is fed into the network at a time, and one output is obtained. But in backpropagation, you utilize this additionally because the previous inputs as input. this is often called a timestep and one timestep will contains many statistic data points entering the RNN simultaneously.

Once the neural network has trained on a timeset and given you an output, that output is employed to calculate and accumulate the errors. After this, the network is rolled duplicate and weights are recalculated and updated keeping the errors in mind.

What are Recurrent Neural Networks (RNN)

A recurrent neural network (RNN) is the type of artificial neural network (ANN) that is used in Apple’s Siri and Google’s voice search. RNN remembers past inputs due to an internal memory which is useful for predicting stock prices, generating text, transcriptions, and machine translation.

In the traditional neural network, the inputs and the outputs are independent of each other, whereas the output in RNN is dependent on prior elementals within the sequence. Recurrent networks also share parameters across each layer of the network. In feedforward networks, there are different weights across each node. Whereas RNN shares the same weights within each layer of the network and during gradient descent, the weights and basis are adjusted individually to reduce the loss.

The image above is a simple representation of recurrent neural networks. If we are forecasting stock prices using simple data [45,56,45,49,50,…], each input from X0 to Xt will contain a past value. For example, X0 will have 45, X1 will have 56, and these values are used to predict the next number in a sequence.

RNN From Scratch In Python
RNN From Scratch In Python

Next Task For You

If you are also interested and want to more about the AWS certified Machine Learning Specialist then join the Waitlist.

  • TensorFlow Tutorial
  • TensorFlow – Home
  • TensorFlow – Introduction
  • TensorFlow – Installation
  • Understanding Artificial Intelligence
  • Mathematical Foundations
  • Machine Learning & Deep Learning
  • TensorFlow – Basics
  • Convolutional Neural Networks
  • Recurrent Neural Networks
  • TensorBoard Visualization
  • TensorFlow – Word Embedding
  • Single Layer Perceptron
  • TensorFlow – Linear Regression
  • TFLearn and its installation
  • CNN and RNN Difference
  • TensorFlow – Keras
  • TensorFlow – Distributed Computing
  • TensorFlow – Exporting
  • Multi-Layer Perceptron Learning
  • Hidden Layers of Perceptron
  • TensorFlow – Optimizers
  • TensorFlow – XOR Implementation
  • Gradient Descent Optimization
  • TensorFlow – Forming Graphs
  • Image Recognition using TensorFlow
  • Recommendations for Neural Network Training
  • TensorFlow Useful Resources
  • TensorFlow – Quick Guide
  • TensorFlow – Useful Resources
  • TensorFlow – Discussion

TensorFlow – Recurrent Neural Networks

Recurrent neural networks is a type of deep learning-oriented algorithm, which follows a sequential approach. In neural networks, we always assume that each input and output is independent of all other layers. These type of neural networks are called recurrent because they perform mathematical computations in sequential manner.

Consider the following steps to train a recurrent neural network −

Step 1 − Input a specific example from dataset.

Step 2 − Network will take an example and compute some calculations using randomly initialized variables.

Step 3 − A predicted result is then computed.

Step 4 − The comparison of actual result generated with the expected value will produce an error.

Step 5 − To trace the error, it is propagated through same path where the variables are also adjusted.

Step 6 − The steps from 1 to 5 are repeated until we are confident that the variables declared to get the output are defined properly.

Step 7 − A systematic prediction is made by applying these variables to get new unseen input.

The schematic approach of representing recurrent neural networks is described below −

Python3


plt.subplots(figsize


12


))


plt.subplot(


sns.countplot(data


data, x


'Rating'


plt.subplot(


sns.countplot(data


data, x


"Recommended IND"


plt.show()

Output:

Countplot for the Rating and Recommended IND category

Now let’s plot the histogram plot of the Age group along with the Recommended IND category and the presence of outliers category-wise.

What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)
What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)

Outputs and states

By default, the output of a RNN layer contains a single vector per sample. This vector
is the RNN cell output corresponding to the last timestep, containing information
about the entire input sequence. The shape of this output is

(batch_size, units)

where

units

corresponds to the

units

argument passed to the layer’s constructor.

A RNN layer can also return the entire sequence of outputs for each sample (one vector
per timestep per sample), if you set

return_sequences=True

. The shape of this output
is

(batch_size, timesteps, units)

.


model = keras.Sequential() model.add(layers.Embedding(input_dim=1000, output_dim=64)) # The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256) model.add(layers.GRU(256, return_sequences=True)) # The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128) model.add(layers.SimpleRNN(128)) model.add(layers.Dense(10)) model.summary()

Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, None, 64) 64000 gru (GRU) (None, None, 256) 247296 simple_rnn (SimpleRNN) (None, 128) 49280 dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 361866 (1.38 MB) Trainable params: 361866 (1.38 MB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

In addition, a RNN layer can return its final internal state(s). The returned states can be used to resume the RNN execution later, or to initialize another RNN. This setting is commonly used in the encoder-decoder sequence-to-sequence model, where the encoder final state is used as the initial state of the decoder.

To configure a RNN layer to return its internal state, set the

return_state

parameter
to

True

when creating the layer. Note that

LSTM

has 2 state tensors, but

GRU

only has one.

To configure the initial state of the layer, just call the layer with additional
keyword argument

initial_state

.
Note that the shape of the state needs to match the unit size of the layer, like in the
example below.


encoder_vocab = 1000 decoder_vocab = 2000 encoder_input = layers.Input(shape=(None,)) encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=64)( encoder_input ) # Return states in addition to output output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")( encoder_embedded ) encoder_state = [state_h, state_c] decoder_input = layers.Input(shape=(None,)) decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=64)( decoder_input ) # Pass the 2 states to a new LSTM layer, as initial state decoder_output = layers.LSTM(64, name="decoder")( decoder_embedded, initial_state=encoder_state ) output = layers.Dense(10)(decoder_output) model = keras.Model([encoder_input, decoder_input], output) model.summary()

Model: “model” __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, None)] 0 [] input_2 (InputLayer) [(None, None)] 0 [] embedding_2 (Embedding) (None, None, 64) 64000 [‘input_1[0][0]’] embedding_3 (Embedding) (None, None, 64) 128000 [‘input_2[0][0]’] encoder (LSTM) [(None, 64), 33024 [’embedding_2[0][0]’] (None, 64), (None, 64)] decoder (LSTM) (None, 64) 33024 [’embedding_3[0][0]’, ‘encoder[0][1]’, ‘encoder[0][2]’] dense_2 (Dense) (None, 10) 650 [‘decoder[0][0]’] ================================================================================================== Total params: 258698 (1010.54 KB) Trainable params: 258698 (1010.54 KB) Non-trainable params: 0 (0.00 Byte) __________________________________________________________________________________________________

LSTM Use Case

Now that you just understand how LSTMs work, let’s do a practical implementation to predict the costs of stocks using the “Google stock price” data.Based on the stock price data between 2012 and 2016, we are going to predict the stock prices of 2017.

1. Import the desired libraries

2. Import the training dataset

3. Perform feature scaling to remodel the information

4. Create an information structure with 60-time steps and 1 output

5. Import Keras library and its packages

6. Initialize the RNN

7. Add the LSTM layers and a few dropout regularization.

8. Add the output layer.

9. Compile the RNN

10. Fit the RNN to the training set

11. Load the stock price test data for 2017

12. Get the anticipated stock price for 2017

13. Visualize the results of predicted and real stock price

Tự học Tensorflow | Bài 9.1 | Mạng Recurrent Neural Network (RNN) với Tensorflow
Tự học Tensorflow | Bài 9.1 | Mạng Recurrent Neural Network (RNN) với Tensorflow

Create the model

Above is a diagram of the model.

  1. This model can be build as a


    tf.keras.Sequential

    .

  2. The first layer is the


    encoder

    , which converts the text to a sequence of token indices.

  3. After the encoder is an embedding layer. An embedding layer stores one vector per word. When called, it converts the sequences of word indices to sequences of vectors. These vectors are trainable. After training (on enough data), words with similar meanings often have similar vectors.

    This index-lookup is much more efficient than the equivalent operation of passing a one-hot encoded vector through a


    tf.keras.layers.Dense

    layer.

  4. A recurrent neural network (RNN) processes sequence input by iterating through the elements. RNNs pass the outputs from one timestep to their input on the next timestep.

    The


    tf.keras.layers.Bidirectional

    wrapper can also be used with an RNN layer. This propagates the input forward and backwards through the RNN layer and then concatenates the final output.

    • The main advantage of a bidirectional RNN is that the signal from the beginning of the input doesn’t need to be processed all the way through every timestep to affect the output.

    • The main disadvantage of a bidirectional RNN is that you can’t efficiently stream predictions as words are being added to the end.

  5. After the RNN has converted the sequence to a single vector the two


    layers.Dense

    do some final processing, and convert from this vector representation to a single logit as the classification output.

The code to implement this is below:


model = tf.keras.Sequential([ encoder, tf.keras.layers.Embedding( input_dim=len(encoder.get_vocabulary()), output_dim=64, # Use masking to handle the variable sequence lengths mask_zero=True), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(1) ])

Please note that Keras sequential model is used here since all the layers in the model only have single input and produce single output. In case you want to use stateful RNN layer, you might want to build your model with Keras functional API or model subclassing so that you can retrieve and reuse the RNN layer states. Please check Keras RNN guide for more details.

The embedding layer uses masking to handle the varying sequence-lengths. All the layers after the

Embedding

support masking:


print([layer.supports_masking for layer in model.layers])

[False, True, True, True, True]

To confirm that this works as expected, evaluate a sentence twice. First, alone so there’s no padding to mask:


# predict on a sample text without padding. sample_text = ('The movie was cool. The animation and the graphics ' 'were out of this world. I would recommend this movie.') predictions = model.predict(np.array([sample_text])) print(predictions[0])

1/1 [==============================] – 3s 3s/step [0.00856274]

Now, evaluate it again in a batch with a longer sentence. The result should be identical:


# predict on a sample text with padding padding = "the " * 2000 predictions = model.predict(np.array([sample_text, padding])) print(predictions[0])

1/1 [==============================] – 0s 86ms/step [0.00856275]

Compile the Keras model to configure the training process:


model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy'])

RNNs with list/dict inputs, or nested inputs

Nested structures allow implementers to include more information within a single timestep. For example, a video frame could have audio and video input at the same time. The data shape in this case could be:


[batch, timestep, {"video": [height, width, channel], "audio": [frequency]}]

In another example, handwriting data could have both coordinates x and y for the current position of the pen, as well as pressure information. So the data representation could be:


[batch, timestep, {"location": [x, y], "pressure": [force]}]

The following code provides an example of how to build a custom RNN cell that accepts such structured inputs.

Define a custom cell that supports nested input/output

See Making new Layers & Models via subclassing for details on writing your own layers.


@keras.saving.register_keras_serializable() class NestedCell(keras.layers.Layer): def __init__(self, unit_1, unit_2, unit_3, **kwargs): self.unit_1 = unit_1 self.unit_2 = unit_2 self.unit_3 = unit_3 self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])] self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])] super().__init__(**kwargs) def build(self, input_shapes): # expect input_shape to contain 2 items, [(batch, i1), (batch, i2, i3)] i1 = input_shapes[0][1] i2 = input_shapes[1][1] i3 = input_shapes[1][2] self.kernel_1 = self.add_weight( shape=(i1, self.unit_1), initializer="uniform", name="kernel_1" ) self.kernel_2_3 = self.add_weight( shape=(i2, i3, self.unit_2, self.unit_3), initializer="uniform", name="kernel_2_3", ) def call(self, inputs, states): # inputs should be in [(batch, input_1), (batch, input_2, input_3)] # state should be in shape [(batch, unit_1), (batch, unit_2, unit_3)] input_1, input_2 = tf.nest.flatten(inputs) s1, s2 = states output_1 = tf.matmul(input_1, self.kernel_1) output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3) state_1 = s1 + output_1 state_2_3 = s2 + output_2_3 output = (output_1, output_2_3) new_states = (state_1, state_2_3) return output, new_states def get_config(self): return {"unit_1": self.unit_1, "unit_2": self.unit_2, "unit_3": self.unit_3}

Build a RNN model with nested input/output

Let’s build a Keras model that uses a

keras.layers.RNN

layer and the custom cell
we just defined.


unit_1 = 10 unit_2 = 20 unit_3 = 30 i1 = 32 i2 = 64 i3 = 32 batch_size = 64 num_batches = 10 timestep = 50 cell = NestedCell(unit_1, unit_2, unit_3) rnn = keras.layers.RNN(cell) input_1 = keras.Input((None, i1)) input_2 = keras.Input((None, i2, i3)) outputs = rnn((input_1, input_2)) model = keras.models.Model([input_1, input_2], outputs) model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])

Train the model with randomly generated data

Since there isn’t a good candidate dataset for this model, we use random Numpy data for demonstration.


input_1_data = np.random.random((batch_size * num_batches, timestep, i1)) input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3)) target_1_data = np.random.random((batch_size * num_batches, unit_1)) target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3)) input_data = [input_1_data, input_2_data] target_data = [target_1_data, target_2_data] model.fit(input_data, target_data, batch_size=batch_size)

10/10 [==============================] – 1s 27ms/step – loss: 0.7623 – rnn_1_loss: 0.2873 – rnn_1_1_loss: 0.4750 – rnn_1_accuracy: 0.1016 – rnn_1_1_accuracy: 0.0350

With the Keras

keras.layers.RNN

layer, You are only expected to define the math
logic for individual step within the sequence, and the

keras.layers.RNN

layer
will handle the sequence iteration for you. It’s an incredibly powerful way to quickly
prototype new kinds of RNNs (e.g. a LSTM variant).

For more details, please visit the API docs.

Humans do not reboot their understanding of language each time we hear a sentence. Given an article, we grasp the context based on our previous understanding of those words. One of the defining characteristics we possess is our memory (or retention power).

Can an algorithm replicate this? The first technique that comes to mind is a neural network (NN). But the traditional NNs unfortunately cannot do this. Take an example of wanting to predict what comes next in a video. A traditional neural network will struggle to generate accurate results.

That’s where the concept of recurrent neural networks (RNNs) comes into play. RNNs have become extremely popular in the deep learning space which makes learning them even more imperative. A few real-world applications of RNN include:

In this article, we’ll first quickly go through the core components of a typical RNN model. Then we’ll set up the problem statement which we will finally solve by implementing an RNN model from scratch in Python.

We can always leverage high-level Python libraries to code a RNN. So why code it from scratch? I firmly believe the best way to learn and truly ingrain a concept is to learn it from the ground up. And that’s what I’ll showcase in this tutorial.

This article assumes a basic understanding of recurrent neural networks. In case you need a quick refresher or are looking to learn the basics of RNN, I recommend going through the below articles first:

Let’s quickly recap the core concepts behind recurrent neural networks.

We’ll do this using an example of sequence data, say the stocks of a particular firm. A simple machine learning model, or an Artificial Neural Network, may learn to predict the stock price based on a number of features, such as the volume of the stock, the opening value, etc. Apart from these, the price also depends on how the stock fared in the previous fays and weeks. For a trader, this historical data is actually a major deciding factor for making predictions.

In conventional feed-forward neural networks, all test cases are considered to be independent. Can you see how that’s a bad fit when predicting stock prices? The NN model would not consider the previous stock price values – not a great idea!

There is another concept we can lean on when faced with time sensitive data – Recurrent Neural Networks (RNN)!

A typical RNN looks like this:

This may seem intimidating at first. But once we unfold it, things start looking a lot simpler:

It is now easier for us to visualize how these networks are considering the trend of stock prices. This helps us in predicting the prices for the day. Here, every prediction at time t (h_t) is dependent on all previous predictions and the information learned from them. Fairly straightforward, right?

RNNs can solve our purpose of sequence handling to a great extent but not entirely.

Text is another good example of sequence data. Being able to predict what word or phrase comes after a given text could be a very useful asset. We want our models to write Shakespearean sonnets!

Now, RNNs are great when it comes to context that is short or small in nature. But in order to be able to build a story and remember it, our models should be able to understand the context behind the sequences, just like a human brain.

In this article, we will work on a sequence prediction problem using RNN. One of the simplest tasks for this is sine wave prediction. The sequence contains a visible trend and is easy to solve using heuristics. This is what a sine wave looks like:

We will first devise a recurrent neural network from scratch to solve this problem. Our RNN model should also be able to generalize well so we can apply it on other sequence problems.

We will formulate our problem like this – given a sequence of 50 numbers belonging to a sine wave, predict the 51st number in the series. Time to fire up your Jupyter notebook (or your IDE of choice)!

Ah, the inevitable first step in any data science project – preparing the data before we do anything else.

What does our network model expect the data to be like? It would accept a single sequence of length 50 as input. So the shape of the input data will be:

(number_of_records x length_of_sequence x types_of_sequences)

Here, types_of_sequences is 1, because we have only one type of sequence – the sine wave.

On the other hand, the output would have only one value for each record. This will of course be the 51st value in the input sequence. So it’s shape would be:

(number_of_records x types_of_sequences) #where types_of_sequences is 1

Let’s dive into the code. First, import the necessary libraries:

%pylab inline import math

To create a sine wave like data, we will use the sine function from Python’s math library:

sin_wave = np.array([math.sin(x) for x in np.arange(200)])

Visualizing the sine wave we’ve just generated:

plt.plot(sin_wave[:50])

Python Code:

X_val = [] Y_val = [] for i in range(num_records – 50, num_records): X_val.append(sin_wave[i:i+seq_len]) Y_val.append(sin_wave[i+seq_len]) X_val = np.array(X_val) X_val = np.expand_dims(X_val, axis=2) Y_val = np.array(Y_val) Y_val = np.expand_dims(Y_val, axis=1)

Our next task is defining all the necessary variables and functions we’ll use in the RNN model. Our model will take in the input sequence, process it through a hidden layer of 100 units, and produce a single valued output:

learning_rate = 0.0001 nepoch = 25 T = 50 # length of sequence hidden_dim = 100 output_dim = 1 bptt_truncate = 5 min_clip_value = -10 max_clip_value = 10

We will then define the weights of the network:

U = np.random.uniform(0, 1, (hidden_dim, T)) W = np.random.uniform(0, 1, (hidden_dim, hidden_dim)) V = np.random.uniform(0, 1, (output_dim, hidden_dim))

Here,

Finally, we will define the activation function, sigmoid, to be used in the hidden layer:

def sigmoid(x): return 1 / (1 + np.exp(-x))

Now that we have defined our model, we can finally move on with training it on our sequence data. We can subdivide the training process into smaller steps, namely:

Step 2.1 : Check the loss on training dataStep 2.1.1 : Forward PassStep 2.1.2 : Calculate ErrorStep 2.2 : Check the loss on validation dataStep 2.2.1 : Forward PassStep 2.2.2 : Calculate ErrorStep 2.3 : Start actual trainingStep 2.3.1 : Forward PassStep 2.3.2 : Backpropagate ErrorStep 2.3.3 : Update weights

We need to repeat these steps until convergence. If the model starts to overfit, stop! Or simply pre-define the number of epochs.

We will do a forward pass through our RNN model and calculate the squared error for the predictions for all records in order to get the loss value.

for epoch in range(nepoch): # check loss on train loss = 0.0 # do a forward pass to get prediction for i in range(Y.shape[0]): x, y = X[i], Y[i] # get input, output values of each record prev_s = np.zeros((hidden_dim, 1)) # here, prev-s is the value of the previous activation of hidden layer; which is initialized as all zeroes for t in range(T): new_input = np.zeros(x.shape) # we then do a forward pass for every timestep in the sequence new_input[t] = x[t] # for this, we define a single input for that timestep mulu = np.dot(U, new_input) mulw = np.dot(W, prev_s) add = mulw + mulu s = sigmoid(add) mulv = np.dot(V, s) prev_s = s # calculate error loss_per_record = (y – mulv)**2 / 2 loss += loss_per_record loss = loss / float(y.shape[0])

We will do the same thing for calculating the loss on validation data (in the same loop):

# check loss on val val_loss = 0.0 for i in range(Y_val.shape[0]): x, y = X_val[i], Y_val[i] prev_s = np.zeros((hidden_dim, 1)) for t in range(T): new_input = np.zeros(x.shape) new_input[t] = x[t] mulu = np.dot(U, new_input) mulw = np.dot(W, prev_s) add = mulw + mulu s = sigmoid(add) mulv = np.dot(V, s) prev_s = s loss_per_record = (y – mulv)**2 / 2 val_loss += loss_per_record val_loss = val_loss / float(y.shape[0]) print(‘Epoch: ‘, epoch + 1, ‘, Loss: ‘, loss, ‘, Val Loss: ‘, val_loss)

You should get the below output:

Epoch: 1 , Loss: [[101185.61756671]] , Val Loss: [[50591.0340148]] … …

We will now start with the actual training of the network. In this, we will first do a forward pass to calculate the errors and a backward pass to calculate the gradients and update them. Let me show you these step-by-step so you can visualize how it works in your mind.

In the forward pass:

Here is the code for doing a forward pass (note that it is in continuation of the above loop):

# train model for i in range(Y.shape[0]): x, y = X[i], Y[i] layers = [] prev_s = np.zeros((hidden_dim, 1)) dU = np.zeros(U.shape) dV = np.zeros(V.shape) dW = np.zeros(W.shape) dU_t = np.zeros(U.shape) dV_t = np.zeros(V.shape) dW_t = np.zeros(W.shape) dU_i = np.zeros(U.shape) dW_i = np.zeros(W.shape) # forward pass for t in range(T): new_input = np.zeros(x.shape) new_input[t] = x[t] mulu = np.dot(U, new_input) mulw = np.dot(W, prev_s) add = mulw + mulu s = sigmoid(add) mulv = np.dot(V, s) layers.append({‘s’:s, ‘prev_s’:prev_s}) prev_s = s

After the forward propagation step, we calculate the gradients at each layer, and backpropagate the errors. We will use truncated back propagation through time (TBPTT), instead of vanilla backprop. It may sound complex but its actually pretty straight forward.

The core difference in BPTT versus backprop is that the backpropagation step is done for all the time steps in the RNN layer. So if our sequence length is 50, we will backpropagate for all the timesteps previous to the current timestep.

If you have guessed correctly, BPTT seems very computationally expensive. So instead of backpropagating through all previous timestep , we backpropagate till x timesteps to save computational power. Consider this ideologically similar to stochastic gradient descent, where we include a batch of data points instead of all the data points.

Here is the code for backpropagating the errors:

# derivative of pred dmulv = (mulv – y) # backward pass for t in range(T): dV_t = np.dot(dmulv, np.transpose(layers[t][‘s’])) dsv = np.dot(np.transpose(V), dmulv) ds = dsv dadd = add * (1 – add) * ds dmulw = dadd * np.ones_like(mulw) dprev_s = np.dot(np.transpose(W), dmulw) for i in range(t-1, max(-1, t-bptt_truncate-1), -1): ds = dsv + dprev_s dadd = add * (1 – add) * ds dmulw = dadd * np.ones_like(mulw) dmulu = dadd * np.ones_like(mulu) dW_i = np.dot(W, layers[t][‘prev_s’]) dprev_s = np.dot(np.transpose(W), dmulw) new_input = np.zeros(x.shape) new_input[t] = x[t] dU_i = np.dot(U, new_input) dx = np.dot(np.transpose(U), dmulu) dU_t += dU_i dW_t += dW_i dV += dV_t dU += dU_t dW += dW_t

Lastly, we update the weights with the gradients of weights calculated. One thing we have to keep in mind that the gradients tend to explode if you don’t keep them in check.This is a fundamental issue in training neural networks, called the exploding gradient problem. So we have to clamp them in a range so that they dont explode. We can do it like this

if dU.max() > max_clip_value: dU[dU > max_clip_value] = max_clip_value if dV.max() > max_clip_value: dV[dV > max_clip_value] = max_clip_value if dW.max() > max_clip_value: dW[dW > max_clip_value] = max_clip_value if dU.min() < min_clip_value: dU[dU < min_clip_value] = min_clip_value if dV.min() < min_clip_value: dV[dV < min_clip_value] = min_clip_value if dW.min() < min_clip_value: dW[dW < min_clip_value] = min_clip_value # update U -= learning_rate * dU V -= learning_rate * dV W -= learning_rate * dW

On training the above model, we get this output:

Epoch: 1 , Loss: [[101185.61756671]] , Val Loss: [[50591.0340148]] Epoch: 2 , Loss: [[61205.46869629]] , Val Loss: [[30601.34535365]] Epoch: 3 , Loss: [[31225.3198258]] , Val Loss: [[15611.65669247]] Epoch: 4 , Loss: [[11245.17049551]] , Val Loss: [[5621.96780111]] Epoch: 5 , Loss: [[1264.5157739]] , Val Loss: [[632.02563908]] Epoch: 6 , Loss: [[20.15654115]] , Val Loss: [[10.05477285]] Epoch: 7 , Loss: [[17.13622839]] , Val Loss: [[8.55190426]] Epoch: 8 , Loss: [[17.38870495]] , Val Loss: [[8.68196484]] Epoch: 9 , Loss: [[17.181681]] , Val Loss: [[8.57837827]] Epoch: 10 , Loss: [[17.31275313]] , Val Loss: [[8.64199652]] Epoch: 11 , Loss: [[17.12960034]] , Val Loss: [[8.54768294]] Epoch: 12 , Loss: [[17.09020065]] , Val Loss: [[8.52993502]] Epoch: 13 , Loss: [[17.17370113]] , Val Loss: [[8.57517454]] Epoch: 14 , Loss: [[17.04906914]] , Val Loss: [[8.50658127]] Epoch: 15 , Loss: [[16.96420184]] , Val Loss: [[8.46794248]] Epoch: 16 , Loss: [[17.017519]] , Val Loss: [[8.49241316]] Epoch: 17 , Loss: [[16.94199493]] , Val Loss: [[8.45748739]] Epoch: 18 , Loss: [[16.99796892]] , Val Loss: [[8.48242177]] Epoch: 19 , Loss: [[17.24817035]] , Val Loss: [[8.6126231]] Epoch: 20 , Loss: [[17.00844599]] , Val Loss: [[8.48682234]] Epoch: 21 , Loss: [[17.03943262]] , Val Loss: [[8.50437328]] Epoch: 22 , Loss: [[17.01417255]] , Val Loss: [[8.49409597]] Epoch: 23 , Loss: [[17.20918888]] , Val Loss: [[8.5854792]] Epoch: 24 , Loss: [[16.92068017]] , Val Loss: [[8.44794633]] Epoch: 25 , Loss: [[16.76856238]] , Val Loss: [[8.37295808]]

Looking good! Time to get the predictions and plot them to get a visual sense of what we’ve designed.

We will do a forward pass through the trained weights to get our predictions:

preds = [] for i in range(Y.shape[0]): x, y = X[i], Y[i] prev_s = np.zeros((hidden_dim, 1)) # Forward pass for t in range(T): mulu = np.dot(U, x) mulw = np.dot(W, prev_s) add = mulw + mulu s = sigmoid(add) mulv = np.dot(V, s) prev_s = s preds.append(mulv) preds = np.array(preds)

Plotting these predictions alongside the actual values:

plt.plot(preds[:, 0, 0], ‘g’) plt.plot(Y[:, 0], ‘r’) plt.show()

preds = [] for i in range(Y_val.shape[0]): x, y = X_val[i], Y_val[i] prev_s = np.zeros((hidden_dim, 1)) # For each time step… for t in range(T): mulu = np.dot(U, x) mulw = np.dot(W, prev_s) add = mulw + mulu s = sigmoid(add) mulv = np.dot(V, s) prev_s = s preds.append(mulv) preds = np.array(preds) plt.plot(preds[:, 0, 0], ‘g’) plt.plot(Y_val[:, 0], ‘r’) plt.show()

from sklearn.metrics import mean_squared_error math.sqrt(mean_squared_error(Y_val[:, 0] * max_val, preds[:, 0, 0] * max_val))

0.127191931509431

I cannot stress enough how useful RNNs are when working with sequence data. I implore you all to take this learning and apply it on a dataset. Take a NLP problem and see if you can find a solution for it. You can always reach out to me in the comments section below if you have any questions.

In this article, we learned how to create a recurrent neural network model from scratch by using just the numpy library. You can of course use a high-level library like Keras or Caffe but it is essential to know the concept you’re implementing.

Do share your thoughts, questions and feedback regarding this article below. Happy learning!

Great article! How would the code need to be modified if more than one time series are used to make a prediction? For example: – predict next day temperature using the last 50-day temperature and last 50-day humidity level; or, – predict next day temperature and next day humidity level using the last 50-day temperature and last 50-day humidity level Thank you, Guy Aubin

This article is useful! But there is a little question I want to ask. In the last cell, ‘math.sqrt(mean_squared_error(Y_val[:, 0] * max_val, preds[:, 0, 0] * max_val))’ The object ‘max_val’ seems not be defined in the above code. What is the value(or meaning) of ‘max_val’? Thank you!

Hi, Thanks for a great note. I am getting an error while running the code. The error is coming from the last line. –>math.sqrt(mean_squared_error(Y_val[:, 0] * max_val, preds[:, 0, 0] * max_val)) How did you define “max_val” here?

Hi How can I work it With audio Data

Yes, I also greatly enjoy the explanations. I’ve tried the code and it works well except that prediction and actual signals are phase quadrature signals which doesn’t appear in your graphics

Thank you for this excellent article. Just wondering, shouldn’t this “new_input = np.zeros(x.shape)” come outside the for loop ? It would preserve the sequence and help the context vector ?

How to predict future values, you have used Train and Test data to predict, but how will you predict future say 20 values ?

How to predict future values , say for next 30 values ?

“` # derivative of pred dmulv = (mulv – y) # backward pass for t in range(T): dV_t = np.dot(dmulv, np.transpose(layers[t][‘s’])) dsv = np.dot(np.transpose(V), dmulv) “` quick question here: can we back propagate the derivative of prediction loss back to every t in the sequence in the `many-to-one` setting?

at the end: what is max_val?

Hi, Thanks for the article. Does RNN use one-hot encoding in each time step for time series data forecasting? for instance, input=[10,20, 30] In 1st time step input is [10, 0, 0], In 2nd time step input is [0, 20, 0], and In 3rd time step input is [0, 0, 30] Isn’t it? If yes, could you please share reference if you have it. Thanks in advance.

Can we use the same code for DNA Sequence?

Above End Notes, in the equations what is the max_val?

Thanks for the tutorial. How would we include static data as well as sequential data?

Hello, I’m very interested in the neural network code. However, the code on replit does not load. How else can you see this code?

Is there some reason why loss_per_record = (y – mulv)**2 / 2, not that loss_per_record = (y – mulv)**2 ?

Recurrent Neural Network Tutorial (RNN)

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka

Python3


model


keras.models.Sequential()


model.add(keras.layers.Embedding(


10000


128


))


model.add(keras.layers.Bidirectional(


keras.layers.GRU(


64


, return_sequences


True


)))


model.add(keras.layers.Bidirectional(keras.layers.GRU(


64


)))


model.add(keras.layers.Dense(


128


, activation


"relu"


))


model.add(keras.layers.Dropout(


0.4


))


model.add(keras.layers.Dense(


, activation


"sigmoid"


))


model.


compile


"rmsprop"


"binary_crossentropy"


, metrics


"accuracy"


])


history


model.fit(train_pad, y_train, epochs

Output:

Training Process of the GRU model

If you notice the accuracy difference, the model performed better in LSTM and GRU cases than in simpler ones. We shall conclude this article by discussing the applications, where RNNs are used widely.

Don’t miss your chance to ride the wave of the data revolution! Every industry is scaling new heights by tapping into the power of data. Sharpen your skills and become a part of the hottest trend in the 21st century.

Dive into the future of technology – explore the Complete Machine Learning and Data Science Program by GeeksforGeeks and stay ahead of the curve.

Last Updated :
30 Dec, 2022

Like Article

Save Article

Share your thoughts in the comments

Please Login to comment…

This tutorial demonstrates how to generate text using a character-based RNN. You will work with a dataset of Shakespeare’s writing from Andrej Karpathy’s The Unreasonable Effectiveness of Recurrent Neural Networks. Given a sequence of characters from this data (“Shakespear”), train a model to predict the next character in the sequence (“e”). Longer sequences of text can be generated by calling the model repeatedly.

This tutorial includes runnable code implemented using tf.keras and eager execution. The following is the sample output when the model in this tutorial trained for 30 epochs, and started with the prompt “Q”:

QUEENE: I had thought thou hadst a Roman; for the oracle, Thus by All bids the man against the word, Which are so weak of care, by old care done; Your children were in your holy love, And the precipitation through the bleeding throne. BISHOP OF ELY: Marry, and will, my lord, to weep in such a one were prettiest; Yet now I was adopted heir Of the world’s lamentable day, To watch the next way with his father with his face? ESCALUS: The cause why then we are all resolved more sons. VOLUMNIA: O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead, And love and pale as any will to that word. QUEEN ELIZABETH: But how long have I heard the soul for this world, And show his hands of life be proved to stand. PETRUCHIO: I say he look’d on, if I must be content To stay him from the fatal of our country’s bliss. His lordship pluck’d from this sentence then for prey, And then let us twain, being the moon, were she such a case as fills m

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

  • The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

  • The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

  • As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

Why Recurrent Neural Networks?

Recurrent Neural Networks have unique capacities as opposed to other kinds of Neural Networks, which open a wide range of possibilities for their users still also bringing some challenges with them. Then’s a rundown of the main benefits

  • It’s the only neural network with memory and binary data processing.
  • It can plan out several inputs and productions. Unlike other algorithms that deliver one product for one input, the benefit of RNN is that it can plot out many to many, one to many, and many to one input and productions.
Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition
Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition

Export the generator

This single-step model can easily be saved and restored, allowing you to use it anywhere a

tf.saved_model

is accepted.


tf.saved_model.save(one_step_model, 'one_step') one_step_reloaded = tf.saved_model.load('one_step')

WARNING:tensorflow:Skipping full serialization of Keras layer <__main__.OneStep object at 0x7f1a9c2e6880>, because it is not built. WARNING:tensorflow:Model’s `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config. WARNING:tensorflow:Model’s `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config. INFO:tensorflow:Assets written to: one_step/assets INFO:tensorflow:Assets written to: one_step/assets


states = None next_char = tf.constant(['ROMEO:']) result = [next_char] for n in range(100): next_char, states = one_step_reloaded.generate_one_step(next_char, states=states) result.append(next_char) print(tf.strings.join(result)[0].numpy().decode("utf-8"))

ROMEO: While shall we toward them, gaunt and married. Urped me say I doubt not, for this world is gentle,

A Caveat: Masking and Padding

Padding and masking are very useful techniques that will optimize our training. But what exactly is its function, and why do we need it?

Imagine you have a corpus of text. The sentences within this corpus are of different lengths. This means that every sentence has to be standardized to a common length.

Practice makes perfectPerfection is a dream worth chasing

Becomes …

Practice makes perfect [pad] [pad] [pad] [pad] Perfection is a dream worth chasing [pad]

Now these

[pad]

tokens are of no use in the model except for standardizing the length. So we mask these tokens while training. The process of masking and padding is demonstrated in Figure 5.

Modeling Sequential Data with MLPs

But before we dive into modeling sequential data with Recurrent Neural Networks, let us first understand the shortcomings of the models we would have generally used.

What if we tried to model sequential data with Multilayer Perceptrons (MLPs)?

The answer lies in the question. We create a sequence of our data and pass it through the MLP. The MLP starts learning the data, but something is missing. The process is demonstrated in Figure 6.

Can you tell what is missing in Figure 6?

The MLP learns individual data points. Each time a data point is passed through the MLP, the color of the entire network changes. This signifies that the MLP models that particular data point. So far, so good.

But there is something else here: is succeeded by and then by and so on. This makes the data sequential. While the MLP can learn the individual data points perfectly, it will not be able to learn the order of the data.

And therein lies the problem.

The Recurrence Formula

To solve this problem, we need a function that takes care of the current and previous states.

is the current input.
is the current output and
is the past output. The function that models this relation is represented by
.

Let’s unfold this equation and understand its importance.

  • At
  • At

It is quite evident from the above examples that the equation can model the previous states along with the current state. This is an extremely important equation we will use throughout our tutorial.

In the case of Recurrent Neural Networks the function is a simple that introduces nonlinearity. If you need a quick brush-up on the tanh function, feel free to play around with the interactive graph below:

The recurrence formula for Recurrent Neural Network can be represented as:

is called the hidden state of the network. Notice how the current hidden state is a function of the current input
and the previous hidden state
. This is demonstrated in Figure 7.

Recurrent Neural Network (an overview)

Now that we know the formula behind Recurrent Neural Networks, let us see how to build one. The diagram of a Recurrent Neural Network is shown in Figure 8.

An RNN cell consists of three parts:

  1. The input or
  2. The output or
  3. And the hidden state

The hidden state is fed back as input to the next state and the input for that state. Fortunately, this is implemented using the

SimpleRNN

function inside the

tf.keras.layers

API.

The relation between the input, output, and the hidden state is demonstrated in the following. Figure 9 shows an unfolded RNN.

And now, we begin to build our RNN model. Let’s go through the

model.py

file.

# import the necessary packages from tensorflow.keras import layers from tensorflow import keras

We begin with the necessary imports on Lines 2-3.

def get_rnn_model(vocabSize): # input for variable-length sequences of integers inputs = keras.Input(shape=(None,), dtype=”int32″) # embed the tokens in a 128-dimensional vector with masking # applied and apply dropout x = layers.Embedding(vocabSize, 128, mask_zero=True)(inputs) x = layers.Dropout(0.2)(x) # add 3 simple RNNs x = layers.SimpleRNN(64, return_sequences=True)(x) x = layers.SimpleRNN(64, return_sequences=True)(x) x = layers.SimpleRNN(64)(x) # add a classifier head x = layers.Dense(units=64, activation=”relu”)(x) x = layers.Dense(units=32, activation=”relu”)(x) x = layers.Dropout(0.2)(x) outputs = layers.Dense(1, activation=”sigmoid”)(x) # build the RNN model model = keras.Model(inputs, outputs, name=”RNN”) # return the RNN model return model

On Line 5, we define the

get_rnn_model

we use to create the model. On Line 7, we create the input layer for a variable length sequence of input. The padding and masking mechanism is taken care of by our

Embedding

layer, which we initialize in Line 11. The

Embedding

layer also embeds the variable-length sequence into a

128

-dimensional vector. For those unclear, consider this 128-dimensional matrix as a way the computers view and assign meaning to the text.

On Line 12, we apply dropout to the inputs. We add 3 simple RNN cells on Lines 15-17 and a classifier head with a sigmoid activation on Lines 20-23.

Finally, the model is built using keras.Model API on Line 26 and returned on Line 29.

Training and Visualizations

With the model created and ready, we can finally begin our training procedure. But before that, we must first look at some helper functions that will assist with visualizations and saving.

We begin with a function inside

plot.py

that will help us plot the loss and accuracy.

# import the necessary packages import matplotlib.pyplot as plt def plot_loss_accuracy(history, filepath): # plot the training and validation loss plt.style.use(“ggplot”) (fig, axs) = plt.subplots(2, 1) axs[0].plot(history[“loss”], label=”train_loss”) axs[0].plot(history[“val_loss”], label=”val_loss”) axs[0].set_xlabel(“Epoch #”) axs[0].set_ylabel(“Loss”) axs[0].legend() axs[1].plot(history[“accuracy”], label=”train_accuracy”) axs[1].plot(history[“val_accuracy”], label=”val_accuracy”) axs[1].set_xlabel(“Epoch #”) axs[1].set_ylabel(“Accuracy”) axs[1].legend() fig.savefig(filepath)

On Line 2, we begin by importing matplotlib. On Line 4, we define the

plot_loss_accuracy

function that takes the model

history

and

filepath

as input.

Next, on Lines 6-17, we plot the loss and accuracy for training and validation and save the figure in the specified

filepath

(Line 18).

Next, we define another helper function called

save_load.py

to save the adapted vectorization layer for later use.

from tensorflow.keras.layers import TextVectorization import tensorflow as tf import pickle def save_vectorizer(vectorizer, name): # pickle the weights of the vectorization layer pickle.dump({“weights”: vectorizer.get_weights()}, open(f”{name}.pkl”, “wb”))

We begin with the necessary imports on Lines 1-3. Next, we define a function called

save_vectorizer

on Lines 5-8 that pickles and saves the weight of the vectorization layer.

And finally, with all the necessary functions defined, we can start with

train.py

, where we actually train our RNN model.

# USAGE # python train.py # set the seed for reproducibility import tensorflow as tf tf.keras.utils.set_random_seed(42) # import the necessary packages from pyimagesearch.standardization import custom_standardization from pyimagesearch.plot import plot_loss_accuracy from pyimagesearch.save_load import save_vectorizer from pyimagesearch.dataset import get_imdb_dataset from pyimagesearch.model import get_rnn_model from pyimagesearch.model import get_lstm_model from pyimagesearch import config from tensorflow.keras import layers from tensorflow import keras import os

We begin with all the necessary imports on Lines 5-18.

# get the IMDB dataset print(“[INFO] getting the IMDB dataset…”) (trainDs, valDs) = get_imdb_dataset(folderName=config.DATASET_PATH, batchSize=config.BATCH_SIZE, bufferSize=config.BUFFER_SIZE, autotune=tf.data.AUTOTUNE, test=False) # initialize the text vectorization layer vectorizeLayer = layers.TextVectorization( max_tokens=config.VOCAB_SIZE, output_mode=”int”, output_sequence_length=config.MAX_SEQUENCE_LENGTH, standardize=custom_standardization, ) # grab the text from the training dataset and adapt the text # vectorization layer on it trainText = trainDs.map(lambda text, label: text) vectorizeLayer.adapt(trainText) # vectorize the training and the validation dataset trainDs = trainDs.map(lambda text, label: (vectorizeLayer(text), label)) valDs = valDs.map(lambda text, label: (vectorizeLayer(text), label)) # get the RNN model and compile it print(“[INFO] building the RNN model…”) modelRNN = get_rnn_model(vocabSize=config.VOCAB_SIZE) modelRNN.compile(metrics=[“accuracy”], optimizer=keras.optimizers.Adam(learning_rate=config.LR), loss=keras.losses.BinaryCrossentropy(from_logits=False), ) # train the RNN model print(“[INFO] training the RNN model…”) historyRNN = modelRNN.fit(trainDs, epochs=config.EPOCHS, validation_data=valDs, )

Next, on Lines 22-24, we get the IMDb dataset for movie reviews.

On Lines 27-32, we initialize the text vectorization layer with:


  • max_tokens

    : The maximum number of tokens inside the vocabulary.

  • output_mode

    : The data type for the output.

  • output_sequence_length

    : The maximum sequence length that we will need for padding and masking.

  • standardize

    : The custom standardization function that we defined previously.

On Lines 36 and 37, we adapt the vectorization layer on the training dataset.

When a text vectorization layer is initialized, it does not hold any information about the vocabulary of the training corpus. To build a vocabulary, we need to pass the entire training dataset through the vectorization layer. This is known as adapting to the text.

Finally, on Lines 40 and 41, we vectorize the text of the training and validation dataset using the adapted vectorization layer.

On Lines 44-49, we initialize the predefined RNN model and compile it with Adam optimizer and Binary Cross-Entropy loss.

On Lines 52-55, we fit the model on the training data and save its history onto the

historyRNN

variable.

# check whether the output folder exists, if not build the output folder if not os.path.exists(config.OUTPUT_PATH): os.makedirs(config.OUTPUT_PATH) # save the loss and accuracy plots of RNN and LSTM models plot_loss_accuracy(history=historyRNN.history, filepath=config.RNN_PLOT) plot_loss_accuracy(history=historyLSTM.history, filepath=config.LSTM_PLOT)

On Lines 72 and 73, we check whether the output path exists or create it otherwise. The

plot_loss_accuracy

function is called on Line 76 to visualize the loss and accuracy of the RNN model.

# save the trained RNN and LSTM models to disk print(f”[INFO] saving the RNN model to {config.RNN_MODEL_PATH}…”) keras.models.save_model(model=modelRNN, filepath=config.RNN_MODEL_PATH, include_optimizer=False) print(f”[INFO] saving the LSTM model to {config.LSTM_MODEL_PATH}…”) keras.models.save_model(model=modelLSTM, filepath=config.LSTM_MODEL_PATH, include_optimizer=False)

On Lines 80-82, we then save the trained RNN model on our disk.

# save the text vectorization layer to disk save_vectorizer(vectorizer=vectorizeLayer, name=config.TEXT_VEC_PATH)

Finally, on Line 88, we save the text vectorization layer we used to vectorize the training and validation data.

We can verify the model output’s training, validation loss, and accuracy. We have a 67.88% validation accuracy in just 10 epochs!

$ python train.py [INFO] getting the IMDB dataset… [INFO] building the RNN model… [INFO] training the RNN model… Epoch 1/10 22/22 [==============================] – 11s 329ms/step – loss: 0.7018 – accuracy: 0.4971 – val_loss: 0.6981 – val_accuracy: 0.4820 Epoch 2/10 22/22 [==============================] – 7s 320ms/step – loss: 0.6935 – accuracy: 0.5152 – val_loss: 0.6967 – val_accuracy: 0.4916 Epoch 3/10 22/22 [==============================] – 7s 293ms/step – loss: 0.6883 – accuracy: 0.5405 – val_loss: 0.6959 – val_accuracy: 0.5000 Epoch 4/10 22/22 [==============================] – 7s 307ms/step – loss: 0.6850 – accuracy: 0.5509 – val_loss: 0.6952 – val_accuracy: 0.5064 Epoch 5/10 22/22 [==============================] – 7s 303ms/step – loss: 0.6802 – accuracy: 0.5673 – val_loss: 0.6950 – val_accuracy: 0.5100 Epoch 6/10 22/22 [==============================] – 7s 302ms/step – loss: 0.6729 – accuracy: 0.5915 – val_loss: 0.6953 – val_accuracy: 0.5136 Epoch 7/10 22/22 [==============================] – 7s 294ms/step – loss: 0.6650 – accuracy: 0.6094 – val_loss: 0.6943 – val_accuracy: 0.5232 Epoch 8/10 22/22 [==============================] – 7s 303ms/step – loss: 0.6493 – accuracy: 0.6402 – val_loss: 0.6812 – val_accuracy: 0.5668 Epoch 9/10 22/22 [==============================] – 7s 294ms/step – loss: 0.6141 – accuracy: 0.6774 – val_loss: 0.6379 – val_accuracy: 0.6380 Epoch 10/10 22/22 [==============================] – 7s 296ms/step – loss: 0.5501 – accuracy: 0.7335 – val_loss: 0.5945 – val_accuracy: 0.6788

Loading and Inference

Now that the model has been trained and saved to disk, we need to perform inference to understand the model performance. Before starting with inference, let us first open

save_load.py

again and look at the

load_vectorizer

function.

def load_vectorizer(name, maxTokens, outputLength, standardize=None): # load the pickles data fromDisk = pickle.load(open(f”{name}.pkl”, “rb”)) # build a new vectorization layer newVectorizer = TextVectorization(max_tokens=maxTokens, output_mode=”int”, output_sequence_length=outputLength, standardize=standardize) # call the adap method with some dummy data for the vectorization # layer to initialize properly newVectorizer.adapt(tf.data.Dataset.from_tensor_slices([“xyz”])) newVectorizer.set_weights(fromDisk[“weights”]) # return the vectorization layer return newVectorizer

We define the

load_vectorizer

function on Line 10, which takes the following input:


  • name

    : the file path of the saved

    TextVectorization

    layer weights

  • maxTokens

    : the maximum number of tokens in the vocabulary

  • outputLength

    : the length of the output sequence

  • standardize

    : we do not need any standardization function here

In the training pipeline, we build a

TextVectorization

layer to tokenize and vectorize our training data. In the inference pipeline, we need the same TextVectorization layer as the training pipeline.

To do this, we save the weights of the adapted

TextVectorization

layer and load the weights on top of a newly initialized layer. On Line 12, we load the weights of the saved

TextVectorization

layer. On Line 25, we return the loaded vectorizer.

# USAGE # python inference.py # import the necessary packages from pyimagesearch.standardization import custom_standardization from pyimagesearch.save_load import load_vectorizer from pyimagesearch.dataset import get_imdb_dataset from pyimagesearch import config from tensorflow import keras import tensorflow as tf # load the pre-trained RNN and LSTM model print(“[INFO] loading the pre-trained RNN model…”) modelRnn = keras.models.load_model(filepath=config.RNN_MODEL_PATH) modelRnn.compile(optimizer=”adam”, metrics=[“accuracy”], loss=keras.losses.BinaryCrossentropy(from_logits=False), ) print(“[INFO] loading the pre-trained LSTM model…”) modelLstm = keras.models.load_model(filepath=config.LSTM_MODEL_PATH) modelLstm.compile(optimizer=”adam”, metrics=[“accuracy”], loss=keras.losses.BinaryCrossentropy(from_logits=False), )

On Lines 5-10, we import the necessary packages. On Line 14, the saved RNN model is loaded back to disk. The loaded model is then compiled with the suitable metrics, loss, and optimizer on Lines 15-17.

# get the IMDB dataset print(“[INFO] getting the IMDB test dataset…”) testDs = get_imdb_dataset(folderName=config.DATASET_PATH, batchSize=config.BATCH_SIZE, bufferSize=config.BUFFER_SIZE, autotune=tf.data.AUTOTUNE, test=True) # load the pre-trained text vectorization layer vectorizeLayer = load_vectorizer(name=config.TEXT_VEC_PATH, maxTokens=config.VOCAB_SIZE, outputLength=config.MAX_SEQUENCE_LENGTH, standardize=custom_standardization) # vectorize the test dataset testDs = testDs.map(lambda text, label: (vectorizeLayer(text), label)) # evaluate the trained RNN and LSTM model on the test dataset for model in [modelRnn, modelLstm]: print(f”[INFO] test evaluation for {model.name}:”) (testLoss, testAccuracy) = model.evaluate(testDs) print(f”\t[INFO] test loss: {testLoss:0.2f}”) print(f”\t[INFO] test accuracy: {testAccuracy * 100:0.2f}%”)

On Lines 26-28, we get the testing dataset and pre-process it. Line 37 will map the dataset to get the vectorized tokens and labels.

On Lines 40-44, we evaluate the testing accuracy and the testing loss of the RNN model. Our model achieves 68.42% accuracy at inference!

$ python inference.py [INFO] loading the pre-trained RNN model… [INFO] loading the pre-trained LSTM model… [INFO] getting the IMDB test dataset… [INFO] test evaluation for RNN: 25/25 [==============================] – 4s 96ms/step – loss: 0.6035 – accuracy: 0.6842 [INFO] test loss: 0.60 [INFO] test accuracy: 68.42%

What’s next? We recommend PyImageSearch University.

84 total classes • 114+ hours of on-demand code walkthrough videos • Last updated: February 2024

★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you’ll find:

  • ✓ 84 courses on essential computer vision, deep learning, and OpenCV topics
  • ✓ 84 Certificates of Completion
  • ✓ 114+ hours of on-demand video
  • ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
  • ✓ Pre-configured Jupyter Notebooks in Google Colab
  • ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
  • ✓ Access to centralized code repos for all 536+ tutorials on PyImageSearch
  • ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
  • ✓ Access on mobile, laptop, desktop, etc.
Tensorflow Tutorial for Python in 10 Minutes
Tensorflow Tutorial for Python in 10 Minutes

Cross-batch statefulness

When processing very long sequences (possibly infinite), you may want to use the pattern of cross-batch statefulness.

Normally, the internal state of a RNN layer is reset every time it sees a new batch (i.e. every sample seen by the layer is assumed to be independent of the past). The layer will only maintain a state while processing a given sample.

If you have very long sequences though, it is useful to break them into shorter sequences, and to feed these shorter sequences sequentially into a RNN layer without resetting the layer’s state. That way, the layer can retain information about the entirety of the sequence, even though it’s only seeing one sub-sequence at a time.

You can do this by setting

stateful=True

in the constructor.

If you have a sequence

s = [t0, t1, ... t1546, t1547]

, you would split it into e.g.


s1 = [t0, t1, ... t100] s2 = [t101, ... t201] ... s16 = [t1501, ... t1547]

Then you would process it via:


lstm_layer = layers.LSTM(64, stateful=True) for s in sub_sequences: output = lstm_layer(s)

When you want to clear the state, you can use

layer.reset_states()

.

Here is a complete example:


paragraph1 = np.random.random((20, 10, 50)).astype(np.float32) paragraph2 = np.random.random((20, 10, 50)).astype(np.float32) paragraph3 = np.random.random((20, 10, 50)).astype(np.float32) lstm_layer = layers.LSTM(64, stateful=True) output = lstm_layer(paragraph1) output = lstm_layer(paragraph2) output = lstm_layer(paragraph3) # reset_states() will reset the cached state to the original initial_state. # If no initial_state was provided, zero-states will be used by default. lstm_layer.reset_states()

RNN State Reuse

The recorded states of the RNN layer are not included in the

layer.weights()

. If you
would like to reuse the state from a RNN layer, you can retrieve the states value by

layer.states

and use it as the
initial state for a new layer via the Keras functional API like

new_layer(inputs,
initial_state=layer.states)

, or model subclassing.

Please also note that sequential model might not be used in this case since it only supports layers with single input and output, the extra input of initial state makes it impossible to use here.


paragraph1 = np.random.random((20, 10, 50)).astype(np.float32) paragraph2 = np.random.random((20, 10, 50)).astype(np.float32) paragraph3 = np.random.random((20, 10, 50)).astype(np.float32) lstm_layer = layers.LSTM(64, stateful=True) output = lstm_layer(paragraph1) output = lstm_layer(paragraph2) existing_state = lstm_layer.states new_lstm_layer = layers.LSTM(64) new_output = new_lstm_layer(paragraph3, initial_state=existing_state)

Built-in RNN layers: a simple example

There are three built-in RNN layers in Keras:


  1. keras.layers.SimpleRNN

    , a fully-connected RNN where the output from previous timestep is to be fed to next timestep.

  2. keras.layers.GRU

    , first proposed in Cho et al., 2014.

  3. keras.layers.LSTM

    , first proposed in Hochreiter & Schmidhuber, 1997.

In early 2015, Keras had the first reusable open-source Python implementations of LSTM and GRU.

Here is a simple example of a

Sequential

model that processes sequences of integers,
embeds each integer into a 64-dimensional vector, then processes the sequence of
vectors using a

LSTM

layer.


model = keras.Sequential() # Add an Embedding layer expecting input vocab of size 1000, and # output embedding dimension of size 64. model.add(layers.Embedding(input_dim=1000, output_dim=64)) # Add a LSTM layer with 128 internal units. model.add(layers.LSTM(128)) # Add a Dense layer with 10 units. model.add(layers.Dense(10)) model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding (Embedding) (None, None, 64) 64000 lstm (LSTM) (None, 128) 98816 dense (Dense) (None, 10) 1290 ================================================================= Total params: 164106 (641.04 KB) Trainable params: 164106 (641.04 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Built-in RNNs support a number of useful features:

  • Recurrent dropout, via the

    dropout

    and

    recurrent_dropout

    arguments
  • Ability to process an input sequence in reverse, via the

    go_backwards

    argument
  • Loop unrolling (which can lead to a large speedup when processing short sequences on
    CPU), via the

    unroll

    argument
  • …and more.

For more information, see the RNN API documentation.

What is LSTM (Long Short Term Memory)?
What is LSTM (Long Short Term Memory)?

Performance optimization and CuDNN kernels

In TensorFlow 2.0, the built-in LSTM and GRU layers have been updated to leverage CuDNN
kernels by default when a GPU is available. With this change, the prior

keras.layers.CuDNNLSTM/CuDNNGRU

layers have been deprecated, and you can build your
model without worrying about the hardware it will run on.

Since the CuDNN kernel is built with certain assumptions, this means the layer will not be able to use the CuDNN kernel if you change the defaults of the built-in LSTM or GRU layers. E.g.:

  • Changing the

    activation

    function from

    tanh

    to something else.
  • Changing the

    recurrent_activation

    function from

    sigmoid

    to something else.
  • Using

    recurrent_dropout

    > 0.
  • Setting

    unroll

    to True, which forces LSTM/GRU to decompose the inner

    tf.while_loop

    into an unrolled

    for

    loop.
  • Setting

    use_bias

    to False.
  • Using masking when the input data is not strictly right padded (if the mask corresponds to strictly right padded data, CuDNN can still be used. This is the most common case).

For the detailed list of constraints, please see the documentation for the LSTM and GRU layers.

Using CuDNN kernels when available

Let’s build a simple LSTM model to demonstrate the performance difference.

We’ll use as input sequences the sequence of rows of MNIST digits (treating each row of pixels as a timestep), and we’ll predict the digit’s label.


batch_size = 64 # Each MNIST image batch is a tensor of shape (batch_size, 28, 28). # Each input sequence will be of size (28, 28) (height is treated like time). input_dim = 28 units = 64 output_size = 10 # labels are from 0 to 9 # Build the RNN model def build_model(allow_cudnn_kernel=True): # CuDNN is only available at the layer level, and not at the cell level. # This means `LSTM(units)` will use the CuDNN kernel, # while RNN(LSTMCell(units)) will run on non-CuDNN kernel. if allow_cudnn_kernel: # The LSTM layer with default options uses CuDNN. lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim)) else: # Wrapping a LSTMCell in a RNN layer will not use CuDNN. lstm_layer = keras.layers.RNN( keras.layers.LSTMCell(units), input_shape=(None, input_dim) ) model = keras.models.Sequential( [ lstm_layer, keras.layers.BatchNormalization(), keras.layers.Dense(output_size), ] ) return model

Let’s load the MNIST dataset:


mnist = keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 sample, sample_label = x_train[0], y_train[0]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] – 0s 0us/step

Let’s create a model instance and train it.

We choose

sparse_categorical_crossentropy

as the loss function for the model. The
output of the model has shape of

[batch_size, 10]

. The target for the model is an
integer vector, each of the integer is in the range of 0 to 9.


model = build_model(allow_cudnn_kernel=True) model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer="sgd", metrics=["accuracy"], ) model.fit( x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1 )

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1700136618.250305 9824 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 938/938 [==============================] – 7s 5ms/step – loss: 0.9965 – accuracy: 0.6845 – val_loss: 0.5699 – val_accuracy: 0.8181

Now, let’s compare to a model that does not use the CuDNN kernel:


noncudnn_model = build_model(allow_cudnn_kernel=False) noncudnn_model.set_weights(model.get_weights()) noncudnn_model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer="sgd", metrics=["accuracy"], ) noncudnn_model.fit( x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1 )

938/938 [==============================] – 20s 20ms/step – loss: 0.4268 – accuracy: 0.8698 – val_loss: 0.3017 – val_accuracy: 0.9145

When running on a machine with a NVIDIA GPU and CuDNN installed, the model built with CuDNN is much faster to train compared to the model that uses the regular TensorFlow kernel.

The same CuDNN-enabled model can also be used to run inference in a CPU-only
environment. The

tf.device

annotation below is just forcing the device placement.
The model will run on CPU by default if no GPU is available.

You simply don’t have to worry about the hardware you’re running on anymore. Isn’t that pretty cool?


import matplotlib.pyplot as plt with tf.device("CPU:0"): cpu_model = build_model(allow_cudnn_kernel=True) cpu_model.set_weights(model.get_weights()) result = tf.argmax(cpu_model.predict_on_batch(tf.expand_dims(sample, 0)), axis=1) print( "Predicted result is: %s, target result is: %s" % (result.numpy(), sample_label) ) plt.imshow(sample, cmap=plt.get_cmap("gray"))

Predicted result is: [3], target result is: 5

Setup

Import TensorFlow and other libraries


import tensorflow as tf import numpy as np import os import time

2023-11-16 12:28:52.207051: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-16 12:28:52.207090: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-16 12:28:52.208630: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Download the Shakespeare dataset

Change the following line to run this code on your own data.


path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt 1115394/1115394 [==============================] – 0s 0us/step

Read the data

First, look in the text:


# Read, then decode for py2 compat. text = open(path_to_file, 'rb').read().decode(encoding='utf-8') # length of text is the number of characters in it print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


# Take a look at the first 250 characters in text print(text[:250])

First Citizen: Before we proceed any further, hear me speak. All: Speak, speak. First Citizen: You are all resolved rather to die than to famish? All: Resolved. resolved. First Citizen: First, you know Caius Marcius is chief enemy to the people.


# The unique characters in the file vocab = sorted(set(text)) print(f'{len(vocab)} unique characters')

65 unique characters

TensorFlow Tutorial 6 - RNNs, GRUs, LSTMs and Bidirectionality
TensorFlow Tutorial 6 – RNNs, GRUs, LSTMs and Bidirectionality

Types of Recurrent Neural Networks

Feedforward networks have single input and output, while recurrent neural networks are flexible as the length of inputs and outputs can be changed. This flexibility allows RNNs to generate music, sentiment classification, and machine translation.

There are four types of RNN based on different lengths of inputs and outputs.

  • One-to-one is a simple neural network. It is commonly used for machine learning problems that have a single input and output.
  • One-to-many has a single input and multiple outputs. This is used for generating image captions.
  • Many-to-one takes a sequence of multiple inputs and predicts a single output. It is popular in sentiment classification, where the input is text and the output is a category.
  • Many-to-many takes multiple inputs and outputs. The most common application is machine translation.

Built-in RNN layers: a simple example

There are three built-in RNN layers in Keras:


  1. keras.layers.SimpleRNN

    , a fully-connected RNN where the output from previous timestep is to be fed to next timestep.

  2. keras.layers.GRU

    , first proposed in Cho et al., 2014.

  3. keras.layers.LSTM

    , first proposed in Hochreiter & Schmidhuber, 1997.

In early 2015, Keras had the first reusable open-source Python implementations of LSTM and GRU.

Here is a simple example of a

Sequential

model that processes sequences of integers,
embeds each integer into a 64-dimensional vector, then processes the sequence of
vectors using a

LSTM

layer.


model = keras.Sequential() # Add an Embedding layer expecting input vocab of size 1000, and # output embedding dimension of size 64. model.add(layers.Embedding(input_dim=1000, output_dim=64)) # Add a LSTM layer with 128 internal units. model.add(layers.LSTM(128)) # Add a Dense layer with 10 units. model.add(layers.Dense(10)) model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding (Embedding) (None, None, 64) 64000 lstm (LSTM) (None, 128) 98816 dense (Dense) (None, 10) 1290 ================================================================= Total params: 164106 (641.04 KB) Trainable params: 164106 (641.04 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Built-in RNNs support a number of useful features:

  • Recurrent dropout, via the

    dropout

    and

    recurrent_dropout

    arguments
  • Ability to process an input sequence in reverse, via the

    go_backwards

    argument
  • Loop unrolling (which can lead to a large speedup when processing short sequences on
    CPU), via the

    unroll

    argument
  • …and more.

For more information, see the RNN API documentation.

Deep Learning Cars
Deep Learning Cars

Outputs and states

By default, the output of a RNN layer contains a single vector per sample. This vector
is the RNN cell output corresponding to the last timestep, containing information
about the entire input sequence. The shape of this output is

(batch_size, units)

where

units

corresponds to the

units

argument passed to the layer’s constructor.

A RNN layer can also return the entire sequence of outputs for each sample (one vector
per timestep per sample), if you set

return_sequences=True

. The shape of this output
is

(batch_size, timesteps, units)

.


model = keras.Sequential() model.add(layers.Embedding(input_dim=1000, output_dim=64)) # The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256) model.add(layers.GRU(256, return_sequences=True)) # The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128) model.add(layers.SimpleRNN(128)) model.add(layers.Dense(10)) model.summary()

Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, None, 64) 64000 gru (GRU) (None, None, 256) 247296 simple_rnn (SimpleRNN) (None, 128) 49280 dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 361866 (1.38 MB) Trainable params: 361866 (1.38 MB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

In addition, a RNN layer can return its final internal state(s). The returned states can be used to resume the RNN execution later, or to initialize another RNN. This setting is commonly used in the encoder-decoder sequence-to-sequence model, where the encoder final state is used as the initial state of the decoder.

To configure a RNN layer to return its internal state, set the

return_state

parameter
to

True

when creating the layer. Note that

LSTM

has 2 state tensors, but

GRU

only has one.

To configure the initial state of the layer, just call the layer with additional
keyword argument

initial_state

.
Note that the shape of the state needs to match the unit size of the layer, like in the
example below.


encoder_vocab = 1000 decoder_vocab = 2000 encoder_input = layers.Input(shape=(None,)) encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=64)( encoder_input ) # Return states in addition to output output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")( encoder_embedded ) encoder_state = [state_h, state_c] decoder_input = layers.Input(shape=(None,)) decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=64)( decoder_input ) # Pass the 2 states to a new LSTM layer, as initial state decoder_output = layers.LSTM(64, name="decoder")( decoder_embedded, initial_state=encoder_state ) output = layers.Dense(10)(decoder_output) model = keras.Model([encoder_input, decoder_input], output) model.summary()

Model: “model” __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, None)] 0 [] input_2 (InputLayer) [(None, None)] 0 [] embedding_2 (Embedding) (None, None, 64) 64000 [‘input_1[0][0]’] embedding_3 (Embedding) (None, None, 64) 128000 [‘input_2[0][0]’] encoder (LSTM) [(None, 64), 33024 [’embedding_2[0][0]’] (None, 64), (None, 64)] decoder (LSTM) (None, 64) 33024 [’embedding_3[0][0]’, ‘encoder[0][1]’, ‘encoder[0][2]’] dense_2 (Dense) (None, 10) 650 [‘decoder[0][0]’] ================================================================================================== Total params: 258698 (1010.54 KB) Trainable params: 258698 (1010.54 KB) Non-trainable params: 0 (0.00 Byte) __________________________________________________________________________________________________

RNN layers and RNN cells

In addition to the built-in RNN layers, the RNN API also provides cell-level APIs. Unlike RNN layers, which processes whole batches of input sequences, the RNN cell only processes a single timestep.

The cell is the inside of the

for

loop of a RNN layer. Wrapping a cell inside a

keras.layers.RNN

layer gives you a layer capable of processing batches of
sequences, e.g.

RNN(LSTMCell(10))

.

Mathematically,

RNN(LSTMCell(10))

produces the same result as

LSTM(10)

. In fact,
the implementation of this layer in TF v1.x was just creating the corresponding RNN
cell and wrapping it in a RNN layer. However using the built-in

GRU

and

LSTM

layers enable the use of CuDNN and you may see better performance.

There are three built-in RNN cells, each of them corresponding to the matching RNN layer.


  • keras.layers.SimpleRNNCell

    corresponds to the

    SimpleRNN

    layer.

  • keras.layers.GRUCell

    corresponds to the

    GRU

    layer.

  • keras.layers.LSTMCell

    corresponds to the

    LSTM

    layer.

The cell abstraction, together with the generic

keras.layers.RNN

class, make it
very easy to implement custom RNN architectures for your research.

Recurrent Neural Networks | LSTM Price Movement Predictions For Trading Algorithms
Recurrent Neural Networks | LSTM Price Movement Predictions For Trading Algorithms

Limitations of RNN

In theory, RNN is supposed to carry the information up to times. However, it is quite challenging to propagate all this information when the time step is too long. When a network has too many deep layers, it becomes untrainable. This problem is called: vanishing gradient problem. If you remember, the neural network updates the weight using the gradient descent algorithm. The gradients grow smaller when the network progress down to lower layers.

In conclusion, the gradients stay constant meaning there is no space for improvement. The model learns from a change in the gradient; this change affects the network’s output. However, if the difference in the gradient is too small (i.e., the weights change a little), the network can’t learn anything and so the output. Therefore, a network facing a vanishing gradient problem cannot converge toward a good solution.

Cross-batch statefulness

When processing very long sequences (possibly infinite), you may want to use the pattern of cross-batch statefulness.

Normally, the internal state of a RNN layer is reset every time it sees a new batch (i.e. every sample seen by the layer is assumed to be independent of the past). The layer will only maintain a state while processing a given sample.

If you have very long sequences though, it is useful to break them into shorter sequences, and to feed these shorter sequences sequentially into a RNN layer without resetting the layer’s state. That way, the layer can retain information about the entirety of the sequence, even though it’s only seeing one sub-sequence at a time.

You can do this by setting

stateful=True

in the constructor.

If you have a sequence

s = [t0, t1, ... t1546, t1547]

, you would split it into e.g.


s1 = [t0, t1, ... t100] s2 = [t101, ... t201] ... s16 = [t1501, ... t1547]

Then you would process it via:


lstm_layer = layers.LSTM(64, stateful=True) for s in sub_sequences: output = lstm_layer(s)

When you want to clear the state, you can use

layer.reset_states()

.

Here is a complete example:


paragraph1 = np.random.random((20, 10, 50)).astype(np.float32) paragraph2 = np.random.random((20, 10, 50)).astype(np.float32) paragraph3 = np.random.random((20, 10, 50)).astype(np.float32) lstm_layer = layers.LSTM(64, stateful=True) output = lstm_layer(paragraph1) output = lstm_layer(paragraph2) output = lstm_layer(paragraph3) # reset_states() will reset the cached state to the original initial_state. # If no initial_state was provided, zero-states will be used by default. lstm_layer.reset_states()

RNN State Reuse

The recorded states of the RNN layer are not included in the

layer.weights()

. If you
would like to reuse the state from a RNN layer, you can retrieve the states value by

layer.states

and use it as the
initial state for a new layer via the Keras functional API like

new_layer(inputs,
initial_state=layer.states)

, or model subclassing.

Please also note that sequential model might not be used in this case since it only supports layers with single input and output, the extra input of initial state makes it impossible to use here.


paragraph1 = np.random.random((20, 10, 50)).astype(np.float32) paragraph2 = np.random.random((20, 10, 50)).astype(np.float32) paragraph3 = np.random.random((20, 10, 50)).astype(np.float32) lstm_layer = layers.LSTM(64, stateful=True) output = lstm_layer(paragraph1) output = lstm_layer(paragraph2) existing_state = lstm_layer.states new_lstm_layer = layers.LSTM(64) new_output = new_lstm_layer(paragraph3, initial_state=existing_state)

Lecture 10 | Recurrent Neural Networks
Lecture 10 | Recurrent Neural Networks

MasterCard Stock Price Prediction Using LSTM & GRU

In this project, we are going to use Kaggle’s MasterCard stock dataset from May-25-2006 to Oct-11-2021 and train the LSTM and GRU models to forecast the stock price. This is a simple project-based tutorial where we will analyze data, preprocess the data to train it on advanced RNN models, and finally evaluate the results.

The project requires Pandas and Numpy for data manipulation, Matplotlib.pyplot for data visualization, scikit-learn for scaling and evaluation, and TensorFlow for modeling. We will also set seeds for reproducibility.


# Importing the libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional from tensorflow.keras.optimizers import SGD from tensorflow.random import set_seed set_seed(455) np.random.seed(455)

Data Analysis

In this part, we will import the MasterCard dataset by adding the Date column to the index and converting it to DateTime format. We will also drop irrelevant columns from the dataset as we are only interested in stock prices, volume, and date.

The dataset has Date as index and Open, High, Low, Close, and Volume as columns. It looks like we have successfully imported a cleaned dataset.


dataset = pd.read_csv( "data/Mastercard_stock_history.csv", index_col="Date", parse_dates=["Date"] ).drop(["Dividends", "Stock Splits"], axis=1) print(dataset.head()) Open High Low Close Volume Date 2006-05-25 3.748967 4.283869 3.739664 4.279217 395343000 2006-05-26 4.307126 4.348058 4.103398 4.179680 103044000 2006-05-30 4.183400 4.184330 3.986184 4.093164 49898000 2006-05-31 4.125723 4.219679 4.125723 4.180608 30002000 2006-06-01 4.179678 4.474572 4.176887 4.419686 62344000

The .describe() function helps us analyze the data in depth. Let’s focus on the High column as we are going to use it to train the model. We can also choose Close or Open columns for a model feature, but High makes more sense as it provides us information of how high the values of the share went on the given day.

The minimum stock price is $4.10, and the highest is $400.5. The mean is at $105.9 and the standard deviation $107.3, which means that stocks have high variance.


print(dataset.describe()) Open High Low Close Volume count 3872.000000 3872.000000 3872.000000 3872.000000 3.872000e+03 mean 104.896814 105.956054 103.769349 104.882714 1.232250e+07 std 106.245511 107.303589 105.050064 106.168693 1.759665e+07 min 3.748967 4.102467 3.739664 4.083861 6.411000e+05 25% 22.347203 22.637997 22.034458 22.300391 3.529475e+06 50% 70.810079 71.375896 70.224002 70.856083 5.891750e+06 75% 147.688448 148.645373 146.822013 147.688438 1.319775e+07 max 392.653890 400.521479 389.747812 394.685730 3.953430e+08

By using .isna().sum() we can determine the missing values in the dataset. It seems that the dataset has no missing values.


dataset.isna().sum() Open 0 High 0 Low 0 Close 0 Volume 0 dtype: int64

The train_test_plot function takes three arguments: dataset, tstart, and tend and plots a simple line graph. The tstart and tend are time limits in years. We can change these arguments to analyze specific periods. The line plot is divided into two parts: train and test. This will allow us to decide the distribution of the test dataset.

MasterCard stock prices have been on the rise since 2016. It had a dip in the first quarter of 2020 but it gained a stable position in the latter half of the year. Our test dataset consists of one year, from 2021 to 2022, and the rest of the dataset is used for training.


tstart = 2016 tend = 2020 def train_test_plot(dataset, tstart, tend): dataset.loc[f"{tstart}":f"{tend}", "High"].plot(figsize=(16, 4), legend=True) dataset.loc[f"{tend+1}":, "High"].plot(figsize=(16, 4), legend=True) plt.legend([f"Train (Before {tend+1})", f"Test ({tend+1} and beyond)"]) plt.title("MasterCard stock price") plt.show() train_test_plot(dataset,tstart,tend)

Data Preprocessing

The train_test_split function divides the dataset into two subsets: training_set and test_set.


def train_test_split(dataset, tstart, tend): train = dataset.loc[f"{tstart}":f"{tend}", "High"].values test = dataset.loc[f"{tend+1}":, "High"].values return train, test training_set, test_set = train_test_split(dataset, tstart, tend)

We will use the MinMaxScaler function to standardize our training set, which will help us avoid the outliers or anomalies. You can also try using StandardScaler or any other scalar function to normalize your data and improve model performance.


sc = MinMaxScaler(feature_range=(0, 1)) training_set = training_set.reshape(-1, 1) training_set_scaled = sc.fit_transform(training_set)

The split_sequence function uses a training dataset and converts it into inputs (X_train) and outputs (y_train).

For example, if the sequence is [1,2,3,4,5,6,7,8,9,10,11,12] and the n_step is three, then it will convert the sequence into three input timestamps and one output as shown below:

1,2,3
2,3,4
3,4,5
4,5,6

In this project, we are using 60 n_steps. We can also reduce or increase the number of steps to optimize model performance.


def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): end_ix = i + n_steps if end_ix > len(sequence) - 1: break seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return np.array(X), np.array(y) n_steps = 60 features = 1 # split into samples X_train, y_train = split_sequence(training_set_scaled, n_steps)

We are working with univariate series, so the number of features is one, and we need to reshape the X_train to fit on the LSTM model. The X_train has [samples, timesteps], and we will reshape it to [samples, timesteps, features].


# Reshaping X_train for model X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],features)

LSTM Model

The model consists of a single hidden layer of LSTM and an output layer. You can experiment with the number of units, as more units will give you better results. For this experiment, we will set LSTM units to 125, tanh as activation, and set input size.

Author’s Note: Tensorflow library is user-friendly, so we don’t have to create LSTM or GRU models from scratch. We will simply use the LSTM or GRU modules to construct the model.

Finally, we will compile the model with an RMSprop optimizer and mean square error as a loss function.


# The LSTM architecture model_lstm = Sequential() model_lstm.add(LSTM(units=125, activation="tanh", input_shape=(n_steps, features))) model_lstm.add(Dense(units=1)) # Compiling the model model_lstm.compile(optimizer="RMSprop", loss="mse") model_lstm.summary() Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm (LSTM) (None, 125) 63500 _________________________________________________________________ dense (Dense) (None, 1) 126 ================================================================= Total params: 63,626 Trainable params: 63,626 Non-trainable params: 0 _________________________________________________________________

The model will train on 50 epochs with 32 batch sizes. You can change the hyperparameters to reduce training time or improve the results. The model training was successfully completed with the best possible loss.


model_lstm.fit(X_train, y_train, epochs=50, batch_size=32) Epoch 50/50 38/38 [==============================] - 1s 30ms/step - loss: 3.1642e-04

Results

We are going to repeat preprocessing and normalize the test set. First of all we will transform then split the dataset into samples, reshape it, predict, and inverse transform the predictions into standard form.


dataset_total = dataset.loc[:,"High"] inputs = dataset_total[len(dataset_total) - len(test_set) - n_steps :].values inputs = inputs.reshape(-1, 1) #scaling inputs = sc.transform(inputs) # Split into samples X_test, y_test = split_sequence(inputs, n_steps) # reshape X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], features) #prediction predicted_stock_price = model_lstm.predict(X_test) #inverse transform the values predicted_stock_price = sc.inverse_transform(predicted_stock_price)

The plot_predictions function will plot a real versus predicted line chart. This will help us visualize the difference between actual and predicted values.

The return_rmse function takes in test and predicted arguments and prints out the root mean square error (rmse) metric.


def plot_predictions(test, predicted): plt.plot(test, color="gray", label="Real") plt.plot(predicted, color="red", label="Predicted") plt.title("MasterCard Stock Price Prediction") plt.xlabel("Time") plt.ylabel("MasterCard Stock Price") plt.legend() plt.show() def return_rmse(test, predicted): rmse = np.sqrt(mean_squared_error(test, predicted)) print("The root mean squared error is {:.2f}.".format(rmse))

According to the line plot below, the single-layered LSTM model has performed well.


plot_predictions(test_set,predicted_stock_price)

The results look promising as the model got 6.70 rmse on the test dataset.


return_rmse(test_set,predicted_stock_price) >>> The root mean squared error is 6.70.

GRU Model

We are going to keep everything the same and just replace the LSTM layer with the GRU layer to properly compare the results. The model structure contains a single GRU layer with 125 units and an output layer.


model_gru = Sequential() model_gru.add(GRU(units=125, activation="tanh", input_shape=(n_steps, features))) model_gru.add(Dense(units=1)) # Compiling the RNN model_gru.compile(optimizer="RMSprop", loss="mse") model_gru.summary() Model: "sequential_5" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= gru_4 (GRU) (None, 125) 48000 _________________________________________________________________ dense_5 (Dense) (None, 1) 126 ================================================================= Total params: 48,126 Trainable params: 48,126 Non-trainable params: 0 _________________________________________________________________

The model has successfully trained with 50 epochs and a batch size of 32.


model_gru.fit(X_train, y_train, epochs=50, batch_size=32) Epoch 50/50 38/38 [==============================] - 1s 29ms/step - loss: 2.6691e-04

Results

As we can see, the real and predicted values are relatively close. The predicted line chart almost fits the actual values.


GRU_predicted_stock_price = model_gru.predict(X_test) GRU_predicted_stock_price = sc.inverse_transform(GRU_predicted_stock_price) plot_predictions(test_set, GRU_predicted_stock_price)

GRU model got 5.50 rmse on the test dataset, which is an improvement from the LSTM model.


return_rmse(test_set,GRU_predicted_stock_price) >>> The root mean squared error is 5.50.

Training Recurrent Neural Networks (RNN)

  • To train an RNN, the trick is to unroll it through time and then actually use regular backpropagation. This strategy is known as backpropagation through time (BPTT).
  • There’s a first forward pass via the unrolled network. Then the output sequence is evaluated with the use of a cost function C.
  • The gradients of that cost feature are then propagated backwards via the unrolled network.
  • Now the model parameters have updated the use of the gradients computed all through BPTT.
Introducing convolutional neural networks (ML Zero to Hero - Part 3)
Introducing convolutional neural networks (ML Zero to Hero – Part 3)

Limitations of RNN

Simple RNN models usually run into two major issues. These issues are related to gradient, which is the slope of the loss function along with the error function.

  1. Vanishing Gradient problem occurs when the gradient becomes so small that updating parameters becomes insignificant; eventually the algorithm stops learning.
  2. Exploding Gradient problem occurs when the gradient becomes too large, which makes the model unstable. In this case, larger error gradients accumulate, and the model weights become too large. This issue can cause longer training times and poor model performance.

The simple solution to these issues is to reduce the number of hidden layers within the neural network, which will reduce some complexity in RNNs. These issues can also be solved by using advanced RNN architectures such as LSTM and GRU.

Why do we need a Recurrent Neural Network (RNN)?

Recurrent Neural Network (RNN) allows you to model memory units to persist data and model short term dependencies. It is also used in time-series forecasting for the identification of data correlations and patterns. It also helps to produce predictive results for sequential data by delivering similar behavior as a human brain.

The structure of an Artificial Neural Network is relatively simple and is mainly about matrix multiplication. During the first step, inputs are multiplied by initially random weights, and bias, transformed with an activation function and the output values are used to make a prediction. This step gives an idea of how far the network is from the reality.

The metric applied is the loss. The higher the loss function, the dumber the model is. To improve the knowledge of the network, some optimization is required by adjusting the weights of the net. The stochastic gradient descent is the method employed to change the values of the weights in the rights direction. Once the adjustment is made, the network can use another batch of data to test its new knowledge.

The error, fortunately, is lower than before, yet not small enough. The optimization step is done iteratively until the error is minimized, i.e., no more information can be extracted.

The problem with this type of model is, it does not have any memory. It means the input and output are independent. In other words, the model does not care about what came before. It raises some question when you need to predict time series or sentences because the network needs to have information about the historical data or past words.

To overcome this issue, a new type of architecture has been developed: Recurrent Neural network (RNN hereafter)

Long Short-Term Memory for NLP (NLP Zero to Hero - Part 5)
Long Short-Term Memory for NLP (NLP Zero to Hero – Part 5)

Setup input pipeline

The IMDB large movie review dataset is a binary classification dataset—all the reviews have either a positive or negative sentiment.

Download the dataset using TFDS. See the loading text tutorial for details on how to load this sort of data manually.


dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True) train_dataset, test_dataset = dataset['train'], dataset['test'] train_dataset.element_spec

(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))

Initially this returns a dataset of (text, label pairs):


for example, label in train_dataset.take(1): print('text: ', example.numpy()) print('label: ', label.numpy())

text: b”This was an absolutely terrible movie. Don’t be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie’s ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor’s like Christopher Walken’s good name. I could barely sit through it.” label: 0

Next shuffle the data for training and create batches of these

(text, label)

pairs:


BUFFER_SIZE = 10000 BATCH_SIZE = 64


train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE) test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)


for example, label in train_dataset.take(1): print('texts: ', example.numpy()[:3]) print() print('labels: ', label.numpy()[:3])

texts: [b’Watching beautiful women sneaking around, playing cops and robbers is one of the most delightful guilty pleasures the medium film lets me enjoy. So The House on Carroll Street was not entirely a waste of time, although the story is contrived and the screenplay uninspired and somewhat irritating.There are many allusions to different Hitchcock pictures, not least the choice of Kelly McGillis in the starring role. She is dressed up as Grace Kelly, and she is not far off the mark. Not at all. But her character is not convincing. The way she is introduced to the audience, she should be someone with political convictions and a purpose in life. After all the movie deals with a clearly defined time period, true events and a specific issue. But the story degenerates within the first minutes into a sorry run-off-the-mill crime story with unbelievable coincidences, high predictability and a set of two dimensional characters. This is all the more regrettable, as the performances of the actors are good, as are the photography and the set design.The finale in Central Station, New York is breath taking. It starts in the subterranean section and then moves up to the roof. The movie can be praised for its good use of architecture.’ b’A group of people are invited to there high school reunion, but after they arrive they discover it to be a scam by an old classmate they played an almost fatal prank on. Now, he seeks to get revenge on all those that hurt him by sealing all the exits and cutting off all telephone lines.Dark slasher film with an unexceptional premise. Bringing it up a notch are a few good performances, some rather creative death scenes, plenty of excitement & scares, some humor and an original ending.Unrated for Extreme Violence, Graphic Nudity, Sexual Situations, Profanity and Drug Use.’ b’The short that starts this film is the true footage of a guy named Gary, apparently it was taken randomly in the parking lot of a television station where Gary works in the town of Beaver. Gary is a little “different”; he is an impersonator and drives an old Chevy named Farrah (after Fawcett). Lo and behold the filmmaker gets a letter from Gary some time later inviting him to return to Beaver to get some footage of the local talent contest he has put together, including Gary\’s staggering performace as Olivia Newton Dawn. Oh, my. The two shorts that follow are Gary\’s story, the same one you just witnessed only the first is portrayed by Sean Penn and the second by Crispin Glover titled “The Orkly Kid.” If you are in the mood for making fun of someone this is definitely the film to watch. I was doubled over with laughter through most of it, especially Crispins performance which could definitely stand on it\’s own. When it was over, I had to rewind the film to once again watch the real Gary and all his shining idiocy. Although Olivia was the focus, I would have liked to have seen one of the “fictitious” shorts take a jab at Gary\’s Barry Manilow impersonation, whic h was equally ridiculous.’] labels: [0 0 1]

Build an RNN to predict Time Series in TensorFlow

Now in this RNN training, it is time to build your first RNN to predict the series above. You need to specify some hyperparameters (the parameters of the model, i.e., number of neurons, etc.) for the model:

  • Number of input: 1
  • Time step (windows in time series): 10
  • Number of neurons: 120
  • Number of output: 1

Your network will learn from a sequence of 10 days and contain 120 recurrent neurons. You feed the model with one input, i.e., one day. Feel free to change the values to see if the model improved.

Before to construct the model, you need to split the dataset into a train set and test set. The full dataset has 222 data points; you will use the first 201 point to train the model and the last 21 points to test your model.

After you define a train and test set, you need to create an object containing the batches. In this batches, you have X values and Y values. Remember that the X values are one period lagged. Therefore, you use the first 200 observations and the time step is equal to 10. The X_batches object should contain 20 batches of size 10*1. The y_batches has the same shape as the X_batches object but with one period ahead.

Step 1) Create the train and test

First of all, you convert the series into a numpy array; then you define the windows (i.e., the number of time the network will learn from), the number of input, output and the size of the train set as shown in the TensorFlow RNN example below.

series = np.array(ts) n_windows = 20 n_input = 1 n_output = 1 size_train = 201

After that, you simply split the array into two datasets.

## Split data train = series[:size_train] test = series[size_train:] print(train.shape, test.shape) (201,) (21,)

Step 2) Create the function to return X_batches and y_batches

To make it easier, you can create a function that returns two different arrays, one for X_batches and one for y_batches.

Let’s write a RNN TensorFlow function to construct the batches.

Note that, the X batches are lagged by one period (we take value t-1). The output of the function should have three dimensions. The first dimensions equal the number of batches, the second the size of the windows and last one the number of input.

The tricky part is to select the data points correctly. For the X data points, you choose the observations from t = 1 to t =200, while for the Y data point, you return the observations from t = 2 to 201. Once you have the correct data points, it is straightforward to reshape the series.

To construct the object with the batches, you need to split the dataset into ten batches of equal length (i.e., 20). You can use the reshape method and pass -1 so that the series is similar to the batch size. The value 20 is the number of observations per batch and 1 is the number of input.

You need to do the same step but for the label.

Note that, you need to shift the data to the number of times you want to forecast. For instance, if you want to predict one timeahead, then you shift the series by 1. If you want to forecast two days, then shift the data by 2.

x_data = train[:size_train-1]: Select all the training instance minus one day X_batches = x_data.reshape(-1, windows, input): create the right shape for the batch e.g (10, 20, 1) def create_batches(df, windows, input, output): ## Create X x_data = train[:size_train-1] # Select the data X_batches = x_data.reshape(-1, windows, input) # Reshape the data ## Create y y_data = train[n_output:size_train] y_batches = y_data.reshape(-1, windows, output) return X_batches, y_batches

Now that the function is defined, you can call it to create the batches as shown in the below RNN example.

X_batches, y_batches = create_batches(df = train, windows = n_windows, input = n_input, output = n_output)

You can print the shape to make sure the dimensions are correct.

print(X_batches.shape, y_batches.shape) (10, 20, 1) (10, 20, 1)

You need to create the test set with only one batch of data and 20 observations.

Note that, you forecast days after days, it means the second predicted value will be based on the true value of the first day (t+1) of the test dataset. In fact, the true value will be known.

If you want to forecast t+2 (i.e., two days ahead), you need to use the predicted value t+1; if you’re going to predict t+3 (three days ahead), you need to use the predicted value t+1 and t+2. It makes sense that, it is difficult to predict accurately t+n days ahead.

X_test, y_test = create_batches(df = test, windows = 20,input = 1, output = 1) print(X_test.shape, y_test.shape) (10, 20, 1) (10, 20, 1)

Alright, your batch size is ready, you can build the RNN architecture. Remember, you have 120 recurrent neurons.

Step 3) Build the model

To create the model, you need to define three parts:

  1. The variable with the tensors
  2. The RNN
  3. The loss and optimization

Step 3.1) Variables

You need to specify the X and y variables with the appropriate shape. This step is trivial. The tensor has the same dimension as the objects X_batches and y_batches.

For instance, the tensor X is a placeholder (Check the tutorial on Introduction to Tensorflow to refresh your mind about variable declaration) has three dimensions:

  • Note: size of the batch
  • n_windows: Lenght of the windows. i.e., the number of time the model looks backward
  • n_input: Number of input

The result is:

tf.placeholder(tf.float32, [None, n_windows, n_input])

## 1. Construct the tensors X = tf.placeholder(tf.float32, [None, n_windows, n_input]) y = tf.placeholder(tf.float32, [None, n_windows, n_output])

Step 3.2) Create the RNN

In the second part of this RNN TensorFlow example, you need to define the architecture of the network. As before, you use the object BasicRNNCell and dynamic_rnn from TensorFlow estimator.

## 2. create the model basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=r_neuron, activation=tf.nn.relu) rnn_output, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

The next part is a bit trickier but allows faster computation. You need to transform the run output to a dense layer and then convert it again to have the same dimension as the input.

stacked_rnn_output = tf.reshape(rnn_output, [-1, r_neuron]) stacked_outputs = tf.layers.dense(stacked_rnn_output, n_output) outputs = tf.reshape(stacked_outputs, [-1, n_windows, n_output])

Step 3.3) Create the loss and optimization

The model optimization depends of the task you are performing. In the previous tutorial on CNN, your objective was to classify images, in this RNN tutorial, the objective is slightly different. You are asked to make a prediction on a continuous variable compare to a class.

This difference is important because it will change the optimization problem. The optimization problem for a continuous variable is to minimize the mean square error. To construct these metrics in TF, you can use:

  • tf.reduce_sum(tf.square(outputs – y))

The remaining of the RNN code is the same as before; you use an Adam optimizer to reduce the loss (i.e., MSE):

  • tf.train.AdamOptimizer(learning_rate=learning_rate)
  • optimizer.minimize(loss)

That’s it, you can pack everything together, and your model is ready to train.

tf.reset_default_graph() r_neuron = 120 ## 1. Construct the tensors X = tf.placeholder(tf.float32, [None, n_windows, n_input]) y = tf.placeholder(tf.float32, [None, n_windows, n_output]) ## 2. create the model basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=r_neuron, activation=tf.nn.relu) rnn_output, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32) stacked_rnn_output = tf.reshape(rnn_output, [-1, r_neuron]) stacked_outputs = tf.layers.dense(stacked_rnn_output, n_output) outputs = tf.reshape(stacked_outputs, [-1, n_windows, n_output]) ## 3. Loss + optimization learning_rate = 0.001 loss = tf.reduce_sum(tf.square(outputs – y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) training_op = optimizer.minimize(loss) init = tf.global_variables_initializer()

You will train the model using 1500 epochs and print the loss every 150 iterations. Once the model is trained, you evaluate the model on the test set and create an object containing the predictions as shown in the below Recurrent Neural Network example.

iteration = 1500 with tf.Session() as sess: init.run() for iters in range(iteration): sess.run(training_op, feed_dict={X: X_batches, y: y_batches}) if iters % 150 == 0: mse = loss.eval(feed_dict={X: X_batches, y: y_batches}) print(iters, “\tMSE:”, mse) y_pred = sess.run(outputs, feed_dict={X: X_test}) 0 MSE: 502893.34 150 MSE: 13839.129 300 MSE: 3964.835 450 MSE: 2619.885 600 MSE: 2418.772 750 MSE: 2110.5923 900 MSE: 1887.9644 1050 MSE: 1747.1377 1200 MSE: 1556.3398 1350 MSE: 1384.6113

At last in this RNN Deep Learning tutorial, you can plot the actual value of the series with the predicted value. If your model is corrected, the predicted values should be put on top of the actual values.

As you can see, the model has room of improvement. It is up to you to change the hyperparameters like the windows, the batch size of the number of recurrent neurons.

plt.title(“Forecast vs Actual”, fontsize=14) plt.plot(pd.Series(np.ravel(y_test)), “bo”, markersize=8, label=”Actual”, color=’green’) plt.plot(pd.Series(np.ravel(y_pred)), “r.”, markersize=8, label=”Forecast”, color=’red’) plt.legend(loc=”lower left”) plt.xlabel(“Time”) plt.show()

Recurrent Neural Networks | RNN LSTM Tutorial | Why use RNN | On Whiteboard | Compare ANN, CNN, RNN
Recurrent Neural Networks | RNN LSTM Tutorial | Why use RNN | On Whiteboard | Compare ANN, CNN, RNN

CNNRNN

The convolutional neural network (CNN) is a feed-forward neural network capable of processing spatial data. It is commonly used for computer vision applications such as image classification. The simple neural networks are good at simple binary classifications, but they can’t handle images with pixel dependencies. The CNN model architecture consists of convolutional layers, ReLU layers, pooling layers, and fully connected output layers. You can learn CNN by working on a project such as Convolutional Neural Networks in Python.

Key Differences Between CNN and RNN

  • CNN is applicable for sparse data like images. RNN is applicable for time series and sequential data.
  • While training the model, CNN uses a simple backpropagation and RNN uses backpropagation through time to calculate the loss.
  • RNN can have no restriction in length of inputs and outputs, but CNN has finite inputs and finite outputs.
  • CNN has a feedforward network and RNN works on loops to handle sequential data.
  • CNN can also be used for video and image processing. RNN is primarily used for speech and text analysis.

Two Issues of Standard RNNs

Vanishing Gradient Problem

Recurrent Neural Networks enable you to model time-dependent and sequential data problems, like stock exchange prediction, artificial intelligence, and text generation. you’ll find, however, RNN is tough to train due to the gradient problem.

RNNs suffer from the matter of vanishing gradients. The gradients carry information utilized in the RNN, and when the gradient becomes too small, the parameter updates become insignificant. This makes the training of long data sequences difficult.

Exploding Gradient Problem

While training a neural network, if the slope tends to grow exponentially rather than decaying, this is often called an Exploding Gradient. This problem arises when large error gradients accumulate, leading to very large updates to the neural network model weights during the training process.

Long training time, poor performance, and bad accuracy are the key issues in gradient problems.

Gradient Problem Solutions

Now, let’s discuss the foremost popular and efficient thanks to cope with gradient problems, i.e., Long immediate memory Network (LSTMs).First, let’s understand Long-Term Dependencies.Suppose you wish to predict the last word within the text: “The clouds are within the ______.”The most obvious answer to the present is that the “sky.” We don’t need from now on context to predict the last word within the above sentence.Consider this sentence: “I are staying in Spain for the last 10 years…I can speak fluent ______.”The word you are expecting will rely on the previous couple of words in context. Here, you would like the context of Spain to predict the last word within the text, and also the most fitted answer to the present sentence is “Spanish.” The gap between the relevant information and also the point where it’s needed may became very large. LSTMs facilitate your solve this problem.

Tự học Tensorflow | Bài 9.2 | Text classification với Recurrent Neural Network (RNN)
Tự học Tensorflow | Bài 9.2 | Text classification với Recurrent Neural Network (RNN)

Recurrent Neural Network Implementation with TensorFlow

In this section, we will learn how to implement recurrent neural network with TensorFlow.

Step 1 − TensorFlow includes various libraries for specific implementation of the recurrent neural network module.

#Import necessary modules from __future__ import print_function import tensorflow as tf from tensorflow.contrib import rnn from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets(“/tmp/data/”, one_hot = True)

As mentioned above, the libraries help in defining the input data, which forms the primary part of recurrent neural network implementation.

Step 2 − Our primary motive is to classify the images using a recurrent neural network, where we consider every image row as a sequence of pixels. MNIST image shape is specifically defined as 28*28 px. Now we will handle 28 sequences of 28 steps for each sample that is mentioned. We will define the input parameters to get the sequential pattern done.

n_input = 28 # MNIST data input with img shape 28*28 n_steps = 28 n_hidden = 128 n_classes = 10 # tf Graph input x = tf.placeholder(“float”, [None, n_steps, n_input]) y = tf.placeholder(“float”, [None, n_classes] weights = { ‘out’: tf.Variable(tf.random_normal([n_hidden, n_classes])) } biases = { ‘out’: tf.Variable(tf.random_normal([n_classes])) }

Step 3 − Compute the results using a defined function in RNN to get the best results. Here, each data shape is compared with current input shape and the results are computed to maintain the accuracy rate.

def RNN(x, weights, biases): x = tf.unstack(x, n_steps, 1) # Define a lstm cell with tensorflow lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0) # Get lstm cell output outputs, states = rnn.static_rnn(lstm_cell, x, dtype = tf.float32) # Linear activation, using rnn inner loop last output return tf.matmul(outputs[-1], weights[‘out’]) + biases[‘out’] pred = RNN(x, weights, biases) # Define loss and optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y)) optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost) # Evaluate model correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # Initializing the variables init = tf.global_variables_initializer()

Step 4 − In this step, we will launch the graph to get the computational results. This also helps in calculating the accuracy for test results.

with tf.Session() as sess: sess.run(init) step = 1 # Keep training until reach max iterations while step * batch_size < training_iters: batch_x, batch_y = mnist.train.next_batch(batch_size) batch_x = batch_x.reshape((batch_size, n_steps, n_input)) sess.run(optimizer, feed_dict={x: batch_x, y: batch_y}) if step % display_step == 0: # Calculate batch accuracy acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y}) # Calculate batch loss loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y}) print(“Iter ” + str(step*batch_size) + “, Minibatch Loss= ” + \ “{:.6f}”.format(loss) + “, Training Accuracy= ” + \ “{:.5f}”.format(acc)) step += 1 print(“Optimization Finished!”) test_len = 128 test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input)) test_label = mnist.test.labels[:test_len] print(“Testing Accuracy:”, \ sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

The screenshots below show the output generated −

Python3


import


warnings


from


tensorflow.keras.utils


import


pad_sequences


from


tensorflow.keras.preprocessing.text


import


Tokenizer


from


sklearn.model_selection


import


train_test_split


import


tensorflow as tf


from


tensorflow


import


keras


import


pandas as pd


import


matplotlib.pyplot as plt


import


seaborn as sns


import


plotly.express as px


import


numpy as np


import


re


import


nltk


nltk.download(


'all'


from


nltk.corpus


import


stopwords


from


nltk.tokenize


import


word_tokenize


from


nltk.stem


import


WordNetLemmatizer


lemm


WordNetLemmatizer()


warnings.filterwarnings(


"ignore"

Now let’s load the dataset using panda’s library. You can download the dataset used in this article from here.

Types of RNN | Recurrent Neural Network Types | Deep Learning Tutorial 34 (Tensorflow & Python)
Types of RNN | Recurrent Neural Network Types | Deep Learning Tutorial 34 (Tensorflow & Python)

Types of Recurrent Neural Networks

There are four types of Recurrent Neural Networks:

  1. One to One
  2. One to Many
  3. Many to One
  4. Many to Many

One to One RNN

This type of neural network is understood because the Vanilla Neural Network. It’s used for general machine learning problems, which contains a single input and one output.

One to Many RNN

This type of neural network incorporates a single input and multiple outputs. An example of this is often the image caption.

Many to One RNN

This RNN takes a sequence of inputs and generates one output. Sentiment analysis may be a example of this sort of network where a given sentence are often classified as expressing positive or negative sentiments.

Many to Many RNN

This RNN takes a sequence of inputs and generates a sequence of outputs. artificial intelligence is one among the examples.

Stack two or more LSTM layers

Keras recurrent layers have two available modes that are controlled by the

return_sequences

constructor argument:

Here is what the flow of information looks like with

return_sequences=True

:

The interesting thing about using an

RNN

with

return_sequences=True

is that the output still has 3-axes, like the input, so it can be passed to another RNN layer, like this:


model = tf.keras.Sequential([ encoder, tf.keras.layers.Embedding(len(encoder.get_vocabulary()), 64, mask_zero=True), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(1) ])


model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy'])


history = model.fit(train_dataset, epochs=10, validation_data=test_dataset, validation_steps=30)

Epoch 1/10 391/391 [==============================] – 66s 131ms/step – loss: 0.6284 – accuracy: 0.5935 – val_loss: 0.4341 – val_accuracy: 0.8031 Epoch 2/10 391/391 [==============================] – 40s 101ms/step – loss: 0.3818 – accuracy: 0.8336 – val_loss: 0.3429 – val_accuracy: 0.8474 Epoch 3/10 391/391 [==============================] – 40s 100ms/step – loss: 0.3369 – accuracy: 0.8557 – val_loss: 0.3489 – val_accuracy: 0.8458 Epoch 4/10 391/391 [==============================] – 40s 101ms/step – loss: 0.3265 – accuracy: 0.8590 – val_loss: 0.3239 – val_accuracy: 0.8589 Epoch 5/10 391/391 [==============================] – 39s 99ms/step – loss: 0.3123 – accuracy: 0.8678 – val_loss: 0.3265 – val_accuracy: 0.8500 Epoch 6/10 391/391 [==============================] – 40s 100ms/step – loss: 0.3072 – accuracy: 0.8690 – val_loss: 0.3242 – val_accuracy: 0.8604 Epoch 7/10 391/391 [==============================] – 40s 100ms/step – loss: 0.3060 – accuracy: 0.8673 – val_loss: 0.3211 – val_accuracy: 0.8464 Epoch 8/10 391/391 [==============================] – 40s 100ms/step – loss: 0.3011 – accuracy: 0.8724 – val_loss: 0.3169 – val_accuracy: 0.8531 Epoch 9/10 391/391 [==============================] – 39s 99ms/step – loss: 0.2973 – accuracy: 0.8717 – val_loss: 0.3248 – val_accuracy: 0.8635 Epoch 10/10 391/391 [==============================] – 40s 100ms/step – loss: 0.2953 – accuracy: 0.8734 – val_loss: 0.3242 – val_accuracy: 0.8672


test_loss, test_acc = model.evaluate(test_dataset) print('Test Loss:', test_loss) print('Test Accuracy:', test_acc)

391/391 [==============================] – 17s 42ms/step – loss: 0.3255 – accuracy: 0.8652 Test Loss: 0.325457364320755 Test Accuracy: 0.8652399778366089


# predict on a sample text without padding. sample_text = ('The movie was not good. The animation and the graphics ' 'were terrible. I would not recommend this movie.') predictions = model.predict(np.array([sample_text])) print(predictions)

1/1 [==============================] – 4s 4s/step [[-1.6299357]]


plt.figure(figsize=(16, 6)) plt.subplot(1, 2, 1) plot_graphs(history, 'accuracy') plt.subplot(1, 2, 2) plot_graphs(history, 'loss')

Check out other existing recurrent layers such as GRU layers.

If you’re interested in building custom RNNs, see the Keras RNN Guide.

In this article, we shall train an RNN i.e., Recurrent Neural Networks(RNN) in TensorFlow. TensorFlow makes it effortless to build a Recurrent Neural Network without performing its mathematics calculations. Compare to other Deep Learning frameworks, TensorFlow is the easiest way to build and train a Recurrent Neural Network.

Introduction to Recurrent Neural Networks in Python, Keras, and TensorFlow
Introduction to Recurrent Neural Networks in Python, Keras, and TensorFlow

Python3


data


pd.read_csv(


"Clothing Reviews.csv"


data.head(


print


(data.shape)


data


data[data[


'Class Name'


].isnull()


False

Output:

First five rows of the dataset

(23486, 11)

Exploratory Data Analysis

EDA is the most crucial step, that you should not skip while analyzing the data. EDA helps one understand how the data is distributed. To perform EDA, one must perform various visualizing techniques so that one can understand the data before building a model.

Table of Contents

  • Introduction to Recurrent Neural Networks with Keras and TensorFlow
  • Introduction
  • Configuring Your Development Environment
  • Having Problems Configuring Your Development Environment?
  • Project Structure
  • What Are Sequential Data
  • A Caveat: Masking and Padding
  • Modeling Sequential Data with MLPs
  • The Recurrence Formula
  • Recurrent Neural Network (an overview)
  • Training and Visualizations
  • Loading and Inference
  • Summary
Recurrent Neural Network (RNN) Tutorial | RNN LSTM Tutorial | Deep Learning Tutorial | Simplilearn
Recurrent Neural Network (RNN) Tutorial | RNN LSTM Tutorial | Deep Learning Tutorial | Simplilearn

Python3


def


toLower(data):


if


isinstance


(data,


float


):


return


'


else


return


data.lower()


stop_words


stopwords.words(


"english"


def


remove_stopwords(text):


no_stop


[]


for


word


in


text.split(


' '


):


if


word


not


in


stop_words:


no_stop.append(word)


return


" "


.join(no_stop)


def


remove_punctuation_func(text):


return


re.sub(r


'[^a-zA-Z0-9]'


' '


, text)


X[


'Title'


X[


'Title'


].


apply


(toLower)


X[


'Review Text'


X[


'Review Text'


].


apply


(toLower)


X[


'Title'


X[


'Title'


].


apply


(remove_stopwords)


X[


'Review Text'


X[


'Review Text'


].


apply


(remove_stopwords)


X[


'Title'


X[


'Title'


].


apply


lambda


x: lemm.lemmatize(x))


X[


'Review Text'


X[


'Review Text'


].


apply


lambda


x: lemm.lemmatize(x))


X[


'Title'


X[


'Title'


].


apply


(remove_punctuation_func)


X[


'Review Text'


X[


'Review Text'


].


apply


(remove_punctuation_func)


X[


'Text'


list


(X[


'Title'


X[


'Review Text'


X[


'Class Name'


])


X_train, X_test, y_train, y_test


train_test_split(


X[


'Text'


], y, test_size


0.25


, random_state


42

If you notice at the end of the code, we have created a new column “Text” which is of type list. The reason we did this is that we need to perform Tokenization on the entire feature taken to train the model.

Tokenization

In Tokenization, we convert the text into Vectors. Keras API supports text pre-processing. This API consists of Tokenizer that takes in the total num_words to create the Word index. OOV stands for out of vocabulary, this is triggered when new text is encountered. Also, remember that we fit_on_texts only on training data and not testing.

Introduction to Recurrent Neural Networks with Keras and TensorFlow

Introduction

Imagine you have been employed by a movie critique firm. Movies receive a lot of reviews from all over the globe. Your mission, should you choose to accept it, is to predict each review’s sentiment to catch the audience’s drift.

The task is simple, given a movie review, classify it as either a positive review or a negative review. Now, as it happens, this task is known as Sentiment Classification in the Deep Learning World.

Don’t confuse this with a computer vision problem. We are not reading facial expressions by employing our old friend OpenCV. Here we deal with text data. Specifically volumes of text data. To mimic the task, we chose imdb_reviews, a 25,000 highly polar movie review dataset.

Configuring Your Development Environment

To follow this guide, you need to have the TensorFlow and the TensorFlow Datasets library installed on your system.

Luckily, both are pip-installable:

$ pip install tensorflow $ pip install tensorflow_datasets $ pip install matplotlib

Having Problems Configuring Your Development Environment?

All that said, are you:

  • Short on time?
  • Learning on your employer’s administratively locked system?
  • Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
  • Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

We first need to review our project directory structure.

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.

From there, take a look at the directory structure:

$ tree -dirsfirst . |____ output | |____ lstm_plot.png | |____ rnn_plot.png |____ pyimagesearch | |____ plot.py | |____ save_load.py | |____ config.py | |____ standardization.py | |____ __init__.py | |____ model.py | |____ dataset.py |____ train.py |____ inference.py |____ terminal_output.txt

In the

pyimagesearch

directory, we have:


  • plot.py

    : Script to help us visualize outputs.

  • save_load.py

    : Script to load and save model weights.

  • config.py

    : Script containing the entire configuration pipeline.

  • standardization.py

    : Script containing utilities to help us prepare the data.

  • __init__.py

    : Script, which turns the directory into a python package.

  • model.py

    : Script housing the model.

  • dataset.py

    : Script to help us load the data to our project.

In the core directory, we have two scripts:


  • train.py

    : Script to train the RNN model.

  • inference.py

    : Script to draw inference from our

Note: The code download for this blog post contains code snippets for Long Short-Term Memory (LSTM) as well. These will be covered in the following blog post on LSTM.

What Are Sequential Data

Before we go into movie reviews and understand their sentiment, we first need to understand the data.

We are all Computer Vision engineers here and know how images are an array of numbers. But how do we interpret a corpus of text?

If we think about this, all texts can be easily represented as a sequence of characters, as shown in Figure 2. Notice how text is not just a collection but a sequence of characters. This signifies that the characters are equally as important as the order in which they reside.

Any data where the order or sequence is as essential as the data itself is called Sequential Data. Some examples of Sequential Data are sentences, stock market data, audio data, etc.

Let us now try to understand how movie reviews relate to Sequential Data. We first open the

dataset.py

python file, which helps load the dataset on disk.

# import the necessary packages import tensorflow_datasets as tfds

We begin with the necessary imports with Line 2. The dataset we will use (imdb_reviews) is already in the

tensorflow_datasets

package.

Next, we define the

get_imdb_dataset

function to load the dataset on the disk.

def get_imdb_dataset(folderName, batchSize, bufferSize, autotune, test=False): # check whether the test flag is true if test: # load the test dataset, batch it, and prefetch it testDs = tfds.load( name=”imdb_reviews”, data_dir=folderName, as_supervised=True, shuffle_files=True, split=”test” ) testDs = testDs.batch(batchSize).prefetch(autotune) # return the test dataset return testDs # otherwise we will be loading the training and validation dataset else: # load the training and validation dataset (trainDs, valDs) = tfds.load( name=”imdb_reviews”, data_dir=folderName, as_supervised=True, shuffle_files=True, split=[“train[:90%]”, “train[90%:]”] ) # shuffle, batch, and prefetch the train and the validation # dataset trainDs = (trainDs .shuffle(bufferSize) .batch(batchSize) .prefetch(autotune) ) valDs = (valDs .shuffle(bufferSize) .batch(batchSize) .prefetch(autotune) ) # return the train and the validation dataset return (trainDs, valDs)

This function takes in the following inputs:


  • folderName

    : the path on the local system to which the dataset will be downloaded

  • batchSize

    : the size in which we want to batch our data

  • bufferSize

    : the size of the buffer from which elements are randomly selected

  • autotune

    : a constant provided by the

    tf.data

    API for space optimization while prefetching

  • test

    : a Boolean flag used to determine if the dataset to be loaded is for testing or training purposes

Lines 7-16 execute when the

test

flag is set to

True

. This code snippet downloads (or uses the cached) test split of the dataset, batches, and prefetches it.

Lines 21-45 execute when the

test

flag is set to

False

. This means that this dataset will be used for training. The code snippet downloads (or uses the cached) train and validation split of the dataset, shuffles, batches, and prefetches it.

The only difference between the two clauses (training and testing) is that we shuffle the training dataset while keeping the testing dataset.

But having the data at hand, loading, batching, and prefetching it is not enough. Primarily because the data looks like this, as shown in Figure 3.

To make this data usable:

  • We need to remove the unnecessary characters (standardization)
  • Tokenize the dataset
  • Vectorize the tokens

We will follow each of the steps gradually. First, let us see how to standardize the dataset for our purposes.

But what is standardization? It removes unnecessary punctuations and HTML tags from the text corpus. It is a pre-processing step that is very important in any text data pipeline.

We create a custom standardization function in the

standardization.py

file.

# import the necessary packages import tensorflow as tf import string import re def custom_standardization(inputData): # transform everything to lowercase lowercase = tf.strings.lower(inputData) # strip off the html break point and punctuations and return it strippedHtml = tf.strings.regex_replace(lowercase, “”, ” “) strippedPunctuation = tf.strings.regex_replace(strippedHtml, f”[{re.escape(string.punctuation)}]”, “”) return strippedPunctuation

On Lines 2-4, we import the necessary packages needed.

Next, we define the

custom_standardization

function to standardize our dataset. The dataset is first converted to lowercase on Line 8. The HTML tags, spaces, and punctuations are removed on Lines 11-13, and finally, the standardized dataset is returned on Line 14.

After standardizing our text dataset, the next step is to tokenize and vectorize it. Tokenization refers to the process of splitting tokens from the dataset into units. A token can be a character, a word, or a sentence. They are created according to the task at hand.

For our sentiment classifier, we will consider a token to be a word.

Can we feed tokens to the deep learning model after tokenization? Not just yet.

We still need to convert these tokens into numbers. The process of representing tokens into numbers is called text vectorization. With vectorization, each token (word) in our text corpus will be represented by a number.

The following are the basic steps of text vectorization:

  • We will create a dictionary of all the unique words from the text corpus. Assign unique numbers for every word. This dictionary is called the vocabulary.
  • Now we will substitute the tokens (words) in the dataset with their respective unique numbers as in the vocabulary dictionary.

The entire process is demonstrated in Figure 4.

Deep Learning with Tensorflow - The Recurrent Neural Network Model
Deep Learning with Tensorflow – The Recurrent Neural Network Model

Recurrent Neural Networks in TensorFlow

Recurrent Neural Network is different from Convolution Neural Network and Artificial Neural Network. A Neural Network is basically known to be trained to learn deep features to make accurate predictions. Whereas Recurrent Neural Network works in such a way that there is feedback between each node which stores information in the cell state. In simple words, Recurrent Neural Networks are the process of backpropagating through time.

In the later part of the article, we all discuss why to use Bidirectional RNN Gated Architecture. To implement the training of Recurrent Neural Networks (RNN) in TensorFlow, let’s work on some real-time NLP projects.

Importing Libraries and Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib/Seaborn – This library is used to draw visualizations.
  • TensorFlow – Import TensorFlow and Keras API that comes installed with TensorFlow. Keras API helps in building a Neural Network in just a few lines of code.
  • NLTK – Natural Language Processing Toolkit comes in very handy while handling raw textual data.
  • Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.

Generate text

The simplest way to generate text with this model is to run it in a loop, and keep track of the model’s internal state as you execute it.

Each time you call the model you pass in some text and an internal state. The model returns a prediction for the next character and its new state. Pass the prediction and state back in to continue generating text.

The following makes a single step prediction:


class OneStep(tf.keras.Model): def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0): super().__init__() self.temperature = temperature self.model = model self.chars_from_ids = chars_from_ids self.ids_from_chars = ids_from_chars # Create a mask to prevent "[UNK]" from being generated. skip_ids = self.ids_from_chars(['[UNK]'])[:, None] sparse_mask = tf.SparseTensor( # Put a -inf at each bad index. values=[-float('inf')]*len(skip_ids), indices=skip_ids, # Match the shape to the vocabulary dense_shape=[len(ids_from_chars.get_vocabulary())]) self.prediction_mask = tf.sparse.to_dense(sparse_mask) @tf.function def generate_one_step(self, inputs, states=None): # Convert strings to token IDs. input_chars = tf.strings.unicode_split(inputs, 'UTF-8') input_ids = self.ids_from_chars(input_chars).to_tensor() # Run the model. # predicted_logits.shape is [batch, char, next_char_logits] predicted_logits, states = self.model(inputs=input_ids, states=states, return_state=True) # Only use the last prediction. predicted_logits = predicted_logits[:, -1, :] predicted_logits = predicted_logits/self.temperature # Apply the prediction mask: prevent "[UNK]" from being generated. predicted_logits = predicted_logits + self.prediction_mask # Sample the output logits to generate token IDs. predicted_ids = tf.random.categorical(predicted_logits, num_samples=1) predicted_ids = tf.squeeze(predicted_ids, axis=-1) # Convert from token ids to characters predicted_chars = self.chars_from_ids(predicted_ids) # Return the characters and model state. return predicted_chars, states


one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

Run it in a loop to generate some text. Looking at the generated text, you’ll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.


start = time.time() states = None next_char = tf.constant(['ROMEO:']) result = [next_char] for n in range(1000): next_char, states = one_step_model.generate_one_step(next_char, states=states) result.append(next_char) result = tf.strings.join(result) end = time.time() print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80) print('\nRun time:', end - start)

ROMEO: The dayning use your brodous parchemn a memaver But to my shrow against it. CURTIS: Do you think of’t, leaving too? QUEEN ELIZABETH: It is, sir, let’s see: to put my freedy-sorrow In the resolution of the fell as of; And she shall to the fish, whose parts-have drey’d the process or owe we eptits the abtent, That they have often been men asiel, as now I lay stoly to none of your adversary title. Nay, stay, what, nurse, shall I respect by him. Assisted with, Hadst thou depart; we should have seen some name of meat: Might from this coast was though his purpose, and they can ffor me to And lack upon your royal king. DUKE VINCENTIO: It is now pale; but not a man with winds, And breathed sunshine way: in give of less, the nurse In thy extreme budden and his wives. One more, most noble friend. Third Servingman: I have no more of it. FRIAR LAURENCE: O, she is found withal. Hark ye and given me thence at we? or die among thes? Evermother, Thomas, Duke old York and him, So fast aspected: s ________________________________________________________________________________ Run time: 2.878746271133423

The easiest thing you can do to improve the results is to train it for longer (try

EPOCHS = 30

).

You can also experiment with a different start string, try adding another RNN layer to improve the model’s accuracy, or adjust the temperature parameter to generate more or less random predictions.

If you want the model to generate text faster the easiest thing you can do is batch the text generation. In the example below the model generates 5 outputs in about the same time it took to generate 1 above.


start = time.time() states = None next_char = tf.constant(['ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:']) result = [next_char] for n in range(1000): next_char, states = one_step_model.generate_one_step(next_char, states=states) result.append(next_char) result = tf.strings.join(result) end = time.time() print(result, '\n\n' + '_'*80) print('\nRun time:', end - start)

tf.Tensor( [b”ROMEO:\nFirst, heaven leave us: O, rest thy wild,\nI will my father dead?\n\nTRANIO:\nBut did you send us run; lay, fool!\nI would the world say no; what brought to? hear?\n\nKATHARINA:\nFear you the heavess to take away the time\nof a kiss of sold murder, to keep this luck\nWhose uncle branches order, I cry you\ndo, and the moting father of the like days\nThey shall seem from reason, whilst I live or else\nTo raise his writing fallent to speak.\nBut, Clifford, he is gone unto these wents were faults.\n\nLUCENTIO:\nAh, Warwick, art thou hear, the worst of in my false\nMoth thy meaning brave through the seas.\nIn what occasion not thyself?\n\nMONTAGUE:\nGood queen; Antigons, and brave amazed at the\ngreaty will out-plane. Calm those that valiant quench,\nthough us’t! it is your baseness. Come, sit down;\nFor, madam: son, away! Now, by the worst.\n\nAll:\nTeems as it hath mest.\n\nDORSET:\nAnd love as you; I could heed home.\nAlas! I needs must out, as thou art a\nfraitors hate than for the year, and the conquer’d bowes\nIn gro” b”ROMEO:\n\nJULIET:\nAs well return.\n\nShepherd:\nBut, good my lord,\nThe Duke of Clarence hath so\n’t born: but all, within mine own absence\nI married my face, moved nor boy;\nAnd I from Deepany, advangal keeps and cheer\nOf heaven nor power, at some thread.\nThat English jedwats shall not stuck a dream;\nFairer that proud thoughts about the abonty of the crown.\n\nJULIET:\nThen, I may have now, she you seal the story\nWhere nowing an oath with lips.\n\nKING RICHARD II:\nDiscourse of any good conceit? Abas!\nMy gracious sovereitn, ladies that vaunts in\nhis wonder age; but yet we should hear\nMy words shed by, and thought on sue the word,\nFor, lords, to-marry my greatness: if my trotubous lies?\n\nLADY GREY:\nTo his submission. Lady and Darb.\n\nDUCHESS OF YORK:\nWouldst thou go? then I’ll judge my tongue,\nAnd graced trembling of the widdows, or\nshe’s the cause? a coucle’s hand thither? O!\nBelance oph?\n\nESCALUS:\nAy; for thou wert kill’d\nwhen the wall’s power I could’s high a peal: be corn\nAt o’cropph the instrument of ” b”ROMEO:\nOr else be swellest Edward: if this long-impellow:\nAll unacounted said Lecious honour have many sortons\nThat first we all go men.\n\nDUKE OF YORK:\nWelcome, my lord, widow!\n\nALONS:\nIf it be not; she was not made a queen,\n’This people will still have mine eyes to heaven from his wife.\n\nTHOMAS MOWBRAY:\nAy, ay.\n\nCitizens:\nDown, ladies,–which is my name is Edward.\nWhen yet she finds alone.\n\nBRAKENBURIO:\nBoth, young and older,, marry; nor my countenance, I did see\nForbidst unreasonably of our queen,–\n\nSTAN:\nI know, I thank thee, knew whose heads the rest,\nBut that the rodut of this speech,\nIs this of noble things you have.\n\nDUKE OF YORK:\nAy, so: you are hereafter, it may make you an\na-bow-forth. But O, pity!\nMake your affections are up, Signior Lucentio.\n\nLUCENTIO:\nIt may not pass:\nThen we shall be most well. but drawly pay\nMe all the world’s shore: Alack the dire dream of your sword,\nOne that our kingdom to the head of monsminy.\n\nPRINCE EDWARD:\nAn is her fawling! here’s your done,\nHe Canst” b”ROMEO:\nLet me see: there were heard.\n\nGLOUCESTER:\n\nKING EDWARD IV:\nAn beast my wife’s will obey: how meaning in him, sin\nJest, and thou shalt smile at the fashion;\nFor inholence my most courtiesy, Warwick, steble,\nThou mayst not, sawn, Kate, and then Oxford’s vast\nIs sworn to lay at honours on my head\nOf her travel the curses; but we shall bear it.\n\nROMEO:\nNot one would please, as I brother!\n\nCLAUDIO:\nPett up, I,\nWe cannot white of this.\n\nRIVERS:\nAy, for once whither that’s no remembrance?\nThou ammonest not most partly keeps, new,\nShould we were cross-wounded to sadver a lord,\nThe horn and holy thank your soldiers arm,\nYou advice for his head to Margar to resign his cloace:\nSure thy wife indeed are womanish.\n\nLUCENTIO:\nThen, good my lord, what talk I possess;\nAnd for the county will he did bear\nOf free speech; and will assect thee,\nThat stands the thrawhilderful: whom I do, an either painted stain\nThe spirits of their power. If you requires a mettain,\nYour mess arrormmeth in either. But we m” b”ROMEO:\nWhy, so; farewell: one in, another death or dread\nWhen he were leons.\nSo first less thou the time ’twere my badied known,\nChequering to you.\n\nShepherd:\nGo heaven and false, she’s highbell’d.\n\nGLOUCESTER:\nCome, go with you, be with him, if the condaction tears\nHe thither was post to revenge,\nAnd thither shall we could grace to the white:\nThou art to-more oraple! Rise of it!\n\nCOMINIUS:\nKeere yes; if we all shall fork\nAs every cousin, drawn.\n\nPRINCE EDWARD:\nLet me unkiss! my master is safe, Furils, take your power\nTo embraceous have at the goor humour and as yours,\nAnd pluck my babes against your tears, and there before the\ncut; or if she were hered betwixt his ease.’\n\nGLOUCESTER:\nHe reverend guilt return.\n\nPARIS:\nIn vain for your name\nIs not denied and drooping too.\n\nQUEEN MARGARET:\nWhat, thou? whence could not, how?\nSail note or two!\nWho stands the matter:–Now, payied and recensit, anst\nthen to fear every end on the matom, I\nam proud to meet your gate upon her mastray’s.\n\nFLORIZEL:\nDo”], shape=(5,), dtype=string) ________________________________________________________________________________ Run time: 2.8853743076324463

TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial
TensorFlow 2.0 Complete Course – Python Neural Networks for Beginners Tutorial

Improvement LSTM

To overcome the potential issue of vanishing gradient faced by RNN, three researchers, Hochreiter, Schmidhuber and Bengio improved the RNN with an architecture called Long Short-Term Memory (LSTM). In brief, LSMT provides to the network relevant past information to more recent times. The machine uses a better architecture to select and carry information back to later time.

LSTM architecture is available in TensorFlow, tf.contrib.rnn.LSTMCell. LSTM is out of the scope of the tutorial. You can refer to the official documentation for further information

RNNs with list/dict inputs, or nested inputs

Nested structures allow implementers to include more information within a single timestep. For example, a video frame could have audio and video input at the same time. The data shape in this case could be:


[batch, timestep, {"video": [height, width, channel], "audio": [frequency]}]

In another example, handwriting data could have both coordinates x and y for the current position of the pen, as well as pressure information. So the data representation could be:


[batch, timestep, {"location": [x, y], "pressure": [force]}]

The following code provides an example of how to build a custom RNN cell that accepts such structured inputs.

Define a custom cell that supports nested input/output

See Making new Layers & Models via subclassing for details on writing your own layers.


@keras.saving.register_keras_serializable() class NestedCell(keras.layers.Layer): def __init__(self, unit_1, unit_2, unit_3, **kwargs): self.unit_1 = unit_1 self.unit_2 = unit_2 self.unit_3 = unit_3 self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])] self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])] super().__init__(**kwargs) def build(self, input_shapes): # expect input_shape to contain 2 items, [(batch, i1), (batch, i2, i3)] i1 = input_shapes[0][1] i2 = input_shapes[1][1] i3 = input_shapes[1][2] self.kernel_1 = self.add_weight( shape=(i1, self.unit_1), initializer="uniform", name="kernel_1" ) self.kernel_2_3 = self.add_weight( shape=(i2, i3, self.unit_2, self.unit_3), initializer="uniform", name="kernel_2_3", ) def call(self, inputs, states): # inputs should be in [(batch, input_1), (batch, input_2, input_3)] # state should be in shape [(batch, unit_1), (batch, unit_2, unit_3)] input_1, input_2 = tf.nest.flatten(inputs) s1, s2 = states output_1 = tf.matmul(input_1, self.kernel_1) output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3) state_1 = s1 + output_1 state_2_3 = s2 + output_2_3 output = (output_1, output_2_3) new_states = (state_1, state_2_3) return output, new_states def get_config(self): return {"unit_1": self.unit_1, "unit_2": self.unit_2, "unit_3": self.unit_3}

Build a RNN model with nested input/output

Let’s build a Keras model that uses a

keras.layers.RNN

layer and the custom cell
we just defined.


unit_1 = 10 unit_2 = 20 unit_3 = 30 i1 = 32 i2 = 64 i3 = 32 batch_size = 64 num_batches = 10 timestep = 50 cell = NestedCell(unit_1, unit_2, unit_3) rnn = keras.layers.RNN(cell) input_1 = keras.Input((None, i1)) input_2 = keras.Input((None, i2, i3)) outputs = rnn((input_1, input_2)) model = keras.models.Model([input_1, input_2], outputs) model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])

Train the model with randomly generated data

Since there isn’t a good candidate dataset for this model, we use random Numpy data for demonstration.


input_1_data = np.random.random((batch_size * num_batches, timestep, i1)) input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3)) target_1_data = np.random.random((batch_size * num_batches, unit_1)) target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3)) input_data = [input_1_data, input_2_data] target_data = [target_1_data, target_2_data] model.fit(input_data, target_data, batch_size=batch_size)

10/10 [==============================] – 1s 27ms/step – loss: 0.7623 – rnn_1_loss: 0.2873 – rnn_1_1_loss: 0.4750 – rnn_1_accuracy: 0.1016 – rnn_1_1_accuracy: 0.0350

With the Keras

keras.layers.RNN

layer, You are only expected to define the math
logic for individual step within the sequence, and the

keras.layers.RNN

layer
will handle the sequence iteration for you. It’s an incredibly powerful way to quickly
prototype new kinds of RNNs (e.g. a LSTM variant).

For more details, please visit the API docs.

This text classification tutorial trains a recurrent neural network on the IMDB large movie review dataset for sentiment analysis.

Recurrent Neural Networks (RNN) - Deep Learning with Neural Networks and TensorFlow 10
Recurrent Neural Networks (RNN) – Deep Learning with Neural Networks and TensorFlow 10

Data windowing

The models in this tutorial will make a set of predictions based on a window of consecutive samples from the data.

The main features of the input windows are:

  • The width (number of time steps) of the input and label windows.
  • The time offset between them.
  • Which features are used as inputs, labels, or both.

This tutorial builds a variety of models (including Linear, DNN, CNN and RNN models), and uses them for both:

  • Single-output, and multi-output predictions.
  • Single-time-step and multi-time-step predictions.

This section focuses on implementing the data windowing so that it can be reused for all of those models.

Depending on the task and type of model you may want to generate a variety of data windows. Here are some examples:

  1. For example, to make a single prediction 24 hours into the future, given 24 hours of history, you might define a window like this:

  2. A model that makes a prediction one hour into the future, given six hours of history, would need a window like this:

The rest of this section defines a

WindowGenerator

class. This class can:

  1. Handle the indexes and offsets as shown in the diagrams above.
  2. Split windows of features into

    (features, labels)

    pairs.
  3. Plot the content of the resulting windows.
  4. Efficiently generate batches of these windows from the training, evaluation, and test data, using

    tf.data.Dataset

    s.

Indexes and offsets

Start by creating the

WindowGenerator

class. The

__init__

method includes all the necessary logic for the input and label indices.

It also takes the training, evaluation, and test DataFrames as input. These will be converted to

tf.data.Dataset

s of windows later.


class WindowGenerator(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): # Store the raw data. self.train_df = train_df self.val_df = val_df self.test_df = test_df # Work out the label column indices. self.label_columns = label_columns if label_columns is not None: self.label_columns_indices = {name: i for i, name in enumerate(label_columns)} self.column_indices = {name: i for i, name in enumerate(train_df.columns)} # Work out the window parameters. self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) self.input_indices = np.arange(self.total_window_size)[self.input_slice] self.label_start = self.total_window_size - self.label_width self.labels_slice = slice(self.label_start, None) self.label_indices = np.arange(self.total_window_size)[self.labels_slice] def __repr__(self): return '\n'.join([ f'Total window size: {self.total_window_size}', f'Input indices: {self.input_indices}', f'Label indices: {self.label_indices}', f'Label column name(s): {self.label_columns}'])

Here is code to create the 2 windows shown in the diagrams at the start of this section:


w1 = WindowGenerator(input_width=24, label_width=1, shift=24, label_columns=['T (degC)']) w1

Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [47] Label column name(s): [‘T (degC)’]


w2 = WindowGenerator(input_width=6, label_width=1, shift=1, label_columns=['T (degC)']) w2

Total window size: 7 Input indices: [0 1 2 3 4 5] Label indices: [6] Label column name(s): [‘T (degC)’]

Split

Given a list of consecutive inputs, the

split_window

method will convert them to a window of inputs and a window of labels.

The example

w2

you define earlier will be split like this:

This diagram doesn’t show the

features

axis of the data, but this

split_window

function also handles the

label_columns

so it can be used for both the single output and multi-output examples.


def split_window(self, features): inputs = features[:, self.input_slice, :] labels = features[:, self.labels_slice, :] if self.label_columns is not None: labels = tf.stack( [labels[:, :, self.column_indices[name]] for name in self.label_columns], axis=-1) # Slicing doesn't preserve static shape information, so set the shapes # manually. This way the `tf.data.Datasets` are easier to inspect. inputs.set_shape([None, self.input_width, None]) labels.set_shape([None, self.label_width, None]) return inputs, labels WindowGenerator.split_window = split_window

Try it out:


# Stack three slices, the length of the total window. example_window = tf.stack([np.array(train_df[:w2.total_window_size]), np.array(train_df[100:100+w2.total_window_size]), np.array(train_df[200:200+w2.total_window_size])]) example_inputs, example_labels = w2.split_window(example_window) print('All shapes are: (batch, time, features)') print(f'Window shape: {example_window.shape}') print(f'Inputs shape: {example_inputs.shape}') print(f'Labels shape: {example_labels.shape}')

All shapes are: (batch, time, features) Window shape: (3, 7, 19) Inputs shape: (3, 6, 19) Labels shape: (3, 1, 1)

Typically, data in TensorFlow is packed into arrays where the outermost index is across examples (the “batch” dimension). The middle indices are the “time” or “space” (width, height) dimension(s). The innermost indices are the features.

The code above took a batch of three 7-time step windows with 19 features at each time step. It splits them into a batch of 6-time step 19-feature inputs, and a 1-time step 1-feature label. The label only has one feature because the

WindowGenerator

was initialized with

label_columns=['T (degC)']

. Initially, this tutorial will build models that predict single output labels.

Plot

Here is a plot method that allows a simple visualization of the split window:


w2.example = example_inputs, example_labels


def plot(self, model=None, plot_col='T (degC)', max_subplots=3): inputs, labels = self.example plt.figure(figsize=(12, 8)) plot_col_index = self.column_indices[plot_col] max_n = min(max_subplots, len(inputs)) for n in range(max_n): plt.subplot(max_n, 1, n+1) plt.ylabel(f'{plot_col} [normed]') plt.plot(self.input_indices, inputs[n, :, plot_col_index], label='Inputs', marker='.', zorder=-10) if self.label_columns: label_col_index = self.label_columns_indices.get(plot_col, None) else: label_col_index = plot_col_index if label_col_index is None: continue plt.scatter(self.label_indices, labels[n, :, label_col_index], edgecolors='k', label='Labels', c='#2ca02c', s=64) if model is not None: predictions = model(inputs) plt.scatter(self.label_indices, predictions[n, :, label_col_index], marker='X', edgecolors='k', label='Predictions', c='#ff7f0e', s=64) if n == 0: plt.legend() plt.xlabel('Time [h]') WindowGenerator.plot = plot

This plot aligns inputs, labels, and (later) predictions based on the time that the item refers to:


w2.plot()

You can plot the other columns, but the example window

w2

configuration only has labels for the

T (degC)

column.


w2.plot(plot_col='p (mbar)')

Create tf.data.Datasets

Finally, this

make_dataset

method will take a time series DataFrame and convert it to a

tf.data.Dataset

of

(input_window, label_window)

pairs using the

tf.keras.utils.timeseries_dataset_from_array

function:


def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.utils.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32,) ds = ds.map(self.split_window) return ds WindowGenerator.make_dataset = make_dataset

The

WindowGenerator

object holds training, validation, and test data.

Add properties for accessing them as

tf.data.Dataset

s using the

make_dataset

method you defined earlier. Also, add a standard example batch for easy access and plotting:


@property def train(self): return self.make_dataset(self.train_df) @property def val(self): return self.make_dataset(self.val_df) @property def test(self): return self.make_dataset(self.test_df) @property def example(self): """Get and cache an example batch of `inputs, labels` for plotting.""" result = getattr(self, '_example', None) if result is None: # No example batch was found, so get one from the `.train` dataset result = next(iter(self.train)) # And cache it for next time self._example = result return result WindowGenerator.train = train WindowGenerator.val = val WindowGenerator.test = test WindowGenerator.example = example

Now, the

WindowGenerator

object gives you access to the

tf.data.Dataset

objects, so you can easily iterate over the data.

The

Dataset.element_spec

property tells you the structure, data types, and shapes of the dataset elements.


# Each element is an (inputs, label) pair. w2.train.element_spec

(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))

Iterating over a

Dataset

yields concrete batches:


for example_inputs, example_labels in w2.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 6, 19) Labels shape (batch, time, features): (32, 1, 1)

RNN layers and RNN cells

In addition to the built-in RNN layers, the RNN API also provides cell-level APIs. Unlike RNN layers, which processes whole batches of input sequences, the RNN cell only processes a single timestep.

The cell is the inside of the

for

loop of a RNN layer. Wrapping a cell inside a

keras.layers.RNN

layer gives you a layer capable of processing batches of
sequences, e.g.

RNN(LSTMCell(10))

.

Mathematically,

RNN(LSTMCell(10))

produces the same result as

LSTM(10)

. In fact,
the implementation of this layer in TF v1.x was just creating the corresponding RNN
cell and wrapping it in a RNN layer. However using the built-in

GRU

and

LSTM

layers enable the use of CuDNN and you may see better performance.

There are three built-in RNN cells, each of them corresponding to the matching RNN layer.


  • keras.layers.SimpleRNNCell

    corresponds to the

    SimpleRNN

    layer.

  • keras.layers.GRUCell

    corresponds to the

    GRU

    layer.

  • keras.layers.LSTMCell

    corresponds to the

    LSTM

    layer.

The cell abstraction, together with the generic

keras.layers.RNN

class, make it
very easy to implement custom RNN architectures for your research.

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention
MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

The weather dataset

This tutorial uses a weather time series dataset recorded by the Max Planck Institute for Biogeochemistry.

This dataset contains 14 different features such as air temperature, atmospheric pressure, and humidity. These were collected every 10 minutes, beginning in 2003. For efficiency, you will use only the data collected between 2009 and 2016. This section of the dataset was prepared by François Chollet for his book Deep Learning with Python.


zip_path = tf.keras.utils.get_file( origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip', fname='jena_climate_2009_2016.csv.zip', extract=True) csv_path, _ = os.path.splitext(zip_path)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip 13568290/13568290 [==============================] – 0s 0us/step

This tutorial will just deal with hourly predictions, so start by sub-sampling the data from 10-minute intervals to one-hour intervals:


df = pd.read_csv(csv_path) # Slice [start:stop:step], starting from index 5 take every 6th record. df = df[5::6] date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')

Let’s take a glance at the data. Here are the first few rows:


df.head()

Here is the evolution of a few features over time:


plot_cols = ['T (degC)', 'p (mbar)', 'rho (g/m**3)'] plot_features = df[plot_cols] plot_features.index = date_time _ = plot_features.plot(subplots=True) plot_features = df[plot_cols][:480] plot_features.index = date_time[:480] _ = plot_features.plot(subplots=True)

Inspect and cleanup

Next, look at the statistics of the dataset:


df.describe().transpose()

Wind velocity

One thing that should stand out is the

min

value of the wind velocity (

wv (m/s)

) and the maximum value (

max. wv (m/s)

) columns. This

-9999

is likely erroneous.

There’s a separate wind direction column, so the velocity should be greater than zero (

>=0

). Replace it with zeros:


wv = df['wv (m/s)'] bad_wv = wv == -9999.0 wv[bad_wv] = 0.0 max_wv = df['max. wv (m/s)'] bad_max_wv = max_wv == -9999.0 max_wv[bad_max_wv] = 0.0 # The above inplace edits are reflected in the DataFrame. df['wv (m/s)'].min()

0.0

Feature engineering

Before diving in to build a model, it’s important to understand your data and be sure that you’re passing the model appropriately formatted data.

Wind

The last column of the data,

wd (deg)

—gives the wind direction in units of degrees. Angles do not make good model inputs: 360° and 0° should be close to each other and wrap around smoothly. Direction shouldn’t matter if the wind is not blowing.

Right now the distribution of wind data looks like this:


plt.hist2d(df['wd (deg)'], df['wv (m/s)'], bins=(50, 50), vmax=400) plt.colorbar() plt.xlabel('Wind Direction [deg]') plt.ylabel('Wind Velocity [m/s]')

Text(0, 0.5, ‘Wind Velocity [m/s]’)

But this will be easier for the model to interpret if you convert the wind direction and velocity columns to a wind vector:


wv = df.pop('wv (m/s)') max_wv = df.pop('max. wv (m/s)') # Convert to radians. wd_rad = df.pop('wd (deg)')*np.pi / 180 # Calculate the wind x and y components. df['Wx'] = wv*np.cos(wd_rad) df['Wy'] = wv*np.sin(wd_rad) # Calculate the max wind x and y components. df['max Wx'] = max_wv*np.cos(wd_rad) df['max Wy'] = max_wv*np.sin(wd_rad)

The distribution of wind vectors is much simpler for the model to correctly interpret:


plt.hist2d(df['Wx'], df['Wy'], bins=(50, 50), vmax=400) plt.colorbar() plt.xlabel('Wind X [m/s]') plt.ylabel('Wind Y [m/s]') ax = plt.gca() ax.axis('tight')

(-11.305513973134667, 8.24469928549079, -8.27438540335515, 7.7338312955467785)

Time

Similarly, the

Date Time

column is very useful, but not in this string form. Start by converting it to seconds:


timestamp_s = date_time.map(pd.Timestamp.timestamp)

Similar to the wind direction, the time in seconds is not a useful model input. Being weather data, it has clear daily and yearly periodicity. There are many ways you could deal with periodicity.

You can get usable signals by using sine and cosine transforms to clear “Time of day” and “Time of year” signals:


day = 24*60*60 year = (365.2425)*day df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day)) df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day)) df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year)) df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))


plt.plot(np.array(df['Day sin'])[:25]) plt.plot(np.array(df['Day cos'])[:25]) plt.xlabel('Time [h]') plt.title('Time of day signal')

Text(0.5, 1.0, ‘Time of day signal’)

This gives the model access to the most important frequency features. In this case you knew ahead of time which frequencies were important.

If you don’t have that information, you can determine which frequencies are important by extracting features with Fast Fourier Transform. To check the assumptions, here is the

tf.signal.rfft

of the temperature over time. Note the obvious peaks at frequencies near

1/year

and

1/day

:


fft = tf.signal.rfft(df['T (degC)']) f_per_dataset = np.arange(0, len(fft)) n_samples_h = len(df['T (degC)']) hours_per_year = 24*365.2524 years_per_dataset = n_samples_h/(hours_per_year) f_per_year = f_per_dataset/years_per_dataset plt.step(f_per_year, np.abs(fft)) plt.xscale('log') plt.ylim(0, 400000) plt.xlim([0.1, max(plt.xlim())]) plt.xticks([1, 365.2524], labels=['1/Year', '1/day']) _ = plt.xlabel('Frequency (log scale)')

Split the data

You’ll use a

(70%, 20%, 10%)

split for the training, validation, and test sets. Note the data is not being randomly shuffled before splitting. This is for two reasons:

  1. It ensures that chopping the data into windows of consecutive samples is still possible.
  2. It ensures that the validation/test results are more realistic, being evaluated on the data collected after the model was trained.


column_indices = {name: i for i, name in enumerate(df.columns)} n = len(df) train_df = df[0:int(n*0.7)] val_df = df[int(n*0.7):int(n*0.9)] test_df = df[int(n*0.9):] num_features = df.shape[1]

Normalize the data

It is important to scale features before training a neural network. Normalization is a common way of doing this scaling: subtract the mean and divide by the standard deviation of each feature.

The mean and standard deviation should only be computed using the training data so that the models have no access to the values in the validation and test sets.

It’s also arguable that the model shouldn’t have access to future values in the training set when training, and that this normalization should be done using moving averages. That’s not the focus of this tutorial, and the validation and test sets ensure that you get (somewhat) honest metrics. So, in the interest of simplicity this tutorial uses a simple average.


train_mean = train_df.mean() train_std = train_df.std() train_df = (train_df - train_mean) / train_std val_df = (val_df - train_mean) / train_std test_df = (test_df - train_mean) / train_std

Now, peek at the distribution of the features. Some features do have long tails, but there are no obvious errors like the

-9999

wind velocity value.


df_std = (df - train_mean) / train_std df_std = df_std.melt(var_name='Column', value_name='Normalized') plt.figure(figsize=(12, 6)) ax = sns.violinplot(x='Column', y='Normalized', data=df_std) _ = ax.set_xticklabels(df.keys(), rotation=90)

/tmpfs/tmp/ipykernel_449604/3214313372.py:5: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator. _ = ax.set_xticklabels(df.keys(), rotation=90)

Bidirectional RNNs

For sequences other than time series (e.g. text), it is often the case that a RNN model can perform better if it not only processes sequence from start to end, but also backwards. For example, to predict the next word in a sentence, it is often useful to have the context around the word, not only just the words that come before it.

Keras provides an easy API for you to build such bidirectional RNNs: the

keras.layers.Bidirectional

wrapper.


model = keras.Sequential() model.add( layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(5, 10)) ) model.add(layers.Bidirectional(layers.LSTM(32))) model.add(layers.Dense(10)) model.summary()

Model: “sequential_2” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= bidirectional (Bidirection (None, 5, 128) 38400 al) bidirectional_1 (Bidirecti (None, 64) 41216 onal) dense_3 (Dense) (None, 10) 650 ================================================================= Total params: 80266 (313.54 KB) Trainable params: 80266 (313.54 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Under the hood,

Bidirectional

will copy the RNN layer passed in, and flip the

go_backwards

field of the newly copied layer, so that it will process the inputs in
reverse order.

The output of the

Bidirectional

RNN will be, by default, the concatenation of the forward layer
output and the backward layer output. If you need a different merging behavior, e.g.
concatenation, change the

merge_mode

parameter in the

Bidirectional

wrapper
constructor. For more details about

Bidirectional

, please check
the API docs.

Recurrent Neural Networks Tutorial | RNN LSTM | Tensorflow Tutorial | Edureka  Rewind
Recurrent Neural Networks Tutorial | RNN LSTM | Tensorflow Tutorial | Edureka Rewind

Python3


model


keras.models.Sequential()


model.add(keras.layers.Embedding(


10000


128


))


model.add(keras.layers.Bidirectional(


keras.layers.LSTM(


64


, return_sequences


True


)))


model.add(keras.layers.Bidirectional(keras.layers.LSTM(


64


)))


model.add(keras.layers.Dense(


128


, activation


"relu"


))


model.add(keras.layers.Dropout(


0.4


))


model.add(keras.layers.Dense(


, activation


"sigmoid"


))


model.


compile


"rmsprop"


"binary_crossentropy"


, metrics


"accuracy"


])


history


model.fit(train_pad, y_train, epochs

Output:

Training Process of the Bidirectional LSTM model

Now let’s check the model’s accuracy using GRUs as well.

Create the text encoder

The raw text loaded by

tfds

needs to be processed before it can be used in a model. The simplest way to process text for training is using the

TextVectorization

layer. This layer has many capabilities, but this tutorial sticks to the default behavior.

Create the layer, and pass the dataset’s text to the layer’s

.adapt

method:


VOCAB_SIZE = 1000 encoder = tf.keras.layers.TextVectorization( max_tokens=VOCAB_SIZE) encoder.adapt(train_dataset.map(lambda text, label: text))

The

.adapt

method sets the layer’s vocabulary. Here are the first 20 tokens. After the padding and unknown tokens they’re sorted by frequency:


vocab = np.array(encoder.get_vocabulary()) vocab[:20]

array([”, ‘[UNK]’, ‘the’, ‘and’, ‘a’, ‘of’, ‘to’, ‘is’, ‘in’, ‘it’, ‘i’, ‘this’, ‘that’, ‘br’, ‘was’, ‘as’, ‘for’, ‘with’, ‘movie’, ‘but’], dtype=’

Once the vocabulary is set, the layer can encode text into indices. The tensors of indices are 0-padded to the longest sequence in the batch (unless you set a fixed

output_sequence_length

):


encoded_example = encoder(example)[:3].numpy() encoded_example

array([[147, 300, 362, …, 0, 0, 0], [ 4, 579, 5, …, 0, 0, 0], [ 2, 348, 12, …, 0, 0, 0]])

With the default settings, the process is not completely reversible. There are three main reasons for that:

  1. The default value for

    preprocessing.TextVectorization

    ‘s

    standardize

    argument is

    "lower_and_strip_punctuation"

    .
  2. The limited vocabulary size and lack of character-based fallback results in some unknown tokens.


for n in range(3): print("Original: ", example[n].numpy()) print("Round-trip: ", " ".join(vocab[encoded_example[n]])) print()

Original: b’Watching beautiful women sneaking around, playing cops and robbers is one of the most delightful guilty pleasures the medium film lets me enjoy. So The House on Carroll Street was not entirely a waste of time, although the story is contrived and the screenplay uninspired and somewhat irritating.There are many allusions to different Hitchcock pictures, not least the choice of Kelly McGillis in the starring role. She is dressed up as Grace Kelly, and she is not far off the mark. Not at all. But her character is not convincing. The way she is introduced to the audience, she should be someone with political convictions and a purpose in life. After all the movie deals with a clearly defined time period, true events and a specific issue. But the story degenerates within the first minutes into a sorry run-off-the-mill crime story with unbelievable coincidences, high predictability and a set of two dimensional characters. This is all the more regrettable, as the performances of the actors are good, as are the photography and the set design.The finale in Central Station, New York is breath taking. It starts in the subterranean section and then moves up to the roof. The movie can be praised for its good use of architecture.’ Round-trip: watching beautiful women [UNK] around playing [UNK] and [UNK] is one of the most [UNK] [UNK] [UNK] the [UNK] film lets me enjoy so the house on [UNK] street was not [UNK] a waste of time although the story is [UNK] and the screenplay [UNK] and somewhat [UNK] br there are many [UNK] to different [UNK] [UNK] not least the [UNK] of [UNK] [UNK] in the [UNK] role she is [UNK] up as [UNK] [UNK] and she is not far off the mark not at all but her character is not [UNK] the way she is [UNK] to the audience she should be someone with political [UNK] and a [UNK] in life after all the movie [UNK] with a clearly [UNK] time period true events and a [UNK] [UNK] but the story [UNK] within the first minutes into a sorry [UNK] crime story with [UNK] [UNK] high [UNK] and a set of two [UNK] characters this is all the more [UNK] as the performances of the actors are good as are the [UNK] and the set [UNK] br the [UNK] in [UNK] [UNK] new york is [UNK] taking it starts in the [UNK] [UNK] and then [UNK] up to the [UNK] the movie can be [UNK] for its good use of [UNK] Original: b’A group of people are invited to there high school reunion, but after they arrive they discover it to be a scam by an old classmate they played an almost fatal prank on. Now, he seeks to get revenge on all those that hurt him by sealing all the exits and cutting off all telephone lines.Dark slasher film with an unexceptional premise. Bringing it up a notch are a few good performances, some rather creative death scenes, plenty of excitement & scares, some humor and an original ending.Unrated for Extreme Violence, Graphic Nudity, Sexual Situations, Profanity and Drug Use.’ Round-trip: a group of people are [UNK] to there high school [UNK] but after they [UNK] they [UNK] it to be a [UNK] by an old [UNK] they played an almost [UNK] [UNK] on now he [UNK] to get [UNK] on all those that [UNK] him by [UNK] all the [UNK] and [UNK] off all [UNK] [UNK] br dark [UNK] film with an [UNK] premise [UNK] it up a [UNK] are a few good performances some rather [UNK] death scenes plenty of [UNK] [UNK] some humor and an original [UNK] br [UNK] for [UNK] violence [UNK] [UNK] sexual [UNK] [UNK] and [UNK] use Original: b’The short that starts this film is the true footage of a guy named Gary, apparently it was taken randomly in the parking lot of a television station where Gary works in the town of Beaver. Gary is a little “different”; he is an impersonator and drives an old Chevy named Farrah (after Fawcett). Lo and behold the filmmaker gets a letter from Gary some time later inviting him to return to Beaver to get some footage of the local talent contest he has put together, including Gary\’s staggering performace as Olivia Newton Dawn. Oh, my. The two shorts that follow are Gary\’s story, the same one you just witnessed only the first is portrayed by Sean Penn and the second by Crispin Glover titled “The Orkly Kid.” If you are in the mood for making fun of someone this is definitely the film to watch. I was doubled over with laughter through most of it, especially Crispins performance which could definitely stand on it\’s own. When it was over, I had to rewind the film to once again watch the real Gary and all his shining idiocy. Although Olivia was the focus, I would have liked to have seen one of the “fictitious” shorts take a jab at Gary\’s Barry Manilow impersonation, whic h was equally ridiculous.’ Round-trip: the short that starts this film is the true footage of a guy named [UNK] apparently it was taken [UNK] in the [UNK] lot of a television [UNK] where [UNK] works in the town of [UNK] [UNK] is a little different he is an [UNK] and [UNK] an old [UNK] named [UNK] after [UNK] [UNK] and [UNK] the [UNK] gets a [UNK] from [UNK] some time later [UNK] him to return to [UNK] to get some footage of the local talent [UNK] he has put together including [UNK] [UNK] [UNK] as [UNK] [UNK] [UNK] oh my the two [UNK] that follow are [UNK] story the same one you just [UNK] only the first is portrayed by [UNK] [UNK] and the second by [UNK] [UNK] [UNK] the [UNK] kid if you are in the [UNK] for making fun of someone this is definitely the film to watch i was [UNK] over with [UNK] through most of it especially [UNK] performance which could definitely stand on its own when it was over i had to [UNK] the film to once again watch the real [UNK] and all his [UNK] [UNK] although [UNK] was the [UNK] i would have liked to have seen one of the [UNK] [UNK] take a [UNK] at [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] was [UNK] ridiculous

Conclusion

The world is moving towards hybrid solutions where data scientists are using CNN-RNN hybrid networks in the field of image captioning, emotion detection, video subtitling, and DNA sequencing. Hybrid networks provide both visual and temporal characteristics for the model. Learn more about RNN by taking the course: Recurrent Neural Networks for Language Modeling in Python.

The first half of the tutorial covers the basics of recurrent neural networks, its limitations, and solutions in the form of more advanced architecture. The second half of the tutorial is about developing MasterCard stock price predictions using LSTM and GRU models. The results clearly show that the GRU model performed better than LSTM, with a similar structure and hyperparameters.

This project is available on the DataCamp workspace.

In this blog, we are going to cover:

What are Recurrent Neural Networks (RNN) | Input and Output Sequences of RNN Training Recurrent Neural Networks (RNN) | Long Short-Term Memory (LSTM) | Advantages of RNN’s | Disadvantages of RNN’s Applications of RNN’s | Conclusion

Recurrent Neural Networks (RNN) are a part of a larger institution of algorithms referred to as sequence models. Sequence models made giant leaps forward within the fields of speech recognition, tune technology, DNA series evaluation, gadget translation, and plenty of extras.

Python3


fig


px.histogram(data, marginal


'box'


"Age"


, title


"Age Group"


color


"Recommended IND"


nbins


65


18


color_discrete_sequence


'green'


'red'


])


fig.update_layout(bargap


0.2

Output:

We can visualize the distribution of the age columns data along with the Rating.

Long Short-Term Memory (LSTM)

  • A unique kind of Recurrent Neural Networks, capable of learning lengthy-time period dependencies.
  • LSTM’s have a Nature of Remembering facts for a long interval of time is their Default behaviour.
  • Each LSTM module may have three gates named as forget gate, input gate, output gate.

    • Forget Gate: This gate makes a decision which facts to be disregarded from the cellular in that unique timestamp. it’s far determined via the sigmoid function.
    • Input gate: makes a decision how plenty of this unit is introduced to the current state. The sigmoid function makes a decision which values to permit through 0,1. and Tanh function gives weightage to the values which might be handed figuring out their level of importance ranging from-1 to at least one.
    • Output Gate: comes to a decision which a part of the current cell makes it to the output. Sigmoid characteristic decides which values to permit thru zero,1. and Tanh characteristic gives weightage to the values which can be exceeded determining their degree of importance ranging from-1 to at least one and expanded with an output of Sigmoid.

Conclusion

  • Recurrent Neural Networks stand at the foundation of the modern-day marvels of synthetic intelligence. They provide stable foundations for synthetic intelligence programs to be greater green, flexible of their accessibility, and most importantly, extra convenient to use.
  • However, the outcomes of recurrent neural network work show the actual cost of the information in this day and age. They display what number of things may be extracted out of records and what this information can create in return. And that is exceptionally inspiring.

Related References

  • Decision Tree Algorithm Introduction
  • Natural Language Processing with Python
  • An Introduction to Reinforcement Learning
  • Data Science And Machine Learning: Hands-On Labs With Python
  • Data Visualization Using Plotly: Python’s Visualization Library
  • Deep Learning Vs Machine Learning
  • Python Decorators and Generators Q & A: Day 6 Live Session Review
  • Python FAQ: Technical & Career Oriented Questions For Beginners
  • Beginners Guide To Data Types In Python

Multi-step models

Both the single-output and multiple-output models in the previous sections made single time step predictions, one hour into the future.

This section looks at how to expand these models to make multiple time step predictions.

In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predicts a sequence of the future values.

There are two rough approaches to this:

  1. Single shot predictions where the entire time series is predicted at once.
  2. Autoregressive predictions where the model only makes single step predictions and its output is fed back as its input.

In this section all the models will predict all the features across all output time steps.

For the multi-step model, the training data again consists of hourly samples. However, here, the models will learn to predict 24 hours into the future, given 24 hours of the past.

Here is a

Window

object that generates these slices from the dataset:


OUT_STEPS = 24 multi_window = WindowGenerator(input_width=24, label_width=OUT_STEPS, shift=OUT_STEPS) multi_window.plot() multi_window

Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47] Label column name(s): None

Baselines

A simple baseline for this task is to repeat the last input time step for the required number of output time steps:


class MultiStepLastBaseline(tf.keras.Model): def call(self, inputs): return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1]) last_baseline = MultiStepLastBaseline() last_baseline.compile(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.MeanAbsoluteError()]) multi_val_performance = {} multi_performance = {} multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val) multi_performance['Last'] = last_baseline.evaluate(multi_window.test, verbose=0) multi_window.plot(last_baseline)

437/437 [==============================] – 1s 2ms/step – loss: 0.6285 – mean_absolute_error: 0.5007

Since this task is to predict 24 hours into the future, given 24 hours of the past, another simple approach is to repeat the previous day, assuming tomorrow will be similar:


class RepeatBaseline(tf.keras.Model): def call(self, inputs): return inputs repeat_baseline = RepeatBaseline() repeat_baseline.compile(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.MeanAbsoluteError()]) multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val) multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test, verbose=0) multi_window.plot(repeat_baseline)

437/437 [==============================] – 1s 2ms/step – loss: 0.4270 – mean_absolute_error: 0.3959

Single-shot models

One high-level approach to this problem is to use a “single-shot” model, where the model makes the entire sequence prediction in a single step.

This can be implemented efficiently as a

tf.keras.layers.Dense

with

OUT_STEPS*features

output units. The model just needs to reshape that output to the required

(OUTPUT_STEPS, features)

.

Linear

A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict

OUTPUT_STEPS

time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.


multi_linear_model = tf.keras.Sequential([ # Take the last time-step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_linear_model, multi_window) IPython.display.clear_output() multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val) multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_linear_model)

437/437 [==============================] – 1s 2ms/step – loss: 0.2559 – mean_absolute_error: 0.3049

Dense

Adding a

tf.keras.layers.Dense

between the input and output gives the linear model more power, but is still only based on a single input time step.


multi_dense_model = tf.keras.Sequential([ # Take the last time step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, dense_units] tf.keras.layers.Dense(512, activation='relu'), # Shape => [batch, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_dense_model, multi_window) IPython.display.clear_output() multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val) multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_dense_model)

437/437 [==============================] – 1s 2ms/step – loss: 0.2203 – mean_absolute_error: 0.2834

CNN

A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:


CONV_WIDTH = 3 multi_conv_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, CONV_WIDTH, features] tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]), # Shape => [batch, 1, conv_units] tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_conv_model, multi_window) IPython.display.clear_output() multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val) multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_conv_model)

437/437 [==============================] – 1s 2ms/step – loss: 0.2145 – mean_absolute_error: 0.2801

RNN

A recurrent model can learn to use a long history of inputs, if it’s relevant to the predictions the model is making. Here the model will accumulate internal state for 24 hours, before making a single prediction for the next 24 hours.

In this single-shot format, the LSTM only needs to produce an output at the last time step, so set

return_sequences=False

in

tf.keras.layers.LSTM

.


multi_lstm_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, lstm_units]. # Adding more `lstm_units` just overfits more quickly. tf.keras.layers.LSTM(32, return_sequences=False), # Shape => [batch, out_steps*features]. tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()), # Shape => [batch, out_steps, features]. tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_lstm_model, multi_window) IPython.display.clear_output() multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val) multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_lstm_model)

437/437 [==============================] – 1s 3ms/step – loss: 0.2130 – mean_absolute_error: 0.2837

Advanced: Autoregressive model

The above models all predict the entire output sequence in a single step.

In some cases it may be helpful for the model to decompose this prediction into individual time steps. Then, each model’s output can be fed back into itself at each step and predictions can be made conditioned on the previous one, like in the classic Generating Sequences With Recurrent Neural Networks.

One clear advantage to this style of model is that it can be set up to produce output with a varying length.

You could take any of the single-step multi-output models trained in the first half of this tutorial and run in an autoregressive feedback loop, but here you’ll focus on building a model that’s been explicitly trained to do that.

RNN

This tutorial only builds an autoregressive RNN model, but this pattern could be applied to any model that was designed to output a single time step.

The model will have the same basic form as the single-step LSTM models from earlier: a

tf.keras.layers.LSTM

layer followed by a

tf.keras.layers.Dense

layer that converts the

LSTM

layer’s outputs to model predictions.

A

tf.keras.layers.LSTM

is a

tf.keras.layers.LSTMCell

wrapped in the higher level

tf.keras.layers.RNN

that manages the state and sequence results for you (Check out the Recurrent Neural Networks (RNN) with Keras guide for details).

In this case, the model has to manually manage the inputs for each step, so it uses

tf.keras.layers.LSTMCell

directly for the lower level, single time step interface.


class FeedBack(tf.keras.Model): def __init__(self, units, out_steps): super().__init__() self.out_steps = out_steps self.units = units self.lstm_cell = tf.keras.layers.LSTMCell(units) # Also wrap the LSTMCell in an RNN to simplify the `warmup` method. self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True) self.dense = tf.keras.layers.Dense(num_features)


feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)

The first method this model needs is a

warmup

method to initialize its internal state based on the inputs. Once trained, this state will capture the relevant parts of the input history. This is equivalent to the single-step

LSTM

model from earlier:


def warmup(self, inputs): # inputs.shape => (batch, time, features) # x.shape => (batch, lstm_units) x, *state = self.lstm_rnn(inputs) # predictions.shape => (batch, features) prediction = self.dense(x) return prediction, state FeedBack.warmup = warmup

This method returns a single time-step prediction and the internal state of the

LSTM

:


prediction, state = feedback_model.warmup(multi_window.example[0]) prediction.shape

TensorShape([32, 19])

With the

RNN

‘s state, and an initial prediction you can now continue iterating the model feeding the predictions at each step back as the input.

The simplest approach for collecting the output predictions is to use a Python list and a

tf.stack

after the loop.


def call(self, inputs, training=None): # Use a TensorArray to capture dynamically unrolled outputs. predictions = [] # Initialize the LSTM state. prediction, state = self.warmup(inputs) # Insert the first prediction. predictions.append(prediction) # Run the rest of the prediction steps. for n in range(1, self.out_steps): # Use the last prediction as input. x = prediction # Execute one lstm step. x, state = self.lstm_cell(x, states=state, training=training) # Convert the lstm output to a prediction. prediction = self.dense(x) # Add the prediction to the output. predictions.append(prediction) # predictions.shape => (time, batch, features) predictions = tf.stack(predictions) # predictions.shape => (batch, time, features) predictions = tf.transpose(predictions, [1, 0, 2]) return predictions FeedBack.call = call

Test run this model on the example inputs:


print('Output shape (batch, time, features): ', feedback_model(multi_window.example[0]).shape)

Output shape (batch, time, features): (32, 24, 19)

Now, train the model:


history = compile_and_fit(feedback_model, multi_window) IPython.display.clear_output() multi_val_performance['AR LSTM'] = feedback_model.evaluate(multi_window.val) multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test, verbose=0) multi_window.plot(feedback_model)

437/437 [==============================] – 3s 8ms/step – loss: 0.2280 – mean_absolute_error: 0.3019

Performance

There are clearly diminishing returns as a function of model complexity on this problem:


x = np.arange(len(multi_performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in multi_val_performance.values()] test_mae = [v[metric_index] for v in multi_performance.values()] plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=multi_performance.keys(), rotation=45) plt.ylabel(f'MAE (average over all times and outputs)') _ = plt.legend()

The metrics for the multi-output models in the first half of this tutorial show the performance averaged across all output features. These performances are similar but also averaged across output time steps.


for name, value in multi_performance.items(): print(f'{name:8s}: {value[1]:0.4f}')

Last : 0.5157 Repeat : 0.3774 Linear : 0.2990 Dense : 0.2776 Conv : 0.2739 LSTM : 0.2763 AR LSTM : 0.2944

The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worth while on this problem, but there was no way to know without trying, and these models could be helpful for your problem.

Bidirectional RNNs

For sequences other than time series (e.g. text), it is often the case that a RNN model can perform better if it not only processes sequence from start to end, but also backwards. For example, to predict the next word in a sentence, it is often useful to have the context around the word, not only just the words that come before it.

Keras provides an easy API for you to build such bidirectional RNNs: the

keras.layers.Bidirectional

wrapper.


model = keras.Sequential() model.add( layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(5, 10)) ) model.add(layers.Bidirectional(layers.LSTM(32))) model.add(layers.Dense(10)) model.summary()

Model: “sequential_2” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= bidirectional (Bidirection (None, 5, 128) 38400 al) bidirectional_1 (Bidirecti (None, 64) 41216 onal) dense_3 (Dense) (None, 10) 650 ================================================================= Total params: 80266 (313.54 KB) Trainable params: 80266 (313.54 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Under the hood,

Bidirectional

will copy the RNN layer passed in, and flip the

go_backwards

field of the newly copied layer, so that it will process the inputs in
reverse order.

The output of the

Bidirectional

RNN will be, by default, the concatenation of the forward layer
output and the backward layer output. If you need a different merging behavior, e.g.
concatenation, change the

merge_mode

parameter in the

Bidirectional

wrapper
constructor. For more details about

Bidirectional

, please check
the API docs.

Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

Attach an optimizer, and a loss function

The standard

tf.keras.losses.sparse_categorical_crossentropy

loss function works in this case because it is applied across the last dimension of the predictions.

Because your model returns logits, you need to set the

from_logits

flag.


loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)


example_batch_mean_loss = loss(target_example_batch, example_batch_predictions) print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)") print("Mean loss: ", example_batch_mean_loss)

Prediction shape: (64, 100, 66) # (batch_size, sequence_length, vocab_size) Mean loss: tf.Tensor(4.1884556, shape=(), dtype=float32)

A newly initialized model shouldn’t be too sure of itself, the output logits should all have similar magnitudes. To confirm this you can check that the exponential of the mean loss is approximately equal to the vocabulary size. A much higher loss means the model is sure of its wrong answers, and is badly initialized:


tf.exp(example_batch_mean_loss).numpy()

65.920906

Configure the training procedure using the

tf.keras.Model.compile

method. Use

tf.keras.optimizers.Adam

with default arguments and the loss function.


model.compile(optimizer='adam', loss=loss)

Configure checkpoints

Use a

tf.keras.callbacks.ModelCheckpoint

to ensure that checkpoints are saved during training:


# Directory where the checkpoints will be saved checkpoint_dir = './training_checkpoints' # Name of the checkpoint files checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}") checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_prefix, save_weights_only=True)

Execute the training

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.


EPOCHS = 20


history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/20 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1700137742.036116 34050 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 172/172 [==============================] – 12s 53ms/step – loss: 2.7219 Epoch 2/20 172/172 [==============================] – 10s 54ms/step – loss: 1.9972 Epoch 3/20 172/172 [==============================] – 11s 55ms/step – loss: 1.7187 Epoch 4/20 172/172 [==============================] – 11s 57ms/step – loss: 1.5568 Epoch 5/20 172/172 [==============================] – 11s 59ms/step – loss: 1.4579 Epoch 6/20 172/172 [==============================] – 11s 61ms/step – loss: 1.3891 Epoch 7/20 172/172 [==============================] – 12s 61ms/step – loss: 1.3356 Epoch 8/20 172/172 [==============================] – 12s 62ms/step – loss: 1.2913 Epoch 9/20 172/172 [==============================] – 11s 60ms/step – loss: 1.2501 Epoch 10/20 172/172 [==============================] – 11s 59ms/step – loss: 1.2090 Epoch 11/20 172/172 [==============================] – 11s 59ms/step – loss: 1.1693 Epoch 12/20 172/172 [==============================] – 11s 59ms/step – loss: 1.1283 Epoch 13/20 172/172 [==============================] – 11s 60ms/step – loss: 1.0859 Epoch 14/20 172/172 [==============================] – 11s 61ms/step – loss: 1.0391 Epoch 15/20 172/172 [==============================] – 11s 61ms/step – loss: 0.9928 Epoch 16/20 172/172 [==============================] – 11s 61ms/step – loss: 0.9408 Epoch 17/20 172/172 [==============================] – 11s 60ms/step – loss: 0.8888 Epoch 18/20 172/172 [==============================] – 11s 60ms/step – loss: 0.8361 Epoch 19/20 172/172 [==============================] – 11s 60ms/step – loss: 0.7840 Epoch 20/20 172/172 [==============================] – 11s 60ms/step – loss: 0.7337

Build The Model

This section defines the model as a

keras.Model

subclass (For details see Making new Layers and Models via subclassing).

This model has three layers:


  • tf.keras.layers.Embedding

    : The input layer. A trainable lookup table that will map each character-ID to a vector with

    embedding_dim

    dimensions;

  • tf.keras.layers.GRU

    : A type of RNN with size

    units=rnn_units

    (You can also use an LSTM layer here.)

  • tf.keras.layers.Dense

    : The output layer, with

    vocab_size

    outputs. It outputs one logit for each character in the vocabulary. These are the log-likelihood of each character according to the model.


# Length of the vocabulary in StringLookup Layer vocab_size = len(ids_from_chars.get_vocabulary()) # The embedding dimension embedding_dim = 256 # Number of RNN units rnn_units = 1024


class MyModel(tf.keras.Model): def __init__(self, vocab_size, embedding_dim, rnn_units): super().__init__(self) self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim) self.gru = tf.keras.layers.GRU(rnn_units, return_sequences=True, return_state=True) self.dense = tf.keras.layers.Dense(vocab_size) def call(self, inputs, states=None, return_state=False, training=False): x = inputs x = self.embedding(x, training=training) if states is None: states = self.gru.get_initial_state(x) x, states = self.gru(x, initial_state=states, training=training) x = self.dense(x, training=training) if return_state: return x, states else: return x


model = MyModel( vocab_size=vocab_size, embedding_dim=embedding_dim, rnn_units=rnn_units)

For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character:

Input And Output Sequences of RNN

  • An RNN can concurrently take a series of inputs and produce a series of outputs.
  • This form of sequence-to-sequence network is useful for predicting time collection which includes stock prices: you feed it the costs during the last N days, and it ought to output the fees shifted by means of sooner or later into the future.
  • You may feed the network a series of inputs and forget about all outputs besides for the final one, words, that is a sequence-to-vector network.
  • You could feed the network the equal input vector again and again once more at whenever step and allow it to output a sequence, that is a vector-to-sequence network.
  • You can have a sequence-to-vector network, referred to as an encoder, followed by a vector-to-sequence network, called a decoder.

Also Read : Azure DevOps Vs AWS DevOps

Python3


def


filter_score(rating):


return


int


(rating >


features


'Class Name'


'Title'


'Review Text'


data[features]


data[


'Rating'


y.


apply


(filter_score)

Text Preprocessing

The text data we have comes with too much noise. This noise can be in form of repeated words or commonly used sentences. In text preprocessing we need the text in the same format, so we first convert the entire text into lowercase. And then perform Lemmatization to remove the superposition of the words. Since we need clean text we also remove common words(aka Stopwords) and punctuation.

Python3


tokenizer


Tokenizer(num_words


10000


, oov_token


'


tokenizer.fit_on_texts(X_train)

Padding the Text Data

Keras preprocessing helps in organizing the text. Padding helps in building models of the same size that further becomes easy to train neural network models. The padding adds extra zeros to satisfy the maximum length to feed a neural network. If the text length exceeds then it can be truncated from either the beginning or end. By default it is pre, we can set it to post or leave it as it is.

Advanced: Customized Training

The above training procedure is simple, but does not give you much control. It uses teacher-forcing which prevents bad predictions from being fed back to the model, so the model never learns to recover from mistakes.

So now that you’ve seen how to run the model manually next you’ll implement the training loop. This gives a starting point if, for example, you want to implement curriculum learning to help stabilize the model’s open-loop output.

The most important part of a custom training loop is the train step function.

Use

tf.GradientTape

to track the gradients. You can learn more about this approach by reading the eager execution guide.

The basic procedure is:

  1. Execute the model and calculate the loss under a

    tf.GradientTape

    .
  2. Calculate the updates and apply them to the model using the optimizer.


class CustomTraining(MyModel): @tf.function def train_step(self, inputs): inputs, labels = inputs with tf.GradientTape() as tape: predictions = self(inputs, training=True) loss = self.loss(labels, predictions) grads = tape.gradient(loss, model.trainable_variables) self.optimizer.apply_gradients(zip(grads, model.trainable_variables)) return {'loss': loss}

The above implementation of the

train_step

method follows Keras’

train_step

conventions. This is optional, but it allows you to change the behavior of the train step and still use keras’

Model.compile

and

Model.fit

methods.


model = CustomTraining( vocab_size=len(ids_from_chars.get_vocabulary()), embedding_dim=embedding_dim, rnn_units=rnn_units)


model.compile(optimizer = tf.keras.optimizers.Adam(), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))


model.fit(dataset, epochs=1)

172/172 [==============================] – 13s 58ms/step – loss: 2.7075

Or if you need more control, you can write your own complete custom training loop:


EPOCHS = 10 mean = tf.metrics.Mean() for epoch in range(EPOCHS): start = time.time() mean.reset_states() for (batch_n, (inp, target)) in enumerate(dataset): logs = model.train_step([inp, target]) mean.update_state(logs['loss']) if batch_n % 50 == 0: template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}" print(template) # saving (checkpoint) the model every 5 epochs if (epoch + 1) % 5 == 0: model.save_weights(checkpoint_prefix.format(epoch=epoch)) print() print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}') print(f'Time taken for 1 epoch {time.time() - start:.2f} sec') print("_"*80) model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 2.1913 Epoch 1 Batch 50 Loss 2.0591 Epoch 1 Batch 100 Loss 1.9363 Epoch 1 Batch 150 Loss 1.8937 Epoch 1 Loss: 1.9814 Time taken for 1 epoch 12.44 sec ________________________________________________________________________________ Epoch 2 Batch 0 Loss 1.8354 Epoch 2 Batch 50 Loss 1.7515 Epoch 2 Batch 100 Loss 1.6990 Epoch 2 Batch 150 Loss 1.6749 Epoch 2 Loss: 1.7090 Time taken for 1 epoch 11.45 sec ________________________________________________________________________________ Epoch 3 Batch 0 Loss 1.5859 Epoch 3 Batch 50 Loss 1.5785 Epoch 3 Batch 100 Loss 1.5548 Epoch 3 Batch 150 Loss 1.5089 Epoch 3 Loss: 1.5524 Time taken for 1 epoch 11.49 sec ________________________________________________________________________________ Epoch 4 Batch 0 Loss 1.4963 Epoch 4 Batch 50 Loss 1.4674 Epoch 4 Batch 100 Loss 1.4629 Epoch 4 Batch 150 Loss 1.4254 Epoch 4 Loss: 1.4550 Time taken for 1 epoch 11.26 sec ________________________________________________________________________________ Epoch 5 Batch 0 Loss 1.3884 Epoch 5 Batch 50 Loss 1.4480 Epoch 5 Batch 100 Loss 1.3669 Epoch 5 Batch 150 Loss 1.3619 Epoch 5 Loss: 1.3870 Time taken for 1 epoch 11.19 sec ________________________________________________________________________________ Epoch 6 Batch 0 Loss 1.3157 Epoch 6 Batch 50 Loss 1.3346 Epoch 6 Batch 100 Loss 1.3065 Epoch 6 Batch 150 Loss 1.2660 Epoch 6 Loss: 1.3341 Time taken for 1 epoch 11.25 sec ________________________________________________________________________________ Epoch 7 Batch 0 Loss 1.3223 Epoch 7 Batch 50 Loss 1.2794 Epoch 7 Batch 100 Loss 1.2886 Epoch 7 Batch 150 Loss 1.3036 Epoch 7 Loss: 1.2888 Time taken for 1 epoch 11.10 sec ________________________________________________________________________________ Epoch 8 Batch 0 Loss 1.2318 Epoch 8 Batch 50 Loss 1.2245 Epoch 8 Batch 100 Loss 1.2677 Epoch 8 Batch 150 Loss 1.2397 Epoch 8 Loss: 1.2480 Time taken for 1 epoch 11.13 sec ________________________________________________________________________________ Epoch 9 Batch 0 Loss 1.2021 Epoch 9 Batch 50 Loss 1.2654 Epoch 9 Batch 100 Loss 1.2190 Epoch 9 Batch 150 Loss 1.1929 Epoch 9 Loss: 1.2083 Time taken for 1 epoch 11.31 sec ________________________________________________________________________________ Epoch 10 Batch 0 Loss 1.1429 Epoch 10 Batch 50 Loss 1.1642 Epoch 10 Batch 100 Loss 1.1455 Epoch 10 Batch 150 Loss 1.1687 Epoch 10 Loss: 1.1684 Time taken for 1 epoch 11.55 sec ________________________________________________________________________________

Đăng nhập/Đăng ký
Ranking
Cộng đồng
|
Kiến thức
30 tháng 05, 2022
Admin
05:18 30/05/2022
Recurrent Neural Network – Tự Học TensorFlow
Cùng tác giả
Không có dữ liệu
0
0
0
Admin
2995 người theo dõi
1283
184
Có liên quan
Không có dữ liệu
Chia sẻ kiến thức – Kết nối tương lai
Về chúng tôi
Về chúng tôi
Giới thiệu
Chính sách bảo mật
Điều khoản dịch vụ
Học miễn phí
Học miễn phí
Khóa học
Luyện tập
Cộng đồng
Cộng đồng
Kiến thức
Tin tức
Hỏi đáp
CÔNG TY CỔ PHẦN CÔNG NGHỆ GIÁO DỤC VÀ DỊCH VỤ BRONTOBYTE
The Manor Central Park, đường Nguyễn Xiển, phường Đại Kim, quận Hoàng Mai, TP. Hà Nội
THÔNG TIN LIÊN HỆ
[email protected]
©2024 TEK4.VN
Copyright © 2024
TEK4.VN

Introduction

Recurrent neural networks (RNN) are a class of neural networks that is powerful for modeling sequence data such as time series or natural language.

Schematically, a RNN layer uses a

for

loop to iterate over the timesteps of a
sequence, while maintaining an internal state that encodes information about the
timesteps it has seen so far.

The Keras RNN API is designed with a focus on:

  • Ease of use: the built-in


    keras.layers.RNN

    ,

    keras.layers.LSTM

    ,

    keras.layers.GRU

    layers enable you to quickly build recurrent models without having to make difficult configuration choices.

  • Ease of customization: You can also define your own RNN cell layer (the inner part of the


    for

    loop) with custom behavior, and use it with the generic

    keras.layers.RNN

    layer (the

    for

    loop itself). This allows you to quickly prototype different research ideas in a flexible way with minimal code.

RNN in time series

In this TensorFlow RNN tutorial, you will use an RNN with time series data. Time series are dependent to previous time which means past values includes relevant information that the network can learn from. The idea behind time series prediction is to estimate the future value of a series, let’s say, stock price, temperature, GDP and so on.

The data preparation for Keras RNN and time series can be a little bit tricky. First of all, the objective is to predict the next value of the series, meaning, you will use the past information to estimate the value at t + 1. The label is equal to the input sequence and shifted one period ahead. Secondly, the number of input is set to 1, i.e., one observation per time. Lastly, the time step is equal to the sequence of the numerical value. For instance, if you set the time step to 10, the input sequence will return ten consecutive times.

Look at the graph below, we have represented the time series data on the left and a fictive input sequence on the right. You create a function to return a dataset with random value for each day from January 2001 to December 2016

# To plot pretty figures %matplotlib inline import matplotlib import matplotlib.pyplot as plt import pandas as pd def create_ts(start = ‘2001’, n = 201, freq = ‘M’): rng = pd.date_range(start=start, periods=n, freq=freq) ts = pd.Series(np.random.uniform(-18, 18, size=len(rng)), rng).cumsum() return ts ts= create_ts(start = ‘2001’, n = 192, freq = ‘M’) ts.tail(5)

Output

2016-08-31 -93.459631 2016-09-30 -95.264791 2016-10-31 -95.551935 2016-11-30 -105.879611 2016-12-31 -123.729319 Freq: M, dtype: float64

ts = create_ts(start = ‘2001’, n = 222) # Left plt.figure(figsize=(11,4)) plt.subplot(121) plt.plot(ts.index, ts) plt.plot(ts.index[90:100], ts[90:100], “b-“, linewidth=3, label=”A training instance”) plt.title(“A time series (generated)”, fontsize=14) # Right plt.subplot(122) plt.title(“A training instance”, fontsize=14) plt.plot(ts.index[90:100], ts[90:100], “b-“, markersize=8, label=”instance”) plt.plot(ts.index[91:101], ts[91:101], “bo”, markersize=10, label=”target”, markerfacecolor=’red’) plt.legend(loc=”upper left”) plt.xlabel(“Time”) plt.show()

The right part of the graph shows all series. It started from 2001 and finishes in 2019 It makes no sense to feed all the data in the network, instead, you need to create a batch of data with a length equal to the time step. This batch will be the X variable. The Y variable is the same as X but shifted by one period (i.e., you want to forecast t+1).

Both vectors have the same length. You can see it in the right part of the above graph. The line represents the ten values of the X input, while the red dots are the ten values of the label, Y. Note that, the label starts one period ahead of X and finishes one period after.

Python3


model.


compile


"rmsprop"


"binary_crossentropy"


metrics


"accuracy"


])


history


model.fit(train_pad,


y_train,


epochs

Output:

Training Process of the SimpleRNN model

Extra: RNN Gated Cells Architecture

Recurrent Neural Networks are the hardest to train. Since this Neural Network is pruned to Vanishing Gradients. To overcome this issue, many researchers worked to develop a better version of RNN. Some solution provided to Vanishing gradients is to include a proper activation function that can prevent the shrinking of the gradients. The next solution is the initialization of weights and biases. One another method is using complex gated cells. Example of these complex gated cells is LSTM and GRU. This architecture consists of a separate cell state and forger gate that allows information to pass through unchanged.

We shall now train LSTM(Long Short Term Memory) and GRU(Gated Recurrent Unit) Recurrent Neural Network architecture in TensorFlow.

Bidirectional LSTM and GRU

The bidirectional layer is mainly used when you have a feedback network, in this case, Bidirectional LSTM is the process of making a Neural Network have the sequence of data in both directions i.e., forwards (past to future) and backward (future to past). Since RNN backpropagate back to a time, Birecdirectional LSTM or GRU ar

Process the text

Vectorize the text

Before training, you need to convert the strings to a numerical representation.

The

tf.keras.layers.StringLookup

layer can convert each character into a numeric ID. It just needs the text to be split into tokens first.


example_texts = ['abcdefg', 'xyz'] chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8') chars

Now create the

tf.keras.layers.StringLookup

layer:


ids_from_chars = tf.keras.layers.StringLookup( vocabulary=list(vocab), mask_token=None)

It converts from tokens to character IDs:


ids = ids_from_chars(chars) ids

Since the goal of this tutorial is to generate text, it will also be important to invert this representation and recover human-readable strings from it. For this you can use

tf.keras.layers.StringLookup(..., invert=True)

.


chars_from_ids = tf.keras.layers.StringLookup( vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

This layer recovers the characters from the vectors of IDs, and returns them as a

tf.RaggedTensor

of characters:


chars = chars_from_ids(ids) chars

You can

tf.strings.reduce_join

to join the characters back into strings.


tf.strings.reduce_join(chars, axis=-1).numpy()

array([b’abcdefg’, b’xyz’], dtype=object)


def text_from_ids(ids): return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task you’re training the model to perform. The input to the model will be a sequence of characters, and you train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?

Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain

seq_length

characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of

seq_length+1

. For example, say

seq_length

is 4 and our text is “Hello”. The input sequence would be “Hell”, and the target sequence “ello”.

To do this first use the

tf.data.Dataset.from_tensor_slices

function to convert the text vector into a stream of character indices.


all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8')) all_ids


ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)


for ids in ids_dataset.take(10): print(chars_from_ids(ids).numpy().decode('utf-8'))

F i r s t C i t i


seq_length = 100

The

batch

method lets you easily convert these individual characters to sequences of the desired size.


sequences = ids_dataset.batch(seq_length+1, drop_remainder=True) for seq in sequences.take(1): print(chars_from_ids(seq))

tf.Tensor( [b’F’ b’i’ b’r’ b’s’ b’t’ b’ ‘ b’C’ b’i’ b’t’ b’i’ b’z’ b’e’ b’n’ b’:’ b’\n’ b’B’ b’e’ b’f’ b’o’ b’r’ b’e’ b’ ‘ b’w’ b’e’ b’ ‘ b’p’ b’r’ b’o’ b’c’ b’e’ b’e’ b’d’ b’ ‘ b’a’ b’n’ b’y’ b’ ‘ b’f’ b’u’ b’r’ b’t’ b’h’ b’e’ b’r’ b’,’ b’ ‘ b’h’ b’e’ b’a’ b’r’ b’ ‘ b’m’ b’e’ b’ ‘ b’s’ b’p’ b’e’ b’a’ b’k’ b’.’ b’\n’ b’\n’ b’A’ b’l’ b’l’ b’:’ b’\n’ b’S’ b’p’ b’e’ b’a’ b’k’ b’,’ b’ ‘ b’s’ b’p’ b’e’ b’a’ b’k’ b’.’ b’\n’ b’\n’ b’F’ b’i’ b’r’ b’s’ b’t’ b’ ‘ b’C’ b’i’ b’t’ b’i’ b’z’ b’e’ b’n’ b’:’ b’\n’ b’Y’ b’o’ b’u’ b’ ‘], shape=(101,), dtype=string)

It’s easier to see what this is doing if you join the tokens back into strings:


for seq in sequences.take(5): print(text_from_ids(seq).numpy())

b’First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou ‘ b’are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k’ b”now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know’t, we know’t.\n\nFirst Citizen:\nLet us ki” b”ll him, and we’ll have corn at our own price.\nIs’t a verdict?\n\nAll:\nNo more talking on’t; let it be d” b’one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi’

For training you’ll need a dataset of

(input, label)

pairs. Where

input

and

label

are sequences. At each time step the input is the current character and the label is the next character.

Here’s a function that takes a sequence as input, duplicates, and shifts it to align the input and label for each timestep:


def split_input_target(sequence): input_text = sequence[:-1] target_text = sequence[1:] return input_text, target_text


split_input_target(list("Tensorflow"))

([‘T’, ‘e’, ‘n’, ‘s’, ‘o’, ‘r’, ‘f’, ‘l’, ‘o’], [‘e’, ‘n’, ‘s’, ‘o’, ‘r’, ‘f’, ‘l’, ‘o’, ‘w’])


dataset = sequences.map(split_input_target)


for input_example, target_example in dataset.take(1): print("Input :", text_from_ids(input_example).numpy()) print("Target:", text_from_ids(target_example).numpy())

Input : b’First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou’ Target: b’irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou ‘

Create training batches

You used

tf.data

to split the text into manageable sequences. But before feeding this data into the model, you need to shuffle the data and pack it into batches.


# Batch size BATCH_SIZE = 64 # Buffer size to shuffle the dataset # (TF data is designed to work with possibly infinite sequences, # so it doesn't attempt to shuffle the entire sequence in memory. Instead, # it maintains a buffer in which it shuffles elements). BUFFER_SIZE = 10000 dataset = ( dataset .shuffle(BUFFER_SIZE) .batch(BATCH_SIZE, drop_remainder=True) .prefetch(tf.data.experimental.AUTOTUNE)) dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

Train the model


history = model.fit(train_dataset, epochs=10, validation_data=test_dataset, validation_steps=30)

Epoch 1/10 2023-11-16 13:53:32.243442: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_INT32 } } } is neither a subtype nor a supertype of the combined inputs preceding it: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_FLOAT } } } for Tuple type infernce function 0 while inferring type of node ‘cond_36/output/_23’ WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1700142813.152065 83765 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 391/391 [==============================] – 43s 88ms/step – loss: 0.6566 – accuracy: 0.5580 – val_loss: 0.5489 – val_accuracy: 0.7505 Epoch 2/10 391/391 [==============================] – 21s 54ms/step – loss: 0.4354 – accuracy: 0.7937 – val_loss: 0.3724 – val_accuracy: 0.8234 Epoch 3/10 391/391 [==============================] – 22s 54ms/step – loss: 0.3451 – accuracy: 0.8468 – val_loss: 0.3403 – val_accuracy: 0.8521 Epoch 4/10 391/391 [==============================] – 21s 52ms/step – loss: 0.3224 – accuracy: 0.8601 – val_loss: 0.3332 – val_accuracy: 0.8573 Epoch 5/10 391/391 [==============================] – 21s 52ms/step – loss: 0.3168 – accuracy: 0.8623 – val_loss: 0.3291 – val_accuracy: 0.8620 Epoch 6/10 391/391 [==============================] – 21s 52ms/step – loss: 0.3088 – accuracy: 0.8658 – val_loss: 0.3370 – val_accuracy: 0.8615 Epoch 7/10 391/391 [==============================] – 22s 52ms/step – loss: 0.3060 – accuracy: 0.8692 – val_loss: 0.3271 – val_accuracy: 0.8448 Epoch 8/10 391/391 [==============================] – 21s 52ms/step – loss: 0.3033 – accuracy: 0.8714 – val_loss: 0.3249 – val_accuracy: 0.8583 Epoch 9/10 391/391 [==============================] – 21s 51ms/step – loss: 0.3017 – accuracy: 0.8695 – val_loss: 0.3293 – val_accuracy: 0.8385 Epoch 10/10 391/391 [==============================] – 21s 52ms/step – loss: 0.2995 – accuracy: 0.8717 – val_loss: 0.3217 – val_accuracy: 0.8630


test_loss, test_acc = model.evaluate(test_dataset) print('Test Loss:', test_loss) print('Test Accuracy:', test_acc)

391/391 [==============================] – 9s 23ms/step – loss: 0.3167 – accuracy: 0.8624 Test Loss: 0.3167201280593872 Test Accuracy: 0.8623600006103516


plt.figure(figsize=(16, 8)) plt.subplot(1, 2, 1) plot_graphs(history, 'accuracy') plt.ylim(None, 1) plt.subplot(1, 2, 2) plot_graphs(history, 'loss') plt.ylim(0, None)

(0.0, 0.6744414046406746)

Run a prediction on a new sentence:

If the prediction is >= 0.0, it is positive else it is negative.


sample_text = ('The movie was cool. The animation and the graphics ' 'were out of this world. I would recommend this movie.') predictions = model.predict(np.array([sample_text]))

1/1 [==============================] – 2s 2s/step

Introduction to Recurrent Neural Networks with Keras and TensorFlow

It’s a standard Monday morning for you. You are sitting at your workstation, waiting for another computer vision problem statement. By now, you have become an absolute maestro at computer vision (CV) problems.

But to your horror, your company gives you a sequential text classification problem instead of CV. They say they are expanding and for that reason, in front of you lies an absolutely alien domain of which you know nothing.

Thankfully, we would never want that to happen to you. So, finally, due to popular demand, we bring you our first tutorial on dealing with sequential text data; Recurrent Neural Networks (RNNs).

The world of deep learning has progressed immensely, with Transformer models ruling both NLP and CV domains. But to understand Transformers, it is important to grasp the intuition behind RNNs, your gateway to working with sequential data.

Oh, we are really excited about this one! This tutorial marks our first venture into deep learning with sequential data. We have been a vision-first firm for a long time now, and it is about time we learn and process the information provided by the language world.

In this tutorial, we talk about sequential data and how to model it. We build a Recurrent Neural Network and train it on a well-defined application of the real world.

This lesson is the first in a 3-part series on NLP 102:

  1. Introduction to Recurrent Neural Networks with Keras and TensorFlow (today’s tutorial)
  2. Long Short-Term Memory Networks
  3. Neural Machine Translation

To learn how to build a Recurrent Neural Network with TensorFlow and Keras, just keep reading.

What is a Recurrent Neural Network (RNN)?

A Recurrent Neural Network (RNN) is a class of Artificial Neural Network in which the connection between different nodes forms a directed graph to give a temporal dynamic behavior. It helps to model sequential data that are derived from feedforward networks. It works similarly to human brains to deliver predictive results.

A recurrent neural network looks quite similar to a traditional neural network except that a memory-state is added to the neurons. The computation to include a memory is simple.

Imagine a simple model with only one neuron feeds by a batch of data. In a traditional neural net, the model produces the output by multiplying the input with the weight and the activation function. With an RNN, this output is sent back to itself number of time. We call timestep the amount of time the output becomes the input of the next matrice multiplication.

For instance, in the picture below, you can see the network is composed of one neuron. The network computes the matrices multiplication between the input and the weight and adds non-linearity with the activation function. It becomes the output at t-1. This output is the input of the second matrix multiplication.

Below, we code a simple RNN in TensorFlow to understand the step and also the shape of the output.

The network is composed of:

  • Four inputs
  • Six neurons
  • 2-time steps

The network will proceed as depicted by the picture below.

The network is called ‘recurrent’ because it performs the same operation in each activate square. The network computed the weights of the inputs and the previous output before to use an activation function.

import numpy as np import tensorflow as tf n_inputs = 4 n_neurons = 6 n_timesteps = 2 The data is a sequence of a number from 0 to 9 and divided into three batches of data. ## Data X_batch = np.array([ [[0, 1, 2, 5], [9, 8, 7, 4]], # Batch 1 [[3, 4, 5, 2], [0, 0, 0, 0]], # Batch 2 [[6, 7, 8, 5], [6, 5, 4, 2]], # Batch 3 ])

We can build the network with a placeholder for the data, the recurrent stage and the output.

  1. Define the placeholder for the data

X = tf.placeholder(tf.float32, [None, n_timesteps, n_inputs])

Here:

  • None: Unknown and will take the size of the batch
  • n_timesteps: Number of times the network will send the output back to the neuron
  • n_inputs: Number of input per batch
  1. Define the recurrent network

As mentioned in the picture above, the network is composed of 6 neurons. The network will compute two dot product:

  • Input data with the first set of weights (i.e., 6: equal to the number of neurons)
  • Previous output with a second set of weights (i.e., 6: corresponding to the number of output)

Note that, during the first feedforward, the values of the previous output are equal to zeroes because we don’t have any value available.

The object to build an RNN is tf.contrib.rnn.BasicRNNCell with the argument num_units to define the number of input

basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)

Now that the network is defined, you can compute the outputs and states

outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

This object uses an internal loop to multiply the matrices the appropriate number of times.

Note that the recurent neuron is a function of all the inputs of the previous time steps. This is how the network build its own memory. The information from the previous time can propagate in future time. This is the magic of Recurrent neural network

## Define the shape of the tensor X = tf.placeholder(tf.float32, [None, n_timesteps, n_inputs]) ## Define the network basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32) init = tf.global_variables_initializer() init = tf.global_variables_initializer() with tf.Session() as sess: init.run() outputs_val = outputs.eval(feed_dict={X: X_batch}) print(states.eval(feed_dict={X: X_batch})) [[ 0.38941205 -0.9980438 0.99750966 0.7892596 0.9978241 0.9999997 ] [ 0.61096436 0.7255889 0.82977575 -0.88226104 0.29261455 -0.15597084] [ 0.62091285 -0.87023467 0.99729395 -0.58261937 0.9811445 0.99969864]]

For explanatory purposes, you print the values of the previous state. The output printed above shows the output from the last state. Now print all the output, you can notice the states are the previous output of each batch. That is, the previous output contains the information about the entire sequence.e

print(outputs_val) print(outputs_val.shape) [[[-0.75934666 -0.99537754 0.9735819 -0.9722234 -0.14234993 -0.9984044 ] [ 0.99975264 -0.9983206 0.9999993 -1. -0.9997506 -1. ]] [[ 0.97486496 -0.98773265 0.9969686 -0.99950117 -0.7092863 -0.99998885] [ 0.9326837 0.2673438 0.2808514 -0.7535883 -0.43337247 0.5700631 ]] [[ 0.99628735 -0.9998728 0.99999213 -0.99999976 -0.9884324 -1. ] [ 0.99962527 -0.9467421 0.9997403 -0.99999714 -0.99929446 -0.9999795 ]]] (3, 2, 6)

The output has the shape of (3, 2, 6):

  • 3: Number of batches
  • 2: Number of the timestep
  • 6: Number of neurons

The optimization of a recurrent neural network is identical to a traditional neural network. You will see in more detail how to code optimization in the next part of this Recurrent Neural Network tutorial.

Introduction

Recurrent neural networks (RNN) are a class of neural networks that is powerful for modeling sequence data such as time series or natural language.

Schematically, a RNN layer uses a

for

loop to iterate over the timesteps of a
sequence, while maintaining an internal state that encodes information about the
timesteps it has seen so far.

The Keras RNN API is designed with a focus on:

  • Ease of use: the built-in


    keras.layers.RNN

    ,

    keras.layers.LSTM

    ,

    keras.layers.GRU

    layers enable you to quickly build recurrent models without having to make difficult configuration choices.

  • Ease of customization: You can also define your own RNN cell layer (the inner part of the


    for

    loop) with custom behavior, and use it with the generic

    keras.layers.RNN

    layer (the

    for

    loop itself). This allows you to quickly prototype different research ideas in a flexible way with minimal code.

Summary

In this tutorial, we have covered the basics of sequential data and how to model them. We at PyImageSearch always want our readers to have a strong foundation. To cover some exciting blog posts on Attention and Transformers (coming soon, really soon), this was the first big step to take.

You are now familiar with a modest amount of text processing utilities and the recurrence formula and can build a Recurrent Neural Network for modeling sequential data of any sort.

In the coming tutorial on this series, we will study the shortcomings of RNN and understand how to bypass them using Long Short-Term Memory.

References

  • https://www.tensorflow.org/tutorials/keras/text_classification
  • https://wandb.ai/authors/rnn-viz/reports/Under-the-Hood-of-RNNs–VmlldzoyNTQ4MjY
  • https://keras.io/api/layers/recurrent_layers/simple_rnn/
  • Mathematical Animations generated using: Manim CE
  • Interactive graph generated using: Desmos

Citation Information

A. R. Gosthipaty, D. Chakraborty, and R. Raha. “Introduction to Recurrent Neural Networks with Keras and TensorFlow,” PyImageSearch, P. Chugh, S. Huot, K. Kidriavsteva, and A. Thanki, eds., 2022, https://pyimg.co/a3dwm

@incollection{GCR_2022_RNN, author = {Aritra Roy Gosthipaty and Devjyoti Chakraborty and Ritwik Raha}, title = {Introduction to Recurrent Neural Networks with Keras and TensorFlow}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and and Susan Huot and Kseniia Kidriavsteva and Abhishek Thanki}, year = {2022}, note = {https://pyimg.co/a3dwm}, }

Unleash the potential of computer vision with Roboflow – Free!

  • Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
  • Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
  • Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
  • Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
  • Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you’ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

This tutorial is an introduction to time series forecasting using TensorFlow. It builds a few different styles of models including Convolutional and Recurrent Neural Networks (CNNs and RNNs).

This is covered in two main parts, with subsections:

  • Forecast for a single time step:

    • A single feature.
    • All features.
  • Forecast multiple steps:

    • Single-shot: Make the predictions all at once.
    • Autoregressive: Make one prediction at a time and feed the output back to the model.

Feed-Forward Neural Networks vs Recurrent Neural Networks

A feed-forward neural network allows information to flow only within the forward direction, from the input nodes, through the hidden layers, and to the output nodes. There aren’t any cycles or loops within the network.

Below is how a simplified presentation of a feed-forward neural network looks like:

In a feed-forward neural network, the choices are supported this input. It doesn’t memorize the past data, and there’s no future scope. Feed-forward neural networks are utilized in general regression and classification problems.

Next steps

This tutorial was a quick introduction to time series forecasting using TensorFlow.

To learn more, refer to:

  • Chapter 15 of Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition.
  • Chapter 6 of Deep Learning with Python.
  • Lesson 8 of Udacity’s intro to TensorFlow for deep learning, including the exercise notebooks.

Also, remember that you can implement any classical time series model in TensorFlow—this tutorial just focuses on TensorFlow’s built-in functionality.

RNN (Recurrent Neural Network) Tutorial: TensorFlow Example

How Recurrent Neural Networks Work

In RNN, the information cycles through the loop, so the output is determined by the current input and previously received inputs.

The input layer X processes the initial input and passes it to the middle layer A. The middle layer consists of multiple hidden layers, each with its activation functions, weights, and biases. These parameters are standardized across the hidden layer so that instead of creating multiple hidden layers, it will create one and loop it over.

Instead of using traditional backpropagation, recurrent neural networks use backpropagation through time (BPTT) algorithms to determine the gradient. In backpropagation, the model adjusts the parameter by calculating errors from the output to the input layer. BPTT sums the error at each time step as RNN shares parameters across each layer. Learn more on RNNs and how they work at What are Recurrent Neural Networks?.

Python3


model


keras.models.Sequential()


model.add(keras.layers.Embedding(


10000


128


))


model.add(keras.layers.SimpleRNN(


64


, return_sequences


True


))


model.add(keras.layers.SimpleRNN(


64


))


model.add(keras.layers.Dense(


128


, activation


"relu"


))


model.add(keras.layers.Dropout(


0.4


))


model.add(keras.layers.Dense(


, activation


"sigmoid"


))


model.summary()

Output:

Summary of the architecture of the model

Model Training

In Tensorflow, after developing a model, it needs to be compiled using the three important parameters i.e., Optimizer, Loss Function, and Evaluation metrics.

Single step models

The simplest model you can build on this sort of data is one that predicts a single feature’s value—1 time step (one hour) into the future based only on the current conditions.

So, start by building models to predict the

T (degC)

value one hour into the future.

Configure a

WindowGenerator

object to produce these single-step

(input, label)

pairs:


single_step_window = WindowGenerator( input_width=1, label_width=1, shift=1, label_columns=['T (degC)']) single_step_window

Total window size: 2 Input indices: [0] Label indices: [1] Label column name(s): [‘T (degC)’]

The

window

object creates

tf.data.Dataset

s from the training, validation, and test sets, allowing you to easily iterate over batches of data.


for example_inputs, example_labels in single_step_window.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 1, 19) Labels shape (batch, time, features): (32, 1, 1)

Baseline

Before building a trainable model it would be good to have a performance baseline as a point for comparison with the later more complicated models.

This first task is to predict temperature one hour into the future, given the current value of all features. The current values include the current temperature.

So, start with a model that just returns the current temperature as the prediction, predicting “No change”. This is a reasonable baseline since temperature changes slowly. Of course, this baseline will work less well if you make a prediction further in the future.


class Baseline(tf.keras.Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: return inputs result = inputs[:, :, self.label_index] return result[:, :, tf.newaxis]

Instantiate and evaluate this model:


baseline = Baseline(label_index=column_indices['T (degC)']) baseline.compile(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.MeanAbsoluteError()]) val_performance = {} performance = {} val_performance['Baseline'] = baseline.evaluate(single_step_window.val) performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)

439/439 [==============================] – 1s 2ms/step – loss: 0.0128 – mean_absolute_error: 0.0785

That printed some performance metrics, but those don’t give you a feeling for how well the model is doing.

The

WindowGenerator

has a plot method, but the plots won’t be very interesting with only a single sample.

So, create a wider

WindowGenerator

that generates windows 24 hours of consecutive inputs and labels at a time. The new

wide_window

variable doesn’t change the way the model operates. The model still makes predictions one hour into the future based on a single input time step. Here, the

time

axis acts like the

batch

axis: each prediction is made independently with no interaction between time steps:


wide_window = WindowGenerator( input_width=24, label_width=24, shift=1, label_columns=['T (degC)']) wide_window

Total window size: 25 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] Label column name(s): [‘T (degC)’]

This expanded window can be passed directly to the same

baseline

model without any code changes. This is possible because the inputs and labels have the same number of time steps, and the baseline just forwards the input to the output:


print('Input shape:', wide_window.example[0].shape) print('Output shape:', baseline(wide_window.example[0]).shape)

Input shape: (32, 24, 19) Output shape: (32, 24, 1)

By plotting the baseline model’s predictions, notice that it is simply the labels shifted right by one hour:


wide_window.plot(baseline)

In the above plots of three examples the single step model is run over the course of 24 hours. This deserves some explanation:

  • The blue

    Inputs

    line shows the input temperature at each time step. The model receives all features, this plot only shows the temperature.
  • The green

    Labels

    dots show the target prediction value. These dots are shown at the prediction time, not the input time. That is why the range of labels is shifted 1 step relative to the inputs.
  • The orange

    Predictions

    crosses are the model’s prediction’s for each output time step. If the model were predicting perfectly the predictions would land directly on the

    Labels

    .

Linear model

The simplest trainable model you can apply to this task is to insert linear transformation between the input and output. In this case the output from a time step only depends on that step:

A

tf.keras.layers.Dense

layer with no

activation

set is a linear model. The layer only transforms the last axis of the data from

(batch, time, inputs)

to

(batch, time, units)

; it is applied independently to every item across the

batch

and

time

axes.


linear = tf.keras.Sequential([ tf.keras.layers.Dense(units=1) ])


print('Input shape:', single_step_window.example[0].shape) print('Output shape:', linear(single_step_window.example[0]).shape)

Input shape: (32, 1, 19) Output shape: (32, 1, 1)

This tutorial trains many models, so package the training procedure into a function:


MAX_EPOCHS = 20 def compile_and_fit(model, window, patience=2): early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, mode='min') model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()]) history = model.fit(window.train, epochs=MAX_EPOCHS, validation_data=window.val, callbacks=[early_stopping]) return history

Train the model and evaluate its performance:


history = compile_and_fit(linear, single_step_window) val_performance['Linear'] = linear.evaluate(single_step_window.val) performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)

Epoch 1/20 21/1534 […………………………] – ETA: 3s – loss: 2.6312 – mean_absolute_error: 1.3650 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698384492.579925 449804 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1534/1534 [==============================] – 5s 3ms/step – loss: 0.2011 – mean_absolute_error: 0.2786 – val_loss: 0.0245 – val_mean_absolute_error: 0.1169 Epoch 2/20 1534/1534 [==============================] – 4s 3ms/step – loss: 0.0140 – mean_absolute_error: 0.0876 – val_loss: 0.0099 – val_mean_absolute_error: 0.0739 Epoch 3/20 1534/1534 [==============================] – 4s 3ms/step – loss: 0.0097 – mean_absolute_error: 0.0726 – val_loss: 0.0091 – val_mean_absolute_error: 0.0713 Epoch 4/20 1534/1534 [==============================] – 4s 3ms/step – loss: 0.0093 – mean_absolute_error: 0.0707 – val_loss: 0.0087 – val_mean_absolute_error: 0.0688 Epoch 5/20 1534/1534 [==============================] – 4s 3ms/step – loss: 0.0092 – mean_absolute_error: 0.0702 – val_loss: 0.0090 – val_mean_absolute_error: 0.0706 Epoch 6/20 1534/1534 [==============================] – 4s 3ms/step – loss: 0.0091 – mean_absolute_error: 0.0700 – val_loss: 0.0088 – val_mean_absolute_error: 0.0697 439/439 [==============================] – 1s 2ms/step – loss: 0.0088 – mean_absolute_error: 0.0697

Like the

baseline

model, the linear model can be called on batches of wide windows. Used this way the model makes a set of independent predictions on consecutive time steps. The

time

axis acts like another

batch

axis. There are no interactions between the predictions at each time step.


print('Input shape:', wide_window.example[0].shape) print('Output shape:', linear(wide_window.example[0]).shape)

Input shape: (32, 24, 19) Output shape: (32, 24, 1)

Here is the plot of its example predictions on the

wide_window

, note how in many cases the prediction is clearly better than just returning the input temperature, but in a few cases it’s worse:


wide_window.plot(linear)

One advantage to linear models is that they’re relatively simple to interpret. You can pull out the layer’s weights and visualize the weight assigned to each input:


plt.bar(x = range(len(train_df.columns)), height=linear.layers[0].kernel[:,0].numpy()) axis = plt.gca() axis.set_xticks(range(len(train_df.columns))) _ = axis.set_xticklabels(train_df.columns, rotation=90)

Sometimes the model doesn’t even place the most weight on the input

T (degC)

. This is one of the risks of random initialization.

Dense

Before applying models that actually operate on multiple time-steps, it’s worth checking the performance of deeper, more powerful, single input step models.

Here’s a model similar to the

linear

model, except it stacks several a few

Dense

layers between the input and the output:


dense = tf.keras.Sequential([ tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=1) ]) history = compile_and_fit(dense, single_step_window) val_performance['Dense'] = dense.evaluate(single_step_window.val) performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)

Epoch 1/20 1534/1534 [==============================] – 8s 4ms/step – loss: 0.0188 – mean_absolute_error: 0.0830 – val_loss: 0.0081 – val_mean_absolute_error: 0.0662 Epoch 2/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0079 – mean_absolute_error: 0.0647 – val_loss: 0.0073 – val_mean_absolute_error: 0.0623 Epoch 3/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0075 – mean_absolute_error: 0.0627 – val_loss: 0.0073 – val_mean_absolute_error: 0.0616 Epoch 4/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0073 – mean_absolute_error: 0.0611 – val_loss: 0.0071 – val_mean_absolute_error: 0.0618 Epoch 5/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0071 – mean_absolute_error: 0.0603 – val_loss: 0.0072 – val_mean_absolute_error: 0.0613 Epoch 6/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0069 – mean_absolute_error: 0.0593 – val_loss: 0.0066 – val_mean_absolute_error: 0.0571 Epoch 7/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0068 – mean_absolute_error: 0.0589 – val_loss: 0.0066 – val_mean_absolute_error: 0.0585 Epoch 8/20 1534/1534 [==============================] – 6s 4ms/step – loss: 0.0068 – mean_absolute_error: 0.0584 – val_loss: 0.0068 – val_mean_absolute_error: 0.0575 439/439 [==============================] – 1s 2ms/step – loss: 0.0068 – mean_absolute_error: 0.0575

Multi-step dense

A single-time-step model has no context for the current values of its inputs. It can’t see how the input features are changing over time. To address this issue the model needs access to multiple time steps when making predictions:

The

baseline

,

linear

and

dense

models handled each time step independently. Here the model will take multiple time steps as input to produce a single output.

Create a

WindowGenerator

that will produce batches of three-hour inputs and one-hour labels:

Note that the

Window

‘s

shift

parameter is relative to the end of the two windows.


CONV_WIDTH = 3 conv_window = WindowGenerator( input_width=CONV_WIDTH, label_width=1, shift=1, label_columns=['T (degC)']) conv_window

Total window size: 4 Input indices: [0 1 2] Label indices: [3] Label column name(s): [‘T (degC)’]


conv_window.plot() plt.title("Given 3 hours of inputs, predict 1 hour into the future.")

Text(0.5, 1.0, ‘Given 3 hours of inputs, predict 1 hour into the future.’)

You could train a

dense

model on a multiple-input-step window by adding a

tf.keras.layers.Flatten

as the first layer of the model:


multi_step_dense = tf.keras.Sequential([ # Shape: (time, features) => (time*features) tf.keras.layers.Flatten(), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), # Add back the time dimension. # Shape: (outputs) => (1, outputs) tf.keras.layers.Reshape([1, -1]), ])


print('Input shape:', conv_window.example[0].shape) print('Output shape:', multi_step_dense(conv_window.example[0]).shape)

Input shape: (32, 3, 19) Output shape: (32, 1, 1)


history = compile_and_fit(multi_step_dense, conv_window) IPython.display.clear_output() val_performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.val) performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test, verbose=0)

438/438 [==============================] – 1s 2ms/step – loss: 0.0066 – mean_absolute_error: 0.0568


conv_window.plot(multi_step_dense)

The main down-side of this approach is that the resulting model can only be executed on input windows of exactly this shape.


print('Input shape:', wide_window.example[0].shape) try: print('Output shape:', multi_step_dense(wide_window.example[0]).shape) except Exception as e: print(f'\n{type(e).__name__}:{e}')

Input shape: (32, 24, 19) ValueError:Exception encountered when calling layer ‘sequential_2’ (type Sequential). Input 0 of layer “dense_4” is incompatible with the layer: expected axis -1 of input shape to have value 57, but received input with shape (32, 456) Call arguments received by layer ‘sequential_2’ (type Sequential): • inputs=tf.Tensor(shape=(32, 24, 19), dtype=float32) • training=None • mask=None

The convolutional models in the next section fix this problem.

Convolution neural network

A convolution layer (

tf.keras.layers.Conv1D

) also takes multiple time steps as input to each prediction.

Below is the same model as

multi_step_dense

, re-written with a convolution.

Note the changes:

  • The

    tf.keras.layers.Flatten

    and the first

    tf.keras.layers.Dense

    are replaced by a

    tf.keras.layers.Conv1D

    .
  • The

    tf.keras.layers.Reshape

    is no longer necessary since the convolution keeps the time axis in its output.


conv_model = tf.keras.Sequential([ tf.keras.layers.Conv1D(filters=32, kernel_size=(CONV_WIDTH,), activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), ])

Run it on an example batch to check that the model produces outputs with the expected shape:


print("Conv model on `conv_window`") print('Input shape:', conv_window.example[0].shape) print('Output shape:', conv_model(conv_window.example[0]).shape)

Conv model on `conv_window` Input shape: (32, 3, 19) Output shape: (32, 1, 1)

Train and evaluate it on the

conv_window

and it should give performance similar to the

multi_step_dense

model.


history = compile_and_fit(conv_model, conv_window) IPython.display.clear_output() val_performance['Conv'] = conv_model.evaluate(conv_window.val) performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)

438/438 [==============================] – 1s 2ms/step – loss: 0.0077 – mean_absolute_error: 0.0638

The difference between this

conv_model

and the

multi_step_dense

model is that the

conv_model

can be run on inputs of any length. The convolutional layer is applied to a sliding window of inputs:

If you run it on wider input, it produces wider output:


print("Wide window") print('Input shape:', wide_window.example[0].shape) print('Labels shape:', wide_window.example[1].shape) print('Output shape:', conv_model(wide_window.example[0]).shape)

Wide window Input shape: (32, 24, 19) Labels shape: (32, 24, 1) Output shape: (32, 22, 1)

Note that the output is shorter than the input. To make training or plotting work, you need the labels, and prediction to have the same length. So build a

WindowGenerator

to produce wide windows with a few extra input time steps so the label and prediction lengths match:


LABEL_WIDTH = 24 INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1) wide_conv_window = WindowGenerator( input_width=INPUT_WIDTH, label_width=LABEL_WIDTH, shift=1, label_columns=['T (degC)']) wide_conv_window

Total window size: 27 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25] Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26] Label column name(s): [‘T (degC)’]


print("Wide conv window") print('Input shape:', wide_conv_window.example[0].shape) print('Labels shape:', wide_conv_window.example[1].shape) print('Output shape:', conv_model(wide_conv_window.example[0]).shape)

Wide conv window Input shape: (32, 26, 19) Labels shape: (32, 24, 1) Output shape: (32, 24, 1)

Now, you can plot the model’s predictions on a wider window. Note the 3 input time steps before the first prediction. Every prediction here is based on the 3 preceding time steps:


wide_conv_window.plot(conv_model)

Recurrent neural network

A Recurrent Neural Network (RNN) is a type of neural network well-suited to time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step.

You can learn more in the Text generation with an RNN tutorial and the Recurrent Neural Networks (RNN) with Keras guide.

In this tutorial, you will use an RNN layer called Long Short-Term Memory (

tf.keras.layers.LSTM

).

An important constructor argument for all Keras RNN layers, such as

tf.keras.layers.LSTM

, is the

return_sequences

argument. This setting can configure the layer in one of two ways:

  1. If

    False

    , the default, the layer only returns the output of the final time step, giving the model time to warm up its internal state before making a single prediction:
  1. If

    True

    , the layer returns an output for each input. This is useful for:

    • Stacking RNN layers.
    • Training a model on multiple time steps simultaneously.


lstm_model = tf.keras.models.Sequential([ # Shape [batch, time, features] => [batch, time, lstm_units] tf.keras.layers.LSTM(32, return_sequences=True), # Shape => [batch, time, features] tf.keras.layers.Dense(units=1) ])

With

return_sequences=True

, the model can be trained on 24 hours of data at a time.


print('Input shape:', wide_window.example[0].shape) print('Output shape:', lstm_model(wide_window.example[0]).shape)

Input shape: (32, 24, 19) Output shape: (32, 24, 1)


history = compile_and_fit(lstm_model, wide_window) IPython.display.clear_output() val_performance['LSTM'] = lstm_model.evaluate(wide_window.val) performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)

438/438 [==============================] – 1s 3ms/step – loss: 0.0056 – mean_absolute_error: 0.0517


wide_window.plot(lstm_model)

Performance

With this dataset typically each of the models does slightly better than the one before it:


x = np.arange(len(performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in val_performance.values()] test_mae = [v[metric_index] for v in performance.values()] plt.ylabel('mean_absolute_error [T (degC), normalized]') plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=performance.keys(), rotation=45) _ = plt.legend()


for name, value in performance.items(): print(f'{name:12s}: {value[1]:0.4f}')

Baseline : 0.0852 Linear : 0.0686 Dense : 0.0595 Multi step dense: 0.0589 Conv : 0.0661 LSTM : 0.0521

Multi-output models

The models so far all predicted a single output feature,

T (degC)

, for a single time step.

All of these models can be converted to predict multiple features just by changing the number of units in the output layer and adjusting the training windows to include all features in the

labels

(

example_labels

):


single_step_window = WindowGenerator( # `WindowGenerator` returns all features as labels if you # don't set the `label_columns` argument. input_width=1, label_width=1, shift=1) wide_window = WindowGenerator( input_width=24, label_width=24, shift=1) for example_inputs, example_labels in wide_window.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 24, 19) Labels shape (batch, time, features): (32, 24, 19)

Note above that the

features

axis of the labels now has the same depth as the inputs, instead of .

Baseline

The same baseline model (

Baseline

) can be used here, but this time repeating all features instead of selecting a specific

label_index

:


baseline = Baseline() baseline.compile(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.MeanAbsoluteError()])


val_performance = {} performance = {} val_performance['Baseline'] = baseline.evaluate(wide_window.val) performance['Baseline'] = baseline.evaluate(wide_window.test, verbose=0)

438/438 [==============================] – 1s 2ms/step – loss: 0.0886 – mean_absolute_error: 0.1589

Dense


dense = tf.keras.Sequential([ tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=num_features) ])


history = compile_and_fit(dense, single_step_window) IPython.display.clear_output() val_performance['Dense'] = dense.evaluate(single_step_window.val) performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)

439/439 [==============================] – 1s 3ms/step – loss: 0.0693 – mean_absolute_error: 0.1321

RNN


%%time wide_window = WindowGenerator( input_width=24, label_width=24, shift=1) lstm_model = tf.keras.models.Sequential([ # Shape [batch, time, features] => [batch, time, lstm_units] tf.keras.layers.LSTM(32, return_sequences=True), # Shape => [batch, time, features] tf.keras.layers.Dense(units=num_features) ]) history = compile_and_fit(lstm_model, wide_window) IPython.display.clear_output() val_performance['LSTM'] = lstm_model.evaluate( wide_window.val) performance['LSTM'] = lstm_model.evaluate( wide_window.test, verbose=0) print()

438/438 [==============================] – 1s 3ms/step – loss: 0.0614 – mean_absolute_error: 0.1193 CPU times: user 5min 54s, sys: 1min 13s, total: 7min 7s Wall time: 2min 38s

Advanced: Residual connections

The

Baseline

model from earlier took advantage of the fact that the sequence doesn’t change drastically from time step to time step. Every model trained in this tutorial so far was randomly initialized, and then had to learn that the output is a a small change from the previous time step.

While you can get around this issue with careful initialization, it’s simpler to build this into the model structure.

It’s common in time series analysis to build models that instead of predicting the next value, predict how the value will change in the next time step. Similarly, residual networks—or ResNets—in deep learning refer to architectures where each layer adds to the model’s accumulating result.

That is how you take advantage of the knowledge that the change should be small.

Essentially, this initializes the model to match the

Baseline

. For this task it helps models converge faster, with slightly better performance.

This approach can be used in conjunction with any model discussed in this tutorial.

Here, it is being applied to the LSTM model, note the use of the

tf.initializers.zeros

to ensure that the initial predicted changes are small, and don’t overpower the residual connection. There are no symmetry-breaking concerns for the gradients here, since the

zeros

are only used on the last layer.


class ResidualWrapper(tf.keras.Model): def __init__(self, model): super().__init__() self.model = model def call(self, inputs, *args, **kwargs): delta = self.model(inputs, *args, **kwargs) # The prediction for each time step is the input # from the previous time step plus the delta # calculated by the model. return inputs + delta


%%time residual_lstm = ResidualWrapper( tf.keras.Sequential([ tf.keras.layers.LSTM(32, return_sequences=True), tf.keras.layers.Dense( num_features, # The predicted deltas should start small. # Therefore, initialize the output layer with zeros. kernel_initializer=tf.initializers.zeros()) ])) history = compile_and_fit(residual_lstm, wide_window) IPython.display.clear_output() val_performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.val) performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.test, verbose=0) print()

438/438 [==============================] – 1s 3ms/step – loss: 0.0617 – mean_absolute_error: 0.1178 CPU times: user 2min 8s, sys: 26.9 s, total: 2min 35s Wall time: 58.8 s

Performance

Here is the overall performance for these multi-output models.


x = np.arange(len(performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in val_performance.values()] test_mae = [v[metric_index] for v in performance.values()] plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=performance.keys(), rotation=45) plt.ylabel('MAE (average over all outputs)') _ = plt.legend()


for name, value in performance.items(): print(f'{name:15s}: {value[1]:0.4f}')

Baseline : 0.1638 Dense : 0.1333 LSTM : 0.1206 Residual LSTM : 0.1193

The above performances are averaged across all model outputs.

Performance optimization and CuDNN kernels

In TensorFlow 2.0, the built-in LSTM and GRU layers have been updated to leverage CuDNN
kernels by default when a GPU is available. With this change, the prior

keras.layers.CuDNNLSTM/CuDNNGRU

layers have been deprecated, and you can build your
model without worrying about the hardware it will run on.

Since the CuDNN kernel is built with certain assumptions, this means the layer will not be able to use the CuDNN kernel if you change the defaults of the built-in LSTM or GRU layers. E.g.:

  • Changing the

    activation

    function from

    tanh

    to something else.
  • Changing the

    recurrent_activation

    function from

    sigmoid

    to something else.
  • Using

    recurrent_dropout

    > 0.
  • Setting

    unroll

    to True, which forces LSTM/GRU to decompose the inner

    tf.while_loop

    into an unrolled

    for

    loop.
  • Setting

    use_bias

    to False.
  • Using masking when the input data is not strictly right padded (if the mask corresponds to strictly right padded data, CuDNN can still be used. This is the most common case).

For the detailed list of constraints, please see the documentation for the LSTM and GRU layers.

Using CuDNN kernels when available

Let’s build a simple LSTM model to demonstrate the performance difference.

We’ll use as input sequences the sequence of rows of MNIST digits (treating each row of pixels as a timestep), and we’ll predict the digit’s label.


batch_size = 64 # Each MNIST image batch is a tensor of shape (batch_size, 28, 28). # Each input sequence will be of size (28, 28) (height is treated like time). input_dim = 28 units = 64 output_size = 10 # labels are from 0 to 9 # Build the RNN model def build_model(allow_cudnn_kernel=True): # CuDNN is only available at the layer level, and not at the cell level. # This means `LSTM(units)` will use the CuDNN kernel, # while RNN(LSTMCell(units)) will run on non-CuDNN kernel. if allow_cudnn_kernel: # The LSTM layer with default options uses CuDNN. lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim)) else: # Wrapping a LSTMCell in a RNN layer will not use CuDNN. lstm_layer = keras.layers.RNN( keras.layers.LSTMCell(units), input_shape=(None, input_dim) ) model = keras.models.Sequential( [ lstm_layer, keras.layers.BatchNormalization(), keras.layers.Dense(output_size), ] ) return model

Let’s load the MNIST dataset:


mnist = keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 sample, sample_label = x_train[0], y_train[0]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] – 0s 0us/step

Let’s create a model instance and train it.

We choose

sparse_categorical_crossentropy

as the loss function for the model. The
output of the model has shape of

[batch_size, 10]

. The target for the model is an
integer vector, each of the integer is in the range of 0 to 9.


model = build_model(allow_cudnn_kernel=True) model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer="sgd", metrics=["accuracy"], ) model.fit( x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1 )

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1700136618.250305 9824 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 938/938 [==============================] – 7s 5ms/step – loss: 0.9965 – accuracy: 0.6845 – val_loss: 0.5699 – val_accuracy: 0.8181

Now, let’s compare to a model that does not use the CuDNN kernel:


noncudnn_model = build_model(allow_cudnn_kernel=False) noncudnn_model.set_weights(model.get_weights()) noncudnn_model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer="sgd", metrics=["accuracy"], ) noncudnn_model.fit( x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1 )

938/938 [==============================] – 20s 20ms/step – loss: 0.4268 – accuracy: 0.8698 – val_loss: 0.3017 – val_accuracy: 0.9145

When running on a machine with a NVIDIA GPU and CuDNN installed, the model built with CuDNN is much faster to train compared to the model that uses the regular TensorFlow kernel.

The same CuDNN-enabled model can also be used to run inference in a CPU-only
environment. The

tf.device

annotation below is just forcing the device placement.
The model will run on CPU by default if no GPU is available.

You simply don’t have to worry about the hardware you’re running on anymore. Isn’t that pretty cool?


import matplotlib.pyplot as plt with tf.device("CPU:0"): cpu_model = build_model(allow_cudnn_kernel=True) cpu_model.set_weights(model.get_weights()) result = tf.argmax(cpu_model.predict_on_batch(tf.expand_dims(sample, 0)), axis=1) print( "Predicted result is: %s, target result is: %s" % (result.numpy(), sample_label) ) plt.imshow(sample, cmap=plt.get_cmap("gray"))

Predicted result is: [3], target result is: 5

Setup


import numpy as np import tensorflow_datasets as tfds import tensorflow as tf tfds.disable_progress_bar()

2023-11-16 13:53:10.615881: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-16 13:53:10.615930: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-16 13:53:10.617515: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Import

matplotlib

and create a helper function to plot graphs:


import matplotlib.pyplot as plt def plot_graphs(history, metric): plt.plot(history.history[metric]) plt.plot(history.history['val_'+metric], '') plt.xlabel("Epochs") plt.ylabel(metric) plt.legend([metric, 'val_'+metric])

RNN Advanced Architectures

The simple RNN repeating modules have a basic structure with a single tanh layer. RNN simple structure suffers from short memory, where it struggles to retain previous time step information in larger sequential data. These problems can easily be solved by long short term memory (LSTM) and gated recurrent unit (GRU), as they are capable of remembering long periods of information.

Long Short Term Memory (LSTM)

The Long Short Term Memory (LSTM) is the advanced type of RNN, which was designed to prevent both decaying and exploding gradient problems. Just like RNN, LSTM has repeating modules, but the structure is different. Instead of having a single layer of tanh, LSTM has four interacting layers that communicate with each other. This four-layered structure helps LSTM retain long-term memory and can be used in several sequential problems including machine translation, speech synthesis, speech recognition, and handwriting recognition. You can gain hands-on experience in LSTM by following the guide: Python LSTM for Stock Predictions.

Gated Recurrent Unit (GRU)

The gated recurrent unit (GRU) is a variation of LSTM as both have design similarities, and in some cases, they produce similar results. GRU uses an update gate and reset gate to solve the vanishing gradient problem. These gates decide what information is important and pass it to the output. The gates can be trained to store information from long ago, without vanishing over time or removing irrelevant information.

Unlike LSTM, GRU does not have cell state Ct. It only has a hidden state ht, and due to the simple architecture, GRU has a lower training time compared to LSTM models. The GRU architecture is easy to understand as it takes input xt and the hidden state from the previous timestamp ht-1 and outputs the new hidden state ht. You can get in-depth knowledge about GRU at Understanding GRU Networks.

Summary

A recurrent neural network is a robust architecture to deal with time series or text analysis. The output of the previous state is feedback to preserve the memory of the network over time or sequence of words.

In TensorFlow, you can use the following codes to train a TensorFlow Recurrent Neural Network for time series:

Parameters of the model

n_windows = 20 n_input = 1 n_output = 1 size_train = 201

Define the model

X = tf.placeholder(tf.float32, [None, n_windows, n_input]) y = tf.placeholder(tf.float32, [None, n_windows, n_output]) basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=r_neuron, activation=tf.nn.relu) rnn_output, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32) stacked_rnn_output = tf.reshape(rnn_output, [-1, r_neuron]) stacked_outputs = tf.layers.dense(stacked_rnn_output, n_output) outputs = tf.reshape(stacked_outputs, [-1, n_windows, n_output])

Construct the optimization

learning_rate = 0.001 loss = tf.reduce_sum(tf.square(outputs – y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) training_op = optimizer.minimize(loss)

Train the model

init = tf.global_variables_initializer() iteration = 1500 with tf.Session() as sess: init.run() for iters in range(iteration): sess.run(training_op, feed_dict={X: X_batches, y: y_batches}) if iters % 150 == 0: mse = loss.eval(feed_dict={X: X_batches, y: y_batches}) print(iters, “\tMSE:”, mse) y_pred = sess.run(outputs, feed_dict={X: X_test})

Authors: Scott Zhu, Francois Chollet

How Does Recurrent Neural Networks Work

In Recurrent Neural networks, the data cycles through a loop to the center hidden layer.

The input layer ‘x’ takes within the input to the neural network and processes it and passes it onto the center layer.

The middle layer ‘h’ can encompass multiple hidden layers, each with its own activation functions and weights and biases. If you have got a neural network where the assorted parameters of various hidden layers aren’t tormented by the previous layer, ie: the neural network doesn’t have memory, then you’ll be able to use a recurrent neural network.

The Recurrent Neural Network will standardize the various activation functions and weights and biases in order that each hidden layer has the identical parameters. Then, rather than creating multiple hidden layers, it’ll create one and loop over it as over and over as needed.

Python3


fig


px.histogram(data,


"Age"


marginal


'box'


title


"Age Group"


color


"Rating"


nbins


65


18


color_discrete_sequence\


'black'


'green'


'blue'


'red'


'yellow'


])


fig.update_layout(bargap


0.2

Output:

Prepare the Data to build Model

Since we are working on the NLP-based dataset, it could be valid to use Text columns as the feature. So we select the features that are text and the Rating column is used for Sentiment Analysis. By the above Rating counterplot, we can observe that there is too much of an imbalance between the rating. So all the rating above 3 is made as 1 and below 3 as 0.

int(True) # will return 1
int(False) #will return 0

Setup


import numpy as np import tensorflow as tf import keras from keras import layers

2023-11-16 12:10:07.977993: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-16 12:10:07.978039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-16 12:10:07.979464: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

What Are Recurrent Neural Networks (RNN)?

  • RNN recalls the past and its selections are motivated with the aid of what it has learned from the past.
  • Simple feed ahead networks “don’t forget” things too, however they consider things they learned at some stage in training.
  • A recurrent neural network appears very just like feedforward neural networks, except it also has connections pointing backwards.
  • At each time step t (additionally called a frame), the RNN’s gets the inputs x(t) in addition to its personal output from the preceding time step, y(t–1). In view that there is no previous output at the primary time step, it’s far usually set to 0.
  • Without difficulty, you can create a layer of recurrent neurons. At whenever step t, every neuron gets the entering vector x(t) and the output vector from the previous time step y(t–1).

Keywords searched by users: recurrent neural network tensorflow

Categories: Chia sẻ 81 Recurrent Neural Network Tensorflow

Rnn (Recurrent Neural Network) Tutorial: Tensorflow Example
Rnn (Recurrent Neural Network) Tutorial: Tensorflow Example
Recurrent Neural Network Tutorial (Rnn) | Datacamp
Recurrent Neural Network Tutorial (Rnn) | Datacamp
Long Short-Term Memory (Lstm) Rnn In Tensorflow - Geeksforgeeks
Long Short-Term Memory (Lstm) Rnn In Tensorflow – Geeksforgeeks
Introduction To Recurrent Neural Network - Geeksforgeeks
Introduction To Recurrent Neural Network – Geeksforgeeks
Recurrent Neural Networks (Rnn) | Rnn Lstm | Deep Learning Tutorial |  Tensorflow Tutorial | Edureka - Youtube
Recurrent Neural Networks (Rnn) | Rnn Lstm | Deep Learning Tutorial | Tensorflow Tutorial | Edureka – Youtube
Power Of Recurrent Neural Networks (Rnn): Revolutionizing Ai
Power Of Recurrent Neural Networks (Rnn): Revolutionizing Ai
Recurrent Neural Networks: Building A Custom Lstm Cell | Ai Summer
Recurrent Neural Networks: Building A Custom Lstm Cell | Ai Summer
Rnn Example In Tensorflow - Deep Learning With Neural Networks 11 - Youtube
Rnn Example In Tensorflow – Deep Learning With Neural Networks 11 – Youtube
Build A Simple Recurrent Neural Network With Keras – Python Algorithms
Build A Simple Recurrent Neural Network With Keras – Python Algorithms
Deep Learning In Tensorflow #5 L3 - Rnn: Build Your First Recurrent Neural  Network - Youtube
Deep Learning In Tensorflow #5 L3 – Rnn: Build Your First Recurrent Neural Network – Youtube
Tensorflow Recurrent Neural Networks (Complete Guide With Examples And Code)
Tensorflow Recurrent Neural Networks (Complete Guide With Examples And Code)

See more here: kientrucannam.vn

See more: https://kientrucannam.vn/vn/

Leave a Reply

Your email address will not be published. Required fields are marked *