Generates a tf.data.Dataset
The next step is to create a TensorFlow dataset from the images. That can be done using the `image_dataset_from_directory`. Since it will infer the classes from the folder, your data should be structured as shown below.
When using the function to generate the dataset, you will need to define the following parameters:
- the path to the data
- an optional seed for shuffling and transformations
- the `image_size` is the size the images will be resized to after being loaded from the disk
- since this is a binary classification problem the `label_mode` is binary
- `batch_size=32` means that the images will be loaded in batches of 32
In the absence of a validation set, you can also define a `validation_split`. If it is set, the `subset` also needs to be passed. That is to indicate whether the split is a validation or training split. In this case, let’s use the testing set for validation.
training_set = tf.keras.preprocessing.image_dataset_from_directory( train_dir, seed=101, image_size=(200, 200), batch_size=32)
By default, the classes will be represented using integers. You can see the representation by using `class_names` of the generated training set.
class_names = training_set.class_names
In this case, cats will be represented by 0 and dogs by 1.This is based on the directory structure of the dataset. Since `class_names` isn’t specified, the alphanumerical order will be used.
Generate the validation split as well. The arguments are similar to the training set;
- the directory containing the images
- an optional seed
- how to resize the images
- the size of the batches
validation_set = tf.keras.preprocessing.image_dataset_from_directory( test_dir, seed=101, image_size=(200, 200), batch_size=32)
Architectures of CNNs
You don’t always have to design your convolutional neural networks from scratch. Other times one can try architectures developed by experts. These have proven to perform well on many image tasks. Some of these architectures are:
They can be accessed via Keras applications. These applications have also been pre-trained on the ImageNet dataset. The dataset contains over a million images. This makes these applications robust enough for use in the real world. When instantiating the model, you have the choice whether to include the pre-trained weights or not. When the weights are used, you can start using the model for classification right away. Other ways of using the pre-trained models are:
- extracting features and passing them to a new model
- fine-tuning a new model
Let’s take a look at how you can load the Xception architecture without weights. Since weights are not included, you can use your dataset to train the model.
model = tf.keras.applications.Xception( include_top=True, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
When you load the model with weights, you can start using it for prediction right away. The weights are stored in this location `~/.keras/models/`.
model = tf.keras.applications.Xception( include_top=True, weights=”imagenet”, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
After that, you can process the image and run the predictions. The Keras applications provide a function for doing that. Each of the architectures dictates the size of the image that should be passed to it. You should always confirm that from its documentation. Next, convert the image into an array and expand its dimensions in order to include the batch size.
from tensorflow.keras.preprocessing import image import numpy as np !wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/b/b5/Lion_d%27Afrique.jpg \ -O /tmp/lion.jpg img_path = ‘/tmp/lion.jpg’ img = image.load_img(img_path, target_size=(299, 299)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = tf.keras.applications.xception.preprocess_input(x) preds = model.predict(x) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print(‘Predicted:’, tf.keras.applications.xception.decode_predictions(preds, top=3)[0])
The final step is to decode the predictions and print the results.
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
- Layer with hav 32 filters
- Layer with 64 filters
- Layer with 128 filters
In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
- Create For loops that define the number of training iterations as specified above
- Create an inner For loop to specify the number of batches we specified above
- Pass training images and labels using variables batch_x and batch_y
- Define x and y placeholders to hold parameters the training images
- After each training iteration, run the loss function and check training accuracy
- After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
Monitoring the model’s performance
Using the history object, the training losses and accuracies can be obtained.
import pandas as pd metrics_df = pd.DataFrame(history.history)
You can plot them in order to see the learning curves. Let’s start by comparing the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
Next, on to the training and validation accuracy.
metrics_df[[“binary_accuracy”,”val_binary_accuracy”]].plot();
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many real-life applications, such as self-driving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
- Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
- The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
- Multiple pre-trained CNN models such as VGG-16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached state-of-the-art results and can be fine-tuned on news tasks using a relatively small amount of data.
- CNNs can also be used for non-image classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
- Convolutional layers
- Rectified Linear Unit (ReLU for short)
- Pooling layers
- Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
- Apply the kernel matrix from the top-left corner to the right.
- Perform element-wise multiplication.
- Sum the values of the products.
- The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
- Move the kernel down with respect to the size of the sliding window.
- Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be fine-tuned to detect specific features in the input image.
For instance:
- Averaging neighboring pixels kernel can be used to blur the input image.
- Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
- Max pooling which is the maximum value of the feature map
- Sum pooling corresponds to the sum of all the values of the feature map
- Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
- We can think of a matrice as a tensor with only two dimensions.
- Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
- They can deal with any number of dimensions, which makes them a better fit for multi-dimensional data.
- Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
- Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
- Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an open-source machine learning framework for everyone for several reasons.
- Open-source: released under the Apache 2.0 open-source license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
- Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
- For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, built-in libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an open-source machine learning framework for everyone for several reasons.
- Open-source: released under the Apache 2.0 open-source license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
- Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
- For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, built-in libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
Adding Dropout to the Model
To help mitigate this problem, we can employ one or more regularization strategies to help the model generalize better. Regularization techniques help to restrict the model’s flexibility so that it doesn’t overfit the training data. One approach is called Dropout, which is built into Keras. Dropout is implemented in Keras as a special layer type that randomly drops a percentage of neurons during the training process. When dropout is used in convolutional layers, it is usually used after the max pooling layer and has the effect of eliminating a percentage of neurons in the feature maps. When used after a fully connected layer, a percentage of neurons in the fully connected layer are dropped.
In the diagram below, we add a
dropout
layer at the end of each convolutional block and also after the dense layer in the classifier. The input argument to the Dropout function is the fraction of neurons to (randomly) drop from the previous layer during the training process.
Define the Model (with Dropout)
def cnn_model_dropout(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dropout(0.5)) model.add(Dense(10, activation=’softmax’)) return model
Create the Model (with Dropout)
# Create the model. model_dropout = cnn_model_dropout() model_dropout.summary()
Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 32, 32, 32) 896 conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d_3 (MaxPooling (None, 16, 16, 32) 0 2D) dropout (Dropout) (None, 16, 16, 32) 0 conv2d_8 (Conv2D) (None, 16, 16, 64) 18496 conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_4 (MaxPooling (None, 8, 8, 64) 0 2D) dropout_1 (Dropout) (None, 8, 8, 64) 0 conv2d_10 (Conv2D) (None, 8, 8, 64) 36928 conv2d_11 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_5 (MaxPooling (None, 4, 4, 64) 0 2D) dropout_2 (Dropout) (None, 4, 4, 64) 0 flatten_1 (Flatten) (None, 1024) 0 dense_2 (Dense) (None, 512) 524800 dropout_3 (Dropout) (None, 512) 0 dense_3 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Non-trainable params: 0 _________________________________________________________________
Compile the Model (with Dropout)
model_dropout.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model (with Dropout)
history = model_dropout.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
2023-01-16 07:38:29.760435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 2.1302 – accuracy: 0.2181 137/137 [==============================] – 5s 31ms/step – loss: 2.1302 – accuracy: 0.2181 – val_loss: 1.9788 – val_accuracy: 0.2755 Epoch 2/31 137/137 [==============================] – 4s 27ms/step – loss: 1.7749 – accuracy: 0.3647 – val_loss: 1.8633 – val_accuracy: 0.3332 Epoch 3/31 137/137 [==============================] – 4s 27ms/step – loss: 1.5430 – accuracy: 0.4442 – val_loss: 1.5015 – val_accuracy: 0.4795 : : Epoch 29/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4626 – accuracy: 0.8359 – val_loss: 0.6721 – val_accuracy: 0.7832 Epoch 30/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4584 – accuracy: 0.8380 – val_loss: 0.6638 – val_accuracy: 0.7847 Epoch 31/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4427 – accuracy: 0.8449 – val_loss: 0.6598 – val_accuracy: 0.7863
Plot the Training Results
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
In the plots above, the training curves align very closely with the validation curves. Also, notice that we achieve a higher validation accuracy than the baseline model that did not contain dropout. Both sets of training plots are shown below for comparison.
Data augmentation
Data augmentation is usually applied in order to prevent overfitting. Augmenting the images increases the dataset as well as exposes the model to various aspects of the data. Augmentation can be achieved by applying random transformations such as flipping and rotating the images. Fortunately, Keras provides layers that can do just that.
data_augmentation = keras.Sequential( [ tf.keras.layers.experimental.preprocessing.RandomFlip(“horizontal”, input_shape=(200, 200, 3)), tf.keras.layers.experimental.preprocessing.RandomRotation(0.2), tf.keras.layers.experimental.preprocessing.RandomZoom(0.2), ] )
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Implementing a CNN in TensorFlow & Keras
In this post, we’ll learn how to implement a Convolutional Neural Network (CNN) from scratch using Keras. Here, we show a CNN architecture similar to the structure of VGG-16 but with fewer layers. We will learn how to model this architecture and train it on a small dataset called CIFAR-10. We’ll also use this as an opportunity to introduce a new layer type called
Dropout
, which is often used in models to mitigate the effects of overfitting.
- Load the CIFAR-10 Dataset
- Dataset Preprocessing
- Dataset and Training Configuration Parameters
- CNN Model Implementation in Keras
- Adding Dropout to the Model
- Saving and Loading Models
- Model Evaluation
- Conclusion
import os import random import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten from tensorflow.keras.datasets import cifar10 from tensorflow.keras.utils import to_categorical from matplotlib.ticker import (MultipleLocator, FormatStrFormatter) from dataclasses import dataclass
SEED_VALUE = 42 # Fix seed to make training deterministic. random.seed(SEED_VALUE) np.random.seed(SEED_VALUE) tf.random.set_seed(SEED_VALUE)
Convolutional Neural Networks (CNN) in TensorFlow
Now that you understand how convolutional neural networks work, you can start building them using TensorFlow. However, you will first have to install TensorFlow. If you are working on a Google Colab environment, TensorFlow will already be installed.
How to install TensorFlow
TensorFlow can be installed via pip. Run the following command to install it.
pip install tensorflow
Alternatively, you can run TensorFlow in a container.
docker pull tensorflow/tensorflow:latest # Download latest stable image docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server
How to confirm TensorFlow is installed
After installation is complete via pip, you might want to check TensorFlow’s version or confirm its installation. If you manage to import TensorFlow without any errors, then it was installed successfully.
import tensorflow print(tensorflow.__version__)
What are Keras and tf.keras?
As of TensorFlow 2.0, Keras has become the official high-level API for TensorFlow. It is an open-source package that has been integrated into TensorFlow in order to quicken the process of building deep learning models. It is accessible via `tf.keras`. That is what you will be using in this article.
Develop multilayer CNN models
Let’s now take a look at how you can build a convolutional neural network with Keras and TensorFlow. The CIFAR-10 dataset will be used. The dataset contains 60000 32×32 color images in 10 classes, with 6000 images per class.
Develop multilayer CNN models
Loading the dataset can be done directly by using Keras utilities. Other datasets that ship with TensorFlow can be loaded in a similar manner.
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
The dataset contains the following classes
‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’
You can use Matplotlib to visualize one of the images. Let’s visualize the image at index 785.
import matplotlib.pyplot as plt image = X_train[785] plt.imshow(image) plt.show()
That looks like a cat. You can confirm that from the `y_train`. 3 is the label for a cat.
Data preprocessing
The weights of a neural network are initialized to very small numbers. Therefore, scaling the images to be within the same range is important. In this case, let’s scale the values to be numbers between 0 and 1.
X_train = X_train / 255 X_test = X_test / 255
Build the convolutional neural network
The next step is to define the convolutional neural network. Here is where the convolution, pooling, and flattening layers will be applied. The first layer is the `Conv2D`layer. It’s defined with the following parameters:
- 32 output filters
- a 3 by 3 feature detector
- `same` padding to result in even padding for the input
- input shape of `(32, 32, 3)` because the images are of size 32 by 32. 3 notifies the network that images are colored
- the `relu` activation function so as to achieve non-linearity
The next layer is a max-pooling layer defined with the following parameters:
- a `pool_size` of (2, 2) that defines the size of the pooling window
- 2 strides that define the number of steps taken by the pooling window
Remember that you can design your network as you like. You just have to monitor the metrics and tweak the design and settle on the one that results in the best performance. In this case, another convolution and pooling layer is created. That is followed by the flatten layer whose results are passed to the dense layer. The final layer has 10 units because the dataset has 10 classes. Since it’s a multiclass problem, the Softmax activation function is applied.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dense(10, activation=”softmax”) ] )
How to visualize a deep learning model
The quickest way to visualize your model is to use the model summary function.
model.summary()
You can also use the Keras `plot_model` utility to plot the model.
tf.keras.utils.plot_model( model, to_file=”model.png”, show_shapes=True, show_layer_names=True, rankdir=”TB”, expand_nested=True, dpi=96, )
How to reduce overfitting with Dropout
One of the common ways to improve the performance of deep learning models is to introduce dropout regularization. In this process, a specified percentage of connections are dropped during the training process. This forces the network to learn patterns from the data instead of memorizing the data. This is what reduces overfitting. In Keras, this can be achieved by introducing a Dropout layer in the network. Here is how the network would look like after applying the DropOut layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=”softmax”) ] )
Compiling the model
The next step is to compile the model. The Sparse Categorical Cross-Entropy loss is used because the labels are not one-hot encoded. In the event that you want to encode the labels, then you will have to use the Categorical Cross-Entropy loss function.
model.compile(optimizer=’adam’, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[‘accuracy’])
How to halt training at the right time with Early Stopping
Left to train for more epochs than needed, your model will most likely overfit on the training set. One of the ways to avoid that is to stop the training process when the model stops improving. This is done by monitoring the loss or the accuracy. In order to achieve that, the Keras EarlyStopping callback is used. By default, the callback monitors the validation loss. Patience is the number of epochs to wait before stopping the training process if there is no improvement in the model loss. This callback will be used at the training stage. The callbacks should be passed as a list, even if it’s just one callback.
from tensorflow.keras.callbacks import EarlyStopping callbacks = [ EarlyStopping(patience=2) ]
How to save the best model automatically
You might also be interested in automatically saving the best model or model weights during training. That can be applied using a Keras ModelCheckpoint callback. The callback will save the best model after each epoch. You can instruct it to save the entire model or just the model weights. By default, it will save the models where the validation loss is minimum.
checkpoint_filepath = ‘/tmp/checkpoint’ model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_filepath, save_weights_only=False, monitor=’loss’, mode=’min’, save_best_only=True)
Your callbacks will now look like this.
callbacks = [ EarlyStopping(patience=2), model_checkpoint_callback, ]
After training, you can load the model again using the Keras `load_model` utility.
another_saved_model = tf.keras.models.load_model(checkpoint_filepath)
Training the model
Let’s now fit the data to the training set. The validation set is passed as well because the callback monitors the validation set. In this case, you can define many epochs but the training process will be stopped by the callback when the loss doesn’t improve after 2 epochs as declared in the EarlyStopping callback.
history = model.fit(X_train,y_train, epochs=600,validation_data=(X_test,y_test),callbacks=callbacks)
How to plot model learning curves
Learning curves are important because they can inform you whether the model is learning or overfitting. If the validation loss increases significantly or the validation accuracy reduces sharply then your model is most likely overfitting. Since the model was saved into a history variable, you can use that to access the losses and accuracy and plot them. You can also store them in a Pandas DataFrame.
import pandas as pd metrics_df = pd.DataFrame(history.history)
Let’s now look at how you would plot the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
The same can be done for the training and validation accuracy.
metrics_df[[“accuracy”,”val_accuracy”]].plot();
How to save and load your model
You might be interested in saving the model for later use. Saving the model is important so that you don’t have to train the model again. This is especially critical for image models that take a long period to train. The H5 format is a common format for saving Keras models.
model.save(“model.h5”)
The Keras `load_model` is used for loading the model again.
load_saved_model = tf.keras.models.load_model(“model.h5”) load_saved_model.summary()
How to accelerate training with Batch Normalization
The network you trained here was relatively small. However, in other cases, you might have to train a very deep neural network. Training such a network can be very slow. The training process can be hastened using Batch Normalization. It transforms the data ensuring that the mean output is closer to zero and the output standard deviation is close to 1. The mean and variance are computed using the current batch of inputs. Since Batch Normalization offers some form of regularization it is usually not used with DropOut. Here’s how the model would look like after adding the batch normalization layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(10, activation=”softmax”) ]
The working of the batch normalization layer is different during training and during prediction and evaluation. During training `trainable=True` while during prediction and evaluation it’s false. When training normalization is done using the mean and standard deviation of the current batch of inputs. At inference i.e prediction and evaluation, normalization is done using a moving average of the mean and the standard deviation of the batches seen during training. When using a pre-trained model that contains this layer, training for the batch normalization layer has to be set to false. Otherwise the mean and standard deviation will be disrupted and all the prior learning lost.
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
How to easily run CNN with Tensorflow in cnvrg.io
Now, with cnvrg.io you can run this pipeline without configuring the different platforms which makes it much faster and easier to run. Using cnvrg.io, you can easily track training progress and serve the model as a REST endpoint. First, you can spin up a VS Code workspace inside cnvrg.io to build our training script from the notebook code. You can use the exact code and ensure that the model is saved at the end of the training.
Run your code as an experiment
Next, you can launch this training script as an experiment. cnvrg.io will provision resources to execute the script and monitor the performance automatically. Resource and training metrics are automatically visualized along with the logs, and all files that were written to disk during the experiment are saved as artifacts in cnvrg.io’s object store.
Make predictions in a few clicks
Now that you have your model, you’ll need to create a “predict” function. cnvrg.io makes it easy, by automatically wrapping this function into a production-grade Flask application equipped with load balancing, autoscaling, monitoring. This file loads the model into memory and uses it in the predict function, which will format the incoming data and return a prediction.
Deploy your predictions to an endpoint
Next, you’ll want to create an endpoint that routes to that function. You could also specify compute resources and autoscaling configurations here too.
Track and monitor your endpoints
cnvrg.io automatically displays metrics such as the number of requests and latency for the endpoint. It also comes with Grafana and Kibana integrated for increased visibility into model
Finally, if you want to trigger retraining and deploying the model as part of a CI/CD pipeline, cnvrg.io provides Flows. The pipeline could programatically trigger the flow via cnvrg.io’s CLI or SDK.
You can test it out in cnvrg.io now by installing cnvrg.io CORE our free community MLOps platform on your Kuberentes here.
Learner reviews
Showing 3 of 7998
7,998 reviews
-
5 stars
79.13%
-
4 stars
15.57%
-
3 stars
3.51%
-
2 stars
1.01%
-
1 star
0.77%
Reviewed on Oct 6, 2020
Reviewed on Aug 1, 2020
Reviewed on May 15, 2019
How Does CNN Recognize Images?
Consider the following images:
The boxes that are colored represent a pixel value of 1, and 0 if not colored.
When you press backslash (\), the below image gets processed.
When you press forward-slash (/), the below image is processed:
Here is another example to depict how CNN recognizes an image:
As you can see from the above diagram, only those values are lit that have a value of 1.
Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a real-world scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An End-to-End ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.
Convolutional Neural Networks in TensorFlow
This course is part of DeepLearning.AI TensorFlow Developer Professional Certificate
Taught in English
Some content may not be translated
142,698 already enrolled
Learn More about CNN and Deep Learning
This is how you build a CNN with multiple hidden layers and how to identify a bird using its pixel values. You’ve also completed a demo to classify images across 10 categories using the CIFAR dataset.
You can also enroll in the Artificial Intelligence Course with Caltech University and in collaboration with IBM, and transform yourself into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning and deep neural network research. This PG program in AI and Machine Learning covers Python, Machine Learning, Natural Language Processing, Speech Recognition, Advanced Deep Learning, Computer Vision, and Reinforcement Learning. It will prepare you for one of the world’s most exciting technology frontiers.
Đăng nhập/Đăng ký
Ranking
Cộng đồng
|
Kiến thức
19 tháng 03, 2022
Admin
23:49 19/03/2022
Convolutional Neural Network – Tự Học TensorFlow
Cùng tác giả
Không có dữ liệu
0
0
0
Admin
2995 người theo dõi
1283
184
Có liên quan
Không có dữ liệu
Chia sẻ kiến thức – Kết nối tương lai
Về chúng tôi
Về chúng tôi
Giới thiệu
Chính sách bảo mật
Điều khoản dịch vụ
Học miễn phí
Học miễn phí
Khóa học
Luyện tập
Cộng đồng
Cộng đồng
Kiến thức
Tin tức
Hỏi đáp
CÔNG TY CỔ PHẦN CÔNG NGHỆ GIÁO DỤC VÀ DỊCH VỤ BRONTOBYTE
The Manor Central Park, đường Nguyễn Xiển, phường Đại Kim, quận Hoàng Mai, TP. Hà Nội
THÔNG TIN LIÊN HỆ
[email protected]
©2024 TEK4.VN
Copyright © 2024
TEK4.VN
What are Tensors?
We mainly deal with high-dimensional data when building machine learning and deep learning models. Tensors are multi-dimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
- A 0-dimensional tensor contains a single value.
- A 1-dimensional tensor, also known as “rank-1” tensor is list of values.
- A 2-dimensional tensor is a “rank-2” tensor.
- Finally, we can have a N-dimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3-dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
- The actual value of the tensor.
- The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
- The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
Dataset and Training Configuration Parameters
Before we describe the model implementation and training, we’re going to apply a little more structure to our training process by using the
dataclasses
module in python to create simple
DatasetConfig
and
TrainingConfig
classes to organize several data and training configuration parameters. This allows us to create data structures for configuration parameters, as shown below. The benefit of doing this is that we have a single place to go to make any desired changes.
@dataclass(frozen=True) class DatasetConfig: NUM_CLASSES: int = 10 IMG_HEIGHT: int = 32 IMG_WIDTH: int = 32 NUM_CHANNELS: int = 3 @dataclass(frozen=True) class TrainingConfig: EPOCHS: int = 31 BATCH_SIZE: int = 256 LEARNING_RATE: float = 0.001
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a real-world scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An End-to-End ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
Artificial Intelligence has come a long way and has been seamlessly bridging the gap between the potential of humans and machines. And data enthusiasts all around the globe work on numerous aspects of AI and turn visions into reality – and one such amazing area is the domain of Computer Vision. This field aims to enable and configure machines to view the world as humans do, and use the knowledge for several tasks and processes (such as Image Recognition, Image Analysis and Classification, and so on). And the advancements in Computer Vision with Deep Learning have been a considerable success, particularly with the Convolutional Neural Network algorithm.
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
Layers in a Convolutional Neural Network
A convolution neural network has multiple hidden layers that help in extracting information from an image. The four important layers in CNN are:
- Convolution layer
- ReLU layer
- Pooling layer
- Fully connected layer
Convolution Layer
This is the first step in the process of extracting valuable features from an image. A convolution layer has several filters that perform the convolution operation. Every image is considered as a matrix of pixel values.
Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.
ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer.
ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map. Below is the graph of a ReLU function:
The original image is scanned with multiple convolutions and ReLU layers for locating the features.
Pooling Layer
Pooling is a down-sampling operation that reduces the dimensionality of the feature map. The rectified feature map now goes through a pooling layer to generate a pooled feature map.
The pooling layer uses various filters to identify different parts of the image like edges, corners, body, feathers, eyes, and beak.
Here’s how the structure of the convolution neural network looks so far:
The next step in the process is called flattening. Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a single long continuous linear vector.
The flattened matrix is fed as input to the fully connected layer to classify the image.
Here’s how exactly CNN recognizes a bird:
- The pixels from the image are fed to the convolutional layer that performs the convolution operation
- It results in a convolved map
- The convolved map is applied to a ReLU function to generate a rectified feature map
- The image is processed with multiple convolutions and ReLU layers for locating the features
- Different pooling layers with various filters are used to identify specific parts of the image
- The pooled feature map is flattened and fed to a fully connected layer to get the final output
Model definition
Let’s now create the convolutional neural network that will be used to classify the images. It will be similar to the previous one with a few cosmetic changes.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout from tensorflow.keras.preprocessing.image import ImageDataGenerator model = Sequential([ data_augmentation, tf.keras.layers.experimental.preprocessing.Rescaling(1./255), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Conv2D(filters=64,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Flatten(), Dense(128, activation=’relu’), Dropout(0.25), Dense(1, activation=’sigmoid’) ])
The notable changes are:
- the application of the augmentation layer
- using the `Scaling` layer to scale the images in the model definition
Use case implementation using CNN
We’ll be using the CIFAR-10 dataset from the Canadian Institute For Advanced Research for classifying images across 10 categories using CNN.
1. Download the data set:
2. Import the CIFAR data set:
3. Read the label names:
4. Display the images using matplotlib:
5. Use the helper function to handle data:
6. Create the model:
7. Apply the helper functions:
8. Create the layers for convolution and pooling:
9. Create the flattened layer by reshaping the pooling layer:
10. Create a fully connected layer:
11. Set the output to y_pred variable:
12. Apply the loss function:
13. Create the optimizer:
14. Create a variable to initialize all the global variables:
15. Run the model by creating a graph session:
Build deep learning models in TensorFlow and learn the TensorFlow open-source framework with the Deep Learning Course (with Keras &TensorFlow). Enroll now!
Training the model
The next step is to train the model. In this case, `y` is not passed in. That’s taken care of by the function used to generate the training set. Passing the validation data is critical so that the loss and accuracy can be accessed later and plotted. Let’s also reuse the callbacks that were defined in the last section.
history = model.fit(training_set,validation_data=validation_set, epochs=600,callbacks=callbacks)
CNN Step-by-Step Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the end-to-end implementation of a convolutional neural network in TensorFlow applied to the CIFAR-10 dataset, which is a built-in dataset with the following properties:
- It contains 60.000 32 by 32 color images
- The dataset has 10 different classes
- Each class has 6000 images
- There are overall 50.000 training images
- And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
- The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
- We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
- The first pooling layer will apply a 2×2 max pooling
- The second pooling layer will apply a 2×2 max pooling as well
- The fully connected layer will have 128 units and a ReLU activation function
- Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The built-in dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
- The training images
- The class names
- And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
- The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
- The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
- Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
- The horizontal axis (x) is the number of epochs
- The vertical one (y) is the underlying performance of the model.
- The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e-2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='--', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any fine-tuning and pre-processing, we end up with:
- An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
- And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
- These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
- Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
- On the other hand, it seems to struggle with the remaining classes:
- The classes with the highest off-diagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
- Image augmentation
- Transfer learning using pre-trained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
- Applying different regularization technics such as L1, L2 or dropout.
- Fine-tuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
What is Convolutional Neural Network?
A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with grid-like topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.
Below is a neural network that identifies two types of flowers: Orchid and Rose.
In CNN, every image is represented in the form of an array of pixel values.
The convolution operation forms the basis of any convolutional neural network. Let’s understand the convolution operation using two matrices, a and b, of 1 dimension.
a = [5,3,7,5,9,7]
b = [1,2,3]
In convolution operation, the arrays are multiplied element-wise, and the product is summed to create a new array, which represents a*b.
The first three elements of the matrix a are multiplied with the elements of matrix b. The product is summed to get the result.
The next three elements from the matrix a are multiplied by the elements in matrix b, and the product is summed up.
This process continues until the convolution operation is complete.
Dataset Preprocessing
We normalize the image data to the range
[0,1]
. This is very common when working with image data which helps the model train more efficiently. We also convert the integer labels to one-hot encoded labels, as discussed in previous videos.
# Normalize images to the range [0, 1]. X_train = X_train.astype(“float32”) / 255 X_test = X_test.astype(“float32”) / 255 # Change the labels from integer to categorical data. print(‘Original (integer) label for the first training sample: ‘, y_train[0]) # Convert labels to one-hot encoding. y_train = to_categorical(y_train) y_test = to_categorical(y_test) print(‘After conversion to categorical one-hot encoded labels: ‘, y_train[0])
Model Evaluation
There are several things we can do to evaluate the trained model further. We can compute the model’s accuracy on the test dataset. We can visually inspect the results on a subset of the images in a dataset and plot the confusion matrix for a dataset. Let’s take a look at all three examples.
Evaluate the Model on the Test Dataset
We can now predict the results for all the test images, as shown in the code below. Here, we call the
predict()
method to retrieve all the predictions, and then we select a specific index from the test set and print out the predicted scores for each class. You can experiment with the code below by setting the test index to various values and see how the highest score is usually associated with the correct value indicated by the ground truth.
test_loss, test_acc = reloaded_model_dropout.evaluate(X_test, y_test) print(f”Test accuracy: {test_acc*100:.3f}”)
313/313 [==============================] – 3s 9ms/step – loss: 0.6736 – accuracy: 0.7833 Test accuracy: 78.330
Make Predictions on Sample Test Images
Here we create a convenience function that will allow us to evaluate the model on a subset of images from a dataset and display the results visually.
def evaluate_model(dataset, model): class_names = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’ ] num_rows = 3 num_cols = 6 # Retrieve a number of images from the dataset. data_batch = dataset[0:num_rows*num_cols] # Get predictions from model. predictions = model.predict(data_batch) plt.figure(figsize=(20, 8)) num_matches = 0 for idx in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, idx + 1) plt.axis(“off”) plt.imshow(data_batch[idx]) pred_idx = tf.argmax(predictions[idx]).numpy() truth_idx = np.nonzero(y_test[idx]) title = str(class_names[truth_idx[0][0]]) + ” : ” + str(class_names[pred_idx]) title_obj = plt.title(title, fontdict={‘fontsize’:13}) if pred_idx == truth_idx: num_matches += 1 plt.setp(title_obj, color=’g’) else: plt.setp(title_obj, color=’r’) acc = num_matches/(idx+1) print(“Prediction accuracy: “, int(100*acc)/100) return
evaluate_model(X_test, reloaded_model_dropout)
1/1 [==============================] – 0s 18ms/step Prediction accuracy: 0.77
Confusion Matrix
A confusion matrix is a very common metric that is used to summarize the results of a classification problem. The information is presented in the form of a table or matrix where one axis represents the ground truth labels for each class, and the other axis represents the predicted labels from the network. The entries in the table represent the number of instances from an experiment (which are sometimes represented as percentages rather than counts). Generating a confusion matrix in TensorFlow is accomplished by calling the
function tf.math.confusion_matrix()
, which takes two required arguments: the list of ground truth labels and the associated predicted labels.
# Generate predictions for the test dataset. predictions = reloaded_model_dropout.predict(X_test) # For each sample image in the test dataset, select the class label with the highest probability. predicted_labels = [np.argmax(i) for i in predictions]
313/313 [==============================] – 2s 6ms/step
# Convert one-hot encoded labels to integers. y_test_integer_labels = tf.argmax(y_test, axis=1) # Generate a confusion matrix for the test dataset. cm = tf.math.confusion_matrix(labels=y_test_integer_labels, predictions=predicted_labels) # Plot the confusion matrix as a heatmap. plt.figure(figsize=[14, 7]) import seaborn as sn sn.heatmap(cm, annot=True, fmt=’d’, annot_kws={“size”: 12}) plt.title(‘Confusion Matrix’) plt.xlabel(‘Predicted’) plt.ylabel(‘Truth’) plt.show()
A confusion matrix is a content-rich representation of a model’s performance at the class level. It can be very informative to better understand where the model performs well and where it may have more difficulty. For example, a few things stand out right away. Two of the ten classes tend to be misclassified more than others: Dogs and Cat. More specifically, a large percentage of the time, the model confuses these two classes with each other. Let’s take a closer look. The ground truth label for a cat is 3, and the ground truth label for a dog is 5. Notice that when the input image is a cat (index 3), it is often most misclassified as a dog, with 176 misclassified samples. When the input image is a dog (index 5), the most misclassified examples are cats, with 117 samples.
Also, notice that the last row, which represents trucks, is most often confused with automobiles. So all of these observations make intuitive sense, given the similarity of the classes involved.
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don’t see the audit option:
-
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
-
The course may offer ‘Full Course, No Certificate’ instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page – from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.
Convolutional Neural Networks (CNN) have been used in state-of-the-art computer vision tasks such as face detection and self-driving cars. In this article, let’s take a look at the concepts required to understand CNNs in TensorFlow. Later you will also dive into some TensorFlow CNN examples.
Model evaluation
You can also check the performance of the model on the validation set.
loss, accuracy = model.evaluate(validation_set) print(‘Accuracy on test dataset:’, accuracy)
Let’s now try the model on new images. The `image` module from Keras will be used to load the image.
import numpy as np from keras.preprocessing import image
Download some images from the internet and store them in a temporary folder. The images used here are provided via the permissive creative commons license.
!wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/c/c7/Tabby_cat_with_blue_eyes-3336579.jpg \ -O /tmp/cat.jpg
Next, load the image while specifying the size used in training.
test_image = image.load_img(‘/tmp/cat.jpg’, target_size=(200, 200))
After this, convert it into an array since the model expects array inputs.
test_image = image.img_to_array(test_image)
The next step is to expand the dimensions of the image in order to include the batch size. Let’s take a look at the shape of the image at the moment.
That needs to be amended to include a batch size of 1, because only one image is being used here. Expanding the dimensions is done using the `expand_dims` function from NumPy.
test_image = np.expand_dims(test_image, axis=0)
If you check the shape again, you will see that it’s in the form required by the model.
How do CNNs work?
Although they can be used for other tasks, CNNs are mostly used in tasks involving image data. Each image contains pixel data that can be represented in a numerical form. This numerical representation is what is passed to a CNN. As much as normal artificial neural networks can be used in processing image data, CNNs have proven to perform better, resulting in higher accuracy. Let’s now take a look at how CNNs work.
Convolution
Usually, you will not feed the entire image to a CNN. You will feed the features that are most important in classifying the image. The features are obtained through a process known as convolution. The convolution operation results in what is known as a feature map. It is also referred to as the convolved feature or an activation map. The feature map is obtained by applying a feature detector to the input image. The feature detector is also referred to as a kernel or a filter. The filter is usually a 3 by 3 matrix. However, other types of matrices can be used. The feature map is obtained through an element-wise multiplication of the filter with the matrix representation of the input image. The objective here is to reduce the size of the image being passed to the CNN while maintaining the important features. The filter slides step by step through each of the elements in the input image. These steps are known as strides and can be defined when creating the CNN. When building the CNN you will be able to define the number of filters you want for your network.
Once you obtain the feature map, the Rectified Linear unit is applied in order to prevent the operation from being linear. This is because working with images is not linear.
Pooling
Pooling results in what is known as a pooled feature map. Pooling ensures that the neural network is able to detect features in an image irrespective of their location in an image. This is what is known as spatial invariance. There are several types of pooling, for example, max-pooling average pooling, and min pooling. For instance, in max-pooling a 2 by 2 matrix is slid over the feature map while picking the largest value in a given box.
Pooling ensures that the main features of the image are maintained while reducing the size of the image further. This reduces the amount of information passed to the neural network and hence helps to reduce overfitting.
Flattening
The next step is to flatten the pooled feature map. This involves transforming the entire pooled feature map into a single column that can be passed to the fully connected layer.
Full connection
The flattened feature map is then passed to the input layer of the neural network. The result of that is passed to a fully connected layer. After that, the result of the entire process is emitted by the output layer. An activation function is usually applied depending on the type of classification problem. For binary classifications, the sigmoid activation function will be used whereas the softmax activation function is used for multiclass problems.
Load the CIFAR-10 Dataset
The CIFAR-10 dataset consists of 60,000 color images from 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Several sample images are shown below, along with the class names.
Since the CIFAR-10 dataset is included in TensorFlow, so we can load the dataset using the
load_data()
function as shown in the code cell below and confirm the number of samples and the shape of the data.
(X_train, y_train), (X_test, y_test) = cifar10.load_data() print(X_train.shape) print(X_test.shape)
Display Sample Images from the Dataset
It’s always a good idea to inspect some images in a dataset, as shown below. Remember, the images in CIFAR-10 are quite small, only 32×32 pixels, so while they don’t have a lot of detail, there’s still enough information in these images to support an image classification task.
plt.figure(figsize=(18, 9)) num_rows = 4 num_cols = 8 # plot each of the images in the batch and the associated ground truth labels. for i in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, i + 1) plt.imshow(X_train[i,:,:]) plt.axis(“off”)
Conclusion
In this post, we learned how to use TensorFlow and Keras to define and train a simple convolutional neural network. We showed that the model overfit the training data, and we learned how to use
dropout
layers to reduce the overfitting and improve the model’s performance on the validation dataset. We also covered how to save and load models to and from the file system. Finally, we reviewed three techniques used to evaluate the model on the test dataset.
This tutorial implements a simplified Quantum Convolutional Neural Network (QCNN), a proposed quantum analogue to a classical convolutional neural network that is also translationally invariant.
This example demonstrates how to detect certain properties of a quantum data source, such as a quantum sensor or a complex simulation from a device. The quantum data source being a cluster state that may or may not have an excitation—what the QCNN will learn to detect (The dataset used in the paper was SPT phase classification).
Running CNNs with TensorFlow in the real world
Loading datasets from TensorFlow is quite straightforward. However, consider a situation where you have to load data from the real world. The process for doing so is a little different. In this section, let’s look at how you can use this dataset from Kaggle to build a convolutional neural network. The goal here will be to build a model that can classify images of cats and dogs. Once you have built this model, you can tweak it and repurpose it for other classification problems.
Loading the images
Let’s start by downloading the images into a temporary folder on the virtual machine provided by Google Colab. Using Colab, in this case, is advantageous because you can use GPU compute to speed the model training.
!wget –no-check-certificate \ https://namespace.co.ke/ml/dataset.zip \ -O /tmp/catsdogs.zip
The next step will be to unzip this dataset.
import os import zipfile with zipfile.ZipFile(‘/tmp/catsdogs.zip’, ‘r’) as zip_ref: zip_ref.extractall(‘/tmp/cats_dogs’)
After that set the paths to the training and testing set.
base_dir = ‘/tmp/cats_dogs/dataset’ train_dir = os.path.join(base_dir, ‘training_set’) test_dir = os.path.join(base_dir, ‘test_set’)
You can list the folders in order to see their arrangement.
import os os.listdir(base_dir)
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many real-life applications, such as self-driving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
- Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
- The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
- Multiple pre-trained CNN models such as VGG-16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached state-of-the-art results and can be fine-tuned on news tasks using a relatively small amount of data.
- CNNs can also be used for non-image classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
- Convolutional layers
- Rectified Linear Unit (ReLU for short)
- Pooling layers
- Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
- Apply the kernel matrix from the top-left corner to the right.
- Perform element-wise multiplication.
- Sum the values of the products.
- The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
- Move the kernel down with respect to the size of the sliding window.
- Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be fine-tuned to detect specific features in the input image.
For instance:
- Averaging neighboring pixels kernel can be used to blur the input image.
- Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
- Max pooling which is the maximum value of the feature map
- Sum pooling corresponds to the sum of all the values of the feature map
- Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
There are 4 modules in this course
In the first course in this specialization, you had an introduction to TensorFlow, and how, with its high level APIs you could do basic image classification, and you learned a little bit about Convolutional Neural Networks (ConvNets). In this course you’ll go deeper into using ConvNets will real-world data, and learn about techniques that you can use to improve your ConvNet performance, particularly when doing image classification!In Week 1, this week, you’ll get started by looking at a much larger dataset than you’ve been using thus far: The Cats and Dogs dataset which had been a Kaggle Challenge in image classification!
What’s included
8 videos8 readings1 quiz1 programming assignment
You’ve heard the term overfitting a number of times to this point. Overfitting is simply the concept of being over specialized in training — namely that your model is very good at classifying what it is trained for, but not so good at classifying things that it hasn’t seen. In order to generalize your model more effectively, you will of course need a greater breadth of samples to train it on. That’s not always possible, but a nice potential shortcut to this is Image Augmentation, where you tweak the training set to potentially increase the diversity of subjects it covers. You’ll learn all about that this week!
What’s included
7 videos7 readings1 quiz1 programming assignment
Building models for yourself is great, and can be very powerful. But, as you’ve seen, you can be limited by the data you have on hand. Not everybody has access to massive datasets or the compute power that’s needed to train them effectively. Transfer learning can help solve this — where people with models trained on large datasets train them, so that you can either use them directly, or, you can use the features that they have learned and apply them to your scenario. This is Transfer learning, and you’ll look into that this week!
What’s included
7 videos5 readings1 quiz1 programming assignment
You’ve come a long way, Congratulations! One more thing to do before we move off of ConvNets to the next module, and that’s to go beyond binary classification. Each of the examples you’ve done so far involved classifying one thing or another — horse or human, cat or dog. When moving beyond binary into Categorical classification there are some coding considerations you need to take into account. You’ll look at them this week!
What’s included
6 videos8 readings1 quiz1 programming assignment
Instructor
Offered by
Recommended if you’re interested in Machine Learning
Hybrid models
You don’t have to go from eight qubits to one qubit using quantum convolution—you could have done one or two rounds of quantum convolution and fed the results into a classical neural network. This section explores quantum-classical hybrid models.
2.1 Hybrid model with a single quantum filter
Apply one layer of quantum convolution, reading out \(\langle \hat{Z}_n \rangle\) on all bits, followed by a densely-connected neural network.
2.1.1 Model definition
# 1-local operators to read out readouts = [cirq.Z(bit) for bit in cluster_state_bits[4:]] def multi_readout_model_circuit(qubits): """Make a model circuit with less quantum pool and conv operations.""" model_circuit = cirq.Circuit() symbols = sympy.symbols('qconv0:21') model_circuit += quantum_conv_circuit(qubits, symbols[0:15]) model_circuit += quantum_pool_circuit(qubits[:4], qubits[4:], symbols[15:21]) return model_circuit # Build a model enacting the logic in 2.1 of this notebook. excitation_input_dual = tf.keras.Input(shape=(), dtype=tf.dtypes.string) cluster_state_dual = tfq.layers.AddCircuit()( excitation_input_dual, prepend=cluster_state_circuit(cluster_state_bits)) quantum_model_dual = tfq.layers.PQC( multi_readout_model_circuit(cluster_state_bits), readouts)(cluster_state_dual) d1_dual = tf.keras.layers.Dense(8)(quantum_model_dual) d2_dual = tf.keras.layers.Dense(1)(d1_dual) hybrid_model = tf.keras.Model(inputs=[excitation_input_dual], outputs=[d2_dual]) # Display the model architecture tf.keras.utils.plot_model(hybrid_model, show_shapes=True, show_layer_names=False, dpi=70)
2.1.2 Train the model
hybrid_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), loss=tf.losses.mse, metrics=[custom_accuracy]) hybrid_history = hybrid_model.fit(x=train_excitations, y=train_labels, batch_size=16, epochs=25, verbose=1, validation_data=(test_excitations, test_labels))
Epoch 1/25 7/7 [==============================] – 1s 128ms/step – loss: 0.9753 – custom_accuracy: 0.5268 – val_loss: 0.8405 – val_custom_accuracy: 0.9167 7/7 [==============================] – 1s 98ms/step – loss: 0.7181 – custom_accuracy: 0.9018 – val_loss: 0.5449 – val_custom_accuracy: 0.8958 Epoch 3/25 7/7 [==============================] – 1s 97ms/step – loss: 0.4624 – custom_accuracy: 0.9018 – val_loss: 0.4371 – val_custom_accuracy: 0.8750 Epoch 4/25 7/7 [==============================] – 1s 95ms/step – loss: 0.3186 – custom_accuracy: 0.9375 – val_loss: 0.2288 – val_custom_accuracy: 0.9583 Epoch 5/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2721 – custom_accuracy: 0.9643 – val_loss: 0.2586 – val_custom_accuracy: 0.9583 Epoch 6/25 7/7 [==============================] – 1s 96ms/step – loss: 0.2889 – custom_accuracy: 0.9286 – val_loss: 0.2618 – val_custom_accuracy: 0.9583 Epoch 7/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2488 – custom_accuracy: 0.9464 – val_loss: 0.3037 – val_custom_accuracy: 0.9583 Epoch 8/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2975 – custom_accuracy: 0.9018 – val_loss: 0.3480 – val_custom_accuracy: 0.8750 Epoch 9/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2853 – custom_accuracy: 0.9375 – val_loss: 0.2324 – val_custom_accuracy: 0.9375 Epoch 10/25 7/7 [==============================] – 1s 96ms/step – loss: 0.2439 – custom_accuracy: 0.9375 – val_loss: 0.2389 – val_custom_accuracy: 0.9375 Epoch 11/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2130 – custom_accuracy: 0.9732 – val_loss: 0.2436 – val_custom_accuracy: 0.9583 Epoch 12/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2226 – custom_accuracy: 0.9643 – val_loss: 0.2430 – val_custom_accuracy: 0.9583 Epoch 13/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2356 – custom_accuracy: 0.9554 – val_loss: 0.3146 – val_custom_accuracy: 0.9375 Epoch 14/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2434 – custom_accuracy: 0.9554 – val_loss: 0.2291 – val_custom_accuracy: 0.9583 Epoch 15/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2438 – custom_accuracy: 0.9554 – val_loss: 0.2459 – val_custom_accuracy: 0.9792 Epoch 16/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2346 – custom_accuracy: 0.9286 – val_loss: 0.2400 – val_custom_accuracy: 0.9375 Epoch 17/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2121 – custom_accuracy: 0.9643 – val_loss: 0.2227 – val_custom_accuracy: 0.9583 Epoch 18/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2239 – custom_accuracy: 0.9821 – val_loss: 0.2249 – val_custom_accuracy: 0.9583 Epoch 19/25 7/7 [==============================] – 1s 93ms/step – loss: 0.2233 – custom_accuracy: 0.9643 – val_loss: 0.2326 – val_custom_accuracy: 0.9583 Epoch 20/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2178 – custom_accuracy: 0.9643 – val_loss: 0.2241 – val_custom_accuracy: 0.9792 Epoch 21/25 7/7 [==============================] – 1s 96ms/step – loss: 0.2120 – custom_accuracy: 0.9643 – val_loss: 0.2209 – val_custom_accuracy: 0.9792 Epoch 22/25 7/7 [==============================] – 1s 95ms/step – loss: 0.2268 – custom_accuracy: 0.9732 – val_loss: 0.2311 – val_custom_accuracy: 0.9583 Epoch 23/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2114 – custom_accuracy: 0.9821 – val_loss: 0.2165 – val_custom_accuracy: 0.9792 Epoch 24/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2088 – custom_accuracy: 0.9732 – val_loss: 0.2294 – val_custom_accuracy: 0.9792 Epoch 25/25 7/7 [==============================] – 1s 94ms/step – loss: 0.2375 – custom_accuracy: 0.9732 – val_loss: 0.2622 – val_custom_accuracy: 0.9583
plt.plot(history.history['val_custom_accuracy'], label='QCNN') plt.plot(hybrid_history.history['val_custom_accuracy'], label='Hybrid CNN') plt.title('Quantum vs Hybrid CNN performance') plt.xlabel('Epochs') plt.legend() plt.ylabel('Validation Accuracy') plt.show()
As you can see, with very modest classical assistance, the hybrid model will usually converge faster than the purely quantum version.
2.2 Hybrid convolution with multiple quantum filters
Now let’s try an architecture that uses multiple quantum convolutions and a classical neural network to combine them.
2.2.1 Model definition
excitation_input_multi = tf.keras.Input(shape=(), dtype=tf.dtypes.string) cluster_state_multi = tfq.layers.AddCircuit()( excitation_input_multi, prepend=cluster_state_circuit(cluster_state_bits)) # apply 3 different filters and measure expectation values quantum_model_multi1 = tfq.layers.PQC( multi_readout_model_circuit(cluster_state_bits), readouts)(cluster_state_multi) quantum_model_multi2 = tfq.layers.PQC( multi_readout_model_circuit(cluster_state_bits), readouts)(cluster_state_multi) quantum_model_multi3 = tfq.layers.PQC( multi_readout_model_circuit(cluster_state_bits), readouts)(cluster_state_multi) # concatenate outputs and feed into a small classical NN concat_out = tf.keras.layers.concatenate( [quantum_model_multi1, quantum_model_multi2, quantum_model_multi3]) dense_1 = tf.keras.layers.Dense(8)(concat_out) dense_2 = tf.keras.layers.Dense(1)(dense_1) multi_qconv_model = tf.keras.Model(inputs=[excitation_input_multi], outputs=[dense_2]) # Display the model architecture tf.keras.utils.plot_model(multi_qconv_model, show_shapes=True, show_layer_names=True, dpi=70)
2.2.2 Train the model
multi_qconv_model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), loss=tf.losses.mse, metrics=[custom_accuracy]) multi_qconv_history = multi_qconv_model.fit(x=train_excitations, y=train_labels, batch_size=16, epochs=25, verbose=1, validation_data=(test_excitations, test_labels))
Epoch 1/25 7/7 [==============================] – 2s 152ms/step – loss: 0.9256 – custom_accuracy: 0.6696 – val_loss: 0.7474 – val_custom_accuracy: 0.7500 7/7 [==============================] – 1s 107ms/step – loss: 0.5610 – custom_accuracy: 0.8214 – val_loss: 0.5227 – val_custom_accuracy: 0.8125 Epoch 3/25 7/7 [==============================] – 1s 119ms/step – loss: 0.3334 – custom_accuracy: 0.9464 – val_loss: 0.3507 – val_custom_accuracy: 0.9167 7/7 [==============================] – 1s 126ms/step – loss: 0.3083 – custom_accuracy: 0.9375 – val_loss: 0.2617 – val_custom_accuracy: 0.9375 7/7 [==============================] – 1s 108ms/step – loss: 0.2863 – custom_accuracy: 0.9375 – val_loss: 0.3249 – val_custom_accuracy: 0.9792 Epoch 6/25 7/7 [==============================] – 1s 114ms/step – loss: 0.2682 – custom_accuracy: 0.9554 – val_loss: 0.2564 – val_custom_accuracy: 1.0000 Epoch 7/25 7/7 [==============================] – 1s 103ms/step – loss: 0.2340 – custom_accuracy: 0.9643 – val_loss: 0.2705 – val_custom_accuracy: 0.9583 Epoch 8/25 7/7 [==============================] – 1s 107ms/step – loss: 0.2273 – custom_accuracy: 0.9554 – val_loss: 0.2497 – val_custom_accuracy: 1.0000 Epoch 9/25 7/7 [==============================] – 1s 110ms/step – loss: 0.2416 – custom_accuracy: 0.9732 – val_loss: 0.2218 – val_custom_accuracy: 0.9583 Epoch 10/25 7/7 [==============================] – 1s 105ms/step – loss: 0.2410 – custom_accuracy: 0.9643 – val_loss: 0.2486 – val_custom_accuracy: 0.9583 Epoch 11/25 7/7 [==============================] – 1s 110ms/step – loss: 0.2327 – custom_accuracy: 0.9643 – val_loss: 0.2254 – val_custom_accuracy: 0.9583 Epoch 12/25 7/7 [==============================] – 1s 103ms/step – loss: 0.2096 – custom_accuracy: 0.9643 – val_loss: 0.2275 – val_custom_accuracy: 0.9792 Epoch 13/25 7/7 [==============================] – 1s 105ms/step – loss: 0.2137 – custom_accuracy: 0.9732 – val_loss: 0.2493 – val_custom_accuracy: 1.0000 Epoch 14/25 7/7 [==============================] – 1s 103ms/step – loss: 0.2436 – custom_accuracy: 0.9821 – val_loss: 0.2357 – val_custom_accuracy: 1.0000 Epoch 15/25 7/7 [==============================] – 1s 112ms/step – loss: 0.2255 – custom_accuracy: 0.9732 – val_loss: 0.2277 – val_custom_accuracy: 0.9792 Epoch 16/25 7/7 [==============================] – 1s 115ms/step – loss: 0.2115 – custom_accuracy: 0.9911 – val_loss: 0.2264 – val_custom_accuracy: 0.9792 Epoch 17/25 7/7 [==============================] – 1s 109ms/step – loss: 0.2156 – custom_accuracy: 0.9732 – val_loss: 0.2341 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 110ms/step – loss: 0.2014 – custom_accuracy: 0.9732 – val_loss: 0.2275 – val_custom_accuracy: 1.0000 Epoch 19/25 7/7 [==============================] – 1s 109ms/step – loss: 0.2100 – custom_accuracy: 0.9732 – val_loss: 0.2167 – val_custom_accuracy: 1.0000 Epoch 20/25 7/7 [==============================] – 1s 104ms/step – loss: 0.2163 – custom_accuracy: 0.9911 – val_loss: 0.2167 – val_custom_accuracy: 0.9792 Epoch 21/25 7/7 [==============================] – 1s 115ms/step – loss: 0.2125 – custom_accuracy: 0.9911 – val_loss: 0.2386 – val_custom_accuracy: 1.0000 Epoch 22/25 7/7 [==============================] – 1s 110ms/step – loss: 0.2032 – custom_accuracy: 0.9732 – val_loss: 0.2249 – val_custom_accuracy: 1.0000 Epoch 23/25 7/7 [==============================] – 1s 115ms/step – loss: 0.2030 – custom_accuracy: 0.9821 – val_loss: 0.2273 – val_custom_accuracy: 0.9792 Epoch 24/25 7/7 [==============================] – 1s 116ms/step – loss: 0.2022 – custom_accuracy: 0.9911 – val_loss: 0.2174 – val_custom_accuracy: 1.0000 Epoch 25/25 7/7 [==============================] – 1s 115ms/step – loss: 0.1975 – custom_accuracy: 0.9911 – val_loss: 0.2070 – val_custom_accuracy: 1.0000
plt.plot(history.history['val_custom_accuracy'][:25], label='QCNN') plt.plot(hybrid_history.history['val_custom_accuracy'][:25], label='Hybrid CNN') plt.plot(multi_qconv_history.history['val_custom_accuracy'][:25], label='Hybrid CNN \n Multiple Quantum Filters') plt.title('Quantum vs Hybrid CNN performance') plt.xlabel('Epochs') plt.legend() plt.ylabel('Validation Accuracy') plt.show()
This article was published as a part of the Data Science Blogathon
This article aims to explain Convolutional Neural Network and how to Build CNN using the TensorFlow Keras library. This article will discuss the following topics.
Let’s first discuss Convolutional Neural Network.
Deep learning is a very significant subset of machine learning because of its high performance across various domains. Convolutional Neural Network (CNN), is a powerful image processing deep learning type often using in computer vision that comprises an image and video recognition along with a recommender system and natural language processing ( NLP).
CNN uses a multilayer system consists of the input layer, output layer, and a hidden layer that comprises multiple convolutional layers, pooling layers, fully connected layers. We will discuss all layers in the next section of the article while explaining the building of CNN.
Let’s discuss the building of CNN using the Keras library along with an explanation of the working of CNN.
We will use the Malaria Cell Image dataset. This dataset consists of 27,558 images of microscopic blood samples. The dataset consists of 2 folders – folders-Parasitized and Uninfected. Sample Images-
a) parasitized blood sample
b) Uninfected blood sample
We will discuss the building of CNN along with CNN working in following 6 steps –
Step1 – Import Required libraries
Step2 – Initializing CNN & add a convolutional layer
Step3 – Pooling operation
Step4 – Add two convolutional layers
Step5 – Flattening operation
Step6 – Fully connected layer & output layer
These 6 steps will explain the working of CNN, which is shown in the below image –
Now, let’s discuss each step –
Kindly refer to the below link for detailed explanations of Keras modules.
https://keras.io/getting_started/
Python Code :
from tensorflow.keras.layers import Input, Lambda, Dense, Flatten,Conv2D from tensorflow.keras.models import Model from tensorflow.keras.applications.vgg19 import VGG19 from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow.keras.preprocessing import image from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img from tensorflow.keras.models import Sequential import numpy as np from glob import glob import matplotlib.pyplot as plt from tensorflow.keras.layers import MaxPooling2D
Python Code :
model=Sequential() model.add(Conv2D(filters=16,kernel_size=2,padding=”same”,activation=”relu”,input_shape=(224,224,3)))
We first need to initiate sequential class since there are various layers to build CNN which all must be in sequence. Then we add the first convolutional layer where we need to specify 5 arguments. So, let’s discuss each argument and its purpose.
The primary purpose of convolution is to find features in the image using a feature detector. Then put them into a feature map, which preserves distinct features of images.
Feature detector which is known as a filter also is initialized randomly and then after a lot of iteration, filter matrix parameter selected which will be best for separating images. For instance, animals’ eye, nose, etc. will be considered as a feature which is used for classifying images using filter or feature detectors. Here we are using 16 features.
Kernel_size refers to filter matrix size. Here we are using a 2*2 filter size.
Let’s discuss what is problem with CNN and how the padding operation will solve the problem.
a. For a gray scale (n x n) image and (f x f) filter/kernel, the dimensions of the image resulting from a convolution operation is (n – f + 1) x (n – f + 1).
So for instances, a 5*7 image and 3*3 filter kernel size, the output result after convolution operation would be a size of 3*5. Thus, the image shrinks every time after the convolutional operation
b. Pixels, located on corners are contributed very little compared to middle pixels.
So, then to mitigate these problems, padding operation is done. Padding is a simple process of adding layers with 0 or -1 to input images so to avoid above mentioned problems.
Here we are using Padding = Same arguments, which depicts that output images have the same dimensions as input images.
Since images are non-linear, to bring non-linearity, the relu activation function is applied after the convolutional operation.
Relu stands for Rectified linear activation function. Relu function will output the input directly if it is positive, otherwise, it will output zero.
This argument shows image size – 224*224*3. Since the images in RGB format so, the third dimension of the image is 3.
Python Code :
model.add(MaxPooling2D(pool_size=2))
We need to apply the pooling operation after initializing CNN. Pooling is an operation of down sampling of the image. The pooling layer is used to reduce the dimensions of the feature maps. Thus, the Pooling layer reduces the number of parameters to learn and reduces computation in the neural network.
Future operations are performed on summarized features created by the pooling layer. instead of precisely positioned features generated by the convolution layer. This leads the model more robust to variations in the orientation of the feature in the image.
There are mainly 3 types of pooling: –
1. Max Pooling
2. Average Pooling
3. Global Pooling
In order to add two more convolutional layers, we need to repeat steps 2 &3 with slight modification in the number of filters.
Python Code :
model.add(Conv2D(filters=32,kernel_size=2,padding=”same”,activation =”relu”)) model.add(MaxPooling2D(pool_size=2)) model.add(Conv2D(filters=64,kernel_size=2,padding=”same”,activation=”relu”)) model.add(MaxPooling2D(pool_size=2))
We modified the 2nd and 3rd convolutional layers with filter numbers 32 & 64 respectively.
Python Code :
model.add(Flatten())
Flattening operation is converting the dataset into a 1-D array for input into the next layer which is the fully connected layer.
After finishing the 3 steps, now we have pooled feature map. We are now flattening our output after two steps into a column. Because we need to insert this 1-D data into an artificial neural network layer.
The output of the flattening operation work as input for the neural network. The aim of the artificial neural network makes the convolutional neural network more advanced and capable enough of classifying images.
Here we are using a dense class from the Keras library from creating a fully connected layer and output layer.
Python Code :
model.add(Dense(500,activation=”relu”)) model.add(Dense(2,activation=”softmax”))
The softMax activation function is used for building the output layer. Let’s discuss the softmax activation function.
Softmax Activation Function
It is used as the last activation function of a neural network to bring the output of the neural network to a probability distribution over predicting classes. The output of Softmax is in probabilities of each possible outcome for predicting class. The probabilities sum should be one for all possible predicting classes.
Now, let’s discuss training and evaluation of the Convolutional neural network. We will be discussing this section in 3 steps;-
Step 1 – Compile CNN model
Step 2 – Fit model on training set
Step 3 – Evaluate Result
Code line-
model.compile(loss=’categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])
Here we are using 3 arguments:-
· Loss function
We are using the categorical_crossentropy loss function that is used in the classification task. This loss is a very good measure of how distinguishable two discrete probability distributions are from each other.
Kindly refer to the below link for a detailed discussion of different types of loss function:
https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
· Optimizer
We are using adam Optimizer that is used to update neural network weights and learning rate. Optimizers are used to solve optimization problems by minimizing the function.
Kindly refer to the below link for a detailed explanation of different types of optimizer:
https://medium.datadriveninvestor.com/overview-of-different-optimizers-for-neural-networks-e0ed119440c3
· Metrics arguments
Here, we are using Accuracy as a metrics to evaluate the performance of the Convolutional neural network algorithm.
Code line:
model.fit_generator(training_set,validation_data=test_set,epochs=50, steps_per_epoch=len(training_set), validation_steps=len(test_set) )
We are fitting the CNN model on the training dataset with 50 iterations and each iteration has different steps for training and evaluating steps based on the length of the test and training set.
We compare the accuracy and loss function for both the training and test dataset.
Code: Plotting loss graph
plt.plot(r.history[‘loss’], label=’train loss’) plt.plot(r.history[‘val_loss’], label=’val loss’) plt.legend() plt.show() plt.savefig(‘LossVal_loss’)
Output
Loss is the penalty for a bad prediction. The aim is to make the validation loss as low as possible. Some overfitting is nearly always a good thing. All that matters, in the end, is: is the validation loss as low as you can get it.
Code: Plotting Accuracy graph
plt.plot(r.history[‘accuracy’], label=’train acc’) plt.plot(r.history[‘val_accuracy’], label=’val acc’) plt.legend() plt.show() plt.savefig(‘AccVal_acc’)
Output
Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Here, we can observe that accuracy inches towards 90% on validating test which depicts a CNN model is performing well on accuracy metrics.
Thanks for reading! Happy Deep learning!!
References:
1. https://www.superdatascience.com/
2. https://www.youtube.com/watch?v=H-bcnHE6Mes
I’m Jitendra Sharma, Data Science Intern at Nabler, pursuing PGDM-Big Data Analytics from Goa Institute of Management. You can contact me through LinkedIn and Github.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Write, captivate, and earn accolades and rewards for your work
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
Master Generative AI for CV
Compile the Model
The next step is to compile the model, where we specify the optimizer type and loss function and any additional metrics we would like recorded during training. Here we specify
RMSProp
as the optimizer type for gradient descent, and we use a cross-entropy loss function which is the standard loss function for classification problems. We specifically use
categorical_crossentropy
since our labels are one-hot encoded. Finally, we specify
accuracy
as an additional metric to record during training. The value of the loss function is always recorded by default, but if you want accuracy, you need to specify it.
model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model
Since the dataset does not include a validation dataset, and since we did not previously split the training dataset to create a validation dataset, we will use the
validation_split
argument below so that 30% of the training dataset is automatically reserved for validation. In this case, this approach reserves the last 30% of the training dataset for validation. This is a very convenient approach, but if the training dataset has any specific ordering (say, ordered by classes), you will need to take steps to randomize the order before splitting.
history = model.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
2023-01-16 07:36:41.659504: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2023-01-16 07:36:42.049455: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 1.9926 – accuracy: 0.2704
2023-01-16 07:36:45.531645: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – 5s 28ms/step – loss: 1.9926 – accuracy: 0.2704 – val_loss: 1.7339 – val_accuracy: 0.3640 Epoch 2/31 137/137 [==============================] – 3s 25ms/step – loss: 1.5905 – accuracy: 0.4254 – val_loss: 1.4228 – val_accuracy: 0.4887 Epoch 3/31 137/137 [==============================] – 3s 25ms/step – loss: 1.3743 – accuracy: 0.5076 – val_loss: 1.2851 – val_accuracy: 0.5380 : : Epoch 29/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0569 – accuracy: 0.9814 – val_loss: 2.0920 – val_accuracy: 0.7137 Epoch 30/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0543 – accuracy: 0.9837 – val_loss: 2.1936 – val_accuracy: 0.7253 Epoch 31/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0520 – accuracy: 0.9834 – val_loss: 2.1732 – val_accuracy: 0.7227
Plot the Training Results
The function below is a convenience function to plot training and validation losses and training and validation accuracies. It has a single required argument which is a list of metrics to plot.
def plot_results(metrics, title=None, ylabel=None, ylim=None, metric_name=None, color=None): fig, ax = plt.subplots(figsize=(15, 4)) if not (isinstance(metric_name, list) or isinstance(metric_name, tuple)): metrics = [metrics,] metric_name = [metric_name,] for idx, metric in enumerate(metrics): ax.plot(metric, color=color[idx]) plt.xlabel(“Epoch”) plt.ylabel(ylabel) plt.title(title) plt.xlim([0, TrainingConfig.EPOCHS-1]) plt.ylim(ylim) # Tailor x-axis tick marks ax.xaxis.set_major_locator(MultipleLocator(5)) ax.xaxis.set_major_formatter(FormatStrFormatter(‘%d’)) ax.xaxis.set_minor_locator(MultipleLocator(1)) plt.grid(True) plt.legend(metric_name) plt.show() plt.close()
The loss and accuracy metrics can be accessed from the
history
object returned from the fit method. We access the metrics using predefined dictionary keys, as shown below.
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
The results from our baseline model reveal that the model is overfitting. Notice that the validation loss increases after about ten epochs of training while the training loss continues to decline. This means that the network learns how to model the training data well but does not generalize to unseen test data well. The accuracy plot shows a similar trend where the validation accuracy levels off after about ten epochs while the training accuracy continues to approach 100% as training progresses. This is a common problem when training neural networks and can occur for a number of reasons. One reason is that the model can fit the nuances of the training dataset, especially when the training dataset is small.
What are Tensors?
We mainly deal with high-dimensional data when building machine learning and deep learning models. Tensors are multi-dimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
- A 0-dimensional tensor contains a single value.
- A 1-dimensional tensor, also known as “rank-1” tensor is list of values.
- A 2-dimensional tensor is a “rank-2” tensor.
- Finally, we can have a N-dimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3-dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
- The actual value of the tensor.
- The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
- The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
- Layer with hav 32 filters
- Layer with 64 filters
- Layer with 128 filters
In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
- Create For loops that define the number of training iterations as specified above
- Create an inner For loop to specify the number of batches we specified above
- Pass training images and labels using variables batch_x and batch_y
- Define x and y placeholders to hold parameters the training images
- After each training iteration, run the loss function and check training accuracy
- After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images. Because this tutorial uses the Keras Sequential API, creating and training your model will take just a few lines of code.
Import TensorFlow
import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt
2023-10-27 06:01:15.153603: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-27 06:01:15.153656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-27 06:01:15.155401: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Download and prepare the CIFAR10 dataset
The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 [==============================] – 2s 0us/step
Verify the data
To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Create the convolutional base
The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure your CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument
input_shape
to your first layer.
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Let’s display the architecture of your model so far:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 ================================================================= Total params: 56320 (220.00 KB) Trainable params: 56320 (220.00 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________
Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
Add Dense layers on top
To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.
model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10))
Here’s the complete architecture of your model:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 10) 650 ================================================================= Total params: 122570 (478.79 KB) Trainable params: 122570 (478.79 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________
The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
Compile and train the model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
Epoch 1/10 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1563/1563 [==============================] – 10s 5ms/step – loss: 1.5211 – accuracy: 0.4429 – val_loss: 1.2497 – val_accuracy: 0.5531 Epoch 2/10 1563/1563 [==============================] – 6s 4ms/step – loss: 1.1408 – accuracy: 0.5974 – val_loss: 1.1474 – val_accuracy: 0.6023 Epoch 3/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.9862 – accuracy: 0.6538 – val_loss: 0.9759 – val_accuracy: 0.6582 Epoch 4/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8929 – accuracy: 0.6879 – val_loss: 0.9412 – val_accuracy: 0.6702 Epoch 5/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8183 – accuracy: 0.7131 – val_loss: 0.8830 – val_accuracy: 0.6967 Epoch 6/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7588 – accuracy: 0.7334 – val_loss: 0.8671 – val_accuracy: 0.7039 Epoch 7/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7126 – accuracy: 0.7518 – val_loss: 0.8972 – val_accuracy: 0.6897 Epoch 8/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6655 – accuracy: 0.7661 – val_loss: 0.8412 – val_accuracy: 0.7111 Epoch 9/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6205 – accuracy: 0.7851 – val_loss: 0.8581 – val_accuracy: 0.7109 Epoch 10/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.5872 – accuracy: 0.7937 – val_loss: 0.8817 – val_accuracy: 0.7113
Evaluate the model
plt.plot(history.history['accuracy'], label='accuracy') plt.plot(history.history['val_accuracy'], label = 'val_accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.ylim([0.5, 1]) plt.legend(loc='lower right') test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
313/313 – 1s – loss: 0.8817 – accuracy: 0.7113 – 655ms/epoch – 2ms/step
print(test_acc)
0.7113000154495239
Your simple CNN has achieved a test accuracy of over 70%. Not bad for a few lines of code! For another CNN style, check out the TensorFlow 2 quickstart for experts example that uses the Keras subclassing API and
tf.GradientTape
.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a step-by-step implementation of CNN in TensorFlow Framework 2.
Introduction to CNN
Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks like reading zip codes and digits.
Have you ever wondered how facial recognition works on social media, or how object detection helps in building self-driving cars, or how disease detection is done using visual imagery in healthcare? It’s all possible thanks to convolutional neural networks (CNN). Here’s an example of convolutional neural networks that illustrates how they work:
Imagine there’s an image of a bird, and you want to identify whether it’s really a bird or some other object. The first thing you do is feed the pixels of the image in the form of arrays to the input layer of the neural network (multi-layer networks used to classify things). The hidden layers carry out feature extraction by performing different calculations and manipulations. There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature extraction from the image. Finally, there’s a fully connected layer that identifies the object in the image.
Fig: Convolutional Neural Network to identify the image of a bird
CNN Model Implementation in Keras
In this section, we will define a simple CNN model in Keras and train it on the CIRFAR-10 dataset. Recall from a previous post the following steps required to define and train a model in Keras.
- Build/Define a network model using predefined layers in Keras.
-
Compile the model with
model.compile()
-
Train the model with
model.fit()
Model Structure
Before we get into the coding details, let’s first take a look at the general structure of the model we’re proposing. Notice that the model has a similar structure to VGG-16 but has fewer layers and a much smaller input image size, and therefore far fewer trainable parameters. The model contains three convolutional blocks followed by a fully connected layer and an output layer. For reference, we’ve included the number of channels at key points in the architecture. We have also indicated the spatial size of the activation maps at the end of each convolutional block. This is a good visual to refer back to when studying the code below.
For convenience, we’re going to define the model in a function. Notice that the function has one optional argument: the input shape for the model. We first start by instantiating the model by calling the
sequential()
method. This allows us to build a model sequentially by adding one layer at a time. Notice that we define three convolutional blocks and that their structure is very similar.
Define the Convolutional Blocks for the CNN
Let’s start with the very first convolutional layer in the first convolutional block. To define a convolutional layer in Keras, we call the
Conv2D()
function, which takes several input arguments. First, we defined the layer to have 32 filters. The kernel size for each filter is 3 (which is interpreted as 3×3). We use a padding option called
same
, which will pad the input tensor so that the output of the convolution operation has the same spatial size as the input. This is not required, but it’s commonly used. if you don’t explicitly specify this padding option, then the default behavior has no padding, and therefore, the spatial size of output from the convolutional layer will be slightly smaller than the input size. We use a
ReLU
activation function in all the layers in the Network except for the output layer.
For the very first convolutional layer, we need to specify the shape of the input, but for all subsequent layers, this is not necessary since the shape of the input is automatically computed based on the shape of the output from previous layers, so we have two convolutional layers with 32 filters each, and then we follow that with a max pooling layer that has a window size of (2×2), so the output shape from this first convolution block is (16×16 x32). Next, we have the second convolutional block, which is nearly identical to the first, with the exception that we have 64 filters in each convolutional layer instead of 32, and then finally, the third convolutional block is an exact copy of the second convolutional block.
Note
The number of filters in each convolutional layer is something that you will need to experiment with. A larger number of filters allows the model to have a greater learning capacity, but this also needs to be balanced with the amount of data available to train the model. Adding too many filters (or layers) can lead to overfitting, one of the most common issues encountered when training models.
def cnn_model(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dense(10, activation=’softmax’)) return model
Define the Classifier for the CNN
Before we define the fully connected layers for the classifier, we need to first flatten the two-dimensional activation maps that are produced by the last convolutional layer (which have a spatial shape of 4×4 with 64 channels). This is accomplished by calling the
flatten()
function to create a 1-dimensional vector of length 1024. We then add a densely connected layer with 512 neurons and a fully connected output layer with ten neurons because we have ten classes in our dataset. And to avoid any confusion, we’ve also provided a detailed diagram of the fully connected layers.
Create the Model
We can now create an instance of the model by calling the function above and use the
summary()
method to display the model summary to the console.
# Create the model. model = cnn_model() model.summary()
Metal device set to: Apple M1 Max Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) conv2d_2 (Conv2D) (None, 16, 16, 64) 18496 conv2d_3 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) conv2d_4 (Conv2D) (None, 8, 8, 64) 36928 conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_2 (MaxPooling (None, 4, 4, 64) 0 2D) flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 512) 524800 dense_1 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Non-trainable params: 0 _________________________________________________________________
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
Setup
pip install tensorflow==2.7.0
Install TensorFlow Quantum:
pip install tensorflow-quantum==0.7.2
# Update package resources to account for version changes. import importlib, pkg_resources importlib.reload(pkg_resources)
Now import TensorFlow and the module dependencies:
import tensorflow as tf import tensorflow_quantum as tfq import cirq import sympy import numpy as np # visualization tools %matplotlib inline import matplotlib.pyplot as plt from cirq.contrib.svg import SVGCircuit
2023-08-28 11:34:33.867753: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
- We can think of a matrice as a tensor with only two dimensions.
- Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
- They can deal with any number of dimensions, which makes them a better fit for multi-dimensional data.
- Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
- Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
- Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
Saving and Loading Models
Saving and loading models are very convenient. This enables you to develop and train a model, save it to the file system and then load it at some future time for use. This section will cover the basic operations for saving and loading models.
Saving Models
You can easily save a model using the
save()
method which will save the model to the file system in the ‘SavedModel’ format. This method creates a folder on the file system. Within this folder, the model architecture and training configuration (including the optimizer, losses, and metrics) are stored in
saved_model.pb
. The
variables/
folder contains a standard training checkpoint file that includes the weights of the model. We will delve into these details in later modules. For now, let’s save the trained model, and then we’ll load it in the next code cell with a different name and continue using it in the remainder of the post.
# Using the save() method, the model will be saved to the file system in the ‘SavedModel’ format. model_dropout.save(‘model_dropout’)
INFO:tensorflow:Assets written to: CFIRAR_Classifier/assets
Loading Models
from tensorflow.keras import models reloaded_model_dropout = models.load_model(‘model_dropout’)
Final remarks
In this article, you have learned CNNs from their intuition to their applications in the real world. You have also seen that you can use existing architectures to hasten your model development process. Specifically, you have covered:
- what convolutional neural networks are
- how convolutional neural networks work
- using pre-trained convolutional neural networks to run image classification
- building convolutional neural networks from scratch using Keras and TensorFlow
- how to plot the learning curves of your neural network
- preventing overfitting using DropOut regularization and batch normalization
- saving your best model using the model checkpoint callback
- how to stop the training process of your CNN when it stops improving
- how you can save and load the model again
…just to mention a few.
And that’s not the end of it, you can explore all the examples used in this article on this Google Colab Notebook. Feel free to play with the parameters of the models to see how they affect the performance of the model.
Build a QCNN
1.1 Assemble circuits in a TensorFlow graph
TensorFlow Quantum (TFQ) provides layer classes designed for in-graph circuit construction. One example is the
tfq.layers.AddCircuit
layer that inherits from
tf.keras.Layer
. This layer can either prepend or append to the input batch of circuits, as shown in the following figure.
The following snippet uses this layer:
qubit = cirq.GridQubit(0, 0) # Define some circuits. circuit1 = cirq.Circuit(cirq.X(qubit)) circuit2 = cirq.Circuit(cirq.H(qubit)) # Convert to a tensor. input_circuit_tensor = tfq.convert_to_tensor([circuit1, circuit2]) # Define a circuit that we want to append y_circuit = cirq.Circuit(cirq.Y(qubit)) # Instantiate our layer y_appender = tfq.layers.AddCircuit() # Run our circuit tensor through the layer and save the output. output_circuit_tensor = y_appender(input_circuit_tensor, append=y_circuit)
Examine the input tensor:
print(tfq.from_tensor(input_circuit_tensor))
[cirq.Circuit([ cirq.Moment( cirq.X(cirq.GridQubit(0, 0)), ), ]) cirq.Circuit([ cirq.Moment( cirq.H(cirq.GridQubit(0, 0)), ), ]) ]
And examine the output tensor:
print(tfq.from_tensor(output_circuit_tensor))
[cirq.Circuit([ cirq.Moment( cirq.X(cirq.GridQubit(0, 0)), ), cirq.Moment( cirq.Y(cirq.GridQubit(0, 0)), ), ]) cirq.Circuit([ cirq.Moment( cirq.H(cirq.GridQubit(0, 0)), ), cirq.Moment( cirq.Y(cirq.GridQubit(0, 0)), ), ]) ]
While it is possible to run the examples below without using
tfq.layers.AddCircuit
, it’s a good opportunity to understand how complex functionality can be embedded into TensorFlow compute graphs.
1.2 Problem overview
You will prepare a cluster state and train a quantum classifier to detect if it is “excited” or not. The cluster state is highly entangled but not necessarily difficult for a classical computer. For clarity, this is a simpler dataset than the one used in the paper.
For this classification task you will implement a deep MERA-like QCNN architecture since:
- Like the QCNN, the cluster state on a ring is translationally invariant.
- The cluster state is highly entangled.
This architecture should be effective at reducing entanglement, obtaining the classification by reading out a single qubit.
An “excited” cluster state is defined as a cluster state that had a
cirq.rx
gate applied to any of its qubits. Qconv and QPool are discussed later in this tutorial.
1.3 Building blocks for TensorFlow
One way to solve this problem with TensorFlow Quantum is to implement the following:
- The input to the model is a circuit tensor—either an empty circuit or an X gate on a particular qubit indicating an excitation.
-
The rest of the model’s quantum components are constructed with
tfq.layers.AddCircuit
layers. -
For inference a
tfq.layers.PQC
layer is used. This reads \(\langle \hat{Z} \rangle\) and compares it to a label of 1 for an excited state, or -1 for a non-excited state.
1.4 Data
Before building your model, you can generate your data. In this case it’s going to be excitations to the cluster state (The original paper uses a more complicated dataset). Excitations are represented with
cirq.rx
gates. A large enough rotation is deemed an excitation and is labeled and a rotation that isn’t large enough is labeled
-1
and deemed not an excitation.
def generate_data(qubits): """Generate training and testing data.""" n_rounds = 20 # Produces n_rounds * n_qubits datapoints. excitations = [] labels = [] for n in range(n_rounds): for bit in qubits: rng = np.random.uniform(-np.pi, np.pi) excitations.append(cirq.Circuit(cirq.rx(rng)(bit))) labels.append(1 if (-np.pi / 2) <= rng <= (np.pi / 2) else -1) split_ind = int(len(excitations) * 0.7) train_excitations = excitations[:split_ind] test_excitations = excitations[split_ind:] train_labels = labels[:split_ind] test_labels = labels[split_ind:] return tfq.convert_to_tensor(train_excitations), np.array(train_labels), \ tfq.convert_to_tensor(test_excitations), np.array(test_labels)
You can see that just like with regular machine learning you create a training and testing set to use to benchmark the model. You can quickly look at some datapoints with:
sample_points, sample_labels, _, __ = generate_data(cirq.GridQubit.rect(1, 4)) print('Input:', tfq.from_tensor(sample_points)[0], 'Output:', sample_labels[0]) print('Input:', tfq.from_tensor(sample_points)[1], 'Output:', sample_labels[1])
Input: (0, 0): ───X^-0.225─── Output: 1 Input: (0, 1): ───X^-0.973─── Output: -1
1.5 Define layers
Now define the layers shown in the figure above in TensorFlow.
1.5.1 Cluster state
The first step is to define the cluster state using Cirq, a Google-provided framework for programming quantum circuits. Since this is a static part of the model, embed it using the
tfq.layers.AddCircuit
functionality.
def cluster_state_circuit(bits): """Return a cluster state on the qubits in `bits`.""" circuit = cirq.Circuit() circuit.append(cirq.H.on_each(bits)) for this_bit, next_bit in zip(bits, bits[1:] + [bits[0]]): circuit.append(cirq.CZ(this_bit, next_bit)) return circuit
Display a cluster state circuit for a rectangle of
cirq.GridQubit
s:
SVGCircuit(cluster_state_circuit(cirq.GridQubit.rect(1, 4)))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
1.5.2 QCNN layers
Define the layers that make up the model using the Cong and Lukin QCNN paper. There are a few prerequisites:
- The one- and two-qubit parameterized unitary matrices from the Tucci paper.
- A general parameterized two-qubit pooling operation.
def one_qubit_unitary(bit, symbols): """Make a Cirq circuit enacting a rotation of the bloch sphere about the X, Y and Z axis, that depends on the values in `symbols`. """ return cirq.Circuit( cirq.X(bit)**symbols[0], cirq.Y(bit)**symbols[1], cirq.Z(bit)**symbols[2]) def two_qubit_unitary(bits, symbols): """Make a Cirq circuit that creates an arbitrary two qubit unitary.""" circuit = cirq.Circuit() circuit += one_qubit_unitary(bits[0], symbols[0:3]) circuit += one_qubit_unitary(bits[1], symbols[3:6]) circuit += [cirq.ZZ(*bits)**symbols[6]] circuit += [cirq.YY(*bits)**symbols[7]] circuit += [cirq.XX(*bits)**symbols[8]] circuit += one_qubit_unitary(bits[0], symbols[9:12]) circuit += one_qubit_unitary(bits[1], symbols[12:]) return circuit def two_qubit_pool(source_qubit, sink_qubit, symbols): """Make a Cirq circuit to do a parameterized 'pooling' operation, which attempts to reduce entanglement down from two qubits to just one.""" pool_circuit = cirq.Circuit() sink_basis_selector = one_qubit_unitary(sink_qubit, symbols[0:3]) source_basis_selector = one_qubit_unitary(source_qubit, symbols[3:6]) pool_circuit.append(sink_basis_selector) pool_circuit.append(source_basis_selector) pool_circuit.append(cirq.CNOT(control=source_qubit, target=sink_qubit)) pool_circuit.append(sink_basis_selector**-1) return pool_circuit
To see what you created, print out the one-qubit unitary circuit:
SVGCircuit(one_qubit_unitary(cirq.GridQubit(0, 0), sympy.symbols('x0:3')))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
And the two-qubit unitary circuit:
SVGCircuit(two_qubit_unitary(cirq.GridQubit.rect(1, 2), sympy.symbols('x0:15')))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
And the two-qubit pooling circuit:
SVGCircuit(two_qubit_pool(*cirq.GridQubit.rect(1, 2), sympy.symbols('x0:6')))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
1.5.2.1 Quantum convolution
As in the Cong and Lukin paper, define the 1D quantum convolution as the application of a two-qubit parameterized unitary to every pair of adjacent qubits with a stride of one.
def quantum_conv_circuit(bits, symbols): """Quantum Convolution Layer following the above diagram. Return a Cirq circuit with the cascade of `two_qubit_unitary` applied to all pairs of qubits in `bits` as in the diagram above. """ circuit = cirq.Circuit() for first, second in zip(bits[0::2], bits[1::2]): circuit += two_qubit_unitary([first, second], symbols) for first, second in zip(bits[1::2], bits[2::2] + [bits[0]]): circuit += two_qubit_unitary([first, second], symbols) return circuit
Display the (very horizontal) circuit:
SVGCircuit( quantum_conv_circuit(cirq.GridQubit.rect(1, 8), sympy.symbols('x0:15')))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
1.5.2.2 Quantum pooling
A quantum pooling layer pools from \(N\) qubits to \(\frac{N}{2}\) qubits using the two-qubit pool defined above.
def quantum_pool_circuit(source_bits, sink_bits, symbols): """A layer that specifies a quantum pooling operation. A Quantum pool tries to learn to pool the relevant information from two qubits onto 1. """ circuit = cirq.Circuit() for source, sink in zip(source_bits, sink_bits): circuit += two_qubit_pool(source, sink, symbols) return circuit
Examine a pooling component circuit:
test_bits = cirq.GridQubit.rect(1, 8) SVGCircuit( quantum_pool_circuit(test_bits[:4], test_bits[4:], sympy.symbols('x0:6')))
findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found. findfont: Font family ‘Arial’ not found.
1.6 Model definition
Now use the defined layers to construct a purely quantum CNN. Start with eight qubits, pool down to one, then measure \(\langle \hat{Z} \rangle\).
def create_model_circuit(qubits): """Create sequence of alternating convolution and pooling operators which gradually shrink over time.""" model_circuit = cirq.Circuit() symbols = sympy.symbols('qconv0:63') # Cirq uses sympy.Symbols to map learnable variables. TensorFlow Quantum # scans incoming circuits and replaces these with TensorFlow variables. model_circuit += quantum_conv_circuit(qubits, symbols[0:15]) model_circuit += quantum_pool_circuit(qubits[:4], qubits[4:], symbols[15:21]) model_circuit += quantum_conv_circuit(qubits[4:], symbols[21:36]) model_circuit += quantum_pool_circuit(qubits[4:6], qubits[6:], symbols[36:42]) model_circuit += quantum_conv_circuit(qubits[6:], symbols[42:57]) model_circuit += quantum_pool_circuit([qubits[6]], [qubits[7]], symbols[57:63]) return model_circuit # Create our qubits and readout operators in Cirq. cluster_state_bits = cirq.GridQubit.rect(1, 8) readout_operators = cirq.Z(cluster_state_bits[-1]) # Build a sequential model enacting the logic in 1.3 of this notebook. # Here you are making the static cluster state prep as a part of the AddCircuit and the # "quantum datapoints" are coming in the form of excitation excitation_input = tf.keras.Input(shape=(), dtype=tf.dtypes.string) cluster_state = tfq.layers.AddCircuit()( excitation_input, prepend=cluster_state_circuit(cluster_state_bits)) quantum_model = tfq.layers.PQC(create_model_circuit(cluster_state_bits), readout_operators)(cluster_state) qcnn_model = tf.keras.Model(inputs=[excitation_input], outputs=[quantum_model]) # Show the keras plot of the model tf.keras.utils.plot_model(qcnn_model, show_shapes=True, show_layer_names=False, dpi=70)
1.7 Train the model
Train the model over the full batch to simplify this example.
# Generate some training data. train_excitations, train_labels, test_excitations, test_labels = generate_data( cluster_state_bits) # Custom accuracy metric. @tf.function def custom_accuracy(y_true, y_pred): y_true = tf.squeeze(y_true) y_pred = tf.map_fn(lambda x: 1.0 if x >= 0 else -1.0, y_pred) return tf.keras.backend.mean(tf.keras.backend.equal(y_true, y_pred)) qcnn_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), loss=tf.losses.mse, metrics=[custom_accuracy]) history = qcnn_model.fit(x=train_excitations, y=train_labels, batch_size=16, epochs=25, verbose=1, validation_data=(test_excitations, test_labels))
Epoch 1/25 7/7 [==============================] – 2s 199ms/step – loss: 0.9405 – custom_accuracy: 0.6786 – val_loss: 0.9161 – val_custom_accuracy: 0.7292 7/7 [==============================] – 1s 164ms/step – loss: 0.8813 – custom_accuracy: 0.6786 – val_loss: 0.8215 – val_custom_accuracy: 0.8333 7/7 [==============================] – 1s 163ms/step – loss: 0.7926 – custom_accuracy: 0.8125 – val_loss: 0.7911 – val_custom_accuracy: 0.7917 7/7 [==============================] – 1s 158ms/step – loss: 0.7783 – custom_accuracy: 0.7946 – val_loss: 0.7514 – val_custom_accuracy: 0.8125 7/7 [==============================] – 1s 155ms/step – loss: 0.7512 – custom_accuracy: 0.7946 – val_loss: 0.7504 – val_custom_accuracy: 0.8958 7/7 [==============================] – 1s 159ms/step – loss: 0.7384 – custom_accuracy: 0.8214 – val_loss: 0.7301 – val_custom_accuracy: 0.8750 7/7 [==============================] – 1s 154ms/step – loss: 0.7057 – custom_accuracy: 0.8036 – val_loss: 0.6739 – val_custom_accuracy: 0.8750 7/7 [==============================] – 1s 154ms/step – loss: 0.6599 – custom_accuracy: 0.8393 – val_loss: 0.6463 – val_custom_accuracy: 0.8958 7/7 [==============================] – 1s 153ms/step – loss: 0.6455 – custom_accuracy: 0.8661 – val_loss: 0.6469 – val_custom_accuracy: 0.9167 7/7 [==============================] – 1s 154ms/step – loss: 0.5925 – custom_accuracy: 0.9107 – val_loss: 0.5196 – val_custom_accuracy: 0.9375 7/7 [==============================] – 1s 154ms/step – loss: 0.4542 – custom_accuracy: 0.9286 – val_loss: 0.3406 – val_custom_accuracy: 0.9583 7/7 [==============================] – 1s 156ms/step – loss: 0.3303 – custom_accuracy: 0.9732 – val_loss: 0.2818 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 154ms/step – loss: 0.2892 – custom_accuracy: 0.9911 – val_loss: 0.2805 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 155ms/step – loss: 0.2885 – custom_accuracy: 0.9911 – val_loss: 0.2801 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 154ms/step – loss: 0.2833 – custom_accuracy: 0.9911 – val_loss: 0.2683 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 155ms/step – loss: 0.2752 – custom_accuracy: 0.9911 – val_loss: 0.2657 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 152ms/step – loss: 0.2708 – custom_accuracy: 0.9911 – val_loss: 0.2683 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 155ms/step – loss: 0.2712 – custom_accuracy: 0.9911 – val_loss: 0.2657 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 153ms/step – loss: 0.2730 – custom_accuracy: 0.9911 – val_loss: 0.2645 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 152ms/step – loss: 0.2750 – custom_accuracy: 0.9911 – val_loss: 0.2681 – val_custom_accuracy: 0.9792 7/7 [==============================] – 1s 155ms/step – loss: 0.2873 – custom_accuracy: 0.9911 – val_loss: 0.2732 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 155ms/step – loss: 0.2866 – custom_accuracy: 0.9911 – val_loss: 0.2751 – val_custom_accuracy: 0.9375 7/7 [==============================] – 1s 153ms/step – loss: 0.2795 – custom_accuracy: 0.9911 – val_loss: 0.2714 – val_custom_accuracy: 1.0000 7/7 [==============================] – 1s 153ms/step – loss: 0.2808 – custom_accuracy: 0.9911 – val_loss: 0.2665 – val_custom_accuracy: 0.9375 7/7 [==============================] – 1s 153ms/step – loss: 0.2758 – custom_accuracy: 0.9911 – val_loss: 0.2706 – val_custom_accuracy: 0.9792
plt.plot(history.history['loss'][1:], label='Training') plt.plot(history.history['val_loss'][1:], label='Validation') plt.title('Training a Quantum CNN to Detect Excited Cluster States') plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend() plt.show()
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a step-by-step implementation of CNN in TensorFlow Framework 2.
CNN Step-by-Step Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the end-to-end implementation of a convolutional neural network in TensorFlow applied to the CIFAR-10 dataset, which is a built-in dataset with the following properties:
- It contains 60.000 32 by 32 color images
- The dataset has 10 different classes
- Each class has 6000 images
- There are overall 50.000 training images
- And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
- The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
- We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
- The first pooling layer will apply a 2×2 max pooling
- The second pooling layer will apply a 2×2 max pooling as well
- The fully connected layer will have 128 units and a ReLU activation function
- Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The built-in dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
- The training images
- The class names
- And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
- The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
- The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
- Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
- The horizontal axis (x) is the number of epochs
- The vertical one (y) is the underlying performance of the model.
- The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e-2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='--', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any fine-tuning and pre-processing, we end up with:
- An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
- And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
- These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
- Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
- On the other hand, it seems to struggle with the remaining classes:
- The classes with the highest off-diagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
- Image augmentation
- Transfer learning using pre-trained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
- Applying different regularization technics such as L1, L2 or dropout.
- Fine-tuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
Making predictions
You can now use this image to run a prediction.
prediction = model.predict(test_image)
When you print this you will see something similar to this.
prediction[0][0] 0.014393696
The question is, how do you interpret this? Remember that the network output layer has just one unit and uses the sigmoid activation function. The output of this network is therefore a number between 0 and 1. That number represents the probability that the image belongs to class 1. Class 1 in this case is dogs. You can therefore set a threshold of say 50% to separate the two classes.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
Since the obtained probability is less than 0.5 then that image is definitely that of a cat.
You can repeat the same process with a dogs image. First, start by downloading the image.
!wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/1/18/Dog_Breeds.jpg \ -O /tmp/dog.jpg
After that, load it while converting it to the required size.
test_image2 = image.load_img(‘/tmp/dog.jpg’, target_size=(200, 200))
Next, expand the dimensions and run the prediction.
test_image2 = np.expand_dims(test_image2, axis=0) prediction = model.predict(test_image2)
Use the same threshold to determine if it is the image of a cat or a dog.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
With an accuracy of 99%, the image is classified as a dog.
Keywords searched by users: convolutional neural network in tensorflow
See more here: kientrucannam.vn
See more: https://kientrucannam.vn/vn/