Model evaluation
You can also check the performance of the model on the validation set.
loss, accuracy = model.evaluate(validation_set) print(‘Accuracy on test dataset:’, accuracy)
Let’s now try the model on new images. The `image` module from Keras will be used to load the image.
import numpy as np from keras.preprocessing import image
Download some images from the internet and store them in a temporary folder. The images used here are provided via the permissive creative commons license.
!wget –nocheckcertificate \ https://upload.wikimedia.org/wikipedia/commons/c/c7/Tabby_cat_with_blue_eyes3336579.jpg \ O /tmp/cat.jpg
Next, load the image while specifying the size used in training.
test_image = image.load_img(‘/tmp/cat.jpg’, target_size=(200, 200))
After this, convert it into an array since the model expects array inputs.
test_image = image.img_to_array(test_image)
The next step is to expand the dimensions of the image in order to include the batch size. Let’s take a look at the shape of the image at the moment.
That needs to be amended to include a batch size of 1, because only one image is being used here. Expanding the dimensions is done using the `expand_dims` function from NumPy.
test_image = np.expand_dims(test_image, axis=0)
If you check the shape again, you will see that it’s in the form required by the model.
How Does CNN Recognize Images?
Consider the following images:
The boxes that are colored represent a pixel value of 1, and 0 if not colored.
When you press backslash (\), the below image gets processed.
When you press forwardslash (/), the below image is processed:
Here is another example to depict how CNN recognizes an image:
As you can see from the above diagram, only those values are lit that have a value of 1.
Dataset and Training Configuration Parameters
Before we describe the model implementation and training, we’re going to apply a little more structure to our training process by using the
dataclasses
module in python to create simple
DatasetConfig
and
TrainingConfig
classes to organize several data and training configuration parameters. This allows us to create data structures for configuration parameters, as shown below. The benefit of doing this is that we have a single place to go to make any desired changes.
@dataclass(frozen=True) class DatasetConfig: NUM_CLASSES: int = 10 IMG_HEIGHT: int = 32 IMG_WIDTH: int = 32 NUM_CHANNELS: int = 3 @dataclass(frozen=True) class TrainingConfig: EPOCHS: int = 31 BATCH_SIZE: int = 256 LEARNING_RATE: float = 0.001
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are littleknown to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick handson tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
Saving and Loading Models
Saving and loading models are very convenient. This enables you to develop and train a model, save it to the file system and then load it at some future time for use. This section will cover the basic operations for saving and loading models.
Saving Models
You can easily save a model using the
save()
method which will save the model to the file system in the ‘SavedModel’ format. This method creates a folder on the file system. Within this folder, the model architecture and training configuration (including the optimizer, losses, and metrics) are stored in
saved_model.pb
. The
variables/
folder contains a standard training checkpoint file that includes the weights of the model. We will delve into these details in later modules. For now, let’s save the trained model, and then we’ll load it in the next code cell with a different name and continue using it in the remainder of the post.
# Using the save() method, the model will be saved to the file system in the ‘SavedModel’ format. model_dropout.save(‘model_dropout’)
INFO:tensorflow:Assets written to: CFIRAR_Classifier/assets
Loading Models
from tensorflow.keras import models reloaded_model_dropout = models.load_model(‘model_dropout’)
Learn More about CNN and Deep Learning
This is how you build a CNN with multiple hidden layers and how to identify a bird using its pixel values. You’ve also completed a demo to classify images across 10 categories using the CIFAR dataset.
You can also enroll in the Artificial Intelligence Course with Caltech University and in collaboration with IBM, and transform yourself into an expert in deep learning techniques using TensorFlow, the opensource software library designed to conduct machine learning and deep neural network research. This PG program in AI and Machine Learning covers Python, Machine Learning, Natural Language Processing, Speech Recognition, Advanced Deep Learning, Computer Vision, and Reinforcement Learning. It will prepare you for one of the world’s most exciting technology frontiers.
Implementing a CNN in TensorFlow & Keras
In this post, we’ll learn how to implement a Convolutional Neural Network (CNN) from scratch using Keras. Here, we show a CNN architecture similar to the structure of VGG16 but with fewer layers. We will learn how to model this architecture and train it on a small dataset called CIFAR10. We’ll also use this as an opportunity to introduce a new layer type called
Dropout
, which is often used in models to mitigate the effects of overfitting.
 Load the CIFAR10 Dataset
 Dataset Preprocessing
 Dataset and Training Configuration Parameters
 CNN Model Implementation in Keras
 Adding Dropout to the Model
 Saving and Loading Models
 Model Evaluation
 Conclusion
import os import random import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten from tensorflow.keras.datasets import cifar10 from tensorflow.keras.utils import to_categorical from matplotlib.ticker import (MultipleLocator, FormatStrFormatter) from dataclasses import dataclass
SEED_VALUE = 42 # Fix seed to make training deterministic. random.seed(SEED_VALUE) np.random.seed(SEED_VALUE) tf.random.set_seed(SEED_VALUE)
Monitoring the model’s performance
Using the history object, the training losses and accuracies can be obtained.
import pandas as pd metrics_df = pd.DataFrame(history.history)
You can plot them in order to see the learning curves. Let’s start by comparing the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
Next, on to the training and validation accuracy.
metrics_df[[“binary_accuracy”,”val_binary_accuracy”]].plot();
This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images. Because this tutorial uses the Keras Sequential API, creating and training your model will take just a few lines of code.
Import TensorFlow
import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt
20231027 06:01:15.153603: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 20231027 06:01:15.153656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 20231027 06:01:15.155401: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Download and prepare the CIFAR10 dataset
The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar10python.tar.gz 170498071/170498071 [==============================] – 2s 0us/step
Verify the data
To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Create the convolutional base
The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure your CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument
input_shape
to your first layer.
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Let’s display the architecture of your model so far:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 ================================================================= Total params: 56320 (220.00 KB) Trainable params: 56320 (220.00 KB) Nontrainable params: 0 (0.00 Byte) _________________________________________________________________
Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
Add Dense layers on top
To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.
model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10))
Here’s the complete architecture of your model:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 10) 650 ================================================================= Total params: 122570 (478.79 KB) Trainable params: 122570 (478.79 KB) Nontrainable params: 0 (0.00 Byte) _________________________________________________________________
The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
Compile and train the model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
Epoch 1/10 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1563/1563 [==============================] – 10s 5ms/step – loss: 1.5211 – accuracy: 0.4429 – val_loss: 1.2497 – val_accuracy: 0.5531 Epoch 2/10 1563/1563 [==============================] – 6s 4ms/step – loss: 1.1408 – accuracy: 0.5974 – val_loss: 1.1474 – val_accuracy: 0.6023 Epoch 3/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.9862 – accuracy: 0.6538 – val_loss: 0.9759 – val_accuracy: 0.6582 Epoch 4/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8929 – accuracy: 0.6879 – val_loss: 0.9412 – val_accuracy: 0.6702 Epoch 5/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8183 – accuracy: 0.7131 – val_loss: 0.8830 – val_accuracy: 0.6967 Epoch 6/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7588 – accuracy: 0.7334 – val_loss: 0.8671 – val_accuracy: 0.7039 Epoch 7/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7126 – accuracy: 0.7518 – val_loss: 0.8972 – val_accuracy: 0.6897 Epoch 8/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6655 – accuracy: 0.7661 – val_loss: 0.8412 – val_accuracy: 0.7111 Epoch 9/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6205 – accuracy: 0.7851 – val_loss: 0.8581 – val_accuracy: 0.7109 Epoch 10/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.5872 – accuracy: 0.7937 – val_loss: 0.8817 – val_accuracy: 0.7113
Evaluate the model
plt.plot(history.history['accuracy'], label='accuracy') plt.plot(history.history['val_accuracy'], label = 'val_accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.ylim([0.5, 1]) plt.legend(loc='lower right') test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
313/313 – 1s – loss: 0.8817 – accuracy: 0.7113 – 655ms/epoch – 2ms/step
print(test_acc)
0.7113000154495239
Your simple CNN has achieved a test accuracy of over 70%. Not bad for a few lines of code! For another CNN style, check out the TensorFlow 2 quickstart for experts example that uses the Keras subclassing API and
tf.GradientTape
.
If you are a software developer who wants to build scalable AIpowered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular opensource framework for machine learning.
Convolutional Neural Networks in TensorFlow
This course is part of DeepLearning.AI TensorFlow Developer Professional Certificate
Taught in English
Some content may not be translated
142,698 already enrolled
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many reallife applications, such as selfdriving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
 Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
 The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
 Multiple pretrained CNN models such as VGG16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached stateoftheart results and can be finetuned on news tasks using a relatively small amount of data.
 CNNs can also be used for nonimage classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
 Convolutional layers
 Rectified Linear Unit (ReLU for short)
 Pooling layers
 Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
 Apply the kernel matrix from the topleft corner to the right.
 Perform elementwise multiplication.
 Sum the values of the products.
 The resulting value corresponds to the first value (topleft corner) in the convoluted matrix.
 Move the kernel down with respect to the size of the sliding window.
 Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be finetuned to detect specific features in the input image.
For instance:
 Averaging neighboring pixels kernel can be used to blur the input image.
 Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn nonlinear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
 Max pooling which is the maximum value of the feature map
 Sum pooling corresponds to the sum of all the values of the feature map
 Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened onedimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for nonlinearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
Adding Dropout to the Model
To help mitigate this problem, we can employ one or more regularization strategies to help the model generalize better. Regularization techniques help to restrict the model’s flexibility so that it doesn’t overfit the training data. One approach is called Dropout, which is built into Keras. Dropout is implemented in Keras as a special layer type that randomly drops a percentage of neurons during the training process. When dropout is used in convolutional layers, it is usually used after the max pooling layer and has the effect of eliminating a percentage of neurons in the feature maps. When used after a fully connected layer, a percentage of neurons in the fully connected layer are dropped.
In the diagram below, we add a
dropout
layer at the end of each convolutional block and also after the dense layer in the classifier. The input argument to the Dropout function is the fraction of neurons to (randomly) drop from the previous layer during the training process.
Define the Model (with Dropout)
def cnn_model_dropout(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dropout(0.5)) model.add(Dense(10, activation=’softmax’)) return model
Create the Model (with Dropout)
# Create the model. model_dropout = cnn_model_dropout() model_dropout.summary()
Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 32, 32, 32) 896 conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d_3 (MaxPooling (None, 16, 16, 32) 0 2D) dropout (Dropout) (None, 16, 16, 32) 0 conv2d_8 (Conv2D) (None, 16, 16, 64) 18496 conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_4 (MaxPooling (None, 8, 8, 64) 0 2D) dropout_1 (Dropout) (None, 8, 8, 64) 0 conv2d_10 (Conv2D) (None, 8, 8, 64) 36928 conv2d_11 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_5 (MaxPooling (None, 4, 4, 64) 0 2D) dropout_2 (Dropout) (None, 4, 4, 64) 0 flatten_1 (Flatten) (None, 1024) 0 dense_2 (Dense) (None, 512) 524800 dropout_3 (Dropout) (None, 512) 0 dense_3 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Nontrainable params: 0 _________________________________________________________________
Compile the Model (with Dropout)
model_dropout.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model (with Dropout)
history = model_dropout.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
20230116 07:38:29.760435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 2.1302 – accuracy: 0.2181 137/137 [==============================] – 5s 31ms/step – loss: 2.1302 – accuracy: 0.2181 – val_loss: 1.9788 – val_accuracy: 0.2755 Epoch 2/31 137/137 [==============================] – 4s 27ms/step – loss: 1.7749 – accuracy: 0.3647 – val_loss: 1.8633 – val_accuracy: 0.3332 Epoch 3/31 137/137 [==============================] – 4s 27ms/step – loss: 1.5430 – accuracy: 0.4442 – val_loss: 1.5015 – val_accuracy: 0.4795 : : Epoch 29/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4626 – accuracy: 0.8359 – val_loss: 0.6721 – val_accuracy: 0.7832 Epoch 30/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4584 – accuracy: 0.8380 – val_loss: 0.6638 – val_accuracy: 0.7847 Epoch 31/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4427 – accuracy: 0.8449 – val_loss: 0.6598 – val_accuracy: 0.7863
Plot the Training Results
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
In the plots above, the training curves align very closely with the validation curves. Also, notice that we achieve a higher validation accuracy than the baseline model that did not contain dropout. Both sets of training plots are shown below for comparison.
Dataset and Training Configuration Parameters
Before we describe the model implementation and training, we’re going to apply a little more structure to our training process by using the
dataclasses
module in python to create simple
DatasetConfig
and
TrainingConfig
classes to organize several data and training configuration parameters. This allows us to create data structures for configuration parameters, as shown below. The benefit of doing this is that we have a single place to go to make any desired changes.
@dataclass(frozen=True) class DatasetConfig: NUM_CLASSES: int = 10 IMG_HEIGHT: int = 32 IMG_WIDTH: int = 32 NUM_CHANNELS: int = 3 @dataclass(frozen=True) class TrainingConfig: EPOCHS: int = 31 BATCH_SIZE: int = 256 LEARNING_RATE: float = 0.001
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
 We can think of a matrice as a tensor with only two dimensions.
 Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
 They can deal with any number of dimensions, which makes them a better fit for multidimensional data.
 Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
 Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
 Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
Installation
!git clone https://github.com/faizan1234567/ConvolutionalNeuralnetworksinTensorFlow cd ConvolutionalNeuralnetworksinTensorFlow
create virtual environment
python m venv –systemsitepackages .\CNN_with_tf
Activate the virtual environment
.\CNN_with_tf\Scripts\Activate
Install packages within a virtual environment without affecting the host system setup. Start by upgrading pip:
pip install –upgrade pip pip list # show packages installed within the virtual environment
And to exit the virtual environment later:
deactivate # don’t exit until you’re done using TensorFlow
Now install TensorFlow
pip install –upgrade tensorflow
To check everything works perfectly, run the following command
python c “import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))”
You are all set to use tensorflow now. Please follow the notebooks to import or install modules.
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are littleknown to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick handson tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
Master Generative AI for CV
Compile the Model
The next step is to compile the model, where we specify the optimizer type and loss function and any additional metrics we would like recorded during training. Here we specify
RMSProp
as the optimizer type for gradient descent, and we use a crossentropy loss function which is the standard loss function for classification problems. We specifically use
categorical_crossentropy
since our labels are onehot encoded. Finally, we specify
accuracy
as an additional metric to record during training. The value of the loss function is always recorded by default, but if you want accuracy, you need to specify it.
model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model
Since the dataset does not include a validation dataset, and since we did not previously split the training dataset to create a validation dataset, we will use the
validation_split
argument below so that 30% of the training dataset is automatically reserved for validation. In this case, this approach reserves the last 30% of the training dataset for validation. This is a very convenient approach, but if the training dataset has any specific ordering (say, ordered by classes), you will need to take steps to randomize the order before splitting.
history = model.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
20230116 07:36:41.659504: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 20230116 07:36:42.049455: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 1.9926 – accuracy: 0.2704
20230116 07:36:45.531645: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – 5s 28ms/step – loss: 1.9926 – accuracy: 0.2704 – val_loss: 1.7339 – val_accuracy: 0.3640 Epoch 2/31 137/137 [==============================] – 3s 25ms/step – loss: 1.5905 – accuracy: 0.4254 – val_loss: 1.4228 – val_accuracy: 0.4887 Epoch 3/31 137/137 [==============================] – 3s 25ms/step – loss: 1.3743 – accuracy: 0.5076 – val_loss: 1.2851 – val_accuracy: 0.5380 : : Epoch 29/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0569 – accuracy: 0.9814 – val_loss: 2.0920 – val_accuracy: 0.7137 Epoch 30/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0543 – accuracy: 0.9837 – val_loss: 2.1936 – val_accuracy: 0.7253 Epoch 31/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0520 – accuracy: 0.9834 – val_loss: 2.1732 – val_accuracy: 0.7227
Plot the Training Results
The function below is a convenience function to plot training and validation losses and training and validation accuracies. It has a single required argument which is a list of metrics to plot.
def plot_results(metrics, title=None, ylabel=None, ylim=None, metric_name=None, color=None): fig, ax = plt.subplots(figsize=(15, 4)) if not (isinstance(metric_name, list) or isinstance(metric_name, tuple)): metrics = [metrics,] metric_name = [metric_name,] for idx, metric in enumerate(metrics): ax.plot(metric, color=color[idx]) plt.xlabel(“Epoch”) plt.ylabel(ylabel) plt.title(title) plt.xlim([0, TrainingConfig.EPOCHS1]) plt.ylim(ylim) # Tailor xaxis tick marks ax.xaxis.set_major_locator(MultipleLocator(5)) ax.xaxis.set_major_formatter(FormatStrFormatter(‘%d’)) ax.xaxis.set_minor_locator(MultipleLocator(1)) plt.grid(True) plt.legend(metric_name) plt.show() plt.close()
The loss and accuracy metrics can be accessed from the
history
object returned from the fit method. We access the metrics using predefined dictionary keys, as shown below.
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
The results from our baseline model reveal that the model is overfitting. Notice that the validation loss increases after about ten epochs of training while the training loss continues to decline. This means that the network learns how to model the training data well but does not generalize to unseen test data well. The accuracy plot shows a similar trend where the validation accuracy levels off after about ten epochs while the training accuracy continues to approach 100% as training progresses. This is a common problem when training neural networks and can occur for a number of reasons. One reason is that the model can fit the nuances of the training dataset, especially when the training dataset is small.
Final remarks
In this article, you have learned CNNs from their intuition to their applications in the real world. You have also seen that you can use existing architectures to hasten your model development process. Specifically, you have covered:
 what convolutional neural networks are
 how convolutional neural networks work
 using pretrained convolutional neural networks to run image classification
 building convolutional neural networks from scratch using Keras and TensorFlow
 how to plot the learning curves of your neural network
 preventing overfitting using DropOut regularization and batch normalization
 saving your best model using the model checkpoint callback
 how to stop the training process of your CNN when it stops improving
 how you can save and load the model again
…just to mention a few.
And that’s not the end of it, you can explore all the examples used in this article on this Google Colab Notebook. Feel free to play with the parameters of the models to see how they affect the performance of the model.
Đăng nhập/Đăng ký
Ranking
Cộng đồng

Kiến thức
19 tháng 03, 2022
Admin
23:49 19/03/2022
Convolutional Neural Network – Tự Học TensorFlow
Cùng tác giả
Không có dữ liệu
0
0
0
Admin
2995 người theo dõi
1283
184
Có liên quan
Không có dữ liệu
Chia sẻ kiến thức – Kết nối tương lai
Về chúng tôi
Về chúng tôi
Giới thiệu
Chính sách bảo mật
Điều khoản dịch vụ
Học miễn phí
Học miễn phí
Khóa học
Luyện tập
Cộng đồng
Cộng đồng
Kiến thức
Tin tức
Hỏi đáp
CÔNG TY CỔ PHẦN CÔNG NGHỆ GIÁO DỤC VÀ DỊCH VỤ BRONTOBYTE
The Manor Central Park, đường Nguyễn Xiển, phường Đại Kim, quận Hoàng Mai, TP. Hà Nội
THÔNG TIN LIÊN HỆ
[email protected]
©2024 TEK4.VN
Copyright © 2024
TEK4.VN
Layers in a Convolutional Neural Network
A convolution neural network has multiple hidden layers that help in extracting information from an image. The four important layers in CNN are:
 Convolution layer
 ReLU layer
 Pooling layer
 Fully connected layer
Convolution Layer
This is the first step in the process of extracting valuable features from an image. A convolution layer has several filters that perform the convolution operation. Every image is considered as a matrix of pixel values.
Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.
ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer.
ReLU performs an elementwise operation and sets all the negative pixels to 0. It introduces nonlinearity to the network, and the generated output is a rectified feature map. Below is the graph of a ReLU function:
The original image is scanned with multiple convolutions and ReLU layers for locating the features.
Pooling Layer
Pooling is a downsampling operation that reduces the dimensionality of the feature map. The rectified feature map now goes through a pooling layer to generate a pooled feature map.
The pooling layer uses various filters to identify different parts of the image like edges, corners, body, feathers, eyes, and beak.
Here’s how the structure of the convolution neural network looks so far:
The next step in the process is called flattening. Flattening is used to convert all the resultant 2Dimensional arrays from pooled feature maps into a single long continuous linear vector.
The flattened matrix is fed as input to the fully connected layer to classify the image.
Here’s how exactly CNN recognizes a bird:
 The pixels from the image are fed to the convolutional layer that performs the convolution operation
 It results in a convolved map
 The convolved map is applied to a ReLU function to generate a rectified feature map
 The image is processed with multiple convolutions and ReLU layers for locating the features
 Different pooling layers with various filters are used to identify specific parts of the image
 The pooled feature map is flattened and fed to a fully connected layer to get the final output
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don’t see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

The course may offer ‘Full Course, No Certificate’ instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page – from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a stepbystep implementation of CNN in TensorFlow Framework 2.
Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a realworld scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An EndtoEnd ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
Artificial Intelligence has come a long way and has been seamlessly bridging the gap between the potential of humans and machines. And data enthusiasts all around the globe work on numerous aspects of AI and turn visions into reality – and one such amazing area is the domain of Computer Vision. This field aims to enable and configure machines to view the world as humans do, and use the knowledge for several tasks and processes (such as Image Recognition, Image Analysis and Classification, and so on). And the advancements in Computer Vision with Deep Learning have been a considerable success, particularly with the Convolutional Neural Network algorithm.
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
Training the model
The next step is to train the model. In this case, `y` is not passed in. That’s taken care of by the function used to generate the training set. Passing the validation data is critical so that the loss and accuracy can be accessed later and plotted. Let’s also reuse the callbacks that were defined in the last section.
history = model.fit(training_set,validation_data=validation_set, epochs=600,callbacks=callbacks)
Convolutional Neural Networks (CNN) in TensorFlow
Now that you understand how convolutional neural networks work, you can start building them using TensorFlow. However, you will first have to install TensorFlow. If you are working on a Google Colab environment, TensorFlow will already be installed.
How to install TensorFlow
TensorFlow can be installed via pip. Run the following command to install it.
pip install tensorflow
Alternatively, you can run TensorFlow in a container.
docker pull tensorflow/tensorflow:latest # Download latest stable image docker run it p 8888:8888 tensorflow/tensorflow:latestjupyter # Start Jupyter server
How to confirm TensorFlow is installed
After installation is complete via pip, you might want to check TensorFlow’s version or confirm its installation. If you manage to import TensorFlow without any errors, then it was installed successfully.
import tensorflow print(tensorflow.__version__)
What are Keras and tf.keras?
As of TensorFlow 2.0, Keras has become the official highlevel API for TensorFlow. It is an opensource package that has been integrated into TensorFlow in order to quicken the process of building deep learning models. It is accessible via `tf.keras`. That is what you will be using in this article.
Develop multilayer CNN models
Let’s now take a look at how you can build a convolutional neural network with Keras and TensorFlow. The CIFAR10 dataset will be used. The dataset contains 60000 32×32 color images in 10 classes, with 6000 images per class.
Develop multilayer CNN models
Loading the dataset can be done directly by using Keras utilities. Other datasets that ship with TensorFlow can be loaded in a similar manner.
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
The dataset contains the following classes
‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’
You can use Matplotlib to visualize one of the images. Let’s visualize the image at index 785.
import matplotlib.pyplot as plt image = X_train[785] plt.imshow(image) plt.show()
That looks like a cat. You can confirm that from the `y_train`. 3 is the label for a cat.
Data preprocessing
The weights of a neural network are initialized to very small numbers. Therefore, scaling the images to be within the same range is important. In this case, let’s scale the values to be numbers between 0 and 1.
X_train = X_train / 255 X_test = X_test / 255
Build the convolutional neural network
The next step is to define the convolutional neural network. Here is where the convolution, pooling, and flattening layers will be applied. The first layer is the `Conv2D`layer. It’s defined with the following parameters:
 32 output filters
 a 3 by 3 feature detector
 `same` padding to result in even padding for the input
 input shape of `(32, 32, 3)` because the images are of size 32 by 32. 3 notifies the network that images are colored
 the `relu` activation function so as to achieve nonlinearity
The next layer is a maxpooling layer defined with the following parameters:
 a `pool_size` of (2, 2) that defines the size of the pooling window
 2 strides that define the number of steps taken by the pooling window
Remember that you can design your network as you like. You just have to monitor the metrics and tweak the design and settle on the one that results in the best performance. In this case, another convolution and pooling layer is created. That is followed by the flatten layer whose results are passed to the dense layer. The final layer has 10 units because the dataset has 10 classes. Since it’s a multiclass problem, the Softmax activation function is applied.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dense(10, activation=”softmax”) ] )
How to visualize a deep learning model
The quickest way to visualize your model is to use the model summary function.
model.summary()
You can also use the Keras `plot_model` utility to plot the model.
tf.keras.utils.plot_model( model, to_file=”model.png”, show_shapes=True, show_layer_names=True, rankdir=”TB”, expand_nested=True, dpi=96, )
How to reduce overfitting with Dropout
One of the common ways to improve the performance of deep learning models is to introduce dropout regularization. In this process, a specified percentage of connections are dropped during the training process. This forces the network to learn patterns from the data instead of memorizing the data. This is what reduces overfitting. In Keras, this can be achieved by introducing a Dropout layer in the network. Here is how the network would look like after applying the DropOut layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=”softmax”) ] )
Compiling the model
The next step is to compile the model. The Sparse Categorical CrossEntropy loss is used because the labels are not onehot encoded. In the event that you want to encode the labels, then you will have to use the Categorical CrossEntropy loss function.
model.compile(optimizer=’adam’, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[‘accuracy’])
How to halt training at the right time with Early Stopping
Left to train for more epochs than needed, your model will most likely overfit on the training set. One of the ways to avoid that is to stop the training process when the model stops improving. This is done by monitoring the loss or the accuracy. In order to achieve that, the Keras EarlyStopping callback is used. By default, the callback monitors the validation loss. Patience is the number of epochs to wait before stopping the training process if there is no improvement in the model loss. This callback will be used at the training stage. The callbacks should be passed as a list, even if it’s just one callback.
from tensorflow.keras.callbacks import EarlyStopping callbacks = [ EarlyStopping(patience=2) ]
How to save the best model automatically
You might also be interested in automatically saving the best model or model weights during training. That can be applied using a Keras ModelCheckpoint callback. The callback will save the best model after each epoch. You can instruct it to save the entire model or just the model weights. By default, it will save the models where the validation loss is minimum.
checkpoint_filepath = ‘/tmp/checkpoint’ model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_filepath, save_weights_only=False, monitor=’loss’, mode=’min’, save_best_only=True)
Your callbacks will now look like this.
callbacks = [ EarlyStopping(patience=2), model_checkpoint_callback, ]
After training, you can load the model again using the Keras `load_model` utility.
another_saved_model = tf.keras.models.load_model(checkpoint_filepath)
Training the model
Let’s now fit the data to the training set. The validation set is passed as well because the callback monitors the validation set. In this case, you can define many epochs but the training process will be stopped by the callback when the loss doesn’t improve after 2 epochs as declared in the EarlyStopping callback.
history = model.fit(X_train,y_train, epochs=600,validation_data=(X_test,y_test),callbacks=callbacks)
How to plot model learning curves
Learning curves are important because they can inform you whether the model is learning or overfitting. If the validation loss increases significantly or the validation accuracy reduces sharply then your model is most likely overfitting. Since the model was saved into a history variable, you can use that to access the losses and accuracy and plot them. You can also store them in a Pandas DataFrame.
import pandas as pd metrics_df = pd.DataFrame(history.history)
Let’s now look at how you would plot the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
The same can be done for the training and validation accuracy.
metrics_df[[“accuracy”,”val_accuracy”]].plot();
How to save and load your model
You might be interested in saving the model for later use. Saving the model is important so that you don’t have to train the model again. This is especially critical for image models that take a long period to train. The H5 format is a common format for saving Keras models.
model.save(“model.h5”)
The Keras `load_model` is used for loading the model again.
load_saved_model = tf.keras.models.load_model(“model.h5”) load_saved_model.summary()
How to accelerate training with Batch Normalization
The network you trained here was relatively small. However, in other cases, you might have to train a very deep neural network. Training such a network can be very slow. The training process can be hastened using Batch Normalization. It transforms the data ensuring that the mean output is closer to zero and the output standard deviation is close to 1. The mean and variance are computed using the current batch of inputs. Since Batch Normalization offers some form of regularization it is usually not used with DropOut. Here’s how the model would look like after adding the batch normalization layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(10, activation=”softmax”) ]
The working of the batch normalization layer is different during training and during prediction and evaluation. During training `trainable=True` while during prediction and evaluation it’s false. When training normalization is done using the mean and standard deviation of the current batch of inputs. At inference i.e prediction and evaluation, normalization is done using a moving average of the mean and the standard deviation of the batches seen during training. When using a pretrained model that contains this layer, training for the batch normalization layer has to be set to false. Otherwise the mean and standard deviation will be disrupted and all the prior learning lost.
Saving and Loading Models
Saving and loading models are very convenient. This enables you to develop and train a model, save it to the file system and then load it at some future time for use. This section will cover the basic operations for saving and loading models.
Saving Models
You can easily save a model using the
save()
method which will save the model to the file system in the ‘SavedModel’ format. This method creates a folder on the file system. Within this folder, the model architecture and training configuration (including the optimizer, losses, and metrics) are stored in
saved_model.pb
. The
variables/
folder contains a standard training checkpoint file that includes the weights of the model. We will delve into these details in later modules. For now, let’s save the trained model, and then we’ll load it in the next code cell with a different name and continue using it in the remainder of the post.
# Using the save() method, the model will be saved to the file system in the ‘SavedModel’ format. model_dropout.save(‘model_dropout’)
INFO:tensorflow:Assets written to: CFIRAR_Classifier/assets
Loading Models
from tensorflow.keras import models reloaded_model_dropout = models.load_model(‘model_dropout’)
Introduction to CNN
Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks like reading zip codes and digits.
Have you ever wondered how facial recognition works on social media, or how object detection helps in building selfdriving cars, or how disease detection is done using visual imagery in healthcare? It’s all possible thanks to convolutional neural networks (CNN). Here’s an example of convolutional neural networks that illustrates how they work:
Imagine there’s an image of a bird, and you want to identify whether it’s really a bird or some other object. The first thing you do is feed the pixels of the image in the form of arrays to the input layer of the neural network (multilayer networks used to classify things). The hidden layers carry out feature extraction by performing different calculations and manipulations. There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature extraction from the image. Finally, there’s a fully connected layer that identifies the object in the image.
Fig: Convolutional Neural Network to identify the image of a bird
CNN Model Implementation in Keras
In this section, we will define a simple CNN model in Keras and train it on the CIRFAR10 dataset. Recall from a previous post the following steps required to define and train a model in Keras.
 Build/Define a network model using predefined layers in Keras.

Compile the model with
model.compile()

Train the model with
model.fit()
Model Structure
Before we get into the coding details, let’s first take a look at the general structure of the model we’re proposing. Notice that the model has a similar structure to VGG16 but has fewer layers and a much smaller input image size, and therefore far fewer trainable parameters. The model contains three convolutional blocks followed by a fully connected layer and an output layer. For reference, we’ve included the number of channels at key points in the architecture. We have also indicated the spatial size of the activation maps at the end of each convolutional block. This is a good visual to refer back to when studying the code below.
For convenience, we’re going to define the model in a function. Notice that the function has one optional argument: the input shape for the model. We first start by instantiating the model by calling the
sequential()
method. This allows us to build a model sequentially by adding one layer at a time. Notice that we define three convolutional blocks and that their structure is very similar.
Define the Convolutional Blocks for the CNN
Let’s start with the very first convolutional layer in the first convolutional block. To define a convolutional layer in Keras, we call the
Conv2D()
function, which takes several input arguments. First, we defined the layer to have 32 filters. The kernel size for each filter is 3 (which is interpreted as 3×3). We use a padding option called
same
, which will pad the input tensor so that the output of the convolution operation has the same spatial size as the input. This is not required, but it’s commonly used. if you don’t explicitly specify this padding option, then the default behavior has no padding, and therefore, the spatial size of output from the convolutional layer will be slightly smaller than the input size. We use a
ReLU
activation function in all the layers in the Network except for the output layer.
For the very first convolutional layer, we need to specify the shape of the input, but for all subsequent layers, this is not necessary since the shape of the input is automatically computed based on the shape of the output from previous layers, so we have two convolutional layers with 32 filters each, and then we follow that with a max pooling layer that has a window size of (2×2), so the output shape from this first convolution block is (16×16 x32). Next, we have the second convolutional block, which is nearly identical to the first, with the exception that we have 64 filters in each convolutional layer instead of 32, and then finally, the third convolutional block is an exact copy of the second convolutional block.
Note
The number of filters in each convolutional layer is something that you will need to experiment with. A larger number of filters allows the model to have a greater learning capacity, but this also needs to be balanced with the amount of data available to train the model. Adding too many filters (or layers) can lead to overfitting, one of the most common issues encountered when training models.
def cnn_model(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dense(10, activation=’softmax’)) return model
Define the Classifier for the CNN
Before we define the fully connected layers for the classifier, we need to first flatten the twodimensional activation maps that are produced by the last convolutional layer (which have a spatial shape of 4×4 with 64 channels). This is accomplished by calling the
flatten()
function to create a 1dimensional vector of length 1024. We then add a densely connected layer with 512 neurons and a fully connected output layer with ten neurons because we have ten classes in our dataset. And to avoid any confusion, we’ve also provided a detailed diagram of the fully connected layers.
Create the Model
We can now create an instance of the model by calling the function above and use the
summary()
method to display the model summary to the console.
# Create the model. model = cnn_model() model.summary()
Metal device set to: Apple M1 Max Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) conv2d_2 (Conv2D) (None, 16, 16, 64) 18496 conv2d_3 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) conv2d_4 (Conv2D) (None, 8, 8, 64) 36928 conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_2 (MaxPooling (None, 4, 4, 64) 0 2D) flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 512) 524800 dense_1 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Nontrainable params: 0 _________________________________________________________________
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an opensource machine learning framework for everyone for several reasons.
 Opensource: released under the Apache 2.0 opensource license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
 Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
 For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, builtin libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
Model definition
Let’s now create the convolutional neural network that will be used to classify the images. It will be similar to the previous one with a few cosmetic changes.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout from tensorflow.keras.preprocessing.image import ImageDataGenerator model = Sequential([ data_augmentation, tf.keras.layers.experimental.preprocessing.Rescaling(1./255), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Conv2D(filters=64,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Flatten(), Dense(128, activation=’relu’), Dropout(0.25), Dense(1, activation=’sigmoid’) ])
The notable changes are:
 the application of the augmentation layer
 using the `Scaling` layer to scale the images in the model definition
What is Convolutional Neural Network?
A convolutional neural network is a feedforward neural network that is generally used to analyze visual images by processing data with gridlike topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.
Below is a neural network that identifies two types of flowers: Orchid and Rose.
In CNN, every image is represented in the form of an array of pixel values.
The convolution operation forms the basis of any convolutional neural network. Let’s understand the convolution operation using two matrices, a and b, of 1 dimension.
a = [5,3,7,5,9,7]
b = [1,2,3]
In convolution operation, the arrays are multiplied elementwise, and the product is summed to create a new array, which represents a*b.
The first three elements of the matrix a are multiplied with the elements of matrix b. The product is summed to get the result.
The next three elements from the matrix a are multiplied by the elements in matrix b, and the product is summed up.
This process continues until the convolution operation is complete.
Master Generative AI for CV
Compile the Model
The next step is to compile the model, where we specify the optimizer type and loss function and any additional metrics we would like recorded during training. Here we specify
RMSProp
as the optimizer type for gradient descent, and we use a crossentropy loss function which is the standard loss function for classification problems. We specifically use
categorical_crossentropy
since our labels are onehot encoded. Finally, we specify
accuracy
as an additional metric to record during training. The value of the loss function is always recorded by default, but if you want accuracy, you need to specify it.
model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model
Since the dataset does not include a validation dataset, and since we did not previously split the training dataset to create a validation dataset, we will use the
validation_split
argument below so that 30% of the training dataset is automatically reserved for validation. In this case, this approach reserves the last 30% of the training dataset for validation. This is a very convenient approach, but if the training dataset has any specific ordering (say, ordered by classes), you will need to take steps to randomize the order before splitting.
history = model.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
20230116 07:36:41.659504: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 20230116 07:36:42.049455: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 1.9926 – accuracy: 0.2704
20230116 07:36:45.531645: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – 5s 28ms/step – loss: 1.9926 – accuracy: 0.2704 – val_loss: 1.7339 – val_accuracy: 0.3640 Epoch 2/31 137/137 [==============================] – 3s 25ms/step – loss: 1.5905 – accuracy: 0.4254 – val_loss: 1.4228 – val_accuracy: 0.4887 Epoch 3/31 137/137 [==============================] – 3s 25ms/step – loss: 1.3743 – accuracy: 0.5076 – val_loss: 1.2851 – val_accuracy: 0.5380 : : Epoch 29/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0569 – accuracy: 0.9814 – val_loss: 2.0920 – val_accuracy: 0.7137 Epoch 30/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0543 – accuracy: 0.9837 – val_loss: 2.1936 – val_accuracy: 0.7253 Epoch 31/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0520 – accuracy: 0.9834 – val_loss: 2.1732 – val_accuracy: 0.7227
Plot the Training Results
The function below is a convenience function to plot training and validation losses and training and validation accuracies. It has a single required argument which is a list of metrics to plot.
def plot_results(metrics, title=None, ylabel=None, ylim=None, metric_name=None, color=None): fig, ax = plt.subplots(figsize=(15, 4)) if not (isinstance(metric_name, list) or isinstance(metric_name, tuple)): metrics = [metrics,] metric_name = [metric_name,] for idx, metric in enumerate(metrics): ax.plot(metric, color=color[idx]) plt.xlabel(“Epoch”) plt.ylabel(ylabel) plt.title(title) plt.xlim([0, TrainingConfig.EPOCHS1]) plt.ylim(ylim) # Tailor xaxis tick marks ax.xaxis.set_major_locator(MultipleLocator(5)) ax.xaxis.set_major_formatter(FormatStrFormatter(‘%d’)) ax.xaxis.set_minor_locator(MultipleLocator(1)) plt.grid(True) plt.legend(metric_name) plt.show() plt.close()
The loss and accuracy metrics can be accessed from the
history
object returned from the fit method. We access the metrics using predefined dictionary keys, as shown below.
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
The results from our baseline model reveal that the model is overfitting. Notice that the validation loss increases after about ten epochs of training while the training loss continues to decline. This means that the network learns how to model the training data well but does not generalize to unseen test data well. The accuracy plot shows a similar trend where the validation accuracy levels off after about ten epochs while the training accuracy continues to approach 100% as training progresses. This is a common problem when training neural networks and can occur for a number of reasons. One reason is that the model can fit the nuances of the training dataset, especially when the training dataset is small.
Model Evaluation
There are several things we can do to evaluate the trained model further. We can compute the model’s accuracy on the test dataset. We can visually inspect the results on a subset of the images in a dataset and plot the confusion matrix for a dataset. Let’s take a look at all three examples.
Evaluate the Model on the Test Dataset
We can now predict the results for all the test images, as shown in the code below. Here, we call the
predict()
method to retrieve all the predictions, and then we select a specific index from the test set and print out the predicted scores for each class. You can experiment with the code below by setting the test index to various values and see how the highest score is usually associated with the correct value indicated by the ground truth.
test_loss, test_acc = reloaded_model_dropout.evaluate(X_test, y_test) print(f”Test accuracy: {test_acc*100:.3f}”)
313/313 [==============================] – 3s 9ms/step – loss: 0.6736 – accuracy: 0.7833 Test accuracy: 78.330
Make Predictions on Sample Test Images
Here we create a convenience function that will allow us to evaluate the model on a subset of images from a dataset and display the results visually.
def evaluate_model(dataset, model): class_names = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’ ] num_rows = 3 num_cols = 6 # Retrieve a number of images from the dataset. data_batch = dataset[0:num_rows*num_cols] # Get predictions from model. predictions = model.predict(data_batch) plt.figure(figsize=(20, 8)) num_matches = 0 for idx in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, idx + 1) plt.axis(“off”) plt.imshow(data_batch[idx]) pred_idx = tf.argmax(predictions[idx]).numpy() truth_idx = np.nonzero(y_test[idx]) title = str(class_names[truth_idx[0][0]]) + ” : ” + str(class_names[pred_idx]) title_obj = plt.title(title, fontdict={‘fontsize’:13}) if pred_idx == truth_idx: num_matches += 1 plt.setp(title_obj, color=’g’) else: plt.setp(title_obj, color=’r’) acc = num_matches/(idx+1) print(“Prediction accuracy: “, int(100*acc)/100) return
evaluate_model(X_test, reloaded_model_dropout)
1/1 [==============================] – 0s 18ms/step Prediction accuracy: 0.77
Confusion Matrix
A confusion matrix is a very common metric that is used to summarize the results of a classification problem. The information is presented in the form of a table or matrix where one axis represents the ground truth labels for each class, and the other axis represents the predicted labels from the network. The entries in the table represent the number of instances from an experiment (which are sometimes represented as percentages rather than counts). Generating a confusion matrix in TensorFlow is accomplished by calling the
function tf.math.confusion_matrix()
, which takes two required arguments: the list of ground truth labels and the associated predicted labels.
# Generate predictions for the test dataset. predictions = reloaded_model_dropout.predict(X_test) # For each sample image in the test dataset, select the class label with the highest probability. predicted_labels = [np.argmax(i) for i in predictions]
313/313 [==============================] – 2s 6ms/step
# Convert onehot encoded labels to integers. y_test_integer_labels = tf.argmax(y_test, axis=1) # Generate a confusion matrix for the test dataset. cm = tf.math.confusion_matrix(labels=y_test_integer_labels, predictions=predicted_labels) # Plot the confusion matrix as a heatmap. plt.figure(figsize=[14, 7]) import seaborn as sn sn.heatmap(cm, annot=True, fmt=’d’, annot_kws={“size”: 12}) plt.title(‘Confusion Matrix’) plt.xlabel(‘Predicted’) plt.ylabel(‘Truth’) plt.show()
A confusion matrix is a contentrich representation of a model’s performance at the class level. It can be very informative to better understand where the model performs well and where it may have more difficulty. For example, a few things stand out right away. Two of the ten classes tend to be misclassified more than others: Dogs and Cat. More specifically, a large percentage of the time, the model confuses these two classes with each other. Let’s take a closer look. The ground truth label for a cat is 3, and the ground truth label for a dog is 5. Notice that when the input image is a cat (index 3), it is often most misclassified as a dog, with 176 misclassified samples. When the input image is a dog (index 5), the most misclassified examples are cats, with 117 samples.
Also, notice that the last row, which represents trucks, is most often confused with automobiles. So all of these observations make intuitive sense, given the similarity of the classes involved.
Load the CIFAR10 Dataset
The CIFAR10 dataset consists of 60,000 color images from 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Several sample images are shown below, along with the class names.
Since the CIFAR10 dataset is included in TensorFlow, so we can load the dataset using the
load_data()
function as shown in the code cell below and confirm the number of samples and the shape of the data.
(X_train, y_train), (X_test, y_test) = cifar10.load_data() print(X_train.shape) print(X_test.shape)
Display Sample Images from the Dataset
It’s always a good idea to inspect some images in a dataset, as shown below. Remember, the images in CIFAR10 are quite small, only 32×32 pixels, so while they don’t have a lot of detail, there’s still enough information in these images to support an image classification task.
plt.figure(figsize=(18, 9)) num_rows = 4 num_cols = 8 # plot each of the images in the batch and the associated ground truth labels. for i in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, i + 1) plt.imshow(X_train[i,:,:]) plt.axis(“off”)
CNN StepbyStep Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the endtoend implementation of a convolutional neural network in TensorFlow applied to the CIFAR10 dataset, which is a builtin dataset with the following properties:
 It contains 60.000 32 by 32 color images
 The dataset has 10 different classes
 Each class has 6000 images
 There are overall 50.000 training images
 And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
 The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
 We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
 The first pooling layer will apply a 2×2 max pooling
 The second pooling layer will apply a 2×2 max pooling as well
 The fully connected layer will have 128 units and a ReLU activation function
 Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The builtin dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
 The training images
 The class names
 And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
 The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
 The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
 Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
 The horizontal axis (x) is the number of epochs
 The vertical one (y) is the underlying performance of the model.
 The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any finetuning and preprocessing, we end up with:
 An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
 And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
 These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
 Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
 On the other hand, it seems to struggle with the remaining classes:
 The classes with the highest offdiagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
 Image augmentation
 Transfer learning using pretrained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
 Applying different regularization technics such as L1, L2 or dropout.
 Finetuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
What are Tensors?
We mainly deal with highdimensional data when building machine learning and deep learning models. Tensors are multidimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
 A 0dimensional tensor contains a single value.
 A 1dimensional tensor, also known as “rank1” tensor is list of values.
 A 2dimensional tensor is a “rank2” tensor.
 Finally, we can have a Ndimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
 The actual value of the tensor.
 The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
 The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
Architectures of CNNs
You don’t always have to design your convolutional neural networks from scratch. Other times one can try architectures developed by experts. These have proven to perform well on many image tasks. Some of these architectures are:
They can be accessed via Keras applications. These applications have also been pretrained on the ImageNet dataset. The dataset contains over a million images. This makes these applications robust enough for use in the real world. When instantiating the model, you have the choice whether to include the pretrained weights or not. When the weights are used, you can start using the model for classification right away. Other ways of using the pretrained models are:
 extracting features and passing them to a new model
 finetuning a new model
Let’s take a look at how you can load the Xception architecture without weights. Since weights are not included, you can use your dataset to train the model.
model = tf.keras.applications.Xception( include_top=True, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
When you load the model with weights, you can start using it for prediction right away. The weights are stored in this location `~/.keras/models/`.
model = tf.keras.applications.Xception( include_top=True, weights=”imagenet”, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
After that, you can process the image and run the predictions. The Keras applications provide a function for doing that. Each of the architectures dictates the size of the image that should be passed to it. You should always confirm that from its documentation. Next, convert the image into an array and expand its dimensions in order to include the batch size.
from tensorflow.keras.preprocessing import image import numpy as np !wget –nocheckcertificate \ https://upload.wikimedia.org/wikipedia/commons/b/b5/Lion_d%27Afrique.jpg \ O /tmp/lion.jpg img_path = ‘/tmp/lion.jpg’ img = image.load_img(img_path, target_size=(299, 299)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = tf.keras.applications.xception.preprocess_input(x) preds = model.predict(x) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print(‘Predicted:’, tf.keras.applications.xception.decode_predictions(preds, top=3)[0])
The final step is to decode the predictions and print the results.
Conclusion
In this post, we learned how to use TensorFlow and Keras to define and train a simple convolutional neural network. We showed that the model overfit the training data, and we learned how to use
dropout
layers to reduce the overfitting and improve the model’s performance on the validation dataset. We also covered how to save and load models to and from the file system. Finally, we reviewed three techniques used to evaluate the model on the test dataset.
Convolutional Neural Networks (CNN) have been used in stateoftheart computer vision tasks such as face detection and selfdriving cars. In this article, let’s take a look at the concepts required to understand CNNs in TensorFlow. Later you will also dive into some TensorFlow CNN examples.
There are 4 modules in this course
In the first course in this specialization, you had an introduction to TensorFlow, and how, with its high level APIs you could do basic image classification, and you learned a little bit about Convolutional Neural Networks (ConvNets). In this course you’ll go deeper into using ConvNets will realworld data, and learn about techniques that you can use to improve your ConvNet performance, particularly when doing image classification!In Week 1, this week, you’ll get started by looking at a much larger dataset than you’ve been using thus far: The Cats and Dogs dataset which had been a Kaggle Challenge in image classification!
What’s included
8 videos8 readings1 quiz1 programming assignment
You’ve heard the term overfitting a number of times to this point. Overfitting is simply the concept of being over specialized in training — namely that your model is very good at classifying what it is trained for, but not so good at classifying things that it hasn’t seen. In order to generalize your model more effectively, you will of course need a greater breadth of samples to train it on. That’s not always possible, but a nice potential shortcut to this is Image Augmentation, where you tweak the training set to potentially increase the diversity of subjects it covers. You’ll learn all about that this week!
What’s included
7 videos7 readings1 quiz1 programming assignment
Building models for yourself is great, and can be very powerful. But, as you’ve seen, you can be limited by the data you have on hand. Not everybody has access to massive datasets or the compute power that’s needed to train them effectively. Transfer learning can help solve this — where people with models trained on large datasets train them, so that you can either use them directly, or, you can use the features that they have learned and apply them to your scenario. This is Transfer learning, and you’ll look into that this week!
What’s included
7 videos5 readings1 quiz1 programming assignment
You’ve come a long way, Congratulations! One more thing to do before we move off of ConvNets to the next module, and that’s to go beyond binary classification. Each of the examples you’ve done so far involved classifying one thing or another — horse or human, cat or dog. When moving beyond binary into Categorical classification there are some coding considerations you need to take into account. You’ll look at them this week!
What’s included
6 videos8 readings1 quiz1 programming assignment
Instructor
Offered by
Recommended if you’re interested in Machine Learning
What are Tensors?
We mainly deal with highdimensional data when building machine learning and deep learning models. Tensors are multidimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
 A 0dimensional tensor contains a single value.
 A 1dimensional tensor, also known as “rank1” tensor is list of values.
 A 2dimensional tensor is a “rank2” tensor.
 Finally, we can have a Ndimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
 The actual value of the tensor.
 The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
 The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
Use case implementation using CNN
We’ll be using the CIFAR10 dataset from the Canadian Institute For Advanced Research for classifying images across 10 categories using CNN.
1. Download the data set:
2. Import the CIFAR data set:
3. Read the label names:
4. Display the images using matplotlib:
5. Use the helper function to handle data:
6. Create the model:
7. Apply the helper functions:
8. Create the layers for convolution and pooling:
9. Create the flattened layer by reshaping the pooling layer:
10. Create a fully connected layer:
11. Set the output to y_pred variable:
12. Apply the loss function:
13. Create the optimizer:
14. Create a variable to initialize all the global variables:
15. Run the model by creating a graph session:
Build deep learning models in TensorFlow and learn the TensorFlow opensource framework with the Deep Learning Course (with Keras &TensorFlow). Enroll now!
Dataset Preprocessing
We normalize the image data to the range
[0,1]
. This is very common when working with image data which helps the model train more efficiently. We also convert the integer labels to onehot encoded labels, as discussed in previous videos.
# Normalize images to the range [0, 1]. X_train = X_train.astype(“float32”) / 255 X_test = X_test.astype(“float32”) / 255 # Change the labels from integer to categorical data. print(‘Original (integer) label for the first training sample: ‘, y_train[0]) # Convert labels to onehot encoding. y_train = to_categorical(y_train) y_test = to_categorical(y_test) print(‘After conversion to categorical onehot encoded labels: ‘, y_train[0])
Conclusion
In this post, we learned how to use TensorFlow and Keras to define and train a simple convolutional neural network. We showed that the model overfit the training data, and we learned how to use
dropout
layers to reduce the overfitting and improve the model’s performance on the validation dataset. We also covered how to save and load models to and from the file system. Finally, we reviewed three techniques used to evaluate the model on the test dataset.
Running CNNs with TensorFlow in the real world
Loading datasets from TensorFlow is quite straightforward. However, consider a situation where you have to load data from the real world. The process for doing so is a little different. In this section, let’s look at how you can use this dataset from Kaggle to build a convolutional neural network. The goal here will be to build a model that can classify images of cats and dogs. Once you have built this model, you can tweak it and repurpose it for other classification problems.
Loading the images
Let’s start by downloading the images into a temporary folder on the virtual machine provided by Google Colab. Using Colab, in this case, is advantageous because you can use GPU compute to speed the model training.
!wget –nocheckcertificate \ https://namespace.co.ke/ml/dataset.zip \ O /tmp/catsdogs.zip
The next step will be to unzip this dataset.
import os import zipfile with zipfile.ZipFile(‘/tmp/catsdogs.zip’, ‘r’) as zip_ref: zip_ref.extractall(‘/tmp/cats_dogs’)
After that set the paths to the training and testing set.
base_dir = ‘/tmp/cats_dogs/dataset’ train_dir = os.path.join(base_dir, ‘training_set’) test_dir = os.path.join(base_dir, ‘test_set’)
You can list the folders in order to see their arrangement.
import os os.listdir(base_dir)
Learner reviews
Showing 3 of 7998
7,998 reviews

5 stars
79.13%

4 stars
15.57%

3 stars
3.51%

2 stars
1.01%

1 star
0.77%
Reviewed on Oct 6, 2020
Reviewed on Aug 1, 2020
Reviewed on May 15, 2019
CNN StepbyStep Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the endtoend implementation of a convolutional neural network in TensorFlow applied to the CIFAR10 dataset, which is a builtin dataset with the following properties:
 It contains 60.000 32 by 32 color images
 The dataset has 10 different classes
 Each class has 6000 images
 There are overall 50.000 training images
 And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
 The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
 We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
 The first pooling layer will apply a 2×2 max pooling
 The second pooling layer will apply a 2×2 max pooling as well
 The fully connected layer will have 128 units and a ReLU activation function
 Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The builtin dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
 The training images
 The class names
 And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
 The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
 The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
 Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
 The horizontal axis (x) is the number of epochs
 The vertical one (y) is the underlying performance of the model.
 The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any finetuning and preprocessing, we end up with:
 An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
 And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
 These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
 Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
 On the other hand, it seems to struggle with the remaining classes:
 The classes with the highest offdiagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
 Image augmentation
 Transfer learning using pretrained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
 Applying different regularization technics such as L1, L2 or dropout.
 Finetuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a realworld scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An EndtoEnd ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
 We can think of a matrice as a tensor with only two dimensions.
 Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
 They can deal with any number of dimensions, which makes them a better fit for multidimensional data.
 Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
 Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
 Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an opensource machine learning framework for everyone for several reasons.
 Opensource: released under the Apache 2.0 opensource license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
 Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
 For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, builtin libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the FashionMNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the FashionMNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving FashionMNIST dataset from Amazon S3 bucket source_url=’http://fashionmnist.s3website.eucentral1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
 Layer with hav 32 filters
 Layer with 64 filters
 Layer with 128 filters
In addition, we’ll have three maxpooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D maxpool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the maxpool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another maxpool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last maxpool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multiclass classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use crossentropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
 Create For loops that define the number of training iterations as specified above
 Create an inner For loop to specify the number of batches we specified above
 Pass training images and labels using variables batch_x and batch_y
 Define x and y placeholders to hold parameters the training images
 After each training iteration, run the loss function and check training accuracy
 After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
Adding Dropout to the Model
To help mitigate this problem, we can employ one or more regularization strategies to help the model generalize better. Regularization techniques help to restrict the model’s flexibility so that it doesn’t overfit the training data. One approach is called Dropout, which is built into Keras. Dropout is implemented in Keras as a special layer type that randomly drops a percentage of neurons during the training process. When dropout is used in convolutional layers, it is usually used after the max pooling layer and has the effect of eliminating a percentage of neurons in the feature maps. When used after a fully connected layer, a percentage of neurons in the fully connected layer are dropped.
In the diagram below, we add a
dropout
layer at the end of each convolutional block and also after the dense layer in the classifier. The input argument to the Dropout function is the fraction of neurons to (randomly) drop from the previous layer during the training process.
Define the Model (with Dropout)
def cnn_model_dropout(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dropout(0.5)) model.add(Dense(10, activation=’softmax’)) return model
Create the Model (with Dropout)
# Create the model. model_dropout = cnn_model_dropout() model_dropout.summary()
Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 32, 32, 32) 896 conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d_3 (MaxPooling (None, 16, 16, 32) 0 2D) dropout (Dropout) (None, 16, 16, 32) 0 conv2d_8 (Conv2D) (None, 16, 16, 64) 18496 conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_4 (MaxPooling (None, 8, 8, 64) 0 2D) dropout_1 (Dropout) (None, 8, 8, 64) 0 conv2d_10 (Conv2D) (None, 8, 8, 64) 36928 conv2d_11 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_5 (MaxPooling (None, 4, 4, 64) 0 2D) dropout_2 (Dropout) (None, 4, 4, 64) 0 flatten_1 (Flatten) (None, 1024) 0 dense_2 (Dense) (None, 512) 524800 dropout_3 (Dropout) (None, 512) 0 dense_3 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Nontrainable params: 0 _________________________________________________________________
Compile the Model (with Dropout)
model_dropout.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )
Train the Model (with Dropout)
history = model_dropout.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )
Epoch 1/31
20230116 07:38:29.760435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
137/137 [==============================] – ETA: 0s – loss: 2.1302 – accuracy: 0.2181 137/137 [==============================] – 5s 31ms/step – loss: 2.1302 – accuracy: 0.2181 – val_loss: 1.9788 – val_accuracy: 0.2755 Epoch 2/31 137/137 [==============================] – 4s 27ms/step – loss: 1.7749 – accuracy: 0.3647 – val_loss: 1.8633 – val_accuracy: 0.3332 Epoch 3/31 137/137 [==============================] – 4s 27ms/step – loss: 1.5430 – accuracy: 0.4442 – val_loss: 1.5015 – val_accuracy: 0.4795 : : Epoch 29/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4626 – accuracy: 0.8359 – val_loss: 0.6721 – val_accuracy: 0.7832 Epoch 30/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4584 – accuracy: 0.8380 – val_loss: 0.6638 – val_accuracy: 0.7847 Epoch 31/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4427 – accuracy: 0.8449 – val_loss: 0.6598 – val_accuracy: 0.7863
Plot the Training Results
# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])
In the plots above, the training curves align very closely with the validation curves. Also, notice that we achieve a higher validation accuracy than the baseline model that did not contain dropout. Both sets of training plots are shown below for comparison.
Model Evaluation
There are several things we can do to evaluate the trained model further. We can compute the model’s accuracy on the test dataset. We can visually inspect the results on a subset of the images in a dataset and plot the confusion matrix for a dataset. Let’s take a look at all three examples.
Evaluate the Model on the Test Dataset
We can now predict the results for all the test images, as shown in the code below. Here, we call the
predict()
method to retrieve all the predictions, and then we select a specific index from the test set and print out the predicted scores for each class. You can experiment with the code below by setting the test index to various values and see how the highest score is usually associated with the correct value indicated by the ground truth.
test_loss, test_acc = reloaded_model_dropout.evaluate(X_test, y_test) print(f”Test accuracy: {test_acc*100:.3f}”)
313/313 [==============================] – 3s 9ms/step – loss: 0.6736 – accuracy: 0.7833 Test accuracy: 78.330
Make Predictions on Sample Test Images
Here we create a convenience function that will allow us to evaluate the model on a subset of images from a dataset and display the results visually.
def evaluate_model(dataset, model): class_names = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’ ] num_rows = 3 num_cols = 6 # Retrieve a number of images from the dataset. data_batch = dataset[0:num_rows*num_cols] # Get predictions from model. predictions = model.predict(data_batch) plt.figure(figsize=(20, 8)) num_matches = 0 for idx in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, idx + 1) plt.axis(“off”) plt.imshow(data_batch[idx]) pred_idx = tf.argmax(predictions[idx]).numpy() truth_idx = np.nonzero(y_test[idx]) title = str(class_names[truth_idx[0][0]]) + ” : ” + str(class_names[pred_idx]) title_obj = plt.title(title, fontdict={‘fontsize’:13}) if pred_idx == truth_idx: num_matches += 1 plt.setp(title_obj, color=’g’) else: plt.setp(title_obj, color=’r’) acc = num_matches/(idx+1) print(“Prediction accuracy: “, int(100*acc)/100) return
evaluate_model(X_test, reloaded_model_dropout)
1/1 [==============================] – 0s 18ms/step Prediction accuracy: 0.77
Confusion Matrix
A confusion matrix is a very common metric that is used to summarize the results of a classification problem. The information is presented in the form of a table or matrix where one axis represents the ground truth labels for each class, and the other axis represents the predicted labels from the network. The entries in the table represent the number of instances from an experiment (which are sometimes represented as percentages rather than counts). Generating a confusion matrix in TensorFlow is accomplished by calling the
function tf.math.confusion_matrix()
, which takes two required arguments: the list of ground truth labels and the associated predicted labels.
# Generate predictions for the test dataset. predictions = reloaded_model_dropout.predict(X_test) # For each sample image in the test dataset, select the class label with the highest probability. predicted_labels = [np.argmax(i) for i in predictions]
313/313 [==============================] – 2s 6ms/step
# Convert onehot encoded labels to integers. y_test_integer_labels = tf.argmax(y_test, axis=1) # Generate a confusion matrix for the test dataset. cm = tf.math.confusion_matrix(labels=y_test_integer_labels, predictions=predicted_labels) # Plot the confusion matrix as a heatmap. plt.figure(figsize=[14, 7]) import seaborn as sn sn.heatmap(cm, annot=True, fmt=’d’, annot_kws={“size”: 12}) plt.title(‘Confusion Matrix’) plt.xlabel(‘Predicted’) plt.ylabel(‘Truth’) plt.show()
A confusion matrix is a contentrich representation of a model’s performance at the class level. It can be very informative to better understand where the model performs well and where it may have more difficulty. For example, a few things stand out right away. Two of the ten classes tend to be misclassified more than others: Dogs and Cat. More specifically, a large percentage of the time, the model confuses these two classes with each other. Let’s take a closer look. The ground truth label for a cat is 3, and the ground truth label for a dog is 5. Notice that when the input image is a cat (index 3), it is often most misclassified as a dog, with 176 misclassified samples. When the input image is a dog (index 5), the most misclassified examples are cats, with 117 samples.
Also, notice that the last row, which represents trucks, is most often confused with automobiles. So all of these observations make intuitive sense, given the similarity of the classes involved.
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many reallife applications, such as selfdriving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
 Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
 The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
 Multiple pretrained CNN models such as VGG16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached stateoftheart results and can be finetuned on news tasks using a relatively small amount of data.
 CNNs can also be used for nonimage classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
 Convolutional layers
 Rectified Linear Unit (ReLU for short)
 Pooling layers
 Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
 Apply the kernel matrix from the topleft corner to the right.
 Perform elementwise multiplication.
 Sum the values of the products.
 The resulting value corresponds to the first value (topleft corner) in the convoluted matrix.
 Move the kernel down with respect to the size of the sliding window.
 Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be finetuned to detect specific features in the input image.
For instance:
 Averaging neighboring pixels kernel can be used to blur the input image.
 Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn nonlinear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
 Max pooling which is the maximum value of the feature map
 Sum pooling corresponds to the sum of all the values of the feature map
 Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened onedimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for nonlinearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
Load the CIFAR10 Dataset
The CIFAR10 dataset consists of 60,000 color images from 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Several sample images are shown below, along with the class names.
Since the CIFAR10 dataset is included in TensorFlow, so we can load the dataset using the
load_data()
function as shown in the code cell below and confirm the number of samples and the shape of the data.
(X_train, y_train), (X_test, y_test) = cifar10.load_data() print(X_train.shape) print(X_test.shape)
Display Sample Images from the Dataset
It’s always a good idea to inspect some images in a dataset, as shown below. Remember, the images in CIFAR10 are quite small, only 32×32 pixels, so while they don’t have a lot of detail, there’s still enough information in these images to support an image classification task.
plt.figure(figsize=(18, 9)) num_rows = 4 num_cols = 8 # plot each of the images in the batch and the associated ground truth labels. for i in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, i + 1) plt.imshow(X_train[i,:,:]) plt.axis(“off”)
Dataset Preprocessing
We normalize the image data to the range
[0,1]
. This is very common when working with image data which helps the model train more efficiently. We also convert the integer labels to onehot encoded labels, as discussed in previous videos.
# Normalize images to the range [0, 1]. X_train = X_train.astype(“float32”) / 255 X_test = X_test.astype(“float32”) / 255 # Change the labels from integer to categorical data. print(‘Original (integer) label for the first training sample: ‘, y_train[0]) # Convert labels to onehot encoding. y_train = to_categorical(y_train) y_test = to_categorical(y_test) print(‘After conversion to categorical onehot encoded labels: ‘, y_train[0])
Making predictions
You can now use this image to run a prediction.
prediction = model.predict(test_image)
When you print this you will see something similar to this.
prediction[0][0] 0.014393696
The question is, how do you interpret this? Remember that the network output layer has just one unit and uses the sigmoid activation function. The output of this network is therefore a number between 0 and 1. That number represents the probability that the image belongs to class 1. Class 1 in this case is dogs. You can therefore set a threshold of say 50% to separate the two classes.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
Since the obtained probability is less than 0.5 then that image is definitely that of a cat.
You can repeat the same process with a dogs image. First, start by downloading the image.
!wget –nocheckcertificate \ https://upload.wikimedia.org/wikipedia/commons/1/18/Dog_Breeds.jpg \ O /tmp/dog.jpg
After that, load it while converting it to the required size.
test_image2 = image.load_img(‘/tmp/dog.jpg’, target_size=(200, 200))
Next, expand the dimensions and run the prediction.
test_image2 = np.expand_dims(test_image2, axis=0) prediction = model.predict(test_image2)
Use the same threshold to determine if it is the image of a cat or a dog.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
With an accuracy of 99%, the image is classified as a dog.
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
 Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
 No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
 A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Implementing a CNN in TensorFlow & Keras
In this post, we’ll learn how to implement a Convolutional Neural Network (CNN) from scratch using Keras. Here, we show a CNN architecture similar to the structure of VGG16 but with fewer layers. We will learn how to model this architecture and train it on a small dataset called CIFAR10. We’ll also use this as an opportunity to introduce a new layer type called
Dropout
, which is often used in models to mitigate the effects of overfitting.
 Load the CIFAR10 Dataset
 Dataset Preprocessing
 Dataset and Training Configuration Parameters
 CNN Model Implementation in Keras
 Adding Dropout to the Model
 Saving and Loading Models
 Model Evaluation
 Conclusion
import os import random import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten from tensorflow.keras.datasets import cifar10 from tensorflow.keras.utils import to_categorical from matplotlib.ticker import (MultipleLocator, FormatStrFormatter) from dataclasses import dataclass
SEED_VALUE = 42 # Fix seed to make training deterministic. random.seed(SEED_VALUE) np.random.seed(SEED_VALUE) tf.random.set_seed(SEED_VALUE)
Course Information
In this course, you will explore how to work with realworld images in different shapes and sizes, visualize the journey of an image through convolutions to understand how a computer “sees” information, plot loss and accuracy, and explore strategies to prevent overfitting, including augmentation and dropout. Finally, this course will introduce you to transfer learning and how learned features can be extracted from models.
How to easily run CNN with Tensorflow in cnvrg.io
Now, with cnvrg.io you can run this pipeline without configuring the different platforms which makes it much faster and easier to run. Using cnvrg.io, you can easily track training progress and serve the model as a REST endpoint. First, you can spin up a VS Code workspace inside cnvrg.io to build our training script from the notebook code. You can use the exact code and ensure that the model is saved at the end of the training.
Run your code as an experiment
Next, you can launch this training script as an experiment. cnvrg.io will provision resources to execute the script and monitor the performance automatically. Resource and training metrics are automatically visualized along with the logs, and all files that were written to disk during the experiment are saved as artifacts in cnvrg.io’s object store.
Make predictions in a few clicks
Now that you have your model, you’ll need to create a “predict” function. cnvrg.io makes it easy, by automatically wrapping this function into a productiongrade Flask application equipped with load balancing, autoscaling, monitoring. This file loads the model into memory and uses it in the predict function, which will format the incoming data and return a prediction.
Deploy your predictions to an endpoint
Next, you’ll want to create an endpoint that routes to that function. You could also specify compute resources and autoscaling configurations here too.
Track and monitor your endpoints
cnvrg.io automatically displays metrics such as the number of requests and latency for the endpoint. It also comes with Grafana and Kibana integrated for increased visibility into model
Finally, if you want to trigger retraining and deploying the model as part of a CI/CD pipeline, cnvrg.io provides Flows. The pipeline could programatically trigger the flow via cnvrg.io’s CLI or SDK.
You can test it out in cnvrg.io now by installing cnvrg.io CORE our free community MLOps platform on your Kuberentes here.
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
How do CNNs work?
Although they can be used for other tasks, CNNs are mostly used in tasks involving image data. Each image contains pixel data that can be represented in a numerical form. This numerical representation is what is passed to a CNN. As much as normal artificial neural networks can be used in processing image data, CNNs have proven to perform better, resulting in higher accuracy. Let’s now take a look at how CNNs work.
Convolution
Usually, you will not feed the entire image to a CNN. You will feed the features that are most important in classifying the image. The features are obtained through a process known as convolution. The convolution operation results in what is known as a feature map. It is also referred to as the convolved feature or an activation map. The feature map is obtained by applying a feature detector to the input image. The feature detector is also referred to as a kernel or a filter. The filter is usually a 3 by 3 matrix. However, other types of matrices can be used. The feature map is obtained through an elementwise multiplication of the filter with the matrix representation of the input image. The objective here is to reduce the size of the image being passed to the CNN while maintaining the important features. The filter slides step by step through each of the elements in the input image. These steps are known as strides and can be defined when creating the CNN. When building the CNN you will be able to define the number of filters you want for your network.
Once you obtain the feature map, the Rectified Linear unit is applied in order to prevent the operation from being linear. This is because working with images is not linear.
Pooling
Pooling results in what is known as a pooled feature map. Pooling ensures that the neural network is able to detect features in an image irrespective of their location in an image. This is what is known as spatial invariance. There are several types of pooling, for example, maxpooling average pooling, and min pooling. For instance, in maxpooling a 2 by 2 matrix is slid over the feature map while picking the largest value in a given box.
Pooling ensures that the main features of the image are maintained while reducing the size of the image further. This reduces the amount of information passed to the neural network and hence helps to reduce overfitting.
Flattening
The next step is to flatten the pooled feature map. This involves transforming the entire pooled feature map into a single column that can be passed to the fully connected layer.
Full connection
The flattened feature map is then passed to the input layer of the neural network. The result of that is passed to a fully connected layer. After that, the result of the entire process is emitted by the output layer. An activation function is usually applied depending on the type of classification problem. For binary classifications, the sigmoid activation function will be used whereas the softmax activation function is used for multiclass problems.
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the FashionMNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the FashionMNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving FashionMNIST dataset from Amazon S3 bucket source_url=’http://fashionmnist.s3website.eucentral1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
 Layer with hav 32 filters
 Layer with 64 filters
 Layer with 128 filters
In addition, we’ll have three maxpooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D maxpool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the maxpool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another maxpool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last maxpool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multiclass classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use crossentropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
 Create For loops that define the number of training iterations as specified above
 Create an inner For loop to specify the number of batches we specified above
 Pass training images and labels using variables batch_x and batch_y
 Define x and y placeholders to hold parameters the training images
 After each training iteration, run the loss function and check training accuracy
 After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
CNN Model Implementation in Keras
In this section, we will define a simple CNN model in Keras and train it on the CIRFAR10 dataset. Recall from a previous post the following steps required to define and train a model in Keras.
 Build/Define a network model using predefined layers in Keras.

Compile the model with
model.compile()

Train the model with
model.fit()
Model Structure
Before we get into the coding details, let’s first take a look at the general structure of the model we’re proposing. Notice that the model has a similar structure to VGG16 but has fewer layers and a much smaller input image size, and therefore far fewer trainable parameters. The model contains three convolutional blocks followed by a fully connected layer and an output layer. For reference, we’ve included the number of channels at key points in the architecture. We have also indicated the spatial size of the activation maps at the end of each convolutional block. This is a good visual to refer back to when studying the code below.
For convenience, we’re going to define the model in a function. Notice that the function has one optional argument: the input shape for the model. We first start by instantiating the model by calling the
sequential()
method. This allows us to build a model sequentially by adding one layer at a time. Notice that we define three convolutional blocks and that their structure is very similar.
Define the Convolutional Blocks for the CNN
Let’s start with the very first convolutional layer in the first convolutional block. To define a convolutional layer in Keras, we call the
Conv2D()
function, which takes several input arguments. First, we defined the layer to have 32 filters. The kernel size for each filter is 3 (which is interpreted as 3×3). We use a padding option called
same
, which will pad the input tensor so that the output of the convolution operation has the same spatial size as the input. This is not required, but it’s commonly used. if you don’t explicitly specify this padding option, then the default behavior has no padding, and therefore, the spatial size of output from the convolutional layer will be slightly smaller than the input size. We use a
ReLU
activation function in all the layers in the Network except for the output layer.
For the very first convolutional layer, we need to specify the shape of the input, but for all subsequent layers, this is not necessary since the shape of the input is automatically computed based on the shape of the output from previous layers, so we have two convolutional layers with 32 filters each, and then we follow that with a max pooling layer that has a window size of (2×2), so the output shape from this first convolution block is (16×16 x32). Next, we have the second convolutional block, which is nearly identical to the first, with the exception that we have 64 filters in each convolutional layer instead of 32, and then finally, the third convolutional block is an exact copy of the second convolutional block.
Note
The number of filters in each convolutional layer is something that you will need to experiment with. A larger number of filters allows the model to have a greater learning capacity, but this also needs to be balanced with the amount of data available to train the model. Adding too many filters (or layers) can lead to overfitting, one of the most common issues encountered when training models.
def cnn_model(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dense(10, activation=’softmax’)) return model
Define the Classifier for the CNN
Before we define the fully connected layers for the classifier, we need to first flatten the twodimensional activation maps that are produced by the last convolutional layer (which have a spatial shape of 4×4 with 64 channels). This is accomplished by calling the
flatten()
function to create a 1dimensional vector of length 1024. We then add a densely connected layer with 512 neurons and a fully connected output layer with ten neurons because we have ten classes in our dataset. And to avoid any confusion, we’ve also provided a detailed diagram of the fully connected layers.
Create the Model
We can now create an instance of the model by calling the function above and use the
summary()
method to display the model summary to the console.
# Create the model. model = cnn_model() model.summary()
Metal device set to: Apple M1 Max Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) conv2d_2 (Conv2D) (None, 16, 16, 64) 18496 conv2d_3 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) conv2d_4 (Conv2D) (None, 8, 8, 64) 36928 conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_2 (MaxPooling (None, 4, 4, 64) 0 2D) flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 512) 524800 dense_1 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Nontrainable params: 0 _________________________________________________________________
Generates a tf.data.Dataset
The next step is to create a TensorFlow dataset from the images. That can be done using the `image_dataset_from_directory`. Since it will infer the classes from the folder, your data should be structured as shown below.
When using the function to generate the dataset, you will need to define the following parameters:
 the path to the data
 an optional seed for shuffling and transformations
 the `image_size` is the size the images will be resized to after being loaded from the disk
 since this is a binary classification problem the `label_mode` is binary
 `batch_size=32` means that the images will be loaded in batches of 32
In the absence of a validation set, you can also define a `validation_split`. If it is set, the `subset` also needs to be passed. That is to indicate whether the split is a validation or training split. In this case, let’s use the testing set for validation.
training_set = tf.keras.preprocessing.image_dataset_from_directory( train_dir, seed=101, image_size=(200, 200), batch_size=32)
By default, the classes will be represented using integers. You can see the representation by using `class_names` of the generated training set.
class_names = training_set.class_names
In this case, cats will be represented by 0 and dogs by 1.This is based on the directory structure of the dataset. Since `class_names` isn’t specified, the alphanumerical order will be used.
Generate the validation split as well. The arguments are similar to the training set;
 the directory containing the images
 an optional seed
 how to resize the images
 the size of the batches
validation_set = tf.keras.preprocessing.image_dataset_from_directory( test_dir, seed=101, image_size=(200, 200), batch_size=32)
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
 Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
 No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
 A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a stepbystep implementation of CNN in TensorFlow Framework 2.
Data augmentation
Data augmentation is usually applied in order to prevent overfitting. Augmenting the images increases the dataset as well as exposes the model to various aspects of the data. Augmentation can be achieved by applying random transformations such as flipping and rotating the images. Fortunately, Keras provides layers that can do just that.
data_augmentation = keras.Sequential( [ tf.keras.layers.experimental.preprocessing.RandomFlip(“horizontal”, input_shape=(200, 200, 3)), tf.keras.layers.experimental.preprocessing.RandomRotation(0.2), tf.keras.layers.experimental.preprocessing.RandomZoom(0.2), ] )
Keywords searched by users: convolutional neural networks in tensorflow
See more here: kientrucannam.vn
See more: https://kientrucannam.vn/vn/