Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a real-world scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An End-to-End ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
Artificial Intelligence has come a long way and has been seamlessly bridging the gap between the potential of humans and machines. And data enthusiasts all around the globe work on numerous aspects of AI and turn visions into reality – and one such amazing area is the domain of Computer Vision. This field aims to enable and configure machines to view the world as humans do, and use the knowledge for several tasks and processes (such as Image Recognition, Image Analysis and Classification, and so on). And the advancements in Computer Vision with Deep Learning have been a considerable success, particularly with the Convolutional Neural Network algorithm.
Learner reviews
Showing 3 of 7998
7,998 reviews
-
5 stars
79.13%
-
4 stars
15.57%
-
3 stars
3.51%
-
2 stars
1.01%
-
1 star
0.77%
Reviewed on Oct 6, 2020
Reviewed on Aug 1, 2020
Reviewed on May 15, 2019
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an open-source machine learning framework for everyone for several reasons.
- Open-source: released under the Apache 2.0 open-source license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
- Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
- For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, built-in libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
Generates a tf.data.Dataset
The next step is to create a TensorFlow dataset from the images. That can be done using the `image_dataset_from_directory`. Since it will infer the classes from the folder, your data should be structured as shown below.
When using the function to generate the dataset, you will need to define the following parameters:
- the path to the data
- an optional seed for shuffling and transformations
- the `image_size` is the size the images will be resized to after being loaded from the disk
- since this is a binary classification problem the `label_mode` is binary
- `batch_size=32` means that the images will be loaded in batches of 32
In the absence of a validation set, you can also define a `validation_split`. If it is set, the `subset` also needs to be passed. That is to indicate whether the split is a validation or training split. In this case, let’s use the testing set for validation.
training_set = tf.keras.preprocessing.image_dataset_from_directory( train_dir, seed=101, image_size=(200, 200), batch_size=32)
By default, the classes will be represented using integers. You can see the representation by using `class_names` of the generated training set.
class_names = training_set.class_names
In this case, cats will be represented by 0 and dogs by 1.This is based on the directory structure of the dataset. Since `class_names` isn’t specified, the alphanumerical order will be used.
Generate the validation split as well. The arguments are similar to the training set;
- the directory containing the images
- an optional seed
- how to resize the images
- the size of the batches
validation_set = tf.keras.preprocessing.image_dataset_from_directory( test_dir, seed=101, image_size=(200, 200), batch_size=32)
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many real-life applications, such as self-driving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
- Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
- The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
- Multiple pre-trained CNN models such as VGG-16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached state-of-the-art results and can be fine-tuned on news tasks using a relatively small amount of data.
- CNNs can also be used for non-image classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
- Convolutional layers
- Rectified Linear Unit (ReLU for short)
- Pooling layers
- Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
- Apply the kernel matrix from the top-left corner to the right.
- Perform element-wise multiplication.
- Sum the values of the products.
- The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
- Move the kernel down with respect to the size of the sliding window.
- Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be fine-tuned to detect specific features in the input image.
For instance:
- Averaging neighboring pixels kernel can be used to blur the input image.
- Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
- Max pooling which is the maximum value of the feature map
- Sum pooling corresponds to the sum of all the values of the feature map
- Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
Course Information
In this course, you will explore how to work with real-world images in different shapes and sizes, visualize the journey of an image through convolutions to understand how a computer “sees” information, plot loss and accuracy, and explore strategies to prevent overfitting, including augmentation and dropout. Finally, this course will introduce you to transfer learning and how learned features can be extracted from models.
Create the model
The following 3D convolutional neural network model is based off the paper A Closer Look at Spatiotemporal Convolutions for Action Recognition by D. Tran et al. (2017). The paper compares several versions of 3D ResNets. Instead of operating on a single image with dimensions
(height, width)
, like standard ResNets, these operate on video volume
(time, height, width)
. The most obvious approach to this problem would be replace each 2D convolution (
layers.Conv2D
) with a 3D convolution (
layers.Conv3D
).
This tutorial uses a (2 + 1)D convolution with residual connections. The (2 + 1)D convolution allows for the decomposition of the spatial and temporal dimensions, therefore creating two separate steps. An advantage of this approach is that factorizing the convolutions into spatial and temporal dimensions saves parameters.
For each output location a 3D convolution combines all the vectors from a 3D patch of the volume to create one vector in the output volume.
This operation is takes
time * height * width * channels
inputs and produces
channels
outputs (assuming the number of input and output channels are the same. So a 3D convolution layer with a kernel size of
(3 x 3 x 3)
would need a weight-matrix with
27 * channels ** 2
entries. The reference paper found that a more effective & efficient approach was to factorize the convolution. Instead of a single 3D convolution to process the time and space dimensions, they proposed a “(2+1)D” convolution which processes the space and time dimensions separately. The figure below shows the factored spatial and temporal convolutions of a (2 + 1)D convolution.
The main advantage of this approach is that it reduces the number of parameters. In the (2 + 1)D convolution the spatial convolution takes in data of the shape
(1, width, height)
, while the temporal convolution takes in data of the shape
(time, 1, 1)
. For example, a (2 + 1)D convolution with kernel size
(3 x 3 x 3)
would need weight matrices of size
(9 * channels**2) + (3 * channels**2)
, less than half as many as the full 3D convolution. This tutorial implements (2 + 1)D ResNet18, where each convolution in the resnet is replaced by a (2+1)D convolution.
# Define the dimensions of one frame in the set of frames created HEIGHT = 224 WIDTH = 224
class Conv2Plus1D(keras.layers.Layer): def __init__(self, filters, kernel_size, padding): """ A sequence of convolutional layers that first apply the convolution operation over the spatial dimensions, and then the temporal dimension. """ super().__init__() self.seq = keras.Sequential([ # Spatial decomposition layers.Conv3D(filters=filters, kernel_size=(1, kernel_size[1], kernel_size[2]), padding=padding), # Temporal decomposition layers.Conv3D(filters=filters, kernel_size=(kernel_size[0], 1, 1), padding=padding) ]) def call(self, x): return self.seq(x)
A ResNet model is made from a sequence of residual blocks. A residual block has two branches. The main branch performs the calculation, but is difficult for gradients to flow through. The residual branch bypasses the main calculation and mostly just adds the input to the output of the main branch. Gradients flow easily through this branch. Therefore, an easy path from the loss function to any of the residual block’s main branch will be present. This avoids the vanishing gradient problem.
Create the main branch of the residual block with the following class. In contrast to the standard ResNet structure this uses the custom
Conv2Plus1D
layer instead of
layers.Conv2D
.
class ResidualMain(keras.layers.Layer): """ Residual block of the model with convolution, layer normalization, and the activation function, ReLU. """ def __init__(self, filters, kernel_size): super().__init__() self.seq = keras.Sequential([ Conv2Plus1D(filters=filters, kernel_size=kernel_size, padding='same'), layers.LayerNormalization(), layers.ReLU(), Conv2Plus1D(filters=filters, kernel_size=kernel_size, padding='same'), layers.LayerNormalization() ]) def call(self, x): return self.seq(x)
To add the residual branch to the main branch it needs to have the same size. The
Project
layer below deals with cases where the number of channels is changed on the branch. In particular, a sequence of densely-connected layer followed by normalization is added.
class Project(keras.layers.Layer): """ Project certain dimensions of the tensor as the data is passed through different sized filters and downsampled. """ def __init__(self, units): super().__init__() self.seq = keras.Sequential([ layers.Dense(units), layers.LayerNormalization() ]) def call(self, x): return self.seq(x)
Use
add_residual_block
to introduce a skip connection between the layers of the model.
def add_residual_block(input, filters, kernel_size): """ Add residual blocks to the model. If the last dimensions of the input data and filter size does not match, project it such that last dimension matches. """ out = ResidualMain(filters, kernel_size)(input) res = input # Using the Keras functional APIs, project the last dimension of the tensor to # match the new filter size if out.shape[-1] != input.shape[-1]: res = Project(out.shape[-1])(res) return layers.add([res, out])
Resizing the video is necessary to perform downsampling of the data. In particular, downsampling the video frames allow for the model to examine specific parts of frames to detect patterns that may be specific to a certain action. Through downsampling, non-essential information can be discarded. Moreoever, resizing the video will allow for dimensionality reduction and therefore faster processing through the model.
class ResizeVideo(keras.layers.Layer): def __init__(self, height, width): super().__init__() self.height = height self.width = width self.resizing_layer = layers.Resizing(self.height, self.width) def call(self, video): """ Use the einops library to resize the tensor. Args: video: Tensor representation of the video, in the form of a set of frames. Return: A downsampled size of the video according to the new height and width it should be resized to. """ # b stands for batch size, t stands for time, h stands for height, # w stands for width, and c stands for the number of channels. old_shape = einops.parse_shape(video, 'b t h w c') images = einops.rearrange(video, 'b t h w c -> (b t) h w c') images = self.resizing_layer(images) videos = einops.rearrange( images, '(b t) h w c -> b t h w c', t = old_shape['t']) return videos
Use the Keras functional API to build the residual network.
input_shape = (None, 10, HEIGHT, WIDTH, 3) input = layers.Input(shape=(input_shape[1:])) x = input x = Conv2Plus1D(filters=16, kernel_size=(3, 7, 7), padding='same')(x) x = layers.BatchNormalization()(x) x = layers.ReLU()(x) x = ResizeVideo(HEIGHT // 2, WIDTH // 2)(x) # Block 1 x = add_residual_block(x, 16, (3, 3, 3)) x = ResizeVideo(HEIGHT // 4, WIDTH // 4)(x) # Block 2 x = add_residual_block(x, 32, (3, 3, 3)) x = ResizeVideo(HEIGHT // 8, WIDTH // 8)(x) # Block 3 x = add_residual_block(x, 64, (3, 3, 3)) x = ResizeVideo(HEIGHT // 16, WIDTH // 16)(x) # Block 4 x = add_residual_block(x, 128, (3, 3, 3)) x = layers.GlobalAveragePooling3D()(x) x = layers.Flatten()(x) x = layers.Dense(10)(x) model = keras.Model(input, x)
frames, label = next(iter(train_ds)) model.build(frames)
# Visualize the model keras.utils.plot_model(model, expand_nested=True, dpi=60, show_shapes=True)
CNN Step-by-Step Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the end-to-end implementation of a convolutional neural network in TensorFlow applied to the CIFAR-10 dataset, which is a built-in dataset with the following properties:
- It contains 60.000 32 by 32 color images
- The dataset has 10 different classes
- Each class has 6000 images
- There are overall 50.000 training images
- And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
- The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
- We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
- The first pooling layer will apply a 2×2 max pooling
- The second pooling layer will apply a 2×2 max pooling as well
- The fully connected layer will have 128 units and a ReLU activation function
- Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The built-in dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
- The training images
- The class names
- And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
- The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
- The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
- Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
- The horizontal axis (x) is the number of epochs
- The vertical one (y) is the underlying performance of the model.
- The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e-2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='--', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any fine-tuning and pre-processing, we end up with:
- An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
- And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
- These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
- Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
- On the other hand, it seems to struggle with the remaining classes:
- The classes with the highest off-diagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
- Image augmentation
- Transfer learning using pre-trained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
- Applying different regularization technics such as L1, L2 or dropout.
- Fine-tuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don’t see the audit option:
-
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
-
The course may offer ‘Full Course, No Certificate’ instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page – from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.
Evaluate the model
Use Keras
Model.evaluate
to get the loss and accuracy on the test dataset.
model.evaluate(test_ds, return_dict=True)
13/13 [==============================] – 15s 1s/step – loss: 0.8879 – accuracy: 0.7000 {‘loss’: 0.8878847360610962, ‘accuracy’: 0.699999988079071}
To visualize model performance further, use a confusion matrix. The confusion matrix allows you to assess the performance of the classification model beyond accuracy. In order to build the confusion matrix for this multi-class classification problem, get the actual values in the test set and the predicted values.
def get_actual_predicted_labels(dataset): """ Create a list of actual ground truth values and the predictions from the model. Args: dataset: An iterable data structure, such as a TensorFlow Dataset, with features and labels. Return: Ground truth and predicted values for a particular dataset. """ actual = [labels for _, labels in dataset.unbatch()] predicted = model.predict(dataset) actual = tf.stack(actual, axis=0) predicted = tf.concat(predicted, axis=0) predicted = tf.argmax(predicted, axis=1) return actual, predicted
def plot_confusion_matrix(actual, predicted, labels, ds_type): cm = tf.math.confusion_matrix(actual, predicted) ax = sns.heatmap(cm, annot=True, fmt='g') sns.set(rc={'figure.figsize':(12, 12)}) sns.set(font_scale=1.4) ax.set_title('Confusion matrix of action recognition for ' + ds_type) ax.set_xlabel('Predicted Action') ax.set_ylabel('Actual Action') plt.xticks(rotation=90) plt.yticks(rotation=0) ax.xaxis.set_ticklabels(labels) ax.yaxis.set_ticklabels(labels)
fg = FrameGenerator(subset_paths['train'], n_frames, training=True) labels = list(fg.class_ids_for_name.keys())
actual, predicted = get_actual_predicted_labels(train_ds) plot_confusion_matrix(actual, predicted, labels, 'training')
38/38 [==============================] – 46s 1s/step
actual, predicted = get_actual_predicted_labels(test_ds) plot_confusion_matrix(actual, predicted, labels, 'test')
13/13 [==============================] – 15s 1s/step
The precision and recall values for each class can also be calculated using a confusion matrix.
def calculate_classification_metrics(y_actual, y_pred, labels): """ Calculate the precision and recall of a classification model using the ground truth and predicted values. Args: y_actual: Ground truth labels. y_pred: Predicted labels. labels: List of classification labels. Return: Precision and recall measures. """ cm = tf.math.confusion_matrix(y_actual, y_pred) tp = np.diag(cm) # Diagonal represents true positives precision = dict() recall = dict() for i in range(len(labels)): col = cm[:, i] fp = np.sum(col) - tp[i] # Sum of column minus true positive is false negative row = cm[i, :] fn = np.sum(row) - tp[i] # Sum of row minus true positive, is false negative precision[labels[i]] = tp[i] / (tp[i] + fp) # Precision recall[labels[i]] = tp[i] / (tp[i] + fn) # Recall return precision, recall
precision, recall = calculate_classification_metrics(actual, predicted, labels) # Test dataset
precision
{‘ApplyEyeMakeup’: 0.6666666666666666, ‘ApplyLipstick’: 0.5714285714285714, ‘Archery’: 0.6, ‘BabyCrawling’: 0.5, ‘BalanceBeam’: 0.5714285714285714, ‘BandMarching’: 1.0, ‘BaseballPitch’: 1.0, ‘Basketball’: 0.5, ‘BasketballDunk’: 0.8181818181818182, ‘BenchPress’: 0.8333333333333334}
recall
{‘ApplyEyeMakeup’: 0.6, ‘ApplyLipstick’: 0.4, ‘Archery’: 0.9, ‘BabyCrawling’: 0.8, ‘BalanceBeam’: 0.4, ‘BandMarching’: 0.8, ‘BaseballPitch’: 0.9, ‘Basketball’: 0.3, ‘BasketballDunk’: 0.9, ‘BenchPress’: 1.0}
What is Convolutional Neural Network?
A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with grid-like topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.
Below is a neural network that identifies two types of flowers: Orchid and Rose.
In CNN, every image is represented in the form of an array of pixel values.
The convolution operation forms the basis of any convolutional neural network. Let’s understand the convolution operation using two matrices, a and b, of 1 dimension.
a = [5,3,7,5,9,7]
b = [1,2,3]
In convolution operation, the arrays are multiplied element-wise, and the product is summed to create a new array, which represents a*b.
The first three elements of the matrix a are multiplied with the elements of matrix b. The product is summed to get the result.
The next three elements from the matrix a are multiplied by the elements in matrix b, and the product is summed up.
This process continues until the convolution operation is complete.
Layers in a Convolutional Neural Network
A convolution neural network has multiple hidden layers that help in extracting information from an image. The four important layers in CNN are:
- Convolution layer
- ReLU layer
- Pooling layer
- Fully connected layer
Convolution Layer
This is the first step in the process of extracting valuable features from an image. A convolution layer has several filters that perform the convolution operation. Every image is considered as a matrix of pixel values.
Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.
ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer.
ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map. Below is the graph of a ReLU function:
The original image is scanned with multiple convolutions and ReLU layers for locating the features.
Pooling Layer
Pooling is a down-sampling operation that reduces the dimensionality of the feature map. The rectified feature map now goes through a pooling layer to generate a pooled feature map.
The pooling layer uses various filters to identify different parts of the image like edges, corners, body, feathers, eyes, and beak.
Here’s how the structure of the convolution neural network looks so far:
The next step in the process is called flattening. Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a single long continuous linear vector.
The flattened matrix is fed as input to the fully connected layer to classify the image.
Here’s how exactly CNN recognizes a bird:
- The pixels from the image are fed to the convolutional layer that performs the convolution operation
- It results in a convolved map
- The convolved map is applied to a ReLU function to generate a rectified feature map
- The image is processed with multiple convolutions and ReLU layers for locating the features
- Different pooling layers with various filters are used to identify specific parts of the image
- The pooled feature map is flattened and fed to a fully connected layer to get the final output
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
- Layer with hav 32 filters
- Layer with 64 filters
- Layer with 128 filters
In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
- Create For loops that define the number of training iterations as specified above
- Create an inner For loop to specify the number of batches we specified above
- Pass training images and labels using variables batch_x and batch_y
- Define x and y placeholders to hold parameters the training images
- After each training iteration, run the loss function and check training accuracy
- After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
Learn More about CNN and Deep Learning
This is how you build a CNN with multiple hidden layers and how to identify a bird using its pixel values. You’ve also completed a demo to classify images across 10 categories using the CIFAR dataset.
You can also enroll in the Artificial Intelligence Course with Caltech University and in collaboration with IBM, and transform yourself into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning and deep neural network research. This PG program in AI and Machine Learning covers Python, Machine Learning, Natural Language Processing, Speech Recognition, Advanced Deep Learning, Computer Vision, and Reinforcement Learning. It will prepare you for one of the world’s most exciting technology frontiers.
Convolutional Neural Networks (CNN) have been used in state-of-the-art computer vision tasks such as face detection and self-driving cars. In this article, let’s take a look at the concepts required to understand CNNs in TensorFlow. Later you will also dive into some TensorFlow CNN examples.
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a step-by-step implementation of CNN in TensorFlow Framework 2.
Data augmentation
Data augmentation is usually applied in order to prevent overfitting. Augmenting the images increases the dataset as well as exposes the model to various aspects of the data. Augmentation can be achieved by applying random transformations such as flipping and rotating the images. Fortunately, Keras provides layers that can do just that.
data_augmentation = keras.Sequential( [ tf.keras.layers.experimental.preprocessing.RandomFlip(“horizontal”, input_shape=(200, 200, 3)), tf.keras.layers.experimental.preprocessing.RandomRotation(0.2), tf.keras.layers.experimental.preprocessing.RandomZoom(0.2), ] )
What is the TensorFlow Framework?
Google developed TensorFlow in November 2015. They define it to be an open-source machine learning framework for everyone for several reasons.
- Open-source: released under the Apache 2.0 open-source license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
- Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
- For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, built-in libraries such as Keras make it even easier to create robust deep learning models.
All these functionalities make Tensorflow a good candidate for building neural networks.
Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.
After the installation, we can see that the version being used is the 2.9.1
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Now, let’s further explore the main components for creating those networks.
Disclaimer
Programming assignments solutions has been provided for reference. Please don’t copy and paste, as Programming assignments are fairly simple if you have followed the lectures. This solution is for hints and to keep you motivated to learn. If you are stuck in your programming assignments, please take it as help.
This tutorial demonstrates training a 3D convolutional neural network (CNN) for video classification using the UCF101 action recognition dataset. A 3D CNN uses a three-dimensional filter to perform convolutions. The kernel is able to slide in three directions, whereas in a 2D CNN it can slide in two dimensions. The model is based on the work published in A Closer Look at Spatiotemporal Convolutions for Action Recognition by D. Tran et al. (2017). In this tutorial, you will:
- Build an input pipeline
- Build a 3D convolutional neural network model with residual connections using Keras functional API
- Train the model
- Evaluate and test the model
This video classification tutorial is the second part in a series of TensorFlow video tutorials. Here are the other three tutorials:
- Load video data: This tutorial explains much of the code used in this document.
- MoViNet for streaming action recognition: Get familiar with the MoViNet models that are available on TF Hub.
- Transfer learning for video classification with MoViNet: This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset.
How do CNNs work?
Although they can be used for other tasks, CNNs are mostly used in tasks involving image data. Each image contains pixel data that can be represented in a numerical form. This numerical representation is what is passed to a CNN. As much as normal artificial neural networks can be used in processing image data, CNNs have proven to perform better, resulting in higher accuracy. Let’s now take a look at how CNNs work.
Convolution
Usually, you will not feed the entire image to a CNN. You will feed the features that are most important in classifying the image. The features are obtained through a process known as convolution. The convolution operation results in what is known as a feature map. It is also referred to as the convolved feature or an activation map. The feature map is obtained by applying a feature detector to the input image. The feature detector is also referred to as a kernel or a filter. The filter is usually a 3 by 3 matrix. However, other types of matrices can be used. The feature map is obtained through an element-wise multiplication of the filter with the matrix representation of the input image. The objective here is to reduce the size of the image being passed to the CNN while maintaining the important features. The filter slides step by step through each of the elements in the input image. These steps are known as strides and can be defined when creating the CNN. When building the CNN you will be able to define the number of filters you want for your network.
Once you obtain the feature map, the Rectified Linear unit is applied in order to prevent the operation from being linear. This is because working with images is not linear.
Pooling
Pooling results in what is known as a pooled feature map. Pooling ensures that the neural network is able to detect features in an image irrespective of their location in an image. This is what is known as spatial invariance. There are several types of pooling, for example, max-pooling average pooling, and min pooling. For instance, in max-pooling a 2 by 2 matrix is slid over the feature map while picking the largest value in a given box.
Pooling ensures that the main features of the image are maintained while reducing the size of the image further. This reduces the amount of information passed to the neural network and hence helps to reduce overfitting.
Flattening
The next step is to flatten the pooled feature map. This involves transforming the entire pooled feature map into a single column that can be passed to the fully connected layer.
Full connection
The flattened feature map is then passed to the input layer of the neural network. The result of that is passed to a fully connected layer. After that, the result of the entire process is emitted by the output layer. An activation function is usually applied depending on the type of classification problem. For binary classifications, the sigmoid activation function will be used whereas the softmax activation function is used for multiclass problems.
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
What is TensorFlow CNN?
Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.
CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.
If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.
In this article, you will learn:
Running CNNs with TensorFlow in the real world
Loading datasets from TensorFlow is quite straightforward. However, consider a situation where you have to load data from the real world. The process for doing so is a little different. In this section, let’s look at how you can use this dataset from Kaggle to build a convolutional neural network. The goal here will be to build a model that can classify images of cats and dogs. Once you have built this model, you can tweak it and repurpose it for other classification problems.
Loading the images
Let’s start by downloading the images into a temporary folder on the virtual machine provided by Google Colab. Using Colab, in this case, is advantageous because you can use GPU compute to speed the model training.
!wget –no-check-certificate \ https://namespace.co.ke/ml/dataset.zip \ -O /tmp/catsdogs.zip
The next step will be to unzip this dataset.
import os import zipfile with zipfile.ZipFile(‘/tmp/catsdogs.zip’, ‘r’) as zip_ref: zip_ref.extractall(‘/tmp/cats_dogs’)
After that set the paths to the training and testing set.
base_dir = ‘/tmp/cats_dogs/dataset’ train_dir = os.path.join(base_dir, ‘training_set’) test_dir = os.path.join(base_dir, ‘test_set’)
You can list the folders in order to see their arrangement.
import os os.listdir(base_dir)
This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images. Because this tutorial uses the Keras Sequential API, creating and training your model will take just a few lines of code.
Import TensorFlow
import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt
2023-10-27 06:01:15.153603: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-27 06:01:15.153656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-27 06:01:15.155401: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Download and prepare the CIFAR10 dataset
The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 [==============================] – 2s 0us/step
Verify the data
To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Create the convolutional base
The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure your CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument
input_shape
to your first layer.
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Let’s display the architecture of your model so far:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 ================================================================= Total params: 56320 (220.00 KB) Trainable params: 56320 (220.00 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________
Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
Add Dense layers on top
To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.
model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10))
Here’s the complete architecture of your model:
model.summary()
Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 10) 650 ================================================================= Total params: 122570 (478.79 KB) Trainable params: 122570 (478.79 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________
The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
Compile and train the model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
Epoch 1/10 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1563/1563 [==============================] – 10s 5ms/step – loss: 1.5211 – accuracy: 0.4429 – val_loss: 1.2497 – val_accuracy: 0.5531 Epoch 2/10 1563/1563 [==============================] – 6s 4ms/step – loss: 1.1408 – accuracy: 0.5974 – val_loss: 1.1474 – val_accuracy: 0.6023 Epoch 3/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.9862 – accuracy: 0.6538 – val_loss: 0.9759 – val_accuracy: 0.6582 Epoch 4/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8929 – accuracy: 0.6879 – val_loss: 0.9412 – val_accuracy: 0.6702 Epoch 5/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8183 – accuracy: 0.7131 – val_loss: 0.8830 – val_accuracy: 0.6967 Epoch 6/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7588 – accuracy: 0.7334 – val_loss: 0.8671 – val_accuracy: 0.7039 Epoch 7/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7126 – accuracy: 0.7518 – val_loss: 0.8972 – val_accuracy: 0.6897 Epoch 8/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6655 – accuracy: 0.7661 – val_loss: 0.8412 – val_accuracy: 0.7111 Epoch 9/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6205 – accuracy: 0.7851 – val_loss: 0.8581 – val_accuracy: 0.7109 Epoch 10/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.5872 – accuracy: 0.7937 – val_loss: 0.8817 – val_accuracy: 0.7113
Evaluate the model
plt.plot(history.history['accuracy'], label='accuracy') plt.plot(history.history['val_accuracy'], label = 'val_accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.ylim([0.5, 1]) plt.legend(loc='lower right') test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
313/313 – 1s – loss: 0.8817 – accuracy: 0.7113 – 655ms/epoch – 2ms/step
print(test_acc)
0.7113000154495239
Your simple CNN has achieved a test accuracy of over 70%. Not bad for a few lines of code! For another CNN style, check out the TensorFlow 2 quickstart for experts example that uses the Keras subclassing API and
tf.GradientTape
.
Course
Convolutional Neural Networks (CNN) with TensorFlow Tutorial
Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.
This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a step-by-step implementation of CNN in TensorFlow Framework 2.
There are 4 modules in this course
In the first course in this specialization, you had an introduction to TensorFlow, and how, with its high level APIs you could do basic image classification, and you learned a little bit about Convolutional Neural Networks (ConvNets). In this course you’ll go deeper into using ConvNets will real-world data, and learn about techniques that you can use to improve your ConvNet performance, particularly when doing image classification!In Week 1, this week, you’ll get started by looking at a much larger dataset than you’ve been using thus far: The Cats and Dogs dataset which had been a Kaggle Challenge in image classification!
What’s included
8 videos8 readings1 quiz1 programming assignment
You’ve heard the term overfitting a number of times to this point. Overfitting is simply the concept of being over specialized in training — namely that your model is very good at classifying what it is trained for, but not so good at classifying things that it hasn’t seen. In order to generalize your model more effectively, you will of course need a greater breadth of samples to train it on. That’s not always possible, but a nice potential shortcut to this is Image Augmentation, where you tweak the training set to potentially increase the diversity of subjects it covers. You’ll learn all about that this week!
What’s included
7 videos7 readings1 quiz1 programming assignment
Building models for yourself is great, and can be very powerful. But, as you’ve seen, you can be limited by the data you have on hand. Not everybody has access to massive datasets or the compute power that’s needed to train them effectively. Transfer learning can help solve this — where people with models trained on large datasets train them, so that you can either use them directly, or, you can use the features that they have learned and apply them to your scenario. This is Transfer learning, and you’ll look into that this week!
What’s included
7 videos5 readings1 quiz1 programming assignment
You’ve come a long way, Congratulations! One more thing to do before we move off of ConvNets to the next module, and that’s to go beyond binary classification. Each of the examples you’ve done so far involved classifying one thing or another — horse or human, cat or dog. When moving beyond binary into Categorical classification there are some coding considerations you need to take into account. You’ll look at them this week!
What’s included
6 videos8 readings1 quiz1 programming assignment
Instructor
Offered by
Recommended if you’re interested in Machine Learning
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
- We can think of a matrice as a tensor with only two dimensions.
- Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
- They can deal with any number of dimensions, which makes them a better fit for multi-dimensional data.
- Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
- Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
- Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
How Does CNN Recognize Images?
Consider the following images:
The boxes that are colored represent a pixel value of 1, and 0 if not colored.
When you press backslash (\), the below image gets processed.
When you press forward-slash (/), the below image is processed:
Here is another example to depict how CNN recognizes an image:
As you can see from the above diagram, only those values are lit that have a value of 1.
How to easily run CNN with Tensorflow in cnvrg.io
Now, with cnvrg.io you can run this pipeline without configuring the different platforms which makes it much faster and easier to run. Using cnvrg.io, you can easily track training progress and serve the model as a REST endpoint. First, you can spin up a VS Code workspace inside cnvrg.io to build our training script from the notebook code. You can use the exact code and ensure that the model is saved at the end of the training.
Run your code as an experiment
Next, you can launch this training script as an experiment. cnvrg.io will provision resources to execute the script and monitor the performance automatically. Resource and training metrics are automatically visualized along with the logs, and all files that were written to disk during the experiment are saved as artifacts in cnvrg.io’s object store.
Make predictions in a few clicks
Now that you have your model, you’ll need to create a “predict” function. cnvrg.io makes it easy, by automatically wrapping this function into a production-grade Flask application equipped with load balancing, autoscaling, monitoring. This file loads the model into memory and uses it in the predict function, which will format the incoming data and return a prediction.
Deploy your predictions to an endpoint
Next, you’ll want to create an endpoint that routes to that function. You could also specify compute resources and autoscaling configurations here too.
Track and monitor your endpoints
cnvrg.io automatically displays metrics such as the number of requests and latency for the endpoint. It also comes with Grafana and Kibana integrated for increased visibility into model
Finally, if you want to trigger retraining and deploying the model as part of a CI/CD pipeline, cnvrg.io provides Flows. The pipeline could programatically trigger the flow via cnvrg.io’s CLI or SDK.
You can test it out in cnvrg.io now by installing cnvrg.io CORE our free community MLOps platform on your Kuberentes here.
Training the model
The next step is to train the model. In this case, `y` is not passed in. That’s taken care of by the function used to generate the training set. Passing the validation data is critical so that the loss and accuracy can be accessed later and plotted. Let’s also reuse the callbacks that were defined in the last section.
history = model.fit(training_set,validation_data=validation_set, epochs=600,callbacks=callbacks)
Monitoring the model’s performance
Using the history object, the training losses and accuracies can be obtained.
import pandas as pd metrics_df = pd.DataFrame(history.history)
You can plot them in order to see the learning curves. Let’s start by comparing the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
Next, on to the training and validation accuracy.
metrics_df[[“binary_accuracy”,”val_binary_accuracy”]].plot();
Installation
!git clone https://github.com/faizan1234567/Convolutional-Neural-networks-in-TensorFlow cd Convolutional-Neural-networks-in-TensorFlow
create virtual environment
python -m venv –system-site-packages .\CNN_with_tf
Activate the virtual environment
.\CNN_with_tf\Scripts\Activate
Install packages within a virtual environment without affecting the host system setup. Start by upgrading pip:
pip install –upgrade pip pip list # show packages installed within the virtual environment
And to exit the virtual environment later:
deactivate # don’t exit until you’re done using TensorFlow
Now install TensorFlow
pip install –upgrade tensorflow
To check everything works perfectly, run the following command
python -c “import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))”
You are all set to use tensorflow now. Please follow the notebooks to import or install modules.
Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow
This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.
Loading Data
First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:
# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)
CNN Architecture
We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:
- Layer with hav 32 filters
- Layer with 64 filters
- Layer with 128 filters
In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.
We’ll set basic hyperparameters of the CNN model:
training_iters = 10learning_rate = 0.001batch_size = 128
This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.
Neural Network Parameters
The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.
# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10
Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.
We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.
# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])
Wrapper Functions
Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.
Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.
def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)
Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.
def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)
Now let’s define weights and biases.
weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
Building the CNN
Now we build the CNN by feeding the weights and biases into the wrapper functions.
def conv_net(x, weights, biases):
# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.
conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)
# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.
conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)
# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.
conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)
# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.
fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])
# In this last part, apply the Relu function and perform matrix multiplication on the weights
fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out
Loss and Optimizer Nodes
First build the model using the conv_net() function we showed above. Pass in the following:
x, weights, and biases. pred = conv_net(x, weights, biases)
This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Evaluate the Model
To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.
init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Now you can start the computation graph, and run a training session as follows:
- Create For loops that define the number of training iterations as specified above
- Create an inner For loop to specify the number of batches we specified above
- Pass training images and labels using variables batch_x and batch_y
- Define x and y placeholders to hold parameters the training images
- After each training iteration, run the loss function and check training accuracy
- After running through all the images, test accuracy by processing the 10,000 test images
See the original tutorial for the complete code that you can use to run the CNN model.
What is a CNN?
A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many real-life applications, such as self-driving cars, surveillance cameras, and more, use CNNs.
The importance of CNNs
These are several reasons why CNNs are important, as highlighted below:
- Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
- The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
- Multiple pre-trained CNN models such as VGG-16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached state-of-the-art results and can be fine-tuned on news tasks using a relatively small amount of data.
- CNNs can also be used for non-image classification problems and are not limited to natural language processing, time series analysis, and speech recognition.
Architecture of a CNN
CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:
- Convolutional layers
- Rectified Linear Unit (ReLU for short)
- Pooling layers
- Fully connected layers
Let’s understand how each of these layers works using the following example of classification of the handwritten digit.
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.
Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.
Do we have to manually find these weights?
In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:
- Apply the kernel matrix from the top-left corner to the right.
- Perform element-wise multiplication.
- Sum the values of the products.
- The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
- Move the kernel down with respect to the size of the sliding window.
- Repeat from step 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.
Another name associated with the kernel in the literature is feature detector because the weights can be fine-tuned to detect specific features in the input image.
For instance:
- Averaging neighboring pixels kernel can be used to blur the input image.
- Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
The most common aggregation functions that can be applied are:
- Max pooling which is the maximum value of the feature map
- Sum pooling corresponds to the sum of all the values of the feature map
- Average pooling is the average of all the values.
Below is an illustration of each of the previous example:
Also, the dimension of the feature map becomes smaller as the polling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.
Dropout
Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.
Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.
Setup
Begin by installing and importing some necessary libraries, including:
remotezip to inspect the contents of a ZIP file, tqdm to use a progress bar, OpenCV to process video files, einops for performing more complex tensor operations, and
tensorflow_docs
for embedding data in a Jupyter notebook.
pip install remotezip tqdm opencv-python einops
# Install TensorFlow 2.10
pip install tensorflow==2.10.0
import tqdm import random import pathlib import itertools import collections import cv2 import einops import numpy as np import remotezip as rz import seaborn as sns import matplotlib.pyplot as plt import tensorflow as tf import keras from keras import layers
2023-10-27 01:29:52.291653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.327803: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-10-27 01:29:52.949253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer.so.7’; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.949379: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer_plugin.so.7’; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.949390: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Introduction to CNN
Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks like reading zip codes and digits.
Have you ever wondered how facial recognition works on social media, or how object detection helps in building self-driving cars, or how disease detection is done using visual imagery in healthcare? It’s all possible thanks to convolutional neural networks (CNN). Here’s an example of convolutional neural networks that illustrates how they work:
Imagine there’s an image of a bird, and you want to identify whether it’s really a bird or some other object. The first thing you do is feed the pixels of the image in the form of arrays to the input layer of the neural network (multi-layer networks used to classify things). The hidden layers carry out feature extraction by performing different calculations and manipulations. There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature extraction from the image. Finally, there’s a fully connected layer that identifies the object in the image.
Fig: Convolutional Neural Network to identify the image of a bird
Conclusion
This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.
We applied all these skill sets to a real-world scenario related to a multiclass classification task.
Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.
Python Courses
Course
Intermediate Python
Course
Introduction to Deep Learning in Python
Navigating the World of MLOps Certifications
Multilayer Perceptrons in Machine Learning: A Comprehensive Guide
An End-to-End ML Model Monitoring Workflow with NannyML in Python
Bex Tuychiev
15 min
If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.
Convolutional Neural Networks in TensorFlow
This course is part of DeepLearning.AI TensorFlow Developer Professional Certificate
Taught in English
Some content may not be translated
142,698 already enrolled
Convolutional Neural Networks (CNN) in TensorFlow
Now that you understand how convolutional neural networks work, you can start building them using TensorFlow. However, you will first have to install TensorFlow. If you are working on a Google Colab environment, TensorFlow will already be installed.
How to install TensorFlow
TensorFlow can be installed via pip. Run the following command to install it.
pip install tensorflow
Alternatively, you can run TensorFlow in a container.
docker pull tensorflow/tensorflow:latest # Download latest stable image docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server
How to confirm TensorFlow is installed
After installation is complete via pip, you might want to check TensorFlow’s version or confirm its installation. If you manage to import TensorFlow without any errors, then it was installed successfully.
import tensorflow print(tensorflow.__version__)
What are Keras and tf.keras?
As of TensorFlow 2.0, Keras has become the official high-level API for TensorFlow. It is an open-source package that has been integrated into TensorFlow in order to quicken the process of building deep learning models. It is accessible via `tf.keras`. That is what you will be using in this article.
Develop multilayer CNN models
Let’s now take a look at how you can build a convolutional neural network with Keras and TensorFlow. The CIFAR-10 dataset will be used. The dataset contains 60000 32×32 color images in 10 classes, with 6000 images per class.
Develop multilayer CNN models
Loading the dataset can be done directly by using Keras utilities. Other datasets that ship with TensorFlow can be loaded in a similar manner.
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
The dataset contains the following classes
‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’
You can use Matplotlib to visualize one of the images. Let’s visualize the image at index 785.
import matplotlib.pyplot as plt image = X_train[785] plt.imshow(image) plt.show()
That looks like a cat. You can confirm that from the `y_train`. 3 is the label for a cat.
Data preprocessing
The weights of a neural network are initialized to very small numbers. Therefore, scaling the images to be within the same range is important. In this case, let’s scale the values to be numbers between 0 and 1.
X_train = X_train / 255 X_test = X_test / 255
Build the convolutional neural network
The next step is to define the convolutional neural network. Here is where the convolution, pooling, and flattening layers will be applied. The first layer is the `Conv2D`layer. It’s defined with the following parameters:
- 32 output filters
- a 3 by 3 feature detector
- `same` padding to result in even padding for the input
- input shape of `(32, 32, 3)` because the images are of size 32 by 32. 3 notifies the network that images are colored
- the `relu` activation function so as to achieve non-linearity
The next layer is a max-pooling layer defined with the following parameters:
- a `pool_size` of (2, 2) that defines the size of the pooling window
- 2 strides that define the number of steps taken by the pooling window
Remember that you can design your network as you like. You just have to monitor the metrics and tweak the design and settle on the one that results in the best performance. In this case, another convolution and pooling layer is created. That is followed by the flatten layer whose results are passed to the dense layer. The final layer has 10 units because the dataset has 10 classes. Since it’s a multiclass problem, the Softmax activation function is applied.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dense(10, activation=”softmax”) ] )
How to visualize a deep learning model
The quickest way to visualize your model is to use the model summary function.
model.summary()
You can also use the Keras `plot_model` utility to plot the model.
tf.keras.utils.plot_model( model, to_file=”model.png”, show_shapes=True, show_layer_names=True, rankdir=”TB”, expand_nested=True, dpi=96, )
How to reduce overfitting with Dropout
One of the common ways to improve the performance of deep learning models is to introduce dropout regularization. In this process, a specified percentage of connections are dropped during the training process. This forces the network to learn patterns from the data instead of memorizing the data. This is what reduces overfitting. In Keras, this can be achieved by introducing a Dropout layer in the network. Here is how the network would look like after applying the DropOut layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=”softmax”) ] )
Compiling the model
The next step is to compile the model. The Sparse Categorical Cross-Entropy loss is used because the labels are not one-hot encoded. In the event that you want to encode the labels, then you will have to use the Categorical Cross-Entropy loss function.
model.compile(optimizer=’adam’, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[‘accuracy’])
How to halt training at the right time with Early Stopping
Left to train for more epochs than needed, your model will most likely overfit on the training set. One of the ways to avoid that is to stop the training process when the model stops improving. This is done by monitoring the loss or the accuracy. In order to achieve that, the Keras EarlyStopping callback is used. By default, the callback monitors the validation loss. Patience is the number of epochs to wait before stopping the training process if there is no improvement in the model loss. This callback will be used at the training stage. The callbacks should be passed as a list, even if it’s just one callback.
from tensorflow.keras.callbacks import EarlyStopping callbacks = [ EarlyStopping(patience=2) ]
How to save the best model automatically
You might also be interested in automatically saving the best model or model weights during training. That can be applied using a Keras ModelCheckpoint callback. The callback will save the best model after each epoch. You can instruct it to save the entire model or just the model weights. By default, it will save the models where the validation loss is minimum.
checkpoint_filepath = ‘/tmp/checkpoint’ model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_filepath, save_weights_only=False, monitor=’loss’, mode=’min’, save_best_only=True)
Your callbacks will now look like this.
callbacks = [ EarlyStopping(patience=2), model_checkpoint_callback, ]
After training, you can load the model again using the Keras `load_model` utility.
another_saved_model = tf.keras.models.load_model(checkpoint_filepath)
Training the model
Let’s now fit the data to the training set. The validation set is passed as well because the callback monitors the validation set. In this case, you can define many epochs but the training process will be stopped by the callback when the loss doesn’t improve after 2 epochs as declared in the EarlyStopping callback.
history = model.fit(X_train,y_train, epochs=600,validation_data=(X_test,y_test),callbacks=callbacks)
How to plot model learning curves
Learning curves are important because they can inform you whether the model is learning or overfitting. If the validation loss increases significantly or the validation accuracy reduces sharply then your model is most likely overfitting. Since the model was saved into a history variable, you can use that to access the losses and accuracy and plot them. You can also store them in a Pandas DataFrame.
import pandas as pd metrics_df = pd.DataFrame(history.history)
Let’s now look at how you would plot the training and validation loss.
metrics_df[[“loss”,”val_loss”]].plot();
The same can be done for the training and validation accuracy.
metrics_df[[“accuracy”,”val_accuracy”]].plot();
How to save and load your model
You might be interested in saving the model for later use. Saving the model is important so that you don’t have to train the model again. This is especially critical for image models that take a long period to train. The H5 format is a common format for saving Keras models.
model.save(“model.h5”)
The Keras `load_model` is used for loading the model again.
load_saved_model = tf.keras.models.load_model(“model.h5”) load_saved_model.summary()
How to accelerate training with Batch Normalization
The network you trained here was relatively small. However, in other cases, you might have to train a very deep neural network. Training such a network can be very slow. The training process can be hastened using Batch Normalization. It transforms the data ensuring that the mean output is closer to zero and the output standard deviation is close to 1. The mean and variance are computed using the current batch of inputs. Since Batch Normalization offers some form of regularization it is usually not used with DropOut. Here’s how the model would look like after adding the batch normalization layer.
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=’same’, activation=”relu”,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=’same’, activation=”relu”), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=”relu”), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(10, activation=”softmax”) ]
The working of the batch normalization layer is different during training and during prediction and evaluation. During training `trainable=True` while during prediction and evaluation it’s false. When training normalization is done using the mean and standard deviation of the current batch of inputs. At inference i.e prediction and evaluation, normalization is done using a moving average of the mean and the standard deviation of the batches seen during training. When using a pre-trained model that contains this layer, training for the batch normalization layer has to be set to false. Otherwise the mean and standard deviation will be disrupted and all the prior learning lost.
What are Tensors?
We mainly deal with high-dimensional data when building machine learning and deep learning models. Tensors are multi-dimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
- A 0-dimensional tensor contains a single value.
- A 1-dimensional tensor, also known as “rank-1” tensor is list of values.
- A 2-dimensional tensor is a “rank-2” tensor.
- Finally, we can have a N-dimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3-dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
- The actual value of the tensor.
- The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
- The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
Architectures of CNNs
You don’t always have to design your convolutional neural networks from scratch. Other times one can try architectures developed by experts. These have proven to perform well on many image tasks. Some of these architectures are:
They can be accessed via Keras applications. These applications have also been pre-trained on the ImageNet dataset. The dataset contains over a million images. This makes these applications robust enough for use in the real world. When instantiating the model, you have the choice whether to include the pre-trained weights or not. When the weights are used, you can start using the model for classification right away. Other ways of using the pre-trained models are:
- extracting features and passing them to a new model
- fine-tuning a new model
Let’s take a look at how you can load the Xception architecture without weights. Since weights are not included, you can use your dataset to train the model.
model = tf.keras.applications.Xception( include_top=True, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
When you load the model with weights, you can start using it for prediction right away. The weights are stored in this location `~/.keras/models/`.
model = tf.keras.applications.Xception( include_top=True, weights=”imagenet”, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=”softmax”, )
After that, you can process the image and run the predictions. The Keras applications provide a function for doing that. Each of the architectures dictates the size of the image that should be passed to it. You should always confirm that from its documentation. Next, convert the image into an array and expand its dimensions in order to include the batch size.
from tensorflow.keras.preprocessing import image import numpy as np !wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/b/b5/Lion_d%27Afrique.jpg \ -O /tmp/lion.jpg img_path = ‘/tmp/lion.jpg’ img = image.load_img(img_path, target_size=(299, 299)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = tf.keras.applications.xception.preprocess_input(x) preds = model.predict(x) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print(‘Predicted:’, tf.keras.applications.xception.decode_predictions(preds, top=3)[0])
The final step is to decode the predictions and print the results.
TensorFlow: Constants, Variables, and Placeholders
Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.
A computational graph is basically and a representation of a sequence of operations and the flow of data between them.
Now, let’s understand the difference between these types of tensors.
Constants
Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.
Variables
Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.
Placeholders
These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.
In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.
Making predictions
You can now use this image to run a prediction.
prediction = model.predict(test_image)
When you print this you will see something similar to this.
prediction[0][0] 0.014393696
The question is, how do you interpret this? Remember that the network output layer has just one unit and uses the sigmoid activation function. The output of this network is therefore a number between 0 and 1. That number represents the probability that the image belongs to class 1. Class 1 in this case is dogs. You can therefore set a threshold of say 50% to separate the two classes.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
Since the obtained probability is less than 0.5 then that image is definitely that of a cat.
You can repeat the same process with a dogs image. First, start by downloading the image.
!wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/1/18/Dog_Breeds.jpg \ -O /tmp/dog.jpg
After that, load it while converting it to the required size.
test_image2 = image.load_img(‘/tmp/dog.jpg’, target_size=(200, 200))
Next, expand the dimensions and run the prediction.
test_image2 = np.expand_dims(test_image2, axis=0) prediction = model.predict(test_image2)
Use the same threshold to determine if it is the image of a cat or a dog.
if prediction[0][0]>0.5: print(” is a dog”) else: print(” is a cat”)
With an accuracy of 99%, the image is classified as a dog.
Use case implementation using CNN
We’ll be using the CIFAR-10 dataset from the Canadian Institute For Advanced Research for classifying images across 10 categories using CNN.
1. Download the data set:
2. Import the CIFAR data set:
3. Read the label names:
4. Display the images using matplotlib:
5. Use the helper function to handle data:
6. Create the model:
7. Apply the helper functions:
8. Create the layers for convolution and pooling:
9. Create the flattened layer by reshaping the pooling layer:
10. Create a fully connected layer:
11. Set the output to y_pred variable:
12. Apply the loss function:
13. Create the optimizer:
14. Create a variable to initialize all the global variables:
15. Run the model by creating a graph session:
Build deep learning models in TensorFlow and learn the TensorFlow open-source framework with the Deep Learning Course (with Keras &TensorFlow). Enroll now!
Final remarks
In this article, you have learned CNNs from their intuition to their applications in the real world. You have also seen that you can use existing architectures to hasten your model development process. Specifically, you have covered:
- what convolutional neural networks are
- how convolutional neural networks work
- using pre-trained convolutional neural networks to run image classification
- building convolutional neural networks from scratch using Keras and TensorFlow
- how to plot the learning curves of your neural network
- preventing overfitting using DropOut regularization and batch normalization
- saving your best model using the model checkpoint callback
- how to stop the training process of your CNN when it stops improving
- how you can save and load the model again
…just to mention a few.
And that’s not the end of it, you can explore all the examples used in this article on this Google Colab Notebook. Feel free to play with the parameters of the models to see how they affect the performance of the model.
- TensorFlow Tutorial
- TensorFlow – Home
- TensorFlow – Introduction
- TensorFlow – Installation
- Understanding Artificial Intelligence
- Mathematical Foundations
- Machine Learning & Deep Learning
- TensorFlow – Basics
- Convolutional Neural Networks
- Recurrent Neural Networks
- TensorBoard Visualization
- TensorFlow – Word Embedding
- Single Layer Perceptron
- TensorFlow – Linear Regression
- TFLearn and its installation
- CNN and RNN Difference
- TensorFlow – Keras
- TensorFlow – Distributed Computing
- TensorFlow – Exporting
- Multi-Layer Perceptron Learning
- Hidden Layers of Perceptron
- TensorFlow – Optimizers
- TensorFlow – XOR Implementation
- Gradient Descent Optimization
- TensorFlow – Forming Graphs
- Image Recognition using TensorFlow
- Recommendations for Neural Network Training
- TensorFlow Useful Resources
- TensorFlow – Quick Guide
- TensorFlow – Useful Resources
- TensorFlow – Discussion
TensorFlow – Convolutional Neural Networks
After understanding machine-learning concepts, we can now shift our focus to deep learning concepts. Deep learning is a division of machine learning and is considered as a crucial step taken by researchers in recent decades. The examples of deep learning implementation include applications like image recognition and speech recognition.
Following are the two important types of deep neural networks −
- Convolutional Neural Networks
- Recurrent Neural Networks
In this chapter, we will focus on the CNN, Convolutional Neural Networks.
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
TensorFlow Implementation of CNN
In this section, we will learn about the TensorFlow implementation of CNN. The steps,which require the execution and proper dimension of the entire network, are as shown below −
Step 1 − Include the necessary modules for TensorFlow and the data set modules, which are needed to compute the CNN model.
import tensorflow as tf import numpy as np from tensorflow.examples.tutorials.mnist import input_data
Step 2 − Declare a function called run_cnn(), which includes various parameters and optimization variables with declaration of data placeholders. These optimization variables will declare the training pattern.
def run_cnn(): mnist = input_data.read_data_sets(“MNIST_data/”, one_hot = True) learning_rate = 0.0001 epochs = 10 batch_size = 50
Step 3 − In this step, we will declare the training data placeholders with input parameters – for 28 x 28 pixels = 784. This is the flattened image data that is drawn from mnist.train.nextbatch().
We can reshape the tensor according to our requirements. The first value (-1) tells function to dynamically shape that dimension based on the amount of data passed to it. The two middle dimensions are set to the image size (i.e. 28 x 28).
x = tf.placeholder(tf.float32, [None, 784]) x_shaped = tf.reshape(x, [-1, 28, 28, 1]) y = tf.placeholder(tf.float32, [None, 10])
Step 4 − Now it is important to create some convolutional layers −
layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name = ‘layer1’) layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name = ‘layer2’)
Step 5 − Let us flatten the output ready for the fully connected output stage – after two layers of stride 2 pooling with the dimensions of 28 x 28, to dimension of 14 x 14 or minimum 7 x 7 x,y co-ordinates, but with 64 output channels. To create the fully connected with “dense” layer, the new shape needs to be [-1, 7 x 7 x 64]. We can set up some weights and bias values for this layer, then activate with ReLU.
flattened = tf.reshape(layer2, [-1, 7 * 7 * 64]) wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev = 0.03), name = ‘wd1’) bd1 = tf.Variable(tf.truncated_normal([1000], stddev = 0.01), name = ‘bd1’) dense_layer1 = tf.matmul(flattened, wd1) + bd1 dense_layer1 = tf.nn.relu(dense_layer1)
Step 6 − Another layer with specific softmax activations with the required optimizer defines the accuracy assessment, which makes the setup of initialization operator.
wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev = 0.03), name = ‘wd2’) bd2 = tf.Variable(tf.truncated_normal([10], stddev = 0.01), name = ‘bd2’) dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2 y_ = tf.nn.softmax(dense_layer2) cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits = dense_layer2, labels = y)) optimiser = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init_op = tf.global_variables_initializer()
Step 7 − We should set up recording variables. This adds up a summary to store the accuracy of data.
tf.summary.scalar(‘accuracy’, accuracy) merged = tf.summary.merge_all() writer = tf.summary.FileWriter(‘E:\TensorFlowProject’) with tf.Session() as sess: sess.run(init_op) total_batch = int(len(mnist.train.labels) / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size = batch_size) _, c = sess.run([optimiser, cross_entropy], feed_dict = { x:batch_x, y: batch_y}) avg_cost += c / total_batch test_acc = sess.run(accuracy, feed_dict = {x: mnist.test.images, y: mnist.test.labels}) summary = sess.run(merged, feed_dict = {x: mnist.test.images, y: mnist.test.labels}) writer.add_summary(summary, epoch) print(“\nTraining complete!”) writer.add_graph(sess.graph) print(sess.run(accuracy, feed_dict = {x: mnist.test.images, y: mnist.test.labels})) def create_new_conv_layer( input_data, num_input_channels, num_filters,filter_shape, pool_shape, name): conv_filt_shape = [ filter_shape[0], filter_shape[1], num_input_channels, num_filters] weights = tf.Variable( tf.truncated_normal(conv_filt_shape, stddev = 0.03), name = name+’_W’) bias = tf.Variable(tf.truncated_normal([num_filters]), name = name+’_b’) #Out layer defines the output out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding = ‘SAME’) out_layer += bias out_layer = tf.nn.relu(out_layer) ksize = [1, pool_shape[0], pool_shape[1], 1] strides = [1, 2, 2, 1] out_layer = tf.nn.max_pool( out_layer, ksize = ksize, strides = strides, padding = ‘SAME’) return out_layer if __name__ == “__main__”: run_cnn()
Following is the output generated by the above code −
See @{tf.nn.softmax_cross_entropy_with_logits_v2}. 2018-09-19 17:22:58.802268: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-09-19 17:25:41.522845: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 1003520000 exceeds 10% of system memory. 2018-09-19 17:25:44.630941: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 501760000 exceeds 10% of system memory. Epoch: 1 cost = 0.676 test accuracy: 0.940 2018-09-19 17:26:51.987554: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 1003520000 exceeds 10% of system memory.
What are Tensors?
We mainly deal with high-dimensional data when building machine learning and deep learning models. Tensors are multi-dimensional arrays with a uniform type used to represent different features of the data.
Below is the graphical representation of the different types of dimensions of tensors.
- A 0-dimensional tensor contains a single value.
- A 1-dimensional tensor, also known as “rank-1” tensor is list of values.
- A 2-dimensional tensor is a “rank-2” tensor.
- Finally, we can have a N-dimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.
Below is an illustration of a zero to a 3-dimensional tensor. Each tensor is created using the constant() function from TensorFlow.
# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)
A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:
- The actual value of the tensor.
- The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
- The data type represented by the dtype attribute, and all the tensors are int32.
Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.
CNN Step-by-Step Implementation
Let’s put everything we have learned previously into practice. This section will illustrate the end-to-end implementation of a convolutional neural network in TensorFlow applied to the CIFAR-10 dataset, which is a built-in dataset with the following properties:
- It contains 60.000 32 by 32 color images
- The dataset has 10 different classes
- Each class has 6000 images
- There are overall 50.000 training images
- And overall 10.000 testing images
The source code of the article is available on DataCamp’s workspace
Architecture of the network
Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.
- The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
- We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
- The first pooling layer will apply a 2×2 max pooling
- The second pooling layer will apply a 2×2 max pooling as well
- The fully connected layer will have 128 units and a ReLU activation function
- Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.
Load dataset
The built-in dataset is loaded from the keras.datasets() as follows:
(train_images, train_labels), (test_images, test_labels) = cf10.load_data()
Exploratory Data Analysis
In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.
The helper function show_images() shows a total of 12 images by default and takes three main parameters:
- The training images
- The class names
- And the training labels.
import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()
Now, we can call the function with the required parameters.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)
A successful execution of the previous code generates the images below.
Data preprocessing
Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.
max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value
Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.
from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))
Model architecture implementation
The next step is to implement the architecture of the network based on the previous description.
First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.
from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))
After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.
Model training
All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:
- The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
- The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
- Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.
from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))
Model evaluation
After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.
- The horizontal axis (x) is the number of epochs
- The vertical one (y) is the underlying performance of the model.
- The curve represents the value of the metrics at a specific epoch.
For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.
def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e-2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='--', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')
Then, the function is applied for both the accuracy and the precision of the model.
show_performance_curve(training_history, 'accuracy', 'accuracy')
show_performance_curve(training_history, 'precision', 'precision')
After training the model without any fine-tuning and pre-processing, we end up with:
- An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
- And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
- These scores are achieved respectively at the third and second epochs for accuracy and precision.
These two metrics give a global understanding of the model behavior.
What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?
This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()
- Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
- On the other hand, it seems to struggle with the remaining classes:
- The classes with the highest off-diagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).
Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.
This model can be improved with additional tasks such as:
- Image augmentation
- Transfer learning using pre-trained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
- Applying different regularization technics such as L1, L2 or dropout.
- Fine-tuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
Load and preprocess video data
The hidden cell below defines helper functions to download a slice of data from the UCF-101 dataset, and load it into a
tf.data.Dataset
. You can learn more about the specific preprocessing steps in the Loading video data tutorial, which walks you through this code in more detail.
The
FrameGenerator
class at the end of the hidden block is the most important utility here. It creates an iterable object that can feed data into the TensorFlow data pipeline. Specifically, this class contains a Python generator that loads the video frames along with its encoded label. The generator (
__call__
) function yields the frame array produced by
frames_from_video_file
and a one-hot encoded vector of the label associated with the set of frames.
def list_files_per_class(zip_url): """ List the files in each class of the dataset given the zip URL. Args: zip_url: URL from which the files can be unzipped. Return: files: List of files in each of the classes. """ files = [] with rz.RemoteZip(URL) as zip: for zip_info in zip.infolist(): files.append(zip_info.filename) return files def get_class(fname): """ Retrieve the name of the class given a filename. Args: fname: Name of the file in the UCF101 dataset. Return: Class that the file belongs to. """ return fname.split('_')[-3] def get_files_per_class(files): """ Retrieve the files that belong to each class. Args: files: List of files in the dataset. Return: Dictionary of class names (key) and files (values). """ files_for_class = collections.defaultdict(list) for fname in files: class_name = get_class(fname) files_for_class[class_name].append(fname) return files_for_class def download_from_zip(zip_url, to_dir, file_names): """ Download the contents of the zip file from the zip URL. Args: zip_url: Zip URL containing data. to_dir: Directory to download data to. file_names: Names of files to download. """ with rz.RemoteZip(zip_url) as zip: for fn in tqdm.tqdm(file_names): class_name = get_class(fn) zip.extract(fn, str(to_dir / class_name)) unzipped_file = to_dir / class_name / fn fn = pathlib.Path(fn).parts[-1] output_file = to_dir / class_name / fn unzipped_file.rename(output_file,) def split_class_lists(files_for_class, count): """ Returns the list of files belonging to a subset of data as well as the remainder of files that need to be downloaded. Args: files_for_class: Files belonging to a particular class of data. count: Number of files to download. Return: split_files: Files belonging to the subset of data. remainder: Dictionary of the remainder of files that need to be downloaded. """ split_files = [] remainder = {} for cls in files_for_class: split_files.extend(files_for_class[cls][:count]) remainder[cls] = files_for_class[cls][count:] return split_files, remainder def download_ufc_101_subset(zip_url, num_classes, splits, download_dir): """ Download a subset of the UFC101 dataset and split them into various parts, such as training, validation, and test. Args: zip_url: Zip URL containing data. num_classes: Number of labels. splits: Dictionary specifying the training, validation, test, etc. (key) division of data (value is number of files per split). download_dir: Directory to download data to. Return: dir: Posix path of the resulting directories containing the splits of data. """ files = list_files_per_class(zip_url) for f in files: tokens = f.split('/') if len(tokens) <= 2: files.remove(f) # Remove that item from the list if it does not have a filename files_for_class = get_files_per_class(files) classes = list(files_for_class.keys())[:num_classes] for cls in classes: new_files_for_class = files_for_class[cls] random.shuffle(new_files_for_class) files_for_class[cls] = new_files_for_class # Only use the number of classes you want in the dictionary files_for_class = {x: files_for_class[x] for x in list(files_for_class)[:num_classes]} dirs = {} for split_name, split_count in splits.items(): print(split_name, ":") split_dir = download_dir / split_name split_files, files_for_class = split_class_lists(files_for_class, split_count) download_from_zip(zip_url, split_dir, split_files) dirs[split_name] = split_dir return dirs def format_frames(frame, output_size): """ Pad and resize an image from a video. Args: frame: Image that needs to resized and padded. output_size: Pixel size of the output frame image. Return: Formatted frame with padding of specified output size. """ frame = tf.image.convert_image_dtype(frame, tf.float32) frame = tf.image.resize_with_pad(frame, *output_size) return frame def frames_from_video_file(video_path, n_frames, output_size = (224,224), frame_step = 15): """ Creates frames from each video file present for each category. Args: video_path: File path to the video. n_frames: Number of frames to be created per video file. output_size: Pixel size of the output frame image. Return: An NumPy array of frames in the shape of (n_frames, height, width, channels). """ # Read each video frame by frame result = [] src = cv2.VideoCapture(str(video_path)) video_length = src.get(cv2.CAP_PROP_FRAME_COUNT) need_length = 1 + (n_frames - 1) * frame_step if need_length > video_length: start = 0 else: max_start = video_length - need_length start = random.randint(0, max_start + 1) src.set(cv2.CAP_PROP_POS_FRAMES, start) # ret is a boolean indicating whether read was successful, frame is the image itself ret, frame = src.read() result.append(format_frames(frame, output_size)) for _ in range(n_frames - 1): for _ in range(frame_step): ret, frame = src.read() if ret: frame = format_frames(frame, output_size) result.append(frame) else: result.append(np.zeros_like(result[0])) src.release() result = np.array(result)[..., [2, 1, 0]] return result class FrameGenerator: def __init__(self, path, n_frames, training = False): """ Returns a set of frames with their associated label. Args: path: Video file paths. n_frames: Number of frames. training: Boolean to determine if training dataset is being created. """ self.path = path self.n_frames = n_frames self.training = training self.class_names = sorted(set(p.name for p in self.path.iterdir() if p.is_dir())) self.class_ids_for_name = dict((name, idx) for idx, name in enumerate(self.class_names)) def get_files_and_class_names(self): video_paths = list(self.path.glob('*/*.avi')) classes = [p.parent.name for p in video_paths] return video_paths, classes def __call__(self): video_paths, classes = self.get_files_and_class_names() pairs = list(zip(video_paths, classes)) if self.training: random.shuffle(pairs) for path, name in pairs: video_frames = frames_from_video_file(path, self.n_frames) label = self.class_ids_for_name[name] # Encode labels yield video_frames, label
URL = 'https://storage.googleapis.com/thumos14_files/UCF101_videos.zip' download_dir = pathlib.Path('./UCF101_subset/') subset_paths = download_ufc_101_subset(URL, num_classes = 10, splits = {"train": 30, "val": 10, "test": 10}, download_dir = download_dir)
train : 100%|██████████| 300/300 [00:28<00:00, 10.61it/s] val : 100%|██████████| 100/100 [00:09<00:00, 10.75it/s] test : 100%|██████████| 100/100 [00:08<00:00, 12.31it/s]
Create the training, validation, and test sets (
train_ds
,
val_ds
, and
test_ds
).
n_frames = 10 batch_size = 8 output_signature = (tf.TensorSpec(shape = (None, None, None, 3), dtype = tf.float32), tf.TensorSpec(shape = (), dtype = tf.int16)) train_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['train'], n_frames, training=True), output_signature = output_signature) # Batch the data train_ds = train_ds.batch(batch_size) val_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['val'], n_frames), output_signature = output_signature) val_ds = val_ds.batch(batch_size) test_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['test'], n_frames), output_signature = output_signature) test_ds = test_ds.batch(batch_size)
2023-10-27 01:30:41.429812: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.429949: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublas.so.11’; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.430034: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublasLt.so.11’; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.430115: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcufft.so.10’; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.496040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcusparse.so.11’; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.496279: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices…
Convolutional Neural Networks
Convolutional Neural networks are designed to process data through multiple layers of arrays. This type of neural networks is used in applications like image recognition or face recognition. The primary difference between CNN and any other ordinary neural network is that CNN takes input as a two-dimensional array and operates directly on the images rather than focusing on feature extraction which other neural networks focus on.
The dominant approach of CNN includes solutions for problems of recognition. Top companies like Google and Facebook have invested in research and development towards recognition projects to get activities done with greater speed.
A convolutional neural network uses three basic ideas −
- Local respective fields
- Convolution
- Pooling
Let us understand these ideas in detail.
CNN utilizes spatial correlations that exist within the input data. Each concurrent layer of a neural network connects some input neurons. This specific region is called local receptive field. Local receptive field focusses on the hidden neurons. The hidden neurons process the input data inside the mentioned field not realizing the changes outside the specific boundary.
Following is a diagram representation of generating local respective fields −
If we observe the above representation, each connection learns a weight of the hidden neuron with an associated connection with movement from one layer to another. Here, individual neurons perform a shift from time to time. This process is called “convolution”.
The mapping of connections from the input layer to the hidden feature map is defined as “shared weights” and bias included is called “shared bias”.
CNN or convolutional neural networks use pooling layers, which are the layers, positioned immediately after CNN declaration. It takes the input from the user as a feature map that comes out of convolutional networks and prepares a condensed feature map. Pooling layers helps in creating layers with neurons of previous layers.
Tensors vs Matrices: Differences
Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.
- We can think of a matrice as a tensor with only two dimensions.
- Tensors, on the other hand, is a more general format that can have any number of dimensions.
As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:
- They can deal with any number of dimensions, which makes them a better fit for multi-dimensional data.
- Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
- Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
- Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
Model definition
Let’s now create the convolutional neural network that will be used to classify the images. It will be similar to the previous one with a few cosmetic changes.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout from tensorflow.keras.preprocessing.image import ImageDataGenerator model = Sequential([ data_augmentation, tf.keras.layers.experimental.preprocessing.Rescaling(1./255), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Conv2D(filters=32,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Conv2D(filters=64,kernel_size=(3,3), activation=’relu’), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Flatten(), Dense(128, activation=’relu’), Dropout(0.25), Dense(1, activation=’sigmoid’) ])
The notable changes are:
- the application of the augmentation layer
- using the `Scaling` layer to scale the images in the model definition
TensorFlow CNN in Production with Run:AI
Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Đăng nhập/Đăng ký
Ranking
Cộng đồng
|
Kiến thức
19 tháng 03, 2022
Admin
23:49 19/03/2022
Convolutional Neural Network – Tự Học TensorFlow
Cùng tác giả
Không có dữ liệu
0
0
0
Admin
2995 người theo dõi
1283
184
Có liên quan
Không có dữ liệu
Chia sẻ kiến thức – Kết nối tương lai
Về chúng tôi
Về chúng tôi
Giới thiệu
Chính sách bảo mật
Điều khoản dịch vụ
Học miễn phí
Học miễn phí
Khóa học
Luyện tập
Cộng đồng
Cộng đồng
Kiến thức
Tin tức
Hỏi đáp
CÔNG TY CỔ PHẦN CÔNG NGHỆ GIÁO DỤC VÀ DỊCH VỤ BRONTOBYTE
The Manor Central Park, đường Nguyễn Xiển, phường Đại Kim, quận Hoàng Mai, TP. Hà Nội
THÔNG TIN LIÊN HỆ
[email protected]
©2024 TEK4.VN
Copyright © 2024
TEK4.VN
Model evaluation
You can also check the performance of the model on the validation set.
loss, accuracy = model.evaluate(validation_set) print(‘Accuracy on test dataset:’, accuracy)
Let’s now try the model on new images. The `image` module from Keras will be used to load the image.
import numpy as np from keras.preprocessing import image
Download some images from the internet and store them in a temporary folder. The images used here are provided via the permissive creative commons license.
!wget –no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/c/c7/Tabby_cat_with_blue_eyes-3336579.jpg \ -O /tmp/cat.jpg
Next, load the image while specifying the size used in training.
test_image = image.load_img(‘/tmp/cat.jpg’, target_size=(200, 200))
After this, convert it into an array since the model expects array inputs.
test_image = image.img_to_array(test_image)
The next step is to expand the dimensions of the image in order to include the batch size. Let’s take a look at the shape of the image at the moment.
That needs to be amended to include a batch size of 1, because only one image is being used here. Expanding the dimensions is done using the `expand_dims` function from NumPy.
test_image = np.expand_dims(test_image, axis=0)
If you check the shape again, you will see that it’s in the form required by the model.
CNN on TensorFlow Concepts
Tensor
Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.
Computational graph
TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.
Constant
In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.
Placeholder
Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.
Variable
Variables are used to add trainable nodes to the computation graph, such as weights and biases.
Related content: read our guide to deep convolutional neural networks.
Train the model
For this tutorial, choose the
tf.keras.optimizers.Adam
optimizer and the
tf.keras.losses.SparseCategoricalCrossentropy
loss function. Use the
metrics
argument to the view the accuracy of the model performance at every step.
model.compile(loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer = keras.optimizers.Adam(learning_rate = 0.0001), metrics = ['accuracy'])
Train the model for 50 epoches with the Keras
Model.fit
method.
history = model.fit(x = train_ds, epochs = 50, validation_data = val_ds)
Epoch 1/50 38/38 [==============================] – 230s 6s/step – loss: 2.4452 – accuracy: 0.1233 – val_loss: 2.4661 – val_accuracy: 0.1400 Epoch 2/50 38/38 [==============================] – 223s 6s/step – loss: 2.1898 – accuracy: 0.2067 – val_loss: 2.5864 – val_accuracy: 0.1500 Epoch 3/50 38/38 [==============================] – 222s 6s/step – loss: 2.0602 – accuracy: 0.2667 – val_loss: 2.7133 – val_accuracy: 0.1200 Epoch 4/50 38/38 [==============================] – 221s 6s/step – loss: 1.8716 – accuracy: 0.3633 – val_loss: 2.4647 – val_accuracy: 0.1800 Epoch 5/50 38/38 [==============================] – 220s 6s/step – loss: 1.7901 – accuracy: 0.3667 – val_loss: 2.7002 – val_accuracy: 0.1500 Epoch 6/50 38/38 [==============================] – 221s 6s/step – loss: 1.7632 – accuracy: 0.3867 – val_loss: 2.6759 – val_accuracy: 0.1600 Epoch 7/50 38/38 [==============================] – 223s 6s/step – loss: 1.7130 – accuracy: 0.3833 – val_loss: 2.3038 – val_accuracy: 0.2200 Epoch 8/50 38/38 [==============================] – 222s 6s/step – loss: 1.6025 – accuracy: 0.4000 – val_loss: 2.6929 – val_accuracy: 0.1700 Epoch 9/50 38/38 [==============================] – 220s 6s/step – loss: 1.5444 – accuracy: 0.4767 – val_loss: 2.6629 – val_accuracy: 0.1800 Epoch 10/50 38/38 [==============================] – 220s 6s/step – loss: 1.4557 – accuracy: 0.4767 – val_loss: 2.2244 – val_accuracy: 0.2300 Epoch 11/50 38/38 [==============================] – 220s 6s/step – loss: 1.3617 – accuracy: 0.5200 – val_loss: 2.1875 – val_accuracy: 0.3200 Epoch 12/50 38/38 [==============================] – 221s 6s/step – loss: 1.3553 – accuracy: 0.5333 – val_loss: 2.1145 – val_accuracy: 0.2700 Epoch 13/50 38/38 [==============================] – 220s 6s/step – loss: 1.3947 – accuracy: 0.5000 – val_loss: 1.8937 – val_accuracy: 0.3700 Epoch 14/50 38/38 [==============================] – 221s 6s/step – loss: 1.3361 – accuracy: 0.5267 – val_loss: 1.6443 – val_accuracy: 0.4600 Epoch 15/50 38/38 [==============================] – 219s 6s/step – loss: 1.2380 – accuracy: 0.5533 – val_loss: 1.4356 – val_accuracy: 0.5400 Epoch 16/50 38/38 [==============================] – 221s 6s/step – loss: 1.1703 – accuracy: 0.5767 – val_loss: 1.8729 – val_accuracy: 0.4000 Epoch 17/50 38/38 [==============================] – 221s 6s/step – loss: 1.1605 – accuracy: 0.6033 – val_loss: 1.7361 – val_accuracy: 0.4600 Epoch 18/50 38/38 [==============================] – 220s 6s/step – loss: 1.0992 – accuracy: 0.6067 – val_loss: 1.2881 – val_accuracy: 0.5700 Epoch 19/50 38/38 [==============================] – 220s 6s/step – loss: 1.1121 – accuracy: 0.6133 – val_loss: 1.4157 – val_accuracy: 0.5200 Epoch 20/50 38/38 [==============================] – 218s 6s/step – loss: 1.0310 – accuracy: 0.6300 – val_loss: 1.2908 – val_accuracy: 0.5800 Epoch 21/50 38/38 [==============================] – 219s 6s/step – loss: 1.0134 – accuracy: 0.6467 – val_loss: 1.3474 – val_accuracy: 0.5800 Epoch 22/50 38/38 [==============================] – 219s 6s/step – loss: 0.9677 – accuracy: 0.6300 – val_loss: 1.3569 – val_accuracy: 0.5300 Epoch 23/50 38/38 [==============================] – 219s 6s/step – loss: 0.9292 – accuracy: 0.6433 – val_loss: 1.1130 – val_accuracy: 0.5900 Epoch 24/50 38/38 [==============================] – 219s 6s/step – loss: 0.9134 – accuracy: 0.6833 – val_loss: 1.3144 – val_accuracy: 0.5200 Epoch 25/50 38/38 [==============================] – 218s 6s/step – loss: 0.8948 – accuracy: 0.7000 – val_loss: 1.1649 – val_accuracy: 0.5900 Epoch 26/50 38/38 [==============================] – 219s 6s/step – loss: 0.8968 – accuracy: 0.6500 – val_loss: 1.1370 – val_accuracy: 0.6400 Epoch 27/50 38/38 [==============================] – 219s 6s/step – loss: 0.9460 – accuracy: 0.6533 – val_loss: 2.2827 – val_accuracy: 0.3800 Epoch 28/50 38/38 [==============================] – 218s 6s/step – loss: 1.0633 – accuracy: 0.6333 – val_loss: 1.2745 – val_accuracy: 0.5400 Epoch 29/50 38/38 [==============================] – 219s 6s/step – loss: 0.9378 – accuracy: 0.6733 – val_loss: 1.2241 – val_accuracy: 0.6500 Epoch 30/50 38/38 [==============================] – 219s 6s/step – loss: 0.8682 – accuracy: 0.7200 – val_loss: 1.1828 – val_accuracy: 0.6500 Epoch 31/50 38/38 [==============================] – 218s 6s/step – loss: 0.8379 – accuracy: 0.6833 – val_loss: 1.1417 – val_accuracy: 0.6000 Epoch 32/50 38/38 [==============================] – 218s 6s/step – loss: 0.7856 – accuracy: 0.6900 – val_loss: 1.2292 – val_accuracy: 0.5600 Epoch 33/50 38/38 [==============================] – 219s 6s/step – loss: 0.8056 – accuracy: 0.7233 – val_loss: 1.0834 – val_accuracy: 0.6200 Epoch 34/50 38/38 [==============================] – 220s 6s/step – loss: 0.8262 – accuracy: 0.6867 – val_loss: 1.1120 – val_accuracy: 0.6000 Epoch 35/50 38/38 [==============================] – 218s 6s/step – loss: 0.7472 – accuracy: 0.7367 – val_loss: 0.9757 – val_accuracy: 0.6700 Epoch 36/50 38/38 [==============================] – 219s 6s/step – loss: 0.6969 – accuracy: 0.7500 – val_loss: 0.9642 – val_accuracy: 0.6400 Epoch 37/50 38/38 [==============================] – 219s 6s/step – loss: 0.7518 – accuracy: 0.7467 – val_loss: 1.1454 – val_accuracy: 0.5100 Epoch 38/50 38/38 [==============================] – 220s 6s/step – loss: 0.7360 – accuracy: 0.7267 – val_loss: 0.9619 – val_accuracy: 0.6800 Epoch 39/50 38/38 [==============================] – 220s 6s/step – loss: 0.6887 – accuracy: 0.7600 – val_loss: 1.1292 – val_accuracy: 0.6100 Epoch 40/50 38/38 [==============================] – 220s 6s/step – loss: 0.7217 – accuracy: 0.7567 – val_loss: 1.2201 – val_accuracy: 0.6100 Epoch 41/50 38/38 [==============================] – 219s 6s/step – loss: 0.7505 – accuracy: 0.7200 – val_loss: 0.9450 – val_accuracy: 0.6800 Epoch 42/50 38/38 [==============================] – 218s 6s/step – loss: 0.6737 – accuracy: 0.7433 – val_loss: 0.9566 – val_accuracy: 0.6500 Epoch 43/50 38/38 [==============================] – 219s 6s/step – loss: 0.6232 – accuracy: 0.7867 – val_loss: 0.9072 – val_accuracy: 0.7100 Epoch 44/50 38/38 [==============================] – 220s 6s/step – loss: 0.5908 – accuracy: 0.8100 – val_loss: 0.9052 – val_accuracy: 0.7200 Epoch 45/50 38/38 [==============================] – 219s 6s/step – loss: 0.5901 – accuracy: 0.7767 – val_loss: 0.8087 – val_accuracy: 0.7100 Epoch 46/50 38/38 [==============================] – 218s 6s/step – loss: 0.6202 – accuracy: 0.7833 – val_loss: 1.0201 – val_accuracy: 0.7000 Epoch 47/50 38/38 [==============================] – 217s 6s/step – loss: 0.6777 – accuracy: 0.7567 – val_loss: 1.5742 – val_accuracy: 0.4800 Epoch 48/50 38/38 [==============================] – 217s 6s/step – loss: 0.8462 – accuracy: 0.6767 – val_loss: 1.6540 – val_accuracy: 0.4400 Epoch 49/50 38/38 [==============================] – 219s 6s/step – loss: 0.7168 – accuracy: 0.7333 – val_loss: 1.2454 – val_accuracy: 0.6000 Epoch 50/50 38/38 [==============================] – 216s 6s/step – loss: 0.6592 – accuracy: 0.7433 – val_loss: 0.9307 – val_accuracy: 0.6700
Visualize the results
Create plots of the loss and accuracy on the training and validation sets:
def plot_history(history): """ Plotting training and validation learning curves. Args: history: model history with all the metric measures """ fig, (ax1, ax2) = plt.subplots(2) fig.set_size_inches(18.5, 10.5) # Plot loss ax1.set_title('Loss') ax1.plot(history.history['loss'], label = 'train') ax1.plot(history.history['val_loss'], label = 'test') ax1.set_ylabel('Loss') # Determine upper bound of y-axis max_loss = max(history.history['loss'] + history.history['val_loss']) ax1.set_ylim([0, np.ceil(max_loss)]) ax1.set_xlabel('Epoch') ax1.legend(['Train', 'Validation']) # Plot accuracy ax2.set_title('Accuracy') ax2.plot(history.history['accuracy'], label = 'train') ax2.plot(history.history['val_accuracy'], label = 'test') ax2.set_ylabel('Accuracy') ax2.set_ylim([0, 1]) ax2.set_xlabel('Epoch') ax2.legend(['Train', 'Validation']) plt.show() plot_history(history)
Keywords searched by users: tensorflow convolutional neural networks
See more here: kientrucannam.vn
See more: https://kientrucannam.vn/vn/