Chuyển tới nội dung
Home » Tensorflow Convolutional Neural Network | Create The Model

Tensorflow Convolutional Neural Network | Create The Model

Introducing convolutional neural networks (ML Zero to Hero - Part 3)

CNN on TensorFlow Concepts

Tensor

Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.

Computational graph

TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.

Constant

In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.

Placeholder

Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.

Variable

Variables are used to add trainable nodes to the computation graph, such as weights and biases.

Related content: read our guide to deep convolutional neural networks.

Load and preprocess video data

The hidden cell below defines helper functions to download a slice of data from the UCF-101 dataset, and load it into a

tf.data.Dataset

. You can learn more about the specific preprocessing steps in the Loading video data tutorial, which walks you through this code in more detail.

The

FrameGenerator

class at the end of the hidden block is the most important utility here. It creates an iterable object that can feed data into the TensorFlow data pipeline. Specifically, this class contains a Python generator that loads the video frames along with its encoded label. The generator (

__call__

) function yields the frame array produced by

frames_from_video_file

and a one-hot encoded vector of the label associated with the set of frames.


def list_files_per_class(zip_url): """ List the files in each class of the dataset given the zip URL. Args: zip_url: URL from which the files can be unzipped. Return: files: List of files in each of the classes. """ files = [] with rz.RemoteZip(URL) as zip: for zip_info in zip.infolist(): files.append(zip_info.filename) return files def get_class(fname): """ Retrieve the name of the class given a filename. Args: fname: Name of the file in the UCF101 dataset. Return: Class that the file belongs to. """ return fname.split('_')[-3] def get_files_per_class(files): """ Retrieve the files that belong to each class. Args: files: List of files in the dataset. Return: Dictionary of class names (key) and files (values). """ files_for_class = collections.defaultdict(list) for fname in files: class_name = get_class(fname) files_for_class[class_name].append(fname) return files_for_class def download_from_zip(zip_url, to_dir, file_names): """ Download the contents of the zip file from the zip URL. Args: zip_url: Zip URL containing data. to_dir: Directory to download data to. file_names: Names of files to download. """ with rz.RemoteZip(zip_url) as zip: for fn in tqdm.tqdm(file_names): class_name = get_class(fn) zip.extract(fn, str(to_dir / class_name)) unzipped_file = to_dir / class_name / fn fn = pathlib.Path(fn).parts[-1] output_file = to_dir / class_name / fn unzipped_file.rename(output_file,) def split_class_lists(files_for_class, count): """ Returns the list of files belonging to a subset of data as well as the remainder of files that need to be downloaded. Args: files_for_class: Files belonging to a particular class of data. count: Number of files to download. Return: split_files: Files belonging to the subset of data. remainder: Dictionary of the remainder of files that need to be downloaded. """ split_files = [] remainder = {} for cls in files_for_class: split_files.extend(files_for_class[cls][:count]) remainder[cls] = files_for_class[cls][count:] return split_files, remainder def download_ufc_101_subset(zip_url, num_classes, splits, download_dir): """ Download a subset of the UFC101 dataset and split them into various parts, such as training, validation, and test. Args: zip_url: Zip URL containing data. num_classes: Number of labels. splits: Dictionary specifying the training, validation, test, etc. (key) division of data (value is number of files per split). download_dir: Directory to download data to. Return: dir: Posix path of the resulting directories containing the splits of data. """ files = list_files_per_class(zip_url) for f in files: tokens = f.split('/') if len(tokens) <= 2: files.remove(f) # Remove that item from the list if it does not have a filename files_for_class = get_files_per_class(files) classes = list(files_for_class.keys())[:num_classes] for cls in classes: new_files_for_class = files_for_class[cls] random.shuffle(new_files_for_class) files_for_class[cls] = new_files_for_class # Only use the number of classes you want in the dictionary files_for_class = {x: files_for_class[x] for x in list(files_for_class)[:num_classes]} dirs = {} for split_name, split_count in splits.items(): print(split_name, ":") split_dir = download_dir / split_name split_files, files_for_class = split_class_lists(files_for_class, split_count) download_from_zip(zip_url, split_dir, split_files) dirs[split_name] = split_dir return dirs def format_frames(frame, output_size): """ Pad and resize an image from a video. Args: frame: Image that needs to resized and padded. output_size: Pixel size of the output frame image. Return: Formatted frame with padding of specified output size. """ frame = tf.image.convert_image_dtype(frame, tf.float32) frame = tf.image.resize_with_pad(frame, *output_size) return frame def frames_from_video_file(video_path, n_frames, output_size = (224,224), frame_step = 15): """ Creates frames from each video file present for each category. Args: video_path: File path to the video. n_frames: Number of frames to be created per video file. output_size: Pixel size of the output frame image. Return: An NumPy array of frames in the shape of (n_frames, height, width, channels). """ # Read each video frame by frame result = [] src = cv2.VideoCapture(str(video_path)) video_length = src.get(cv2.CAP_PROP_FRAME_COUNT) need_length = 1 + (n_frames - 1) * frame_step if need_length > video_length: start = 0 else: max_start = video_length - need_length start = random.randint(0, max_start + 1) src.set(cv2.CAP_PROP_POS_FRAMES, start) # ret is a boolean indicating whether read was successful, frame is the image itself ret, frame = src.read() result.append(format_frames(frame, output_size)) for _ in range(n_frames - 1): for _ in range(frame_step): ret, frame = src.read() if ret: frame = format_frames(frame, output_size) result.append(frame) else: result.append(np.zeros_like(result[0])) src.release() result = np.array(result)[..., [2, 1, 0]] return result class FrameGenerator: def __init__(self, path, n_frames, training = False): """ Returns a set of frames with their associated label. Args: path: Video file paths. n_frames: Number of frames. training: Boolean to determine if training dataset is being created. """ self.path = path self.n_frames = n_frames self.training = training self.class_names = sorted(set(p.name for p in self.path.iterdir() if p.is_dir())) self.class_ids_for_name = dict((name, idx) for idx, name in enumerate(self.class_names)) def get_files_and_class_names(self): video_paths = list(self.path.glob('*/*.avi')) classes = [p.parent.name for p in video_paths] return video_paths, classes def __call__(self): video_paths, classes = self.get_files_and_class_names() pairs = list(zip(video_paths, classes)) if self.training: random.shuffle(pairs) for path, name in pairs: video_frames = frames_from_video_file(path, self.n_frames) label = self.class_ids_for_name[name] # Encode labels yield video_frames, label


URL = 'https://storage.googleapis.com/thumos14_files/UCF101_videos.zip' download_dir = pathlib.Path('./UCF101_subset/') subset_paths = download_ufc_101_subset(URL, num_classes = 10, splits = {"train": 30, "val": 10, "test": 10}, download_dir = download_dir)

train : 100%|██████████| 300/300 [00:28<00:00, 10.61it/s] val : 100%|██████████| 100/100 [00:09<00:00, 10.75it/s] test : 100%|██████████| 100/100 [00:08<00:00, 12.31it/s]

Create the training, validation, and test sets (

train_ds

,

val_ds

, and

test_ds

).


n_frames = 10 batch_size = 8 output_signature = (tf.TensorSpec(shape = (None, None, None, 3), dtype = tf.float32), tf.TensorSpec(shape = (), dtype = tf.int16)) train_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['train'], n_frames, training=True), output_signature = output_signature) # Batch the data train_ds = train_ds.batch(batch_size) val_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['val'], n_frames), output_signature = output_signature) val_ds = val_ds.batch(batch_size) test_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['test'], n_frames), output_signature = output_signature) test_ds = test_ds.batch(batch_size)

2023-10-27 01:30:41.429812: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.429949: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublas.so.11’; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.430034: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublasLt.so.11’; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.430115: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcufft.so.10’; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.496040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcusparse.so.11’; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:30:41.496279: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices…

Introducing convolutional neural networks (ML Zero to Hero - Part 3)
Introducing convolutional neural networks (ML Zero to Hero – Part 3)

What is TensorFlow CNN?

Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.

CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.

If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.

In this article, you will learn:

What are Tensors?

We mainly deal with high-dimensional data when building machine learning and deep learning models. Tensors are multi-dimensional arrays with a uniform type used to represent different features of the data.

Below is the graphical representation of the different types of dimensions of tensors.

  • A 0-dimensional tensor contains a single value.
  • A 1-dimensional tensor, also known as “rank-1” tensor is list of values.
  • A 2-dimensional tensor is a “rank-2” tensor.
  • Finally, we can have a N-dimensional tensor, where N represents the number of dimensions within the tensor. In the previous cases, N is respectively 0, 1 and 2.

Below is an illustration of a zero to a 3-dimensional tensor. Each tensor is created using the constant() function from TensorFlow.


# Zero dimensional tensor zero_dim_tensor = tf.constant(20) print(zero_dim_tensor) # One dimensional tensor one_dim_tensor = tf.constant([12, 20, 53, 26, 11, 56]) print(one_dim_tensor) # Two dimensional tensor two_dim_array = [[3, 6, 7, 5], [9, 2, 3, 4], [7, 1, 10,6], [0, 8, 11,2]] two_dim_tensor = tf.constant(two_dim_array) print(two_dim_tensor)

A successful execution of the previous code should generate the outputs below, and we can notice the keyword “tf.Tensor” to mean that the result is a tensor. It has three parameters:

  • The actual value of the tensor.
  • The shape() of the tensor, which is 0, 6 by 1, and 4 by 4, respectively for the first, second, and third tensors.
  • The data type represented by the dtype attribute, and all the tensors are int32.

Our Tensorflow Tutorial for Beginners provides a complete overview of TensorFlow and teaches how to build and train models.

TensorFlow Tutorial 05 - Convolutional Neural Network (CNN)
TensorFlow Tutorial 05 – Convolutional Neural Network (CNN)

TensorFlow Implementation of CNN

In this section, we will learn about the TensorFlow implementation of CNN. The steps,which require the execution and proper dimension of the entire network, are as shown below −

Step 1 − Include the necessary modules for TensorFlow and the data set modules, which are needed to compute the CNN model.

import tensorflow as tf import numpy as np from tensorflow.examples.tutorials.mnist import input_data

Step 2 − Declare a function called run_cnn(), which includes various parameters and optimization variables with declaration of data placeholders. These optimization variables will declare the training pattern.

def run_cnn(): mnist = input_data.read_data_sets(“MNIST_data/”, one_hot = True) learning_rate = 0.0001 epochs = 10 batch_size = 50

Step 3 − In this step, we will declare the training data placeholders with input parameters – for 28 x 28 pixels = 784. This is the flattened image data that is drawn from mnist.train.nextbatch().

We can reshape the tensor according to our requirements. The first value (-1) tells function to dynamically shape that dimension based on the amount of data passed to it. The two middle dimensions are set to the image size (i.e. 28 x 28).

x = tf.placeholder(tf.float32, [None, 784]) x_shaped = tf.reshape(x, [-1, 28, 28, 1]) y = tf.placeholder(tf.float32, [None, 10])

Step 4 − Now it is important to create some convolutional layers −

layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name = ‘layer1’) layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name = ‘layer2’)

Step 5 − Let us flatten the output ready for the fully connected output stage – after two layers of stride 2 pooling with the dimensions of 28 x 28, to dimension of 14 x 14 or minimum 7 x 7 x,y co-ordinates, but with 64 output channels. To create the fully connected with “dense” layer, the new shape needs to be [-1, 7 x 7 x 64]. We can set up some weights and bias values for this layer, then activate with ReLU.

flattened = tf.reshape(layer2, [-1, 7 * 7 * 64]) wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev = 0.03), name = ‘wd1’) bd1 = tf.Variable(tf.truncated_normal([1000], stddev = 0.01), name = ‘bd1’) dense_layer1 = tf.matmul(flattened, wd1) + bd1 dense_layer1 = tf.nn.relu(dense_layer1)

Step 6 − Another layer with specific softmax activations with the required optimizer defines the accuracy assessment, which makes the setup of initialization operator.

wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev = 0.03), name = ‘wd2’) bd2 = tf.Variable(tf.truncated_normal([10], stddev = 0.01), name = ‘bd2’) dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2 y_ = tf.nn.softmax(dense_layer2) cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits = dense_layer2, labels = y)) optimiser = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init_op = tf.global_variables_initializer()

Step 7 − We should set up recording variables. This adds up a summary to store the accuracy of data.

tf.summary.scalar(‘accuracy’, accuracy) merged = tf.summary.merge_all() writer = tf.summary.FileWriter(‘E:\TensorFlowProject’) with tf.Session() as sess: sess.run(init_op) total_batch = int(len(mnist.train.labels) / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size = batch_size) _, c = sess.run([optimiser, cross_entropy], feed_dict = { x:batch_x, y: batch_y}) avg_cost += c / total_batch test_acc = sess.run(accuracy, feed_dict = {x: mnist.test.images, y: mnist.test.labels}) summary = sess.run(merged, feed_dict = {x: mnist.test.images, y: mnist.test.labels}) writer.add_summary(summary, epoch) print(“\nTraining complete!”) writer.add_graph(sess.graph) print(sess.run(accuracy, feed_dict = {x: mnist.test.images, y: mnist.test.labels})) def create_new_conv_layer( input_data, num_input_channels, num_filters,filter_shape, pool_shape, name): conv_filt_shape = [ filter_shape[0], filter_shape[1], num_input_channels, num_filters] weights = tf.Variable( tf.truncated_normal(conv_filt_shape, stddev = 0.03), name = name+’_W’) bias = tf.Variable(tf.truncated_normal([num_filters]), name = name+’_b’) #Out layer defines the output out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding = ‘SAME’) out_layer += bias out_layer = tf.nn.relu(out_layer) ksize = [1, pool_shape[0], pool_shape[1], 1] strides = [1, 2, 2, 1] out_layer = tf.nn.max_pool( out_layer, ksize = ksize, strides = strides, padding = ‘SAME’) return out_layer if __name__ == “__main__”: run_cnn()

Following is the output generated by the above code −

See @{tf.nn.softmax_cross_entropy_with_logits_v2}. 2018-09-19 17:22:58.802268: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-09-19 17:25:41.522845: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 1003520000 exceeds 10% of system memory. 2018-09-19 17:25:44.630941: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 501760000 exceeds 10% of system memory. Epoch: 1 cost = 0.676 test accuracy: 0.940 2018-09-19 17:26:51.987554: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 1003520000 exceeds 10% of system memory.

Train the model

For this tutorial, choose the

tf.keras.optimizers.Adam

optimizer and the

tf.keras.losses.SparseCategoricalCrossentropy

loss function. Use the

metrics

argument to the view the accuracy of the model performance at every step.


model.compile(loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer = keras.optimizers.Adam(learning_rate = 0.0001), metrics = ['accuracy'])

Train the model for 50 epoches with the Keras

Model.fit

method.


history = model.fit(x = train_ds, epochs = 50, validation_data = val_ds)

Epoch 1/50 38/38 [==============================] – 230s 6s/step – loss: 2.4452 – accuracy: 0.1233 – val_loss: 2.4661 – val_accuracy: 0.1400 Epoch 2/50 38/38 [==============================] – 223s 6s/step – loss: 2.1898 – accuracy: 0.2067 – val_loss: 2.5864 – val_accuracy: 0.1500 Epoch 3/50 38/38 [==============================] – 222s 6s/step – loss: 2.0602 – accuracy: 0.2667 – val_loss: 2.7133 – val_accuracy: 0.1200 Epoch 4/50 38/38 [==============================] – 221s 6s/step – loss: 1.8716 – accuracy: 0.3633 – val_loss: 2.4647 – val_accuracy: 0.1800 Epoch 5/50 38/38 [==============================] – 220s 6s/step – loss: 1.7901 – accuracy: 0.3667 – val_loss: 2.7002 – val_accuracy: 0.1500 Epoch 6/50 38/38 [==============================] – 221s 6s/step – loss: 1.7632 – accuracy: 0.3867 – val_loss: 2.6759 – val_accuracy: 0.1600 Epoch 7/50 38/38 [==============================] – 223s 6s/step – loss: 1.7130 – accuracy: 0.3833 – val_loss: 2.3038 – val_accuracy: 0.2200 Epoch 8/50 38/38 [==============================] – 222s 6s/step – loss: 1.6025 – accuracy: 0.4000 – val_loss: 2.6929 – val_accuracy: 0.1700 Epoch 9/50 38/38 [==============================] – 220s 6s/step – loss: 1.5444 – accuracy: 0.4767 – val_loss: 2.6629 – val_accuracy: 0.1800 Epoch 10/50 38/38 [==============================] – 220s 6s/step – loss: 1.4557 – accuracy: 0.4767 – val_loss: 2.2244 – val_accuracy: 0.2300 Epoch 11/50 38/38 [==============================] – 220s 6s/step – loss: 1.3617 – accuracy: 0.5200 – val_loss: 2.1875 – val_accuracy: 0.3200 Epoch 12/50 38/38 [==============================] – 221s 6s/step – loss: 1.3553 – accuracy: 0.5333 – val_loss: 2.1145 – val_accuracy: 0.2700 Epoch 13/50 38/38 [==============================] – 220s 6s/step – loss: 1.3947 – accuracy: 0.5000 – val_loss: 1.8937 – val_accuracy: 0.3700 Epoch 14/50 38/38 [==============================] – 221s 6s/step – loss: 1.3361 – accuracy: 0.5267 – val_loss: 1.6443 – val_accuracy: 0.4600 Epoch 15/50 38/38 [==============================] – 219s 6s/step – loss: 1.2380 – accuracy: 0.5533 – val_loss: 1.4356 – val_accuracy: 0.5400 Epoch 16/50 38/38 [==============================] – 221s 6s/step – loss: 1.1703 – accuracy: 0.5767 – val_loss: 1.8729 – val_accuracy: 0.4000 Epoch 17/50 38/38 [==============================] – 221s 6s/step – loss: 1.1605 – accuracy: 0.6033 – val_loss: 1.7361 – val_accuracy: 0.4600 Epoch 18/50 38/38 [==============================] – 220s 6s/step – loss: 1.0992 – accuracy: 0.6067 – val_loss: 1.2881 – val_accuracy: 0.5700 Epoch 19/50 38/38 [==============================] – 220s 6s/step – loss: 1.1121 – accuracy: 0.6133 – val_loss: 1.4157 – val_accuracy: 0.5200 Epoch 20/50 38/38 [==============================] – 218s 6s/step – loss: 1.0310 – accuracy: 0.6300 – val_loss: 1.2908 – val_accuracy: 0.5800 Epoch 21/50 38/38 [==============================] – 219s 6s/step – loss: 1.0134 – accuracy: 0.6467 – val_loss: 1.3474 – val_accuracy: 0.5800 Epoch 22/50 38/38 [==============================] – 219s 6s/step – loss: 0.9677 – accuracy: 0.6300 – val_loss: 1.3569 – val_accuracy: 0.5300 Epoch 23/50 38/38 [==============================] – 219s 6s/step – loss: 0.9292 – accuracy: 0.6433 – val_loss: 1.1130 – val_accuracy: 0.5900 Epoch 24/50 38/38 [==============================] – 219s 6s/step – loss: 0.9134 – accuracy: 0.6833 – val_loss: 1.3144 – val_accuracy: 0.5200 Epoch 25/50 38/38 [==============================] – 218s 6s/step – loss: 0.8948 – accuracy: 0.7000 – val_loss: 1.1649 – val_accuracy: 0.5900 Epoch 26/50 38/38 [==============================] – 219s 6s/step – loss: 0.8968 – accuracy: 0.6500 – val_loss: 1.1370 – val_accuracy: 0.6400 Epoch 27/50 38/38 [==============================] – 219s 6s/step – loss: 0.9460 – accuracy: 0.6533 – val_loss: 2.2827 – val_accuracy: 0.3800 Epoch 28/50 38/38 [==============================] – 218s 6s/step – loss: 1.0633 – accuracy: 0.6333 – val_loss: 1.2745 – val_accuracy: 0.5400 Epoch 29/50 38/38 [==============================] – 219s 6s/step – loss: 0.9378 – accuracy: 0.6733 – val_loss: 1.2241 – val_accuracy: 0.6500 Epoch 30/50 38/38 [==============================] – 219s 6s/step – loss: 0.8682 – accuracy: 0.7200 – val_loss: 1.1828 – val_accuracy: 0.6500 Epoch 31/50 38/38 [==============================] – 218s 6s/step – loss: 0.8379 – accuracy: 0.6833 – val_loss: 1.1417 – val_accuracy: 0.6000 Epoch 32/50 38/38 [==============================] – 218s 6s/step – loss: 0.7856 – accuracy: 0.6900 – val_loss: 1.2292 – val_accuracy: 0.5600 Epoch 33/50 38/38 [==============================] – 219s 6s/step – loss: 0.8056 – accuracy: 0.7233 – val_loss: 1.0834 – val_accuracy: 0.6200 Epoch 34/50 38/38 [==============================] – 220s 6s/step – loss: 0.8262 – accuracy: 0.6867 – val_loss: 1.1120 – val_accuracy: 0.6000 Epoch 35/50 38/38 [==============================] – 218s 6s/step – loss: 0.7472 – accuracy: 0.7367 – val_loss: 0.9757 – val_accuracy: 0.6700 Epoch 36/50 38/38 [==============================] – 219s 6s/step – loss: 0.6969 – accuracy: 0.7500 – val_loss: 0.9642 – val_accuracy: 0.6400 Epoch 37/50 38/38 [==============================] – 219s 6s/step – loss: 0.7518 – accuracy: 0.7467 – val_loss: 1.1454 – val_accuracy: 0.5100 Epoch 38/50 38/38 [==============================] – 220s 6s/step – loss: 0.7360 – accuracy: 0.7267 – val_loss: 0.9619 – val_accuracy: 0.6800 Epoch 39/50 38/38 [==============================] – 220s 6s/step – loss: 0.6887 – accuracy: 0.7600 – val_loss: 1.1292 – val_accuracy: 0.6100 Epoch 40/50 38/38 [==============================] – 220s 6s/step – loss: 0.7217 – accuracy: 0.7567 – val_loss: 1.2201 – val_accuracy: 0.6100 Epoch 41/50 38/38 [==============================] – 219s 6s/step – loss: 0.7505 – accuracy: 0.7200 – val_loss: 0.9450 – val_accuracy: 0.6800 Epoch 42/50 38/38 [==============================] – 218s 6s/step – loss: 0.6737 – accuracy: 0.7433 – val_loss: 0.9566 – val_accuracy: 0.6500 Epoch 43/50 38/38 [==============================] – 219s 6s/step – loss: 0.6232 – accuracy: 0.7867 – val_loss: 0.9072 – val_accuracy: 0.7100 Epoch 44/50 38/38 [==============================] – 220s 6s/step – loss: 0.5908 – accuracy: 0.8100 – val_loss: 0.9052 – val_accuracy: 0.7200 Epoch 45/50 38/38 [==============================] – 219s 6s/step – loss: 0.5901 – accuracy: 0.7767 – val_loss: 0.8087 – val_accuracy: 0.7100 Epoch 46/50 38/38 [==============================] – 218s 6s/step – loss: 0.6202 – accuracy: 0.7833 – val_loss: 1.0201 – val_accuracy: 0.7000 Epoch 47/50 38/38 [==============================] – 217s 6s/step – loss: 0.6777 – accuracy: 0.7567 – val_loss: 1.5742 – val_accuracy: 0.4800 Epoch 48/50 38/38 [==============================] – 217s 6s/step – loss: 0.8462 – accuracy: 0.6767 – val_loss: 1.6540 – val_accuracy: 0.4400 Epoch 49/50 38/38 [==============================] – 219s 6s/step – loss: 0.7168 – accuracy: 0.7333 – val_loss: 1.2454 – val_accuracy: 0.6000 Epoch 50/50 38/38 [==============================] – 216s 6s/step – loss: 0.6592 – accuracy: 0.7433 – val_loss: 0.9307 – val_accuracy: 0.6700

Visualize the results

Create plots of the loss and accuracy on the training and validation sets:


def plot_history(history): """ Plotting training and validation learning curves. Args: history: model history with all the metric measures """ fig, (ax1, ax2) = plt.subplots(2) fig.set_size_inches(18.5, 10.5) # Plot loss ax1.set_title('Loss') ax1.plot(history.history['loss'], label = 'train') ax1.plot(history.history['val_loss'], label = 'test') ax1.set_ylabel('Loss') # Determine upper bound of y-axis max_loss = max(history.history['loss'] + history.history['val_loss']) ax1.set_ylim([0, np.ceil(max_loss)]) ax1.set_xlabel('Epoch') ax1.legend(['Train', 'Validation']) # Plot accuracy ax2.set_title('Accuracy') ax2.plot(history.history['accuracy'], label = 'train') ax2.plot(history.history['val_accuracy'], label = 'test') ax2.set_ylabel('Accuracy') ax2.set_ylim([0, 1]) ax2.set_xlabel('Epoch') ax2.legend(['Train', 'Validation']) plt.show() plot_history(history)

Convolutional Neural Networks - Deep Learning basics with Python, TensorFlow and Keras p.3
Convolutional Neural Networks – Deep Learning basics with Python, TensorFlow and Keras p.3

Monitoring the model’s performance

Using the history object, the training losses and accuracies can be obtained.

<br /> import pandas as pd metrics_df = pd.DataFrame(history.history)<br />

You can plot them in order to see the learning curves. Let’s start by comparing the training and validation loss.

<br /> metrics_df[[&#8220;loss&#8221;,&#8221;val_loss&#8221;]].plot();<br />

Next, on to the training and validation accuracy.

<br /> metrics_df[[&#8220;binary_accuracy&#8221;,&#8221;val_binary_accuracy&#8221;]].plot();<br />

This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images. Because this tutorial uses the Keras Sequential API, creating and training your model will take just a few lines of code.

Import TensorFlow


import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt

2023-10-27 06:01:15.153603: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-27 06:01:15.153656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-27 06:01:15.155401: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Download and prepare the CIFAR10 dataset

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.


(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 [==============================] – 2s 0us/step

Verify the data

To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:


class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show()

Create the convolutional base

The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.

As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure your CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument

input_shape

to your first layer.


model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Let’s display the architecture of your model so far:


model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 ================================================================= Total params: 56320 (220.00 KB) Trainable params: 56320 (220.00 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

Add Dense layers on top

To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.


model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10))

Here’s the complete architecture of your model:


model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 10) 650 ================================================================= Total params: 122570 (478.79 KB) Trainable params: 122570 (478.79 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.

Compile and train the model


model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

Epoch 1/10 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1563/1563 [==============================] – 10s 5ms/step – loss: 1.5211 – accuracy: 0.4429 – val_loss: 1.2497 – val_accuracy: 0.5531 Epoch 2/10 1563/1563 [==============================] – 6s 4ms/step – loss: 1.1408 – accuracy: 0.5974 – val_loss: 1.1474 – val_accuracy: 0.6023 Epoch 3/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.9862 – accuracy: 0.6538 – val_loss: 0.9759 – val_accuracy: 0.6582 Epoch 4/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8929 – accuracy: 0.6879 – val_loss: 0.9412 – val_accuracy: 0.6702 Epoch 5/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8183 – accuracy: 0.7131 – val_loss: 0.8830 – val_accuracy: 0.6967 Epoch 6/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7588 – accuracy: 0.7334 – val_loss: 0.8671 – val_accuracy: 0.7039 Epoch 7/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7126 – accuracy: 0.7518 – val_loss: 0.8972 – val_accuracy: 0.6897 Epoch 8/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6655 – accuracy: 0.7661 – val_loss: 0.8412 – val_accuracy: 0.7111 Epoch 9/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6205 – accuracy: 0.7851 – val_loss: 0.8581 – val_accuracy: 0.7109 Epoch 10/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.5872 – accuracy: 0.7937 – val_loss: 0.8817 – val_accuracy: 0.7113

Evaluate the model


plt.plot(history.history['accuracy'], label='accuracy') plt.plot(history.history['val_accuracy'], label = 'val_accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.ylim([0.5, 1]) plt.legend(loc='lower right') test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

313/313 – 1s – loss: 0.8817 – accuracy: 0.7113 – 655ms/epoch – 2ms/step


print(test_acc)

0.7113000154495239

Your simple CNN has achieved a test accuracy of over 70%. Not bad for a few lines of code! For another CNN style, check out the TensorFlow 2 quickstart for experts example that uses the Keras subclassing API and

tf.GradientTape

.

Course

Convolutional Neural Networks (CNN) with TensorFlow Tutorial

Imagine being in a zoo trying to recognize if a given animal is a cheetah or a leopard. As a human, your brain can effortlessly analyze body and facial features to come to a valid conclusion. In the same way, Convolutional Neural Networks (CNNs) can be trained to perform the same recognition task, no matter how complex the patterns are. This makes them powerful in the field of computer vision.

This conceptual CNN tutorial will start by providing an overview of what CNNs are and their importance in machine learning. Then it will walk you through a step-by-step implementation of CNN in TensorFlow Framework 2.

Convolutional Neural Networks | CNN With TensorFlow | CNN Tutorial for Beginners | CNN | Simplilearn
Convolutional Neural Networks | CNN With TensorFlow | CNN Tutorial for Beginners | CNN | Simplilearn

Create the model

The following 3D convolutional neural network model is based off the paper A Closer Look at Spatiotemporal Convolutions for Action Recognition by D. Tran et al. (2017). The paper compares several versions of 3D ResNets. Instead of operating on a single image with dimensions

(height, width)

, like standard ResNets, these operate on video volume

(time, height, width)

. The most obvious approach to this problem would be replace each 2D convolution (

layers.Conv2D

) with a 3D convolution (

layers.Conv3D

).

This tutorial uses a (2 + 1)D convolution with residual connections. The (2 + 1)D convolution allows for the decomposition of the spatial and temporal dimensions, therefore creating two separate steps. An advantage of this approach is that factorizing the convolutions into spatial and temporal dimensions saves parameters.

For each output location a 3D convolution combines all the vectors from a 3D patch of the volume to create one vector in the output volume.

This operation is takes

time * height * width * channels

inputs and produces

channels

outputs (assuming the number of input and output channels are the same. So a 3D convolution layer with a kernel size of

(3 x 3 x 3)

would need a weight-matrix with

27 * channels ** 2

entries. The reference paper found that a more effective & efficient approach was to factorize the convolution. Instead of a single 3D convolution to process the time and space dimensions, they proposed a “(2+1)D” convolution which processes the space and time dimensions separately. The figure below shows the factored spatial and temporal convolutions of a (2 + 1)D convolution.

The main advantage of this approach is that it reduces the number of parameters. In the (2 + 1)D convolution the spatial convolution takes in data of the shape

(1, width, height)

, while the temporal convolution takes in data of the shape

(time, 1, 1)

. For example, a (2 + 1)D convolution with kernel size

(3 x 3 x 3)

would need weight matrices of size

(9 * channels**2) + (3 * channels**2)

, less than half as many as the full 3D convolution. This tutorial implements (2 + 1)D ResNet18, where each convolution in the resnet is replaced by a (2+1)D convolution.


# Define the dimensions of one frame in the set of frames created HEIGHT = 224 WIDTH = 224


class Conv2Plus1D(keras.layers.Layer): def __init__(self, filters, kernel_size, padding): """ A sequence of convolutional layers that first apply the convolution operation over the spatial dimensions, and then the temporal dimension. """ super().__init__() self.seq = keras.Sequential([ # Spatial decomposition layers.Conv3D(filters=filters, kernel_size=(1, kernel_size[1], kernel_size[2]), padding=padding), # Temporal decomposition layers.Conv3D(filters=filters, kernel_size=(kernel_size[0], 1, 1), padding=padding) ]) def call(self, x): return self.seq(x)

A ResNet model is made from a sequence of residual blocks. A residual block has two branches. The main branch performs the calculation, but is difficult for gradients to flow through. The residual branch bypasses the main calculation and mostly just adds the input to the output of the main branch. Gradients flow easily through this branch. Therefore, an easy path from the loss function to any of the residual block’s main branch will be present. This avoids the vanishing gradient problem.

Create the main branch of the residual block with the following class. In contrast to the standard ResNet structure this uses the custom

Conv2Plus1D

layer instead of

layers.Conv2D

.


class ResidualMain(keras.layers.Layer): """ Residual block of the model with convolution, layer normalization, and the activation function, ReLU. """ def __init__(self, filters, kernel_size): super().__init__() self.seq = keras.Sequential([ Conv2Plus1D(filters=filters, kernel_size=kernel_size, padding='same'), layers.LayerNormalization(), layers.ReLU(), Conv2Plus1D(filters=filters, kernel_size=kernel_size, padding='same'), layers.LayerNormalization() ]) def call(self, x): return self.seq(x)

To add the residual branch to the main branch it needs to have the same size. The

Project

layer below deals with cases where the number of channels is changed on the branch. In particular, a sequence of densely-connected layer followed by normalization is added.


class Project(keras.layers.Layer): """ Project certain dimensions of the tensor as the data is passed through different sized filters and downsampled. """ def __init__(self, units): super().__init__() self.seq = keras.Sequential([ layers.Dense(units), layers.LayerNormalization() ]) def call(self, x): return self.seq(x)

Use

add_residual_block

to introduce a skip connection between the layers of the model.


def add_residual_block(input, filters, kernel_size): """ Add residual blocks to the model. If the last dimensions of the input data and filter size does not match, project it such that last dimension matches. """ out = ResidualMain(filters, kernel_size)(input) res = input # Using the Keras functional APIs, project the last dimension of the tensor to # match the new filter size if out.shape[-1] != input.shape[-1]: res = Project(out.shape[-1])(res) return layers.add([res, out])

Resizing the video is necessary to perform downsampling of the data. In particular, downsampling the video frames allow for the model to examine specific parts of frames to detect patterns that may be specific to a certain action. Through downsampling, non-essential information can be discarded. Moreoever, resizing the video will allow for dimensionality reduction and therefore faster processing through the model.


class ResizeVideo(keras.layers.Layer): def __init__(self, height, width): super().__init__() self.height = height self.width = width self.resizing_layer = layers.Resizing(self.height, self.width) def call(self, video): """ Use the einops library to resize the tensor. Args: video: Tensor representation of the video, in the form of a set of frames. Return: A downsampled size of the video according to the new height and width it should be resized to. """ # b stands for batch size, t stands for time, h stands for height, # w stands for width, and c stands for the number of channels. old_shape = einops.parse_shape(video, 'b t h w c') images = einops.rearrange(video, 'b t h w c -> (b t) h w c') images = self.resizing_layer(images) videos = einops.rearrange( images, '(b t) h w c -> b t h w c', t = old_shape['t']) return videos

Use the Keras functional API to build the residual network.


input_shape = (None, 10, HEIGHT, WIDTH, 3) input = layers.Input(shape=(input_shape[1:])) x = input x = Conv2Plus1D(filters=16, kernel_size=(3, 7, 7), padding='same')(x) x = layers.BatchNormalization()(x) x = layers.ReLU()(x) x = ResizeVideo(HEIGHT // 2, WIDTH // 2)(x) # Block 1 x = add_residual_block(x, 16, (3, 3, 3)) x = ResizeVideo(HEIGHT // 4, WIDTH // 4)(x) # Block 2 x = add_residual_block(x, 32, (3, 3, 3)) x = ResizeVideo(HEIGHT // 8, WIDTH // 8)(x) # Block 3 x = add_residual_block(x, 64, (3, 3, 3)) x = ResizeVideo(HEIGHT // 16, WIDTH // 16)(x) # Block 4 x = add_residual_block(x, 128, (3, 3, 3)) x = layers.GlobalAveragePooling3D()(x) x = layers.Flatten()(x) x = layers.Dense(10)(x) model = keras.Model(input, x)


frames, label = next(iter(train_ds)) model.build(frames)


# Visualize the model keras.utils.plot_model(model, expand_nested=True, dpi=60, show_shapes=True)

Master Generative AI for CV

Compile the Model

The next step is to compile the model, where we specify the optimizer type and loss function and any additional metrics we would like recorded during training. Here we specify

RMSProp

as the optimizer type for gradient descent, and we use a cross-entropy loss function which is the standard loss function for classification problems. We specifically use

categorical_crossentropy

since our labels are one-hot encoded. Finally, we specify

accuracy

as an additional metric to record during training. The value of the loss function is always recorded by default, but if you want accuracy, you need to specify it.

model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )

Train the Model

Since the dataset does not include a validation dataset, and since we did not previously split the training dataset to create a validation dataset, we will use the

validation_split

argument below so that 30% of the training dataset is automatically reserved for validation. In this case, this approach reserves the last 30% of the training dataset for validation. This is a very convenient approach, but if the training dataset has any specific ordering (say, ordered by classes), you will need to take steps to randomize the order before splitting.

history = model.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )

Epoch 1/31

2023-01-16 07:36:41.659504: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2023-01-16 07:36:42.049455: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

137/137 [==============================] – ETA: 0s – loss: 1.9926 – accuracy: 0.2704

2023-01-16 07:36:45.531645: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

137/137 [==============================] – 5s 28ms/step – loss: 1.9926 – accuracy: 0.2704 – val_loss: 1.7339 – val_accuracy: 0.3640 Epoch 2/31 137/137 [==============================] – 3s 25ms/step – loss: 1.5905 – accuracy: 0.4254 – val_loss: 1.4228 – val_accuracy: 0.4887 Epoch 3/31 137/137 [==============================] – 3s 25ms/step – loss: 1.3743 – accuracy: 0.5076 – val_loss: 1.2851 – val_accuracy: 0.5380 : : Epoch 29/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0569 – accuracy: 0.9814 – val_loss: 2.0920 – val_accuracy: 0.7137 Epoch 30/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0543 – accuracy: 0.9837 – val_loss: 2.1936 – val_accuracy: 0.7253 Epoch 31/31 137/137 [==============================] – 3s 25ms/step – loss: 0.0520 – accuracy: 0.9834 – val_loss: 2.1732 – val_accuracy: 0.7227

Plot the Training Results

The function below is a convenience function to plot training and validation losses and training and validation accuracies. It has a single required argument which is a list of metrics to plot.

def plot_results(metrics, title=None, ylabel=None, ylim=None, metric_name=None, color=None): fig, ax = plt.subplots(figsize=(15, 4)) if not (isinstance(metric_name, list) or isinstance(metric_name, tuple)): metrics = [metrics,] metric_name = [metric_name,] for idx, metric in enumerate(metrics): ax.plot(metric, color=color[idx]) plt.xlabel(“Epoch”) plt.ylabel(ylabel) plt.title(title) plt.xlim([0, TrainingConfig.EPOCHS-1]) plt.ylim(ylim) # Tailor x-axis tick marks ax.xaxis.set_major_locator(MultipleLocator(5)) ax.xaxis.set_major_formatter(FormatStrFormatter(‘%d’)) ax.xaxis.set_minor_locator(MultipleLocator(1)) plt.grid(True) plt.legend(metric_name) plt.show() plt.close()

The loss and accuracy metrics can be accessed from the

history

object returned from the fit method. We access the metrics using predefined dictionary keys, as shown below.

# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])

The results from our baseline model reveal that the model is overfitting. Notice that the validation loss increases after about ten epochs of training while the training loss continues to decline. This means that the network learns how to model the training data well but does not generalize to unseen test data well. The accuracy plot shows a similar trend where the validation accuracy levels off after about ten epochs while the training accuracy continues to approach 100% as training progresses. This is a common problem when training neural networks and can occur for a number of reasons. One reason is that the model can fit the nuances of the training dataset, especially when the training dataset is small.

Giới thiệu về mạng nơ ron tích chập (Machine Learning: Zero to Hero, phần 3)
Giới thiệu về mạng nơ ron tích chập (Machine Learning: Zero to Hero, phần 3)

TensorFlow CNN in Production with Run:AI

Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.

Here are some of the capabilities you gain when using Run:AI:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:AI GPU virtualization platform.

This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images. Because this tutorial uses the Keras Sequential API, creating and training your model will take just a few lines of code.

Import TensorFlow


import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt

2023-10-27 06:01:15.153603: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-27 06:01:15.153656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-27 06:01:15.155401: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Download and prepare the CIFAR10 dataset

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.


(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 [==============================] – 2s 0us/step

Verify the data

To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:


class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show()

Create the convolutional base

The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.

As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure your CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument

input_shape

to your first layer.


model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Let’s display the architecture of your model so far:


model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 ================================================================= Total params: 56320 (220.00 KB) Trainable params: 56320 (220.00 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

Add Dense layers on top

To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.


model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10))

Here’s the complete architecture of your model:


model.summary()

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0 D) conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0 g2D) conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 10) 650 ================================================================= Total params: 122570 (478.79 KB) Trainable params: 122570 (478.79 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________

The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.

Compile and train the model


model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

Epoch 1/10 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 1563/1563 [==============================] – 10s 5ms/step – loss: 1.5211 – accuracy: 0.4429 – val_loss: 1.2497 – val_accuracy: 0.5531 Epoch 2/10 1563/1563 [==============================] – 6s 4ms/step – loss: 1.1408 – accuracy: 0.5974 – val_loss: 1.1474 – val_accuracy: 0.6023 Epoch 3/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.9862 – accuracy: 0.6538 – val_loss: 0.9759 – val_accuracy: 0.6582 Epoch 4/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8929 – accuracy: 0.6879 – val_loss: 0.9412 – val_accuracy: 0.6702 Epoch 5/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.8183 – accuracy: 0.7131 – val_loss: 0.8830 – val_accuracy: 0.6967 Epoch 6/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7588 – accuracy: 0.7334 – val_loss: 0.8671 – val_accuracy: 0.7039 Epoch 7/10 1563/1563 [==============================] – 6s 4ms/step – loss: 0.7126 – accuracy: 0.7518 – val_loss: 0.8972 – val_accuracy: 0.6897 Epoch 8/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6655 – accuracy: 0.7661 – val_loss: 0.8412 – val_accuracy: 0.7111 Epoch 9/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.6205 – accuracy: 0.7851 – val_loss: 0.8581 – val_accuracy: 0.7109 Epoch 10/10 1563/1563 [==============================] – 7s 4ms/step – loss: 0.5872 – accuracy: 0.7937 – val_loss: 0.8817 – val_accuracy: 0.7113

Evaluate the model


plt.plot(history.history['accuracy'], label='accuracy') plt.plot(history.history['val_accuracy'], label = 'val_accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.ylim([0.5, 1]) plt.legend(loc='lower right') test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

313/313 – 1s – loss: 0.8817 – accuracy: 0.7113 – 655ms/epoch – 2ms/step


print(test_acc)

0.7113000154495239

Your simple CNN has achieved a test accuracy of over 70%. Not bad for a few lines of code! For another CNN style, check out the TensorFlow 2 quickstart for experts example that uses the Keras subclassing API and

tf.GradientTape

.

Image classification using Convolutional Neural Networks (CNN) has revolutionized computer vision tasks by enabling automated and accurate recognition of objects within images. CNN-based image classification algorithms have gained immense popularity due to their ability to learn and extract intricate features from raw image data automatically. This article will explore the principles, techniques, and applications of image classification using CNNs. We will delve into the architecture, training process, and CNN image classification evaluation metrics. By understanding the workings of CNNs for image classification, we can unlock many possibilities for object recognition, scene understanding, and visual data analysis.

This article was published as a part of the Data Science Blogathon.

Image classification using CNN involves the extraction of features from the image to observe some patterns in the dataset. Using an ANN for the purpose of image classification would end up being very costly in terms of computation since the trainable parameters become extremely large.

For example, if we have a 50 X 50 image of a cat, and we want to train our traditional ANN on that image to classify it into a dog or a cat the trainable parameters become –(50*50) * 100 image pixels multiplied by hidden layer + 100 bias + 2 * 100 output neurons + 2 bias = 2,50,302

We use filters when using CNNs. Filters exist of many different types according to their purpose.

Filters help us exploit the spatial locality of a particular image by enforcing a local connectivity pattern between neurons.

Convolution basically means a pointwise multiplication of two functions to produce a third function. Here one function is our image pixels matrix and another is our filter. We slide the filter over the image and get the dot product of the two matrices. The resulting matrix is called an “Activation Map” or “Feature Map”.

Image classification involves assigning labels or classes to input images. It is a supervised learning task where a model is trained on labeled image data to predict the class of unseen images. CNN are commonly used for image classification as they can learn hierarchical features like edges, textures, and shapes, enabling accurate object recognition in images. CNNs excel in this task because they can automatically extract meaningful spatial features from images. Here are different layers involved in the process:

The input layer of a CNN takes in the raw image data as input. The images are typically represented as matrices of pixel values. The dimensions of the input layer correspond to the size of the input images (e.g., height, width, and color channels).

Convolutional layers are responsible for feature extraction. They consist of filters (also known as kernels) that are convolved with the input images to capture relevant patterns and features. These layers learn to detect edges, textures, shapes, and other important visual elements.

Pooling layers reduce the spatial dimensions of the feature maps produced by the convolutional layers. They perform downsampling operations (e.g., max pooling) to retain the most salient information while discarding unnecessary details. This helps in achieving translation invariance and reducing computational complexity.

The output of the last pooling layer is flattened and connected to one or more fully connected layers. These layers function as traditional neural network layers and classify the extracted features. The fully connected layers learn complex relationships between features and output class probabilities or predictions.

The output layer represents the final layer of the CNN. It consists of neurons equal to the number of distinct classes in the classification task. The output layer provides each class’s classification probabilities or predictions, indicating the likelihood of the input image belonging to a particular class.

I will be working on Google Colab and I have connected the dataset through Google Drive, so the code provided by me should work if the same setup is being used. Remember to make appropriate changes according to your setup.

Choose a dataset of your interest or you can also create your own image dataset for solving your own image classification problem. An easy place to choose a dataset is on kaggle.com.

The dataset I’m going with can be found here.

This dataset contains 12,500 augmented images of blood cells (JPEG) with accompanying cell type labels (CSV). There are approximately 3,000 images for each of 4 different cell types grouped into 4 different folders (according to cell type). The cell types are Eosinophil, Lymphocyte, Monocyte, and Neutrophil.

Here are all the libraries that we would require and the code for importing them:


from keras.models import Sequential import tensorflow as tf import tensorflow_datasets as tfds tf.enable_eager_execution() from keras.layers.core import Dense, Activation, Dropout, Flatten from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.optimizers import SGD, RMSprop, adam from keras.utils import np_utils from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier from sklearn import metricsfrom sklearn.utils import shuffle from sklearn.model_selection import train_test_splitimport matplotlib.image as mpimg import matplotlib.pyplot as plt import numpy as np import os import cv2 import randomfrom numpy import * from PIL import Image import theano

Preparing our dataset for training will involve assigning paths and creating categories(labels), resizing our images.

Resizing images into 200 X 200


path_test = "/content/drive/My Drive/semester 5 - ai ml/datasetHomeAssign/TRAIN" CATEGORIES = ["EOSINOPHIL", "LYMPHOCYTE", "MONOCYTE", "NEUTROPHIL"] print(img_array.shape)IMG_SIZE =200 new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))

Training is an array that will contain image pixel values and the index at which the image in the CATEGORIES list.


training = []def createTrainingData(): for category in CATEGORIES: path = os.path.join(path_test, category) class_num = CATEGORIES.index(category) for img in os.listdir(path): img_array = cv2.imread(os.path.join(path,img)) new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) training.append([new_array, class_num])createTrainingData()


random.shuffle(training)

This shape of both the lists will be used in Classification using the NEURAL NETWORKS.


X =[] y =[]for features, label in training: X.append(features) y.append(label) X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)


X = X.astype('float32') X /= 255 from keras.utils import np_utils Y = np_utils.to_categorical(y, 4) print(Y[100]) print(shape(Y))


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 4)


batch_size = 16 nb_classes =4 nb_epochs = 5 img_rows, img_columns = 200, 200 img_channel = 3 nb_filters = 32 nb_pool = 2 nb_conv = 3 model = tf.keras.Sequential([ tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu, input_shape=(200, 200, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Dropout(0.5), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation=tf.nn.relu), tf.keras.layers.Dense(4, activation=tf.nn.softmax) ]) model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy']) model.fit(X_train, y_train, batch_size = batch_size, epochs = nb_epochs, verbose = 1, validation_data = (X_test, y_test))


score = model.evaluate(X_test, y_test, verbose = 0 ) print("Test Score: ", score[0]) print("Test accuracy: ", score[1])

In these 9 simple steps, you would be ready to train your own Convolutional Neural Networks model and solve real-world problems using these skills. You can practice these skills on platforms like Analytics Vidhya and Kaggle. You can also play around by changing different parameters and discovering how you would get the best accuracy and score. Try changing the batch_size, the number of epochs or even adding/removing layers in the CNN model, and have fun!

Also Read: 20 Questions to Test your CNN Skills

In conclusion, image classification using CNN has revolutionized the field of computer vision, enabling accurate recognition of objects within images. With its ability to automatically learn and extract complex features, CNNs have become a powerful tool for various applications. To further enhance your understanding and skills in image classification using CNN and other advanced data science techniques, consider enrolling in our Blackbelt Program. This comprehensive program offers in-depth knowledge and practical experience, empowering you to become a proficient data scientist. Ready to take the next step? Explore the possibilities of our Blackbelt Program today!

A. To use CNN for image classification, you need to define the architecture of the CNN, preprocess the input images, train the model on labeled data, and assess its performance on test images. Afterward, the trained CNN can classify new images based on the learned features.

A. CNN classifier for image classification is a CNN-based model specifically designed to classify images into different predefined classes. It learns to extract relevant features from input images and map them to the corresponding classes, enabling accurate image classification.

A. CNN in image captioning refers to using Convolutional Neural Networks as a component in the image captioning pipeline. CNNs are employed to extract visual features from input images, combined with text-based models to generate descriptive captions for the images.

A. You can train a CNN-based model on a dataset of noisy and corresponding clean images to denoise an image using CNN. The model learns to map the noisy images to their corresponding denoised versions. Once trained, the CNN can denoise new images by passing them through the network and obtaining the reconstructed clean images.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Hello. Thank you so much for the step to step guide to implement this. I have a question to ask. I hope to hear from you soon. I can not understand how was ‘img_array’ initialized in your work? What does img_array contain and how did you do that?

“print(img_array.shape)” when is try to run this line in colab I get error like “print(img_array.shape)” please help me to rectify this

Hello! I was just wondering, in Step 2: Prepare Dataset for Training, where does the img_array variable come from? Is this just the path_test?

please kindly correct your codes!

Review on code structure would make this post better, imports and declared variables are everywhere, reduces clarity.

Write, captivate, and earn accolades and rewards for your work

Đăng nhập/Đăng ký
Ranking
Cộng đồng
|
Kiến thức
19 tháng 03, 2022
Admin
23:49 19/03/2022
Convolutional Neural Network – Tự Học TensorFlow
Cùng tác giả
Không có dữ liệu
0
0
0
Admin
2995 người theo dõi
1283
184
Có liên quan
Không có dữ liệu
Chia sẻ kiến thức – Kết nối tương lai
Về chúng tôi
Về chúng tôi
Giới thiệu
Chính sách bảo mật
Điều khoản dịch vụ
Học miễn phí
Học miễn phí
Khóa học
Luyện tập
Cộng đồng
Cộng đồng
Kiến thức
Tin tức
Hỏi đáp
CÔNG TY CỔ PHẦN CÔNG NGHỆ GIÁO DỤC VÀ DỊCH VỤ BRONTOBYTE
The Manor Central Park, đường Nguyễn Xiển, phường Đại Kim, quận Hoàng Mai, TP. Hà Nội
THÔNG TIN LIÊN HỆ
[email protected]
©2024 TEK4.VN
Copyright © 2024
TEK4.VN

Conclusion

This article has covered a complete overview of CNNs in TensorFlow, providing details about each layer of the CNNs architecture. Also, it made a brief introduction to TensorFlow and how it helps machine learning engineers and researchers build sophisticated neural networks.

We applied all these skill sets to a real-world scenario related to a multiclass classification task.

Our beginner’s guide to object detection could be a great next step to further your learning about computer vision. It explores the key components in object detection and explains how to implement in SSD and Faster RCNN available in Tensorflow.

Python Courses

Course

Intermediate Python

Course

Introduction to Deep Learning in Python

Navigating the World of MLOps Certifications

Multilayer Perceptrons in Machine Learning: A Comprehensive Guide

An End-to-End ML Model Monitoring Workflow with NannyML in Python

Bex Tuychiev

15 min

Convolutional Neural Networks (CNN) have been used in state-of-the-art computer vision tasks such as face detection and self-driving cars. In this article, let’s take a look at the concepts required to understand CNNs in TensorFlow. Later you will also dive into some TensorFlow CNN examples.

Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)
Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)

There are 4 modules in this course

In the first course in this specialization, you had an introduction to TensorFlow, and how, with its high level APIs you could do basic image classification, and you learned a little bit about Convolutional Neural Networks (ConvNets). In this course you’ll go deeper into using ConvNets will real-world data, and learn about techniques that you can use to improve your ConvNet performance, particularly when doing image classification!In Week 1, this week, you’ll get started by looking at a much larger dataset than you’ve been using thus far: The Cats and Dogs dataset which had been a Kaggle Challenge in image classification!

What’s included

8 videos8 readings1 quiz1 programming assignment

You’ve heard the term overfitting a number of times to this point. Overfitting is simply the concept of being over specialized in training — namely that your model is very good at classifying what it is trained for, but not so good at classifying things that it hasn’t seen. In order to generalize your model more effectively, you will of course need a greater breadth of samples to train it on. That’s not always possible, but a nice potential shortcut to this is Image Augmentation, where you tweak the training set to potentially increase the diversity of subjects it covers. You’ll learn all about that this week!

What’s included

7 videos7 readings1 quiz1 programming assignment

Building models for yourself is great, and can be very powerful. But, as you’ve seen, you can be limited by the data you have on hand. Not everybody has access to massive datasets or the compute power that’s needed to train them effectively. Transfer learning can help solve this — where people with models trained on large datasets train them, so that you can either use them directly, or, you can use the features that they have learned and apply them to your scenario. This is Transfer learning, and you’ll look into that this week!

What’s included

7 videos5 readings1 quiz1 programming assignment

You’ve come a long way, Congratulations! One more thing to do before we move off of ConvNets to the next module, and that’s to go beyond binary classification. Each of the examples you’ve done so far involved classifying one thing or another — horse or human, cat or dog. When moving beyond binary into Categorical classification there are some coding considerations you need to take into account. You’ll look at them this week!

What’s included

6 videos8 readings1 quiz1 programming assignment

Instructor

Offered by

Recommended if you’re interested in Machine Learning

CNN Step-by-Step Implementation

Let’s put everything we have learned previously into practice. This section will illustrate the end-to-end implementation of a convolutional neural network in TensorFlow applied to the CIFAR-10 dataset, which is a built-in dataset with the following properties:

  • It contains 60.000 32 by 32 color images
  • The dataset has 10 different classes
  • Each class has 6000 images
  • There are overall 50.000 training images
  • And overall 10.000 testing images

The source code of the article is available on DataCamp’s workspace

Architecture of the network

Before getting into the technical implementation, let’s first understand the overall architecture of the network being implemented.

  • The input of the model is a 32x32x3 tensor, respectively, for the width, height, and channels.
  • We will have two convolutional layers. The first layer applies 32 filters of size 3×3 each and a ReLU activation function. And the second one applies 64 filters of size 3×3
  • The first pooling layer will apply a 2×2 max pooling
  • The second pooling layer will apply a 2×2 max pooling as well
  • The fully connected layer will have 128 units and a ReLU activation function
  • Finally, the output will be 10 units corresponding to the 10 classes, and the activation function is a softmax to generate the probability distributions.

Load dataset

The built-in dataset is loaded from the keras.datasets() as follows:


(train_images, train_labels), (test_images, test_labels) = cf10.load_data()

Exploratory Data Analysis

In this section, we will focus solely on showing some sample images since we already know the proportion of each class in both the training and testing data.

The helper function show_images() shows a total of 12 images by default and takes three main parameters:

  • The training images
  • The class names
  • And the training labels.


import matplotlib.pyplot as plt def show_images(train_images, class_names, train_labels, nb_samples = 12, nb_row = 4): plt.figure(figsize=(12, 12)) for i in range(nb_samples): plt.subplot(nb_row, nb_row, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i][0]]) plt.show()

Now, we can call the function with the required parameters.


class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] show_images(train_images, class_names, train_labels)

A successful execution of the previous code generates the images below.

Data preprocessing

Prior to training the model, we need to normalize the pixel values of the data in the same range (e.g. 0 to 1). This is a common preprocessing step when dealing with images to ensure scale invariance, and faster convergence during the training.


max_pixel_value = 255 train_images = train_images / max_pixel_value test_images = test_images / max_pixel_value

Also, we notice that the labels are represented in a categorical format like cat, horse, bird, and so one. We need to convert them into a numerical format so that they can be easily processed by the neural network.


from tensorflow.keras.utils import to_categorical train_labels = to_categorical(train_labels, len(class_names)) test_labels = to_categorical(test_labels, len(class_names))

Model architecture implementation

The next step is to implement the architecture of the network based on the previous description.

First, we define the model using the Sequential() class, and each layer is added to the model with the add() function.


from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Variables INPUT_SHAPE = (32, 32, 3) FILTER1_SIZE = 32 FILTER2_SIZE = 64 FILTER_SHAPE = (3, 3) POOL_SHAPE = (2, 2) FULLY_CONNECT_NUM = 128 NUM_CLASSES = len(class_names) # Model architecture implementation model = Sequential() model.add(Conv2D(FILTER1_SIZE, FILTER_SHAPE, activation='relu', input_shape=INPUT_SHAPE)) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Conv2D(FILTER2_SIZE, FILTER_SHAPE, activation='relu')) model.add(MaxPooling2D(POOL_SHAPE)) model.add(Flatten()) model.add(Dense(FULLY_CONNECT_NUM, activation='relu')) model.add(Dense(NUM_CLASSES, activation='softmax'))

After applying the summary() function to the model, we a comprehensive summary of the model’s architecture with information about each layer, its type, output shape and the total number of trainable parameters.

Model training

All the resources are finally available to configure and trigger the training of the model. This is done respectively with the compile() and fit() functions which takes the following parameters:

  • The Optimizer is responsible for updating the model’s weights and biases. In our case, we are using the Adam optimizer.
  • The loss function is used to measure the misclassification errors, and we are using the Crosentropy().
  • Finally, the metrics is used to measure the performance of the model, and accuracy, precision, and recall will be displayed in our use case.


from tensorflow.keras.metrics import Precision, Recall BATCH_SIZE = 32 EPOCHS = 30 METRICS = metrics=['accuracy', Precision(name='precision'), Recall(name='recall')] model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = METRICS) # Train the model training_history = model.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_images, test_labels))

Model evaluation

After the model training, we can compare its performance on both the training and testing datasets by plotting the above metrics using the show_performance_curve() helper function in two dimensions.

  • The horizontal axis (x) is the number of epochs
  • The vertical one (y) is the underlying performance of the model.
  • The curve represents the value of the metrics at a specific epoch.

For better visualization, a vertical red line is drawn through the intersection of the training and validation performance values along with the optimal value.


def show_performance_curve(training_result, metric, metric_label): train_perf = training_result.history[str(metric)] validation_perf = training_result.history['val_'+str(metric)] intersection_idx = np.argwhere(np.isclose(train_perf, validation_perf, atol=1e-2)).flatten()[0] intersection_value = train_perf[intersection_idx] plt.plot(train_perf, label=metric_label) plt.plot(validation_perf, label = 'val_'+str(metric)) plt.axvline(x=intersection_idx, color='r', linestyle='--', label='Intersection') plt.annotate(f'Optimal Value: {intersection_value:.4f}', xy=(intersection_idx, intersection_value), xycoords='data', fontsize=10, color='green') plt.xlabel('Epoch') plt.ylabel(metric_label) plt.legend(loc='lower right')

Then, the function is applied for both the accuracy and the precision of the model.


show_performance_curve(training_history, 'accuracy', 'accuracy')


show_performance_curve(training_history, 'precision', 'precision')

After training the model without any fine-tuning and pre-processing, we end up with:

  • An accuracy score of 67.09%, meaning that the model correctly classifies 67% of the samples out of every 100 samples.
  • And, a precision of 76.55%, meaning that out of each 100 positive predictions, almost 77 of them are true positives, and the remaining 23 are false positives.
  • These scores are achieved respectively at the third and second epochs for accuracy and precision.

These two metrics give a global understanding of the model behavior.

What if we want to know for each class which ones are the model good at predicting and those that the model struggles with?

This can be achieved from the confusion matrix, which shows for each class the number of correct and wrong predictions. The implementation is given below. We start by making predictions on the test data, then compute the confusion matrix and show the final result.


from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay test_predictions = model.predict(test_images) test_predicted_labels = np.argmax(test_predictions, axis=1) test_true_labels = np.argmax(test_labels, axis=1) cm = confusion_matrix(test_true_labels, test_predicted_labels) cmd = ConfusionMatrixDisplay(confusion_matrix=cm) cmd.plot(include_values=True, cmap='viridis', ax=None, xticks_rotation='horizontal') plt.show()

  • Classes 0, 1, 6, 7, 8, 9, respectively, for airplane, automobile, frog, horse, ship, and truck have the highest values at the diagonal. This means that the model is better at predicting those classes.
  • On the other hand, it seems to struggle with the remaining classes:
  • The classes with the highest off-diagonal values are those the model confuses the good classes with. For instance, it confuses birds (class 2) with an airplane, and automobile with trucks (class 9).

Learn more about confusion matrix from our tutorial understanding confusion matrix in R, which takes course material from DataCamp’s Machine Learning toolbox course.

This model can be improved with additional tasks such as:

  • Image augmentation
  • Transfer learning using pre-trained models such as ResNet, MobileNet, or VGG. Our Transfer learning tutorial explains what transfer learning is and some of its applications in real life.
  • Applying different regularization technics such as L1, L2 or dropout.
  • Fine-tuning different hyperparameters such as learning rate, the batch size, number of layers in the network.
TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial
TensorFlow 2.0 Complete Course – Python Neural Networks for Beginners Tutorial

What is TensorFlow CNN?

Convolutional Neural Networks (CNN), a key technique in deep learning for computer vision, are little-known to the wider public but are the driving force behind major innovations, from unlocking your phone with face recognition to safe driverless vehicles.

CNNs are used for a variety of tasks in computer vision, primarily image classification and object detection. The open source TensorFlow framework allows you to create highly flexible CNN architectures for computer vision tasks. In this article we explain the basics of CNN on TensorFlow and present a quick hands-on tutorial to get you started.

If you are interested in learning how to work with CNNs in PyTorch, which is another popular deep learning framework, see our guide to Pytorch CNN.

In this article, you will learn:

Saving and Loading Models

Saving and loading models are very convenient. This enables you to develop and train a model, save it to the file system and then load it at some future time for use. This section will cover the basic operations for saving and loading models.

Saving Models

You can easily save a model using the

save()

method which will save the model to the file system in the ‘SavedModel’ format. This method creates a folder on the file system. Within this folder, the model architecture and training configuration (including the optimizer, losses, and metrics) are stored in

saved_model.pb

. The

variables/

folder contains a standard training checkpoint file that includes the weights of the model. We will delve into these details in later modules. For now, let’s save the trained model, and then we’ll load it in the next code cell with a different name and continue using it in the remainder of the post.

# Using the save() method, the model will be saved to the file system in the ‘SavedModel’ format. model_dropout.save(‘model_dropout’)

INFO:tensorflow:Assets written to: CFIRAR_Classifier/assets

Loading Models

from tensorflow.keras import models reloaded_model_dropout = models.load_model(‘model_dropout’)

Xây dựng mạng nơ-ron nhân tạo từ con số 0 | Writing a Neural Network from scratch
Xây dựng mạng nơ-ron nhân tạo từ con số 0 | Writing a Neural Network from scratch

Setup

Begin by installing and importing some necessary libraries, including:
remotezip to inspect the contents of a ZIP file, tqdm to use a progress bar, OpenCV to process video files, einops for performing more complex tensor operations, and

tensorflow_docs

for embedding data in a Jupyter notebook.


pip install remotezip tqdm opencv-python einops


# Install TensorFlow 2.10


pip install tensorflow==2.10.0


import tqdm import random import pathlib import itertools import collections import cv2 import einops import numpy as np import remotezip as rz import seaborn as sns import matplotlib.pyplot as plt import tensorflow as tf import keras from keras import layers

2023-10-27 01:29:52.291653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.327803: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-10-27 01:29:52.949253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer.so.7’; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.949379: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer_plugin.so.7’; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/cv2/../../lib64: 2023-10-27 01:29:52.949390: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Model Evaluation

There are several things we can do to evaluate the trained model further. We can compute the model’s accuracy on the test dataset. We can visually inspect the results on a subset of the images in a dataset and plot the confusion matrix for a dataset. Let’s take a look at all three examples.

Evaluate the Model on the Test Dataset

We can now predict the results for all the test images, as shown in the code below. Here, we call the

predict()

method to retrieve all the predictions, and then we select a specific index from the test set and print out the predicted scores for each class. You can experiment with the code below by setting the test index to various values and see how the highest score is usually associated with the correct value indicated by the ground truth.

test_loss, test_acc = reloaded_model_dropout.evaluate(X_test, y_test) print(f”Test accuracy: {test_acc*100:.3f}”)

313/313 [==============================] – 3s 9ms/step – loss: 0.6736 – accuracy: 0.7833 Test accuracy: 78.330

Make Predictions on Sample Test Images

Here we create a convenience function that will allow us to evaluate the model on a subset of images from a dataset and display the results visually.

def evaluate_model(dataset, model): class_names = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’ ] num_rows = 3 num_cols = 6 # Retrieve a number of images from the dataset. data_batch = dataset[0:num_rows*num_cols] # Get predictions from model. predictions = model.predict(data_batch) plt.figure(figsize=(20, 8)) num_matches = 0 for idx in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, idx + 1) plt.axis(“off”) plt.imshow(data_batch[idx]) pred_idx = tf.argmax(predictions[idx]).numpy() truth_idx = np.nonzero(y_test[idx]) title = str(class_names[truth_idx[0][0]]) + ” : ” + str(class_names[pred_idx]) title_obj = plt.title(title, fontdict={‘fontsize’:13}) if pred_idx == truth_idx: num_matches += 1 plt.setp(title_obj, color=’g’) else: plt.setp(title_obj, color=’r’) acc = num_matches/(idx+1) print(“Prediction accuracy: “, int(100*acc)/100) return

evaluate_model(X_test, reloaded_model_dropout)

1/1 [==============================] – 0s 18ms/step Prediction accuracy: 0.77

Confusion Matrix

A confusion matrix is a very common metric that is used to summarize the results of a classification problem. The information is presented in the form of a table or matrix where one axis represents the ground truth labels for each class, and the other axis represents the predicted labels from the network. The entries in the table represent the number of instances from an experiment (which are sometimes represented as percentages rather than counts). Generating a confusion matrix in TensorFlow is accomplished by calling the

function tf.math.confusion_matrix()

, which takes two required arguments: the list of ground truth labels and the associated predicted labels.

# Generate predictions for the test dataset. predictions = reloaded_model_dropout.predict(X_test) # For each sample image in the test dataset, select the class label with the highest probability. predicted_labels = [np.argmax(i) for i in predictions]

313/313 [==============================] – 2s 6ms/step

# Convert one-hot encoded labels to integers. y_test_integer_labels = tf.argmax(y_test, axis=1) # Generate a confusion matrix for the test dataset. cm = tf.math.confusion_matrix(labels=y_test_integer_labels, predictions=predicted_labels) # Plot the confusion matrix as a heatmap. plt.figure(figsize=[14, 7]) import seaborn as sn sn.heatmap(cm, annot=True, fmt=’d’, annot_kws={“size”: 12}) plt.title(‘Confusion Matrix’) plt.xlabel(‘Predicted’) plt.ylabel(‘Truth’) plt.show()

A confusion matrix is a content-rich representation of a model’s performance at the class level. It can be very informative to better understand where the model performs well and where it may have more difficulty. For example, a few things stand out right away. Two of the ten classes tend to be misclassified more than others: Dogs and Cat. More specifically, a large percentage of the time, the model confuses these two classes with each other. Let’s take a closer look. The ground truth label for a cat is 3, and the ground truth label for a dog is 5. Notice that when the input image is a cat (index 3), it is often most misclassified as a dog, with 176 misclassified samples. When the input image is a dog (index 5), the most misclassified examples are cats, with 117 samples.

Also, notice that the last row, which represents trucks, is most often confused with automobiles. So all of these observations make intuitive sense, given the similarity of the classes involved.

Giới thiệu về công nghệ học máy (Machine Learning: Zero to Hero, phần 1)
Giới thiệu về công nghệ học máy (Machine Learning: Zero to Hero, phần 1)

CNN Model Implementation in Keras

In this section, we will define a simple CNN model in Keras and train it on the CIRFAR-10 dataset. Recall from a previous post the following steps required to define and train a model in Keras.

  1. Build/Define a network model using predefined layers in Keras.
  2. Compile the model with

    model.compile()
  3. Train the model with

    model.fit()

Model Structure

Before we get into the coding details, let’s first take a look at the general structure of the model we’re proposing. Notice that the model has a similar structure to VGG-16 but has fewer layers and a much smaller input image size, and therefore far fewer trainable parameters. The model contains three convolutional blocks followed by a fully connected layer and an output layer. For reference, we’ve included the number of channels at key points in the architecture. We have also indicated the spatial size of the activation maps at the end of each convolutional block. This is a good visual to refer back to when studying the code below.

For convenience, we’re going to define the model in a function. Notice that the function has one optional argument: the input shape for the model. We first start by instantiating the model by calling the

sequential()

method. This allows us to build a model sequentially by adding one layer at a time. Notice that we define three convolutional blocks and that their structure is very similar.

Define the Convolutional Blocks for the CNN

Let’s start with the very first convolutional layer in the first convolutional block. To define a convolutional layer in Keras, we call the

Conv2D()

function, which takes several input arguments. First, we defined the layer to have 32 filters. The kernel size for each filter is 3 (which is interpreted as 3×3). We use a padding option called

same

, which will pad the input tensor so that the output of the convolution operation has the same spatial size as the input. This is not required, but it’s commonly used. if you don’t explicitly specify this padding option, then the default behavior has no padding, and therefore, the spatial size of output from the convolutional layer will be slightly smaller than the input size. We use a

ReLU

activation function in all the layers in the Network except for the output layer.

For the very first convolutional layer, we need to specify the shape of the input, but for all subsequent layers, this is not necessary since the shape of the input is automatically computed based on the shape of the output from previous layers, so we have two convolutional layers with 32 filters each, and then we follow that with a max pooling layer that has a window size of (2×2), so the output shape from this first convolution block is (16×16 x32). Next, we have the second convolutional block, which is nearly identical to the first, with the exception that we have 64 filters in each convolutional layer instead of 32, and then finally, the third convolutional block is an exact copy of the second convolutional block.

Note

The number of filters in each convolutional layer is something that you will need to experiment with. A larger number of filters allows the model to have a greater learning capacity, but this also needs to be balanced with the amount of data available to train the model. Adding too many filters (or layers) can lead to overfitting, one of the most common issues encountered when training models.

def cnn_model(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dense(10, activation=’softmax’)) return model

Define the Classifier for the CNN

Before we define the fully connected layers for the classifier, we need to first flatten the two-dimensional activation maps that are produced by the last convolutional layer (which have a spatial shape of 4×4 with 64 channels). This is accomplished by calling the

flatten()

function to create a 1-dimensional vector of length 1024. We then add a densely connected layer with 512 neurons and a fully connected output layer with ten neurons because we have ten classes in our dataset. And to avoid any confusion, we’ve also provided a detailed diagram of the fully connected layers.

Create the Model

We can now create an instance of the model by calling the function above and use the

summary()

method to display the model summary to the console.

# Create the model. model = cnn_model() model.summary()

Metal device set to: Apple M1 Max Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) conv2d_2 (Conv2D) (None, 16, 16, 64) 18496 conv2d_3 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) conv2d_4 (Conv2D) (None, 8, 8, 64) 36928 conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_2 (MaxPooling (None, 4, 4, 64) 0 2D) flatten (Flatten) (None, 1024) 0 dense (Dense) (None, 512) 524800 dense_1 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Non-trainable params: 0 _________________________________________________________________

Final remarks

In this article, you have learned CNNs from their intuition to their applications in the real world. You have also seen that you can use existing architectures to hasten your model development process. Specifically, you have covered:

  • what convolutional neural networks are
  • how convolutional neural networks work
  • using pre-trained convolutional neural networks to run image classification
  • building convolutional neural networks from scratch using Keras and TensorFlow
  • how to plot the learning curves of your neural network
  • preventing overfitting using DropOut regularization and batch normalization
  • saving your best model using the model checkpoint callback
  • how to stop the training process of your CNN when it stops improving
  • how you can save and load the model again

…just to mention a few.

And that’s not the end of it, you can explore all the examples used in this article on this Google Colab Notebook. Feel free to play with the parameters of the models to see how they affect the performance of the model.

Convolutional Neural Networks Explained (CNN Visualized)
Convolutional Neural Networks Explained (CNN Visualized)

Dataset and Training Configuration Parameters

Before we describe the model implementation and training, we’re going to apply a little more structure to our training process by using the

dataclasses

module in python to create simple

DatasetConfig

and

TrainingConfig

classes to organize several data and training configuration parameters. This allows us to create data structures for configuration parameters, as shown below. The benefit of doing this is that we have a single place to go to make any desired changes.

@dataclass(frozen=True) class DatasetConfig: NUM_CLASSES: int = 10 IMG_HEIGHT: int = 32 IMG_WIDTH: int = 32 NUM_CHANNELS: int = 3 @dataclass(frozen=True) class TrainingConfig: EPOCHS: int = 31 BATCH_SIZE: int = 256 LEARNING_RATE: float = 0.001

Generates a tf.data.Dataset

The next step is to create a TensorFlow dataset from the images. That can be done using the `image_dataset_from_directory`. Since it will infer the classes from the folder, your data should be structured as shown below.

When using the function to generate the dataset, you will need to define the following parameters:

  • the path to the data
  • an optional seed for shuffling and transformations
  • the `image_size` is the size the images will be resized to after being loaded from the disk
  • since this is a binary classification problem the `label_mode` is binary
  • `batch_size=32` means that the images will be loaded in batches of 32

In the absence of a validation set, you can also define a `validation_split`. If it is set, the `subset` also needs to be passed. That is to indicate whether the split is a validation or training split. In this case, let’s use the testing set for validation.

<br /> training_set = tf.keras.preprocessing.image_dataset_from_directory( train_dir, seed=101, image_size=(200, 200), batch_size=32)<br />

By default, the classes will be represented using integers. You can see the representation by using `class_names` of the generated training set.

<br /> class_names = training_set.class_names<br />

In this case, cats will be represented by 0 and dogs by 1.This is based on the directory structure of the dataset. Since `class_names` isn’t specified, the alphanumerical order will be used.

Generate the validation split as well. The arguments are similar to the training set;

  • the directory containing the images
  • an optional seed
  • how to resize the images
  • the size of the batches

<br /> validation_set = tf.keras.preprocessing.image_dataset_from_directory( test_dir, seed=101, image_size=(200, 200), batch_size=32)<br />

Lecture 5 | Convolutional Neural Networks
Lecture 5 | Convolutional Neural Networks

Convolutional Neural Networks (CNN) in TensorFlow

Now that you understand how convolutional neural networks work, you can start building them using TensorFlow. However, you will first have to install TensorFlow. If you are working on a Google Colab environment, TensorFlow will already be installed.

How to install TensorFlow

TensorFlow can be installed via pip. Run the following command to install it.

<br /> pip install tensorflow<br />

Alternatively, you can run TensorFlow in a container.

<br /> docker pull tensorflow/tensorflow:latest # Download latest stable image docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server<br />

How to confirm TensorFlow is installed

After installation is complete via pip, you might want to check TensorFlow’s version or confirm its installation. If you manage to import TensorFlow without any errors, then it was installed successfully.

<br /> import tensorflow print(tensorflow.__version__)<br />

What are Keras and tf.keras?

As of TensorFlow 2.0, Keras has become the official high-level API for TensorFlow. It is an open-source package that has been integrated into TensorFlow in order to quicken the process of building deep learning models. It is accessible via `tf.keras`. That is what you will be using in this article.

Develop multilayer CNN models

Let’s now take a look at how you can build a convolutional neural network with Keras and TensorFlow. The CIFAR-10 dataset will be used. The dataset contains 60000 32×32 color images in 10 classes, with 6000 images per class.

Develop multilayer CNN models

Loading the dataset can be done directly by using Keras utilities. Other datasets that ship with TensorFlow can be loaded in a similar manner.

<br /> (X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()<br />

The dataset contains the following classes

<br /> &#8216;airplane&#8217;, &#8216;automobile&#8217;, &#8216;bird&#8217;, &#8216;cat&#8217;, &#8216;deer&#8217;, &#8216;dog&#8217;, &#8216;frog&#8217;, &#8216;horse&#8217;, &#8216;ship&#8217;, &#8216;truck&#8217;<br />

You can use Matplotlib to visualize one of the images. Let’s visualize the image at index 785.

<br /> import matplotlib.pyplot as plt image = X_train[785] plt.imshow(image) plt.show()<br />

That looks like a cat. You can confirm that from the `y_train`. 3 is the label for a cat.

Data preprocessing

The weights of a neural network are initialized to very small numbers. Therefore, scaling the images to be within the same range is important. In this case, let’s scale the values to be numbers between 0 and 1.

<br /> X_train = X_train / 255 X_test = X_test / 255<br />

Build the convolutional neural network

The next step is to define the convolutional neural network. Here is where the convolution, pooling, and flattening layers will be applied. The first layer is the `Conv2D`layer. It’s defined with the following parameters:

  • 32 output filters
  • a 3 by 3 feature detector
  • `same` padding to result in even padding for the input
  • input shape of `(32, 32, 3)` because the images are of size 32 by 32. 3 notifies the network that images are colored
  • the `relu` activation function so as to achieve non-linearity

The next layer is a max-pooling layer defined with the following parameters:

  • a `pool_size` of (2, 2) that defines the size of the pooling window
  • 2 strides that define the number of steps taken by the pooling window

Remember that you can design your network as you like. You just have to monitor the metrics and tweak the design and settle on the one that results in the best performance. In this case, another convolution and pooling layer is created. That is followed by the flatten layer whose results are passed to the dense layer. The final layer has 10 units because the dataset has 10 classes. Since it’s a multiclass problem, the Softmax activation function is applied.

<br /> model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=&#8221;relu&#8221;), tf.keras.layers.Dense(10, activation=&#8221;softmax&#8221;) ] )<br />

How to visualize a deep learning model

The quickest way to visualize your model is to use the model summary function.

<br /> model.summary()<br />

You can also use the Keras `plot_model` utility to plot the model.

<br /> tf.keras.utils.plot_model( model, to_file=&#8221;model.png&#8221;, show_shapes=True, show_layer_names=True, rankdir=&#8221;TB&#8221;, expand_nested=True, dpi=96, )<br />

How to reduce overfitting with Dropout

One of the common ways to improve the performance of deep learning models is to introduce dropout regularization. In this process, a specified percentage of connections are dropped during the training process. This forces the network to learn patterns from the data instead of memorizing the data. This is what reduces overfitting. In Keras, this can be achieved by introducing a Dropout layer in the network. Here is how the network would look like after applying the DropOut layer.

<br /> model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=&#8221;relu&#8221;), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=&#8221;softmax&#8221;) ] )<br />

Compiling the model

The next step is to compile the model. The Sparse Categorical Cross-Entropy loss is used because the labels are not one-hot encoded. In the event that you want to encode the labels, then you will have to use the Categorical Cross-Entropy loss function.

<br /> model.compile(optimizer=&#8217;adam&#8217;, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[&#8216;accuracy&#8217;])<br />

How to halt training at the right time with Early Stopping

Left to train for more epochs than needed, your model will most likely overfit on the training set. One of the ways to avoid that is to stop the training process when the model stops improving. This is done by monitoring the loss or the accuracy. In order to achieve that, the Keras EarlyStopping callback is used. By default, the callback monitors the validation loss. Patience is the number of epochs to wait before stopping the training process if there is no improvement in the model loss. This callback will be used at the training stage. The callbacks should be passed as a list, even if it’s just one callback.

<br /> from tensorflow.keras.callbacks import EarlyStopping callbacks = [ EarlyStopping(patience=2) ]<br />

How to save the best model automatically

You might also be interested in automatically saving the best model or model weights during training. That can be applied using a Keras ModelCheckpoint callback. The callback will save the best model after each epoch. You can instruct it to save the entire model or just the model weights. By default, it will save the models where the validation loss is minimum.

<br /> checkpoint_filepath = &#8216;/tmp/checkpoint&#8217; model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_filepath, save_weights_only=False, monitor=&#8217;loss&#8217;, mode=&#8217;min&#8217;, save_best_only=True)<br />

Your callbacks will now look like this.

<br /> callbacks = [ EarlyStopping(patience=2), model_checkpoint_callback, ]<br />

After training, you can load the model again using the Keras `load_model` utility.

<br /> another_saved_model = tf.keras.models.load_model(checkpoint_filepath)<br />

Training the model

Let’s now fit the data to the training set. The validation set is passed as well because the callback monitors the validation set. In this case, you can define many epochs but the training process will be stopped by the callback when the loss doesn’t improve after 2 epochs as declared in the EarlyStopping callback.

<br /> history = model.fit(X_train,y_train, epochs=600,validation_data=(X_test,y_test),callbacks=callbacks)<br />

How to plot model learning curves

Learning curves are important because they can inform you whether the model is learning or overfitting. If the validation loss increases significantly or the validation accuracy reduces sharply then your model is most likely overfitting. Since the model was saved into a history variable, you can use that to access the losses and accuracy and plot them. You can also store them in a Pandas DataFrame.

<br /> import pandas as pd metrics_df = pd.DataFrame(history.history)<br />

Let’s now look at how you would plot the training and validation loss.

<br /> metrics_df[[&#8220;loss&#8221;,&#8221;val_loss&#8221;]].plot();<br />

The same can be done for the training and validation accuracy.

<br /> metrics_df[[&#8220;accuracy&#8221;,&#8221;val_accuracy&#8221;]].plot();<br />

How to save and load your model

You might be interested in saving the model for later use. Saving the model is important so that you don’t have to train the model again. This is especially critical for image models that take a long period to train. The H5 format is a common format for saving Keras models.

<br /> model.save(“model.h5”)<br />

The Keras `load_model` is used for loading the model again.

<br /> load_saved_model = tf.keras.models.load_model(&#8220;model.h5&#8221;) load_saved_model.summary()<br />

How to accelerate training with Batch Normalization

The network you trained here was relatively small. However, in other cases, you might have to train a very deep neural network. Training such a network can be very slow. The training process can be hastened using Batch Normalization. It transforms the data ensuring that the mean output is closer to zero and the output standard deviation is close to 1. The mean and variance are computed using the current batch of inputs. Since Batch Normalization offers some form of regularization it is usually not used with DropOut. Here’s how the model would look like after adding the batch normalization layer.

<br /> model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;,input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), padding=&#8217;same&#8217;, activation=&#8221;relu&#8221;), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation=&#8221;relu&#8221;), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(10, activation=&#8221;softmax&#8221;) ]<br />

The working of the batch normalization layer is different during training and during prediction and evaluation. During training `trainable=True` while during prediction and evaluation it’s false. When training normalization is done using the mean and standard deviation of the current batch of inputs. At inference i.e prediction and evaluation, normalization is done using a moving average of the mean and the standard deviation of the batches seen during training. When using a pre-trained model that contains this layer, training for the batch normalization layer has to be set to false. Otherwise the mean and standard deviation will be disrupted and all the prior learning lost.

Adding Dropout to the Model

To help mitigate this problem, we can employ one or more regularization strategies to help the model generalize better. Regularization techniques help to restrict the model’s flexibility so that it doesn’t overfit the training data. One approach is called Dropout, which is built into Keras. Dropout is implemented in Keras as a special layer type that randomly drops a percentage of neurons during the training process. When dropout is used in convolutional layers, it is usually used after the max pooling layer and has the effect of eliminating a percentage of neurons in the feature maps. When used after a fully connected layer, a percentage of neurons in the fully connected layer are dropped.

In the diagram below, we add a

dropout

layer at the end of each convolutional block and also after the dense layer in the classifier. The input argument to the Dropout function is the fraction of neurons to (randomly) drop from the previous layer during the training process.

Define the Model (with Dropout)

def cnn_model_dropout(input_shape=(32, 32, 3)): model = Sequential() #———————————— # Conv Block 1: 32 Filters, MaxPool. #———————————— model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=input_shape)) model.add(Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 2: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Conv Block 3: 64 Filters, MaxPool. #———————————— model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) #———————————— # Flatten the convolutional features. #———————————— model.add(Flatten()) model.add(Dense(512, activation=’relu’)) model.add(Dropout(0.5)) model.add(Dense(10, activation=’softmax’)) return model

Create the Model (with Dropout)

# Create the model. model_dropout = cnn_model_dropout() model_dropout.summary()

Model: “sequential_1” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 32, 32, 32) 896 conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d_3 (MaxPooling (None, 16, 16, 32) 0 2D) dropout (Dropout) (None, 16, 16, 32) 0 conv2d_8 (Conv2D) (None, 16, 16, 64) 18496 conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 max_pooling2d_4 (MaxPooling (None, 8, 8, 64) 0 2D) dropout_1 (Dropout) (None, 8, 8, 64) 0 conv2d_10 (Conv2D) (None, 8, 8, 64) 36928 conv2d_11 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_5 (MaxPooling (None, 4, 4, 64) 0 2D) dropout_2 (Dropout) (None, 4, 4, 64) 0 flatten_1 (Flatten) (None, 1024) 0 dense_2 (Dense) (None, 512) 524800 dropout_3 (Dropout) (None, 512) 0 dense_3 (Dense) (None, 10) 5130 ================================================================= Total params: 669,354 Trainable params: 669,354 Non-trainable params: 0 _________________________________________________________________

Compile the Model (with Dropout)

model_dropout.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’], )

Train the Model (with Dropout)

history = model_dropout.fit(X_train, y_train, batch_size=TrainingConfig.BATCH_SIZE, epochs=TrainingConfig.EPOCHS, verbose=1, validation_split=.3, )

Epoch 1/31

2023-01-16 07:38:29.760435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

137/137 [==============================] – ETA: 0s – loss: 2.1302 – accuracy: 0.2181 137/137 [==============================] – 5s 31ms/step – loss: 2.1302 – accuracy: 0.2181 – val_loss: 1.9788 – val_accuracy: 0.2755 Epoch 2/31 137/137 [==============================] – 4s 27ms/step – loss: 1.7749 – accuracy: 0.3647 – val_loss: 1.8633 – val_accuracy: 0.3332 Epoch 3/31 137/137 [==============================] – 4s 27ms/step – loss: 1.5430 – accuracy: 0.4442 – val_loss: 1.5015 – val_accuracy: 0.4795 : : Epoch 29/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4626 – accuracy: 0.8359 – val_loss: 0.6721 – val_accuracy: 0.7832 Epoch 30/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4584 – accuracy: 0.8380 – val_loss: 0.6638 – val_accuracy: 0.7847 Epoch 31/31 137/137 [==============================] – 4s 27ms/step – loss: 0.4427 – accuracy: 0.8449 – val_loss: 0.6598 – val_accuracy: 0.7863

Plot the Training Results

# Retrieve training results. train_loss = history.history[“loss”] train_acc = history.history[“accuracy”] valid_loss = history.history[“val_loss”] valid_acc = history.history[“val_accuracy”] plot_results([ train_loss, valid_loss ], ylabel=”Loss”, ylim = [0.0, 5.0], metric_name=[“Training Loss”, “Validation Loss”], color=[“g”, “b”]); plot_results([ train_acc, valid_acc ], ylabel=”Accuracy”, ylim = [0.0, 1.0], metric_name=[“Training Accuracy”, “Validation Accuracy”], color=[“g”, “b”])

In the plots above, the training curves align very closely with the validation curves. Also, notice that we achieve a higher validation accuracy than the baseline model that did not contain dropout. Both sets of training plots are shown below for comparison.

Visualizing Convolutional Neural Networks | Layer by Layer
Visualizing Convolutional Neural Networks | Layer by Layer

What is a CNN?

A Convolutional Neural Network (CNN or ConvNet) is a deep learning algorithm specifically designed for any task where object recognition is crucial such as image classification, detection, and segmentation. Many real-life applications, such as self-driving cars, surveillance cameras, and more, use CNNs.

The importance of CNNs

These are several reasons why CNNs are important, as highlighted below:

  • Unlike traditional machine learning models like SVM and decision trees that require manual feature extractions, CNNs can perform automatic feature extraction at scale, making them efficient.
  • The convolutions layers make CNNs translation invariant, meaning they can recognize patterns from data and extract features regardless of their position, whether the image is rotated, scaled, or shifted.
  • Multiple pre-trained CNN models such as VGG-16, ResNet50, Inceptionv3, and EfficientNet are proved to have reached state-of-the-art results and can be fine-tuned on news tasks using a relatively small amount of data.
  • CNNs can also be used for non-image classification problems and are not limited to natural language processing, time series analysis, and speech recognition.

Architecture of a CNN

CNNs’ architecture tries to mimic the structure of neurons in the human visual system composed of multiple layers, where each one is responsible for detecting a specific feature in the data. As illustrated in the image below, the typical CNN is made of a combination of four main layers:

  • Convolutional layers
  • Rectified Linear Unit (ReLU for short)
  • Pooling layers
  • Fully connected layers

Let’s understand how each of these layers works using the following example of classification of the handwritten digit.

Convolution layers

This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.

In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.

Let’s consider this 32×32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.

Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3×3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.

Do we have to manually find these weights?

In real life, the weights of the kernels are determined during the training process of the neural network.

Using these two matrices, we can perform the convolution operation by taking applying the dot product, and work as follows:

  1. Apply the kernel matrix from the top-left corner to the right.
  2. Perform element-wise multiplication.
  3. Sum the values of the products.
  4. The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
  5. Move the kernel down with respect to the size of the sliding window.
  6. Repeat from step 1 to 5 until the image matrix is fully covered.

The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.

Another name associated with the kernel in the literature is feature detector because the weights can be fine-tuned to detect specific features in the input image.

For instance:

  • Averaging neighboring pixels kernel can be used to blur the input image.
  • Subtracting neighboring kernel is used to perform edge detection.

The more convolution layers the network has, the better the layer is at detecting more abstract features.

Activation function

A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.

Pooling layer

The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduces the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.

The most common aggregation functions that can be applied are:

  • Max pooling which is the maximum value of the feature map
  • Sum pooling corresponds to the sum of all the values of the feature map
  • Average pooling is the average of all the values.

Below is an illustration of each of the previous example:

Also, the dimension of the feature map becomes smaller as the polling function is applied.

The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.

Fully connected layers

These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.

Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.

Dropout

Dropout is a regularization technic applied to improve the generalization capability of the neural networks with a large number of parameters. It consists of randomly dropping some neurons during the training process, which forces the remaining neurons to learn new features from the input data.

Since the technical implementation will be performed using TensorFlow 2, the next section aims to provide a complete overview of different components of this framework to efficiently build deep learning models.

Learner reviews

Showing 3 of 7998

7,998 reviews

  • 5 stars

    79.12%

  • 4 stars

    15.57%

  • 3 stars

    3.51%

  • 2 stars

    1.01%

  • 1 star

    0.77%

Reviewed on Oct 6, 2020

Reviewed on Aug 1, 2020

Reviewed on May 15, 2019

Neural Networks explained in 60 seconds!
Neural Networks explained in 60 seconds!

Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow

This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.

Loading Data

First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:

# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)

CNN Architecture

We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:

  1. Layer with hav 32 filters
  2. Layer with 64 filters
  3. Layer with 128 filters

In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.

We’ll set basic hyperparameters of the CNN model:

training_iters = 10learning_rate = 0.001batch_size = 128

This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.

Neural Network Parameters

The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.

# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10

Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.

We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.

# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])

Wrapper Functions

Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.

Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.

def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)

Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.

def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)

Now let’s define weights and biases.

weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),

Building the CNN

Now we build the CNN by feeding the weights and biases into the wrapper functions.

def conv_net(x, weights, biases):

# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.

conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)

# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.

conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)

# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.

conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)

# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.

fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])

# In this last part, apply the Relu function and perform matrix multiplication on the weights

fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out

Loss and Optimizer Nodes

First build the model using the conv_net() function we showed above. Pass in the following:

x, weights, and biases. pred = conv_net(x, weights, biases)

This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Evaluate the Model

To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.

init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Now you can start the computation graph, and run a training session as follows:

  • Create For loops that define the number of training iterations as specified above
  • Create an inner For loop to specify the number of batches we specified above
  • Pass training images and labels using variables batch_x and batch_y
  • Define x and y placeholders to hold parameters the training images
  • After each training iteration, run the loss function and check training accuracy
  • After running through all the images, test accuracy by processing the 10,000 test images

See the original tutorial for the complete code that you can use to run the CNN model.

Model evaluation

You can also check the performance of the model on the validation set.

<br /> loss, accuracy = model.evaluate(validation_set) print(&#8216;Accuracy on test dataset:&#8217;, accuracy)<br />

Let’s now try the model on new images. The `image` module from Keras will be used to load the image.

<br /> import numpy as np from keras.preprocessing import image<br />

Download some images from the internet and store them in a temporary folder. The images used here are provided via the permissive creative commons license.

<br /> !wget &#8211;no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/c/c7/Tabby_cat_with_blue_eyes-3336579.jpg \ -O /tmp/cat.jpg<br />

Next, load the image while specifying the size used in training.

<br /> test_image = image.load_img(&#8216;/tmp/cat.jpg&#8217;, target_size=(200, 200))<br />

After this, convert it into an array since the model expects array inputs.

<br /> test_image = image.img_to_array(test_image)<br />

The next step is to expand the dimensions of the image in order to include the batch size. Let’s take a look at the shape of the image at the moment.

That needs to be amended to include a batch size of 1, because only one image is being used here. Expanding the dimensions is done using the `expand_dims` function from NumPy.

<br /> test_image = np.expand_dims(test_image, axis=0)<br />

If you check the shape again, you will see that it’s in the form required by the model.

OCR complete end to end project (Hand text detection and Recognition) using python (Deep learning)
OCR complete end to end project (Hand text detection and Recognition) using python (Deep learning)

Convolutional Neural Networks

Convolutional Neural networks are designed to process data through multiple layers of arrays. This type of neural networks is used in applications like image recognition or face recognition. The primary difference between CNN and any other ordinary neural network is that CNN takes input as a two-dimensional array and operates directly on the images rather than focusing on feature extraction which other neural networks focus on.

The dominant approach of CNN includes solutions for problems of recognition. Top companies like Google and Facebook have invested in research and development towards recognition projects to get activities done with greater speed.

A convolutional neural network uses three basic ideas −

  • Local respective fields
  • Convolution
  • Pooling

Let us understand these ideas in detail.

CNN utilizes spatial correlations that exist within the input data. Each concurrent layer of a neural network connects some input neurons. This specific region is called local receptive field. Local receptive field focusses on the hidden neurons. The hidden neurons process the input data inside the mentioned field not realizing the changes outside the specific boundary.

Following is a diagram representation of generating local respective fields −

If we observe the above representation, each connection learns a weight of the hidden neuron with an associated connection with movement from one layer to another. Here, individual neurons perform a shift from time to time. This process is called “convolution”.

The mapping of connections from the input layer to the hidden feature map is defined as “shared weights” and bias included is called “shared bias”.

CNN or convolutional neural networks use pooling layers, which are the layers, positioned immediately after CNN declaration. It takes the input from the user as a feature map that comes out of convolutional networks and prepares a condensed feature map. Pooling layers helps in creating layers with neurons of previous layers.

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don’t see the audit option:

  • The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

  • The course may offer ‘Full Course, No Certificate’ instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page – from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Implementing a CNN in TensorFlow & Keras

In this post, we’ll learn how to implement a Convolutional Neural Network (CNN) from scratch using Keras. Here, we show a CNN architecture similar to the structure of VGG-16 but with fewer layers. We will learn how to model this architecture and train it on a small dataset called CIFAR-10. We’ll also use this as an opportunity to introduce a new layer type called

Dropout

, which is often used in models to mitigate the effects of overfitting.

  • Load the CIFAR-10 Dataset
  • Dataset Preprocessing
  • Dataset and Training Configuration Parameters
  • CNN Model Implementation in Keras
  • Adding Dropout to the Model
  • Saving and Loading Models
  • Model Evaluation
  • Conclusion

import os import random import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten from tensorflow.keras.datasets import cifar10 from tensorflow.keras.utils import to_categorical from matplotlib.ticker import (MultipleLocator, FormatStrFormatter) from dataclasses import dataclass

SEED_VALUE = 42 # Fix seed to make training deterministic. random.seed(SEED_VALUE) np.random.seed(SEED_VALUE) tf.random.set_seed(SEED_VALUE)

MIT 6.S191: Convolutional Neural Networks
MIT 6.S191: Convolutional Neural Networks

Dataset Preprocessing

We normalize the image data to the range

[0,1]

. This is very common when working with image data which helps the model train more efficiently. We also convert the integer labels to one-hot encoded labels, as discussed in previous videos.

# Normalize images to the range [0, 1]. X_train = X_train.astype(“float32”) / 255 X_test = X_test.astype(“float32”) / 255 # Change the labels from integer to categorical data. print(‘Original (integer) label for the first training sample: ‘, y_train[0]) # Convert labels to one-hot encoding. y_train = to_categorical(y_train) y_test = to_categorical(y_test) print(‘After conversion to categorical one-hot encoded labels: ‘, y_train[0])

Architectures of CNNs

You don’t always have to design your convolutional neural networks from scratch. Other times one can try architectures developed by experts. These have proven to perform well on many image tasks. Some of these architectures are:

They can be accessed via Keras applications. These applications have also been pre-trained on the ImageNet dataset. The dataset contains over a million images. This makes these applications robust enough for use in the real world. When instantiating the model, you have the choice whether to include the pre-trained weights or not. When the weights are used, you can start using the model for classification right away. Other ways of using the pre-trained models are:

  • extracting features and passing them to a new model
  • fine-tuning a new model

Let’s take a look at how you can load the Xception architecture without weights. Since weights are not included, you can use your dataset to train the model.

<br /> model = tf.keras.applications.Xception( include_top=True, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=&#8221;softmax&#8221;, )<br />

When you load the model with weights, you can start using it for prediction right away. The weights are stored in this location `~/.keras/models/`.

<br /> model = tf.keras.applications.Xception( include_top=True, weights=&#8221;imagenet&#8221;, input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation=&#8221;softmax&#8221;, )<br />

After that, you can process the image and run the predictions. The Keras applications provide a function for doing that. Each of the architectures dictates the size of the image that should be passed to it. You should always confirm that from its documentation. Next, convert the image into an array and expand its dimensions in order to include the batch size.

<br /> from tensorflow.keras.preprocessing import image import numpy as np !wget &#8211;no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/b/b5/Lion_d%27Afrique.jpg \ -O /tmp/lion.jpg img_path = &#8216;/tmp/lion.jpg&#8217; img = image.load_img(img_path, target_size=(299, 299)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = tf.keras.applications.xception.preprocess_input(x) preds = model.predict(x) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print(&#8216;Predicted:&#8217;, tf.keras.applications.xception.decode_predictions(preds, top=3)[0])<br />

The final step is to decode the predictions and print the results.

Stride in Convolutional Neural Network (CNN)
Stride in Convolutional Neural Network (CNN)

Model definition

Let’s now create the convolutional neural network that will be used to classify the images. It will be similar to the previous one with a few cosmetic changes.

<br /> import tensorflow as tf from tensorflow import keras from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout from tensorflow.keras.preprocessing.image import ImageDataGenerator model = Sequential([ data_augmentation, tf.keras.layers.experimental.preprocessing.Rescaling(1./255), Conv2D(filters=32,kernel_size=(3,3), activation=&#8217;relu&#8217;), MaxPooling2D(pool_size=(2,2)), Conv2D(filters=32,kernel_size=(3,3), activation=&#8217;relu&#8217;), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Conv2D(filters=64,kernel_size=(3,3), activation=&#8217;relu&#8217;), MaxPooling2D(pool_size=(2,2)), Dropout(0.25), Flatten(), Dense(128, activation=&#8217;relu&#8217;), Dropout(0.25), Dense(1, activation=&#8217;sigmoid&#8217;) ])<br />

The notable changes are:

  • the application of the augmentation layer
  • using the `Scaling` layer to scale the images in the model definition

Tensors vs Matrices: Differences

Many people confuse tensors with matrices. Even though these two objects look similar, they have completely different properties. This section provides a better understanding of the difference between matrices and tensors.

  • We can think of a matrice as a tensor with only two dimensions.
  • Tensors, on the other hand, is a more general format that can have any number of dimensions.

As opposed to matrices, tensors are more suitable for deep learning problems for the following reasons:

  • They can deal with any number of dimensions, which makes them a better fit for multi-dimensional data.
  • Tensors’ ability to be compatible with a wide range of data types, shapes, and dimensions makes them more versatile than matrices.
  • Tensorflow provides GPU and TPU support to speed up computations. Using tensors, machine learning engineers can automatically take advantage of these benefits.
  • Tensors natively support broadcasting, which consists of making arithmetic operations between tensors of different shapes, which is not always possible when dealing with matrices.
10 Deep Learning Projects (Beginner & Advanced)
10 Deep Learning Projects (Beginner & Advanced)

Data augmentation

Data augmentation is usually applied in order to prevent overfitting. Augmenting the images increases the dataset as well as exposes the model to various aspects of the data. Augmentation can be achieved by applying random transformations such as flipping and rotating the images. Fortunately, Keras provides layers that can do just that.

<br /> data_augmentation = keras.Sequential( [ tf.keras.layers.experimental.preprocessing.RandomFlip(&#8220;horizontal&#8221;, input_shape=(200, 200, 3)), tf.keras.layers.experimental.preprocessing.RandomRotation(0.2), tf.keras.layers.experimental.preprocessing.RandomZoom(0.2), ] )<br />

Quick Tutorial: Building a Basic Convolutional Neural Network (CNN) in TensorFlow

This quick tutorial can help you get started implementing CNN in TensorFlow. It is based on the Fashion-MNIST dataset, containing 28 x 28 grayscale images of 65,000 fashion products in 10 categories. There are 55,000 images in the training set and 10,000 images in the test set. Our code is based on the full tutorial by Aditya Sharma.

Loading Data

First import all the necessary modules: NumPy, matplotlib and Tensorflow, then import the Fashion-MNIST data as follows:

# Use this for reading the data/fashion directory from the datasetdata = input_data.read_data_sets(‘data/fashion’,one_hot=True,\# Use this for retrieving Fashion-MNIST dataset from Amazon S3 bucket source_url=’http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/’)

CNN Architecture

We will use three convolutional layers, progressively adding more filters. All the filters are 3×3:

  1. Layer with hav 32 filters
  2. Layer with 64 filters
  3. Layer with 128 filters

In addition, we’ll have three max-pooling layers in between the convolutions, which are 2×2.

We’ll set basic hyperparameters of the CNN model:

training_iters = 10learning_rate = 0.001batch_size = 128

This batch size spec tells TensorFlow to train a specified number of images, and do this for every batch.

Neural Network Parameters

The number of inputs to the CNN is 784, because the images have 784 pixels and are read as a 784 dimensional vector. We will rebuild this vector into a matrix of 28 x 28 x 1.

# Use this to specify 28 inputs, and 10 classes for the predicted label at the endn_input = 28n_classes = 10

Here we define an input placeholder x with dimensionality None x 784, and output placeholder size of None x 10. Similarly, we’ll define a placeholder y for the label of the training images, which will be a None x 10 matrix.

We are setting the “row” to None because we previously defined batch_size, meaning placeholders receive the row size when the training set is loaded. Row size will be set to 128, like the batch_size.

# x is the input placeholder, rebuilding the image into 28x28x1 matrixx = tf.placeholder(“float”, [None, 28,28,1])# y is the label set, using the number of classesy = tf.placeholder(“float”, [None, n_classes])

Wrapper Functions

Because we have several layers of the same type in the model, it’s useful to create a wrapper function for each type of layer, to avoid duplicating code. You can get functions like this out of the box with Keras, which is included with Tensorflow. However, in this tutorial we show you how to do things from scratch in TensorFlow without Keras helper functions.

Here is a function creating a 2-dimensional convolutional layer, with bias and Relu activation. The arguments are the test images x, weights W, bias b, and number of strides, meaning how quickly the filter moves over the image during the convolution.

def conv2d(x, W, b, strides=1):x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)x = tf.nn.bias_add(x, b)return tf.nn.relu(x)

Here is another function creating a 2D max-pool layer. Here the parameters are test images x, and k, specifying the kernel/filter size.

def maxpool2d(x, k=2):return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=’SAME’)

Now let’s define weights and biases.

weights = {‘wc1’: tf.get_variable(‘W0′, shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),’wc2’: tf.get_variable(‘W1′, shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),’wc3’: tf.get_variable(‘W2′, shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),’wd1’: tf.get_variable(‘W3′, shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘W6’, shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),}biases = {‘bc1’: tf.get_variable(‘B0′, shape=(32), initializer=tf.contrib.layers.xavier_initializer()),’bc2’: tf.get_variable(‘B1′, shape=(64), initializer=tf.contrib.layers.xavier_initializer()),’bc3’: tf.get_variable(‘B2′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’bd1’: tf.get_variable(‘B3′, shape=(128), initializer=tf.contrib.layers.xavier_initializer()),’out’: tf.get_variable(‘B4’, shape=(10), initializer=tf.contrib.layers.xavier_initializer()),

Building the CNN

Now we build the CNN by feeding the weights and biases into the wrapper functions.

def conv_net(x, weights, biases):

# This constructs the first convolutional layer with 32 3×3 filters and 32 biases. The next specifies the max-pool layer with the kernel size set to 2.

conv1 = conv2d(x, weights[‘wc1’], biases[‘bc1’])conv1 = maxpool2d(conv1, k=2)

# Use this to construct the second convolutional layer with 64 3×3 filters and 64 biases, and to another max-pool layer.

conv2 = conv2d(conv1, weights[‘wc2’], biases[‘bc2’])conv2 = maxpool2d(conv2, k=2)

# This helps you construct the third convolutional layer with 128 3×3 filters and 128 biases, and add the last max-pool layer.

conv3 = conv2d(conv2, weights[‘wc3’], biases[‘bc3’])conv3 = maxpool2d(conv3, k=2)

# Now you need to build the fully connected layer that will generate prediction labels. To do this, use reshape() to adapt the output of pooling to the input expected by the fully connected layer.

fc1 = tf.reshape(conv3, [-1,weights[‘wd1’].get_shape().as_list()[0]])fc1 = tf.add(tf.matmul(fc1, weights[‘wd1’]), biases[‘bd1’])

# In this last part, apply the Relu function and perform matrix multiplication on the weights

fc1 = tf.nn.relu(fc1)out = tf.add(tf.matmul(fc1, weights[‘out’]), biases[‘out’])return out

Loss and Optimizer Nodes

First build the model using the conv_net() function we showed above. Pass in the following:

x, weights, and biases. pred = conv_net(x, weights, biases)

This is a multi-class classification problem, so we will use the softmax activation function, which gives a probability between 0 and 1 for each class label (the label with the highest probability will be the prediction of the model). We’ll use cross-entropy as the loss function.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

Finally, we’ll define the Adam optimizer with a learning rate of 0.001 as defined in the model hyperparameters above:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Evaluate the Model

To test the model, we first initialize weights and biases, and then define a correct_prediction and accuracy node that will evaluate model performance every time it is run.

init = tf.global_variables_initializer()correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Now you can start the computation graph, and run a training session as follows:

  • Create For loops that define the number of training iterations as specified above
  • Create an inner For loop to specify the number of batches we specified above
  • Pass training images and labels using variables batch_x and batch_y
  • Define x and y placeholders to hold parameters the training images
  • After each training iteration, run the loss function and check training accuracy
  • After running through all the images, test accuracy by processing the 10,000 test images

See the original tutorial for the complete code that you can use to run the CNN model.

Convolutional Neural Networks (CNNs) explained
Convolutional Neural Networks (CNNs) explained

Next steps

To learn more about working with video data in TensorFlow, check out the following tutorials:

  • TensorFlow Tutorial
  • TensorFlow – Home
  • TensorFlow – Introduction
  • TensorFlow – Installation
  • Understanding Artificial Intelligence
  • Mathematical Foundations
  • Machine Learning & Deep Learning
  • TensorFlow – Basics
  • Convolutional Neural Networks
  • Recurrent Neural Networks
  • TensorBoard Visualization
  • TensorFlow – Word Embedding
  • Single Layer Perceptron
  • TensorFlow – Linear Regression
  • TFLearn and its installation
  • CNN and RNN Difference
  • TensorFlow – Keras
  • TensorFlow – Distributed Computing
  • TensorFlow – Exporting
  • Multi-Layer Perceptron Learning
  • Hidden Layers of Perceptron
  • TensorFlow – Optimizers
  • TensorFlow – XOR Implementation
  • Gradient Descent Optimization
  • TensorFlow – Forming Graphs
  • Image Recognition using TensorFlow
  • Recommendations for Neural Network Training
  • TensorFlow Useful Resources
  • TensorFlow – Quick Guide
  • TensorFlow – Useful Resources
  • TensorFlow – Discussion

TensorFlow – Convolutional Neural Networks

After understanding machine-learning concepts, we can now shift our focus to deep learning concepts. Deep learning is a division of machine learning and is considered as a crucial step taken by researchers in recent decades. The examples of deep learning implementation include applications like image recognition and speech recognition.

Following are the two important types of deep neural networks −

  • Convolutional Neural Networks
  • Recurrent Neural Networks

In this chapter, we will focus on the CNN, Convolutional Neural Networks.

CNN on TensorFlow Concepts

Tensor

Tensors represent deep learning data. They are multidimensional arrays, used to store multiple dimensions of a dataset. Each dimension is called a feature. For example, a cube storing data across an X, Y, and Z access is represented as a 3-dimensional tensor. Tensors can store very high dimensionality, with hundreds of dimensions of features typically used in deep learning applications.

Computational graph

TensorFlow computational graphs represent the workflows that occur during deep learning model training. For a CNN model, the computational graph can be very complex. The image below demonstrates how a simple graph should look like. You can use TensorBoard, built into TensorFlow, to display the computational graph of your model.

Constant

In TensorFlow, a constant is used to store values that don’t change during the computation of the model. It is used for nodes that must remain the same during model training. A constant does not have parameters.

Placeholder

Placeholders are used to input training examples to your deep learning model. A placeholder can take parameters, and these parameters are changed at runtime as the model processes the training set.

Variable

Variables are used to add trainable nodes to the computation graph, such as weights and biases.

Related content: read our guide to deep convolutional neural networks.

Keras with TensorFlow Course - Python Deep Learning and Neural Networks for Beginners Tutorial
Keras with TensorFlow Course – Python Deep Learning and Neural Networks for Beginners Tutorial

Conclusion

In this post, we learned how to use TensorFlow and Keras to define and train a simple convolutional neural network. We showed that the model overfit the training data, and we learned how to use

dropout

layers to reduce the overfitting and improve the model’s performance on the validation dataset. We also covered how to save and load models to and from the file system. Finally, we reviewed three techniques used to evaluate the model on the test dataset.

This tutorial demonstrates training a 3D convolutional neural network (CNN) for video classification using the UCF101 action recognition dataset. A 3D CNN uses a three-dimensional filter to perform convolutions. The kernel is able to slide in three directions, whereas in a 2D CNN it can slide in two dimensions. The model is based on the work published in A Closer Look at Spatiotemporal Convolutions for Action Recognition by D. Tran et al. (2017). In this tutorial, you will:

  • Build an input pipeline
  • Build a 3D convolutional neural network model with residual connections using Keras functional API
  • Train the model
  • Evaluate and test the model

This video classification tutorial is the second part in a series of TensorFlow video tutorials. Here are the other three tutorials:

  • Load video data: This tutorial explains much of the code used in this document.
  • MoViNet for streaming action recognition: Get familiar with the MoViNet models that are available on TF Hub.
  • Transfer learning for video classification with MoViNet: This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset.

TensorFlow CNN in Production with Run:AI

Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in TensorFlow and other deep learning frameworks.

Here are some of the capabilities you gain when using Run:AI:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:AI GPU virtualization platform.

If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.

Convolutional Neural Networks in TensorFlow

This course is part of DeepLearning.AI TensorFlow Developer Professional Certificate

Taught in English

Some content may not be translated

142,698 already enrolled

Build a Deep CNN Image Classifier with ANY Images
Build a Deep CNN Image Classifier with ANY Images

TensorFlow: Constants, Variables, and Placeholders

Constants are not the only types of tensors. There are also variables and placeholders, which are all building blocks of a computational graph.

A computational graph is basically and a representation of a sequence of operations and the flow of data between them.

Now, let’s understand the difference between these types of tensors.

Constants

Constants are tensors whose values do not change during the execution of the computational graph. They are created using the tf.constant() function and are mainly used to store fixed parameters that do not require any change during the model training.

Variables

Variables are tensors whose value can be changed during the execution of the computational graph and they are created using the tf.Variable() function. For instance, in the case of neural networks, weights, and biases can be defined as variables since they need to be updated during the training process.

Placeholders

These were used in the first version of Tensorflow as empty containers that do not have specific values. They are just used to reverse a spot for data to be used in the future. This gives the users the freedom to use different datasets and batch sizes during model training and validation.

In Tensorflow version 2, placeholders have been replaced by the tf.function() function, which is a more Pythonic and dynamic approach to feeding data into the computational graph.

Training the model

The next step is to train the model. In this case, `y` is not passed in. That’s taken care of by the function used to generate the training set. Passing the validation data is critical so that the loss and accuracy can be accessed later and plotted. Let’s also reuse the callbacks that were defined in the last section.

<br /> history = model.fit(training_set,validation_data=validation_set, epochs=600,callbacks=callbacks)<br />

Image classification using CNN (CIFAR10 dataset) | Deep Learning Tutorial 24 (Tensorflow & Python)
Image classification using CNN (CIFAR10 dataset) | Deep Learning Tutorial 24 (Tensorflow & Python)

How to easily run CNN with Tensorflow in cnvrg.io

Now, with cnvrg.io you can run this pipeline without configuring the different platforms which makes it much faster and easier to run. Using cnvrg.io, you can easily track training progress and serve the model as a REST endpoint. First, you can spin up a VS Code workspace inside cnvrg.io to build our training script from the notebook code. You can use the exact code and ensure that the model is saved at the end of the training.

Run your code as an experiment

Next, you can launch this training script as an experiment. cnvrg.io will provision resources to execute the script and monitor the performance automatically. Resource and training metrics are automatically visualized along with the logs, and all files that were written to disk during the experiment are saved as artifacts in cnvrg.io’s object store.

Make predictions in a few clicks

Now that you have your model, you’ll need to create a “predict” function. cnvrg.io makes it easy, by automatically wrapping this function into a production-grade Flask application equipped with load balancing, autoscaling, monitoring. This file loads the model into memory and uses it in the predict function, which will format the incoming data and return a prediction.

Deploy your predictions to an endpoint

Next, you’ll want to create an endpoint that routes to that function. You could also specify compute resources and autoscaling configurations here too.

Track and monitor your endpoints

cnvrg.io automatically displays metrics such as the number of requests and latency for the endpoint. It also comes with Grafana and Kibana integrated for increased visibility into model

Finally, if you want to trigger retraining and deploying the model as part of a CI/CD pipeline, cnvrg.io provides Flows. The pipeline could programatically trigger the flow via cnvrg.io’s CLI or SDK.

You can test it out in cnvrg.io now by installing cnvrg.io CORE our free community MLOps platform on your Kuberentes here.

How do CNNs work?

Although they can be used for other tasks, CNNs are mostly used in tasks involving image data. Each image contains pixel data that can be represented in a numerical form. This numerical representation is what is passed to a CNN. As much as normal artificial neural networks can be used in processing image data, CNNs have proven to perform better, resulting in higher accuracy. Let’s now take a look at how CNNs work.

Convolution

Usually, you will not feed the entire image to a CNN. You will feed the features that are most important in classifying the image. The features are obtained through a process known as convolution. The convolution operation results in what is known as a feature map. It is also referred to as the convolved feature or an activation map. The feature map is obtained by applying a feature detector to the input image. The feature detector is also referred to as a kernel or a filter. The filter is usually a 3 by 3 matrix. However, other types of matrices can be used. The feature map is obtained through an element-wise multiplication of the filter with the matrix representation of the input image. The objective here is to reduce the size of the image being passed to the CNN while maintaining the important features. The filter slides step by step through each of the elements in the input image. These steps are known as strides and can be defined when creating the CNN. When building the CNN you will be able to define the number of filters you want for your network.

Once you obtain the feature map, the Rectified Linear unit is applied in order to prevent the operation from being linear. This is because working with images is not linear.

Pooling

Pooling results in what is known as a pooled feature map. Pooling ensures that the neural network is able to detect features in an image irrespective of their location in an image. This is what is known as spatial invariance. There are several types of pooling, for example, max-pooling average pooling, and min pooling. For instance, in max-pooling a 2 by 2 matrix is slid over the feature map while picking the largest value in a given box.

Pooling ensures that the main features of the image are maintained while reducing the size of the image further. This reduces the amount of information passed to the neural network and hence helps to reduce overfitting.

Flattening

The next step is to flatten the pooled feature map. This involves transforming the entire pooled feature map into a single column that can be passed to the fully connected layer.

Full connection

The flattened feature map is then passed to the input layer of the neural network. The result of that is passed to a fully connected layer. After that, the result of the entire process is emitted by the output layer. An activation function is usually applied depending on the type of classification problem. For binary classifications, the sigmoid activation function will be used whereas the softmax activation function is used for multiclass problems.

2D Convolution Neural Network Animation
2D Convolution Neural Network Animation

Load the CIFAR-10 Dataset

The CIFAR-10 dataset consists of 60,000 color images from 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Several sample images are shown below, along with the class names.

Since the CIFAR-10 dataset is included in TensorFlow, so we can load the dataset using the

load_data()

function as shown in the code cell below and confirm the number of samples and the shape of the data.

(X_train, y_train), (X_test, y_test) = cifar10.load_data() print(X_train.shape) print(X_test.shape)

Display Sample Images from the Dataset

It’s always a good idea to inspect some images in a dataset, as shown below. Remember, the images in CIFAR-10 are quite small, only 32×32 pixels, so while they don’t have a lot of detail, there’s still enough information in these images to support an image classification task.

plt.figure(figsize=(18, 9)) num_rows = 4 num_cols = 8 # plot each of the images in the batch and the associated ground truth labels. for i in range(num_rows*num_cols): ax = plt.subplot(num_rows, num_cols, i + 1) plt.imshow(X_train[i,:,:]) plt.axis(“off”)

Evaluate the model

Use Keras

Model.evaluate

to get the loss and accuracy on the test dataset.


model.evaluate(test_ds, return_dict=True)

13/13 [==============================] – 15s 1s/step – loss: 0.8879 – accuracy: 0.7000 {‘loss’: 0.8878847360610962, ‘accuracy’: 0.699999988079071}

To visualize model performance further, use a confusion matrix. The confusion matrix allows you to assess the performance of the classification model beyond accuracy. In order to build the confusion matrix for this multi-class classification problem, get the actual values in the test set and the predicted values.


def get_actual_predicted_labels(dataset): """ Create a list of actual ground truth values and the predictions from the model. Args: dataset: An iterable data structure, such as a TensorFlow Dataset, with features and labels. Return: Ground truth and predicted values for a particular dataset. """ actual = [labels for _, labels in dataset.unbatch()] predicted = model.predict(dataset) actual = tf.stack(actual, axis=0) predicted = tf.concat(predicted, axis=0) predicted = tf.argmax(predicted, axis=1) return actual, predicted


def plot_confusion_matrix(actual, predicted, labels, ds_type): cm = tf.math.confusion_matrix(actual, predicted) ax = sns.heatmap(cm, annot=True, fmt='g') sns.set(rc={'figure.figsize':(12, 12)}) sns.set(font_scale=1.4) ax.set_title('Confusion matrix of action recognition for ' + ds_type) ax.set_xlabel('Predicted Action') ax.set_ylabel('Actual Action') plt.xticks(rotation=90) plt.yticks(rotation=0) ax.xaxis.set_ticklabels(labels) ax.yaxis.set_ticklabels(labels)


fg = FrameGenerator(subset_paths['train'], n_frames, training=True) labels = list(fg.class_ids_for_name.keys())


actual, predicted = get_actual_predicted_labels(train_ds) plot_confusion_matrix(actual, predicted, labels, 'training')

38/38 [==============================] – 46s 1s/step


actual, predicted = get_actual_predicted_labels(test_ds) plot_confusion_matrix(actual, predicted, labels, 'test')

13/13 [==============================] – 15s 1s/step

The precision and recall values for each class can also be calculated using a confusion matrix.


def calculate_classification_metrics(y_actual, y_pred, labels): """ Calculate the precision and recall of a classification model using the ground truth and predicted values. Args: y_actual: Ground truth labels. y_pred: Predicted labels. labels: List of classification labels. Return: Precision and recall measures. """ cm = tf.math.confusion_matrix(y_actual, y_pred) tp = np.diag(cm) # Diagonal represents true positives precision = dict() recall = dict() for i in range(len(labels)): col = cm[:, i] fp = np.sum(col) - tp[i] # Sum of column minus true positive is false negative row = cm[i, :] fn = np.sum(row) - tp[i] # Sum of row minus true positive, is false negative precision[labels[i]] = tp[i] / (tp[i] + fp) # Precision recall[labels[i]] = tp[i] / (tp[i] + fn) # Recall return precision, recall


precision, recall = calculate_classification_metrics(actual, predicted, labels) # Test dataset


precision

{‘ApplyEyeMakeup’: 0.6666666666666666, ‘ApplyLipstick’: 0.5714285714285714, ‘Archery’: 0.6, ‘BabyCrawling’: 0.5, ‘BalanceBeam’: 0.5714285714285714, ‘BandMarching’: 1.0, ‘BaseballPitch’: 1.0, ‘Basketball’: 0.5, ‘BasketballDunk’: 0.8181818181818182, ‘BenchPress’: 0.8333333333333334}


recall

{‘ApplyEyeMakeup’: 0.6, ‘ApplyLipstick’: 0.4, ‘Archery’: 0.9, ‘BabyCrawling’: 0.8, ‘BalanceBeam’: 0.4, ‘BandMarching’: 0.8, ‘BaseballPitch’: 0.9, ‘Basketball’: 0.3, ‘BasketballDunk’: 0.9, ‘BenchPress’: 1.0}

TensorFlow Tutorial #02 Convolutional Neural Network
TensorFlow Tutorial #02 Convolutional Neural Network

Running CNNs with TensorFlow in the real world

Loading datasets from TensorFlow is quite straightforward. However, consider a situation where you have to load data from the real world. The process for doing so is a little different. In this section, let’s look at how you can use this dataset from Kaggle to build a convolutional neural network. The goal here will be to build a model that can classify images of cats and dogs. Once you have built this model, you can tweak it and repurpose it for other classification problems.

Loading the images

Let’s start by downloading the images into a temporary folder on the virtual machine provided by Google Colab. Using Colab, in this case, is advantageous because you can use GPU compute to speed the model training.

<br /> !wget &#8211;no-check-certificate \ https://namespace.co.ke/ml/dataset.zip \ -O /tmp/catsdogs.zip<br />

The next step will be to unzip this dataset.

<br /> import os import zipfile with zipfile.ZipFile(&#8216;/tmp/catsdogs.zip&#8217;, &#8216;r&#8217;) as zip_ref: zip_ref.extractall(&#8216;/tmp/cats_dogs&#8217;)<br />

After that set the paths to the training and testing set.

<br /> base_dir = &#8216;/tmp/cats_dogs/dataset&#8217; train_dir = os.path.join(base_dir, &#8216;training_set&#8217;) test_dir = os.path.join(base_dir, &#8216;test_set&#8217;)<br />

You can list the folders in order to see their arrangement.

<br /> import os os.listdir(base_dir)<br />

Making predictions

You can now use this image to run a prediction.

<br /> prediction = model.predict(test_image)<br />

When you print this you will see something similar to this.

<br /> prediction[0][0] 0.014393696<br />

The question is, how do you interpret this? Remember that the network output layer has just one unit and uses the sigmoid activation function. The output of this network is therefore a number between 0 and 1. That number represents the probability that the image belongs to class 1. Class 1 in this case is dogs. You can therefore set a threshold of say 50% to separate the two classes.

<br /> if prediction[0][0]&gt;0.5: print(&#8221; is a dog&#8221;) else: print(&#8221; is a cat&#8221;)<br />

Since the obtained probability is less than 0.5 then that image is definitely that of a cat.

You can repeat the same process with a dogs image. First, start by downloading the image.

<br /> !wget &#8211;no-check-certificate \ https://upload.wikimedia.org/wikipedia/commons/1/18/Dog_Breeds.jpg \ -O /tmp/dog.jpg<br />

After that, load it while converting it to the required size.

<br /> test_image2 = image.load_img(&#8216;/tmp/dog.jpg&#8217;, target_size=(200, 200))<br />

Next, expand the dimensions and run the prediction.

<br /> test_image2 = np.expand_dims(test_image2, axis=0) prediction = model.predict(test_image2)<br />

Use the same threshold to determine if it is the image of a cat or a dog.

<br /> if prediction[0][0]&gt;0.5: print(&#8221; is a dog&#8221;) else: print(&#8221; is a cat&#8221;)<br />

With an accuracy of 99%, the image is classified as a dog.

Introducing Convolutional Neural Networks | Convolutional Neural Networks with TensorFlow
Introducing Convolutional Neural Networks | Convolutional Neural Networks with TensorFlow

What is the TensorFlow Framework?

Google developed TensorFlow in November 2015. They define it to be an open-source machine learning framework for everyone for several reasons.

  • Open-source: released under the Apache 2.0 open-source license. This allows researchers, organizations, and developers to make their contribution to the library by building upon it without any restrictions.
  • Machine learning framework: meaning that it has a set of libraries and tools that support the building process of machine learning models.
  • For everyone: Using TensorFlow makes the implementation of machine learning models easier through common programming languages like Python. Furthermore, built-in libraries such as Keras make it even easier to create robust deep learning models.

All these functionalities make Tensorflow a good candidate for building neural networks.

Furthermore, installing Tensorflow 2 is straightforward and can be performed as follows using the Python package manager pip as explained in the official documentation.

After the installation, we can see that the version being used is the 2.9.1


import tensorflow as tf print("TensorFlow version:", tf.__version__)

Now, let’s further explore the main components for creating those networks.

Keywords searched by users: tensorflow convolutional neural network

Implementing A Cnn In Tensorflow & Keras
Implementing A Cnn In Tensorflow & Keras
Implementing A Cnn In Tensorflow & Keras
Implementing A Cnn In Tensorflow & Keras
Python Convolutional Neural Networks (Cnn) With Tensorflow Tutorial |  Datacamp
Python Convolutional Neural Networks (Cnn) With Tensorflow Tutorial | Datacamp
Cnn Image Classification In Tensorflow With Steps & Examples
Cnn Image Classification In Tensorflow With Steps & Examples
Python Tensorflow - Tf.Keras.Layers.Conv2D() Function - Geeksforgeeks
Python Tensorflow – Tf.Keras.Layers.Conv2D() Function – Geeksforgeeks
Convolutional Neural Network In Tensorflow - Knoldus Blogs
Convolutional Neural Network In Tensorflow – Knoldus Blogs
Github - Roomylee/Cnn-Relation-Extraction: Tensorflow Implementation Of Convolutional  Neural Network For Relation Extraction (Coling 2014, Naacl 2015)
Github – Roomylee/Cnn-Relation-Extraction: Tensorflow Implementation Of Convolutional Neural Network For Relation Extraction (Coling 2014, Naacl 2015)
Building A Convolutional Neural Network With Tensorflow - Scaler Topics
Building A Convolutional Neural Network With Tensorflow – Scaler Topics
What Is A Convolutional Neural Network? A Beginner'S Tutorial For Machine  Learning And Deep Learning
What Is A Convolutional Neural Network? A Beginner’S Tutorial For Machine Learning And Deep Learning
Tensorflow Cnn: Building Your First Cnn With Tensorflow
Tensorflow Cnn: Building Your First Cnn With Tensorflow
Tensorflow Tutorial #02 Convolutional Neural Network - Youtube
Tensorflow Tutorial #02 Convolutional Neural Network – Youtube
Convolutional Neural Networks In Tensorflow | Coursera
Convolutional Neural Networks In Tensorflow | Coursera
Convolutional Neural Networks - Deep Learning Basics With Python, Tensorflow  And Keras P.3 - Youtube
Convolutional Neural Networks – Deep Learning Basics With Python, Tensorflow And Keras P.3 – Youtube
Convolutional Neural Network | Learnopencv
Convolutional Neural Network | Learnopencv
Distributed Tensorflow And Classification Of Time Series Data Using Neural  Networks | Altoros
Distributed Tensorflow And Classification Of Time Series Data Using Neural Networks | Altoros

See more here: kientrucannam.vn

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *