You downloaded an image from social media for your work but need to know the source. You want to search for the source of a picture. You go to Google and paste the image directly into the search bar. Within seconds, Google pops up similar images, original websites, and search results with the words recognized from the picture.
Advanced image recognition, analysis, and classification processes make this possible. Artificial intelligence has emerged as a bridge to improve these processes. Combined with machine learning, AI offers fantastic possibilities that can boost the application portfolio of image recognition.
In this article, we will discuss a critical process used for image recognition: convolutional neural networks (CNNs). We will cover the concept and use examples to show its application. For further learning, we will also recommend an AI ML program to help you grow your skills and knowledge in this field.
What is a CNN?
In 1988, Yann LeCun, the director of Facebook’s AI research group, developed the first convolutional neural network, or CNN. This network, named LeNet, could read zip codes, digits, and other character recognition tasks.
A CNN is a feed-forward neural network that processes data sourced from a grid-like topology to analyze images. Also called ConvNet, it can identify and categorize objects in an image.
Suppose you must recognize if the image contains a bird or something else. The first step is to input the image pixels in an array form to the first layer of the neural network. The successive layers extract the features using multiple scripts. The final fully connected layer then identifies the features in the image.
Fig: Convolutional Neural Network for extracting features of a bird from an image
Let’s see another example of a neural network for identifying two flowers—an orchid and a rose.
Each image is displayed as an array of pixel values and then subjected to the convolution operation.
Suppose a convolution operation uses two matrices of one dimension. The matrices are:
a = [5,3,7,5,9,7]
b = [1,2,3]
The multiplication of these arrays is element-wise. It begins with multiplying the first three elements of a with those of b. The multiplication results in [5, 6, 6]. This product is then summed, i.e. (5+6+6), resulting in 17. This is represented as a*b=[17].
This process is repeated for the following three elements: [3,2,5]. The product is [3,4,15], which is then summed to give a*b = [22].
Then, the following three elements, i.e., [2, 5, 9], are multiplied by those of b to get a product [2, 10, 27], which gives a*b = [39]. Such process proceeds till all the elements of the matrices are multiplied and summed.
The matrices are then used to assess and recognize images. Let’s see how.
How Does CNN Recognize Images?
Take the example of the two images below. The first image represents the backslash key, while the second represents the forward slash key. Further, the white box depicts a pixel value of 0, while the colored one depicts a pixel value 1.
If you punch the backslash key ‘\,’ the left image is processed. When you press the forward slash key, the right image is processed. So, in this case, an image is related to a keyboard strike.
Now, let us see how such boxes help recognize an image. Let’s say CNN is given a picture of a smiling face. It converts the boxes into squares, where black boxes represent the eyes, nose, and smile. These boxes are converted into a matrix represented by 1 and 0, as shown in the grid on the right.
The boxes representing the eyes, nose, and smile are highlighted using the number ‘1’. Thus, when CNN wants to recognize any other smiling image, it compares its 0-1 matrix to this 0-1 matrix to see if they match.
Convolutional Neural Network Layers
A convolution network is like an onion, with several layers that must be peeled to reveal information about an image. Here are the four critical layers.
#1. Convolution layer
The convolution layer is the first step in extracting data from images. In this layer, several filters perform the convolution. Each image is converted into a matrix of pixel values.
For instance, the 5×5 image below has a pixel value of 1 or 0. Further, you have a 3×3 filter matrix. The filter matrix is slid over the image to calculate the dot product and obtain the convoluted feature matrix.
#2. ReLU layer
The ReLu layer is short for the rectified linear unit layer. It is the next step after extracting feature maps from images.
In ReLU, an element-wise operation converts all negative pixels to 0. With such an introduction of non-linearity to the network, you get a rectified feature map as the output. Here’s an example of a ReLU function graph.
CNN then scans the original image using multiple convolutions and the ReLu layer and locates the required features.
#3. Pooling layer
Pooling refers to a down-sampling process for dimensionality reduction in the feature map. The rectified feature map obtained from the ReLU layer passes through the pooling layer and is converted into a pooled feature map.
For instance, various filters are used in the pooling layer to detect features such as corners, edges, eyes, body, beak, and feathers.
Here’s how the three steps appear when put together.
#4. Flattening
In this step, all the two-dimensional arrays generated from pooled feature maps are converted into a single long continuous linear vector, as shown in the image below.
This flattened matrix is the input for the fully connected layer that classifies the image.
Now let us summarize the steps and see how CNN recognizes a bird from an image/
- The pixels from the image of a bird are input into the convolutional layer.
- They undergo a convolution operation to form a convolved map.
- The ReLU function transforms the convolved map into a rectified feature map.
- Several convolutions and ReLU layers process the image to isolate the features.
- Multiple pooling layers use filters to locate characteristics in the image.
- The flattening operation flattens the generated pooled feature map and feeds into a fully connected layer for the final output.
CNN Use case implementation
Let us understand this better with an example of the CIFAR-10 data set sourced from the Canadian Institute For Advanced Research. We will use CNN to classify images across ten classifications.
#1. You begin by downloading the dataset from the official links in the image below.
#2. Then, you import the CIFAR data set using the steps in the following image.
#3. This step is crucial as you read the labels and must ensure that all are noticed.
#4. Further, you use matplotlib to display the images, as demonstrated in the following image.
#5. The next step is to employ the helper function to process data.
#6. Now, you generate the model using the following script.
#7. Then, you use the helper functions.
#8. You further generate convolution and pooling layers.
#9. Reshape the pooling layer to generate the flattened layer.
#10. The next step involves generating a fully connected layer using the following script.
#11. Then, you set the output to y_pred variable.
#12. Next, use the loss function shown in the image below.
#13. You now generate the optimizer.
#14. Then, generate a variable to initialize all the global variables.
#15. Finally, you run the model by forming a graph session as shown in the image below.
Such a process demonstrates the application of AI algorithms to process and accomplish image recognition. So, how do you build on this knowledge? You can do so by learning in depth about the concepts of AI ML. Let’s see how.
Solidify Your AI ML Knowledge for a Rewarding Career
The examples presented in this article represent AI and ML’s exceptional capabilities. To work on more complex problems, aspiring AI ML professionals must master the fundamentals, skills, and tools. This is where online AI and machine learning training can help you.
This program trains you in hands-on data analysis using a Jupyter-based lab environment, supervised learning, regression models, and classification algorithms. You will work on critical skills such as Z-test, T-test, and ANOVA. Finally, you will work on Capstone projects with industry leaders. Learn tools such as TensorFlow 2 and Keras and boost your career.
FAQs
- What is CNN and how does it work?
CNN, short for convolutional neural network (CNN), is a form of artificial neural network capable of pattern recognition. It is used for image processing and recognition. It uses a multi-layer convolutional process to analyze and interpret images as pixels.
- What are the layers of a CNN?
A convolutional neural network consists of four layers: convolutional, pooling, fully connected, and activation function. These layers work sequentially to isolate and identify features in images.
- Why is CNN used in image processing?
In image processing, CNN detects, extracts, and recognizes image features and patterns. It is also helpful for image segmentation, classification, and object detection.