Somehow, I came across Convolutional Neural Networks (CNNs) via YouTube, which sparked my curiosity about how they work. Programs using CNN, can analyze an image and confidently say, “This is a guinea pig.”
But how does that actually work?
A simplified image of the process would be as follows.

In this blog post, I will simplify the process significantly, focusing solely on identifying a white horizontal line in an image. A guinea pig would be way more complex.
To understand it properly, I turned to ChatGPT and other resources. When reading about CNNs(magic factory), you’ll encounter many terms like tensor, neuron, kernel, and filter. It can get complicated very quickly, so I decided to break it down into simpler terms for myself—and now, for you.
- The input layer
Let’s start with a simplified example. Imagine we have an image that is 4×4 pixels in size. This image is 2D (two-dimensional) and contains a horizontal line. I want to use pattern detection to identify that line.

Now, the process is that when you upload an image, it gets converted into a matrix. This happens because, in computer vision, images are represented as grids of pixel values. For grayscale images, each pixel corresponds to a single number (e.g., 000 for black, 255 for white).
This processed image is now called a tensor(just a matrix) in a CNN.
- The Convolutional layer(Pattern Detection Layer)
1. Dividing the Image into Smaller Sections
When the image enters the system, it is divided into smaller sections called patches.
2. Defining a Filter
Filters are typically 3×3 pixels in size, which allows them to focus on smaller parts of the image (patches). This size is for identifying detailed patterns such as edges, lines, or textures.
Filters are often pre-trained with data, meaning their weights are learned during training. However, in this example, we define the weights manually as follows:
- 1 represents detecting white.
- 0 means no detection.
- If the filter had -1, it could be used to detect black horizontal lines.
Example Filter:

3. Analyzing the Patches
- Each patch is analyzed by a neuron, a computational unit like a factory worker.
- Each neuron focuses on its assigned patch, ensuring the job is done systematiclly.
- The neuron takes the pixel values from its patch, applies the filter, and performs a calculation.
Overview

4. Neuron(worker) Output
“So how does this “filter” work?” The above image shows Out1, Out2, and so on, which are just symbolic. Let’s dig into the first patch of the image.

This calculation is based on the convolution concept that have been used for 150 years.
The sum of the patch and the filter amplifies what we are looking for—in this case, 255 (white).
Patch 1 (top-left), the calculated sum is
(0⋅0)+(0⋅0)+(0⋅0)+(255⋅1)+(255⋅1)+(255⋅1)+(255⋅0)+(255⋅0)+(255⋅0)=765
In this case all patches will sum up to 765, producing a combined map called feature map( the final result of the convolution ).

The number 765 itself doesn’t inherently mean “white horizontal line.” However, we assume that the model has been trained to associate the number 765 with this pattern.
Therefore, the presence of 765 in the feature map strongly indicates a white horizontal line.
- The Max Pooling layer(Flattening Layer)
The purpose of this layer is to apply a mathematical function that extracts the most significant value for detecting patterns, such as a white horizontal line. In our case, the maximum value detected is 765, which indicates a strong match with the desired feature.
This value, 765, will be represented as a vector: [765].
If we had another filter, such as one detecting edges, it might produce a value like 563. In that case, the final vector after flattening would be: [765, 563].
- Fully Connected Layer (Decision Layer)
The vector is passed to the fully connected layer, which uses the values (e.g., 765 for horizontal lines and 510 for edges) to classify the image.
Identifying a guinea pig in real-world scenarios typically requires hundreds to thousands of filters, distributed across different layers to detect both low-level details and high-level patterns.
- Final Flow Overview

- Performance and vectors
Why vectors? Using vectors is a design choice for the hardware acceleration capabilities of GPUs.
The combination of:
- High core count (thousands) using parallelism.
- Optimized hardware for matrix and vector operations.
- Massive memory bandwidth for handling large amounts of data.
Over&Out
Leave a comment