top of page

CONVOLUTIONAL NEURAL NETWORKS

Updated: 5 days ago

In this blogpost I talk to you about convolutional neural networks (CNNs), a specialized deep learning architecture used in computer vision.


When processing images, CNNs outperform feedforward networks (FNNs) because they overcome the curse of dimensionality.


FNNs assign an artificial neuron to each pixel in an image, and, because FNNs are densely interconnected, the number of neurons required to process a large image increases disproportionally (multidimensionally). This results in poorly trained FNNs and a taxing computational overhead—the curse of dimensionality (Figure 1).


Figure 1. Image processing by a feedforward network. Image adapted from the cover of the novel Tres tristes tigres, written by Guillermo Cabrera Infante. https://www.penguinlibros.com/es/literatura-contemporanea/261521-libro-tres-tristes-tigres-9788420451466
Figure 1. Image processing by a feedforward network. Image adapted from the cover of the novel Tres tristes tigres, written by Guillermo Cabrera Infante. https://www.penguinlibros.com/es/literatura-contemporanea/261521-libro-tres-tristes-tigres-9788420451466

Before I explain how CNNs process images efficiently, we must first review the concept of artificial neuron, which I explained in a previous blogpost (WHAT IS AN ARTIFICIAL NEURAL NETWORK?).


Artificial neurons were first described in the 1940s and later, in the 1950s, implemented in the perceptron—a one-neuron deep learning model that performs binary classification tasks (Figure 2).


Figure 2. Perceptron.
Figure 2. Perceptron.

When activated, the perceptron receives a collection of numeric inputs, each modified by a tunable weight (W). Once in the artificial neuron (perceptron), a bias constant (B) is added to the weighted sum of input values (W). Because the perceptron is a binary classifier, its modified weighted sum (W + B) is transformed by a Heavyside step function, which generates an yes/no output with 0/1 values (Figure 2). ⋅


The weight (W) and bias (B) constants, which control neuron activation strength in the perceptron (and in all other artificial neurons), are fine-tuned during backpropagation when the model is trained (WHAT IS AN ARTIFICIAL NEURAL NETWORK?).


Depending on the type of layer in the network, different activation functions are used (Figure 3).

Hidden layers:

  • Rectified linear unit (ReLU) — 0 to infinite

  • Hyperbolic tanh (tanh) — -1 to 1

Output layers:

  • Sigmoid function — 0 to 1

  • Softmax — 0 to 1


Figure 3. Activation functions used in deep learning networks.
Figure 3. Activation functions used in deep learning networks.


Below I explain CNN's architecture and the approach it uses to process images. I will walk you through six sections:

  1. Image input

  2. CNN architecture

  3. Convolving filter

  4. Activation map

  5. Pooling layer

  6. FNN classifier



  1. IMAGE INPUT

When processing an image, CNNs ingest normalized pixel values, which have been previously embedded in a multidimensional matrix, also called Tensor (Figure 4). In three-dimensional (3D) Tensors, which are the most common matrices used in deep learning, each dimension corresponds to a primary color channel: red (R), green (G), and blue (B) (Figure 4).


Figure 4. Image input in CNNs. Each channel is represented by a matrix with normalized pixel values
Figure 4. Image input in CNNs. Each channel is represented by a matrix with normalized pixel values


  1. CNN architecture

The CNN architecture has two modules: a body composed of multiple convolutional and pooling layers; and a head, which carries one or more FNNs (Figure 5).


Figure 5. CNN architecture.
Figure 5. CNN architecture.

Arranged in parallel, sequential blocks of convolutional and pooling layers extract features from an input image with an increasing level of detail.


The FNN in the head of a CNN processes the features extracted by the convolutional layers and uses this information to classify the input image.



  1. CONVOLVING FILTER

Convolutional layers use one or more convolving filters to extract features from the pixels in an input image (Figure 6). The region in an input image covered by a convolving filter is called receptive field.


Figure 6. Convolving filter.
Figure 6. Convolving filter.

Convolving filters are N-dimensional matrixes, populated with tunable weights, which are arranged in a 3x3 or 5x5 grid. Each dimension in the matrix is a kernel; and each kernel corresponds to one color channel (dimension) in the input matrix (Tensor) (Figure 7).


Figure 7. Kernels in a convolving filter.
Figure 7. Kernels in a convolving filter.

During feature extraction, the convolving filter “strides” across the input matrix, one or more grids (steps) at a time (Figure 8).


Figure 8. Convolving filter strides.
Figure 8. Convolving filter strides.

  1. ACTIVATION MAP

The activation map in a convolutional layer is an N-dimensional matrix with artificial neurons embedded in it. The neurons in the activation map capture the features extracted by the convolving filter.


Every stride the convolving filter makes results in a dot product of its weights and the pixel intensities in the receptive field. The outcome is a weighted sum of dot products (W) to which a bias constant is added (W + Bias) (Figure 9).


Each neuron in the activation map is activated when its signal intensity (W + Bias) is transformed with the ReLU function (Figure 9).


CNNs learn hierarchically by using multiple convolving filters per convolutional layer. Trained to identify a specific type of content in the input image (e.g., edges and shapes in an image), each convolving filter has a corresponding activation map. This is why each convolutional layer in a CNN has multiple activation maps.


Figure 9. Activation map in a CNN layer.
Figure 9. Activation map in a CNN layer.


  1. POOLING LAYER

To stabilize the neural network, pooling layers reduce the dimensionality of each convolutional layer. Max Pooling is the preferred approach in CNNs, in which the neurons with the highest activation are chosen as part of the downsized activation map (Figure 10).

Figure 10. Max Pooling. The activation map becomes the next convolutional layer in a CNN. Downsizing the output of an activation map stabilizes the CNN.
Figure 10. Max Pooling. The activation map becomes the next convolutional layer in a CNN. Downsizing the output of an activation map stabilizes the CNN.

  1. FNN CLASSIFIER

The convolutional layers in a CNN are multidimensional. Because FNNs ingest a one-dimensional (1D) vector, the output from the last convolutional layer in a CNN must be flattened (Figure 11).


Flattening a convolutional layer entails reshaping the multidimensional matrix (Tensor) so that its numeric values are concatenated along a 1D vector.


By abstracting the information it finds in the flattened input, the FNN refines the features previously extracted by the upstream convolutional layers. The deeper the hidden layers in the FNN, the more refined is the classification of the input image.


Figure 11. FNN classifier.
Figure 11. FNN classifier.


CNNs IN BIOINFORMATICS


In shotgun proteomics, CNNs are used to predict fragment ion masses and retention times. Relevant examples are MS2CNN for fragment ion prediction and DeepLC for retention time prediction.


In actionable genomics, CNNs are used to predict genomic variants in DNA-seq reads. An example is DeepVariant.



FURTHER READING AND WATCHING

There is obviously much more to say about CNNs.

Useful resources are the following:

LearnOpenCV


DeepLearningAI—C4W1L01 Computer Vision


MIT 6.S191: Convolutional Neural Networks


Stay tuned!

GPR


Disclosure: At BioTech Writing and Consulting we believe in the use AI in Data Science, but do not use AI to generate text or images. 

Comments


bottom of page