Skip to content

Latest commit

 

History

History
283 lines (156 loc) · 7.3 KB

File metadata and controls

283 lines (156 loc) · 7.3 KB
marp
true

Convolutional Neural Networks


Learning objectives

  • Discover the general architecture of convolutional neural networks.
  • Understand why they perform better than plain neural networks for image-related tasks.

Architecture


Justification

The visual world has the following properties:

  • Translation invariance.
  • Locality: nearby pixels are more strongly correlated
  • Spatial hierarchy: complex and abstract concepts are composed from simple, local elements.

Classical models are not designed to detect local patterns in images.

Visual world


Topological structure

From edges to objects


General CNN design

General CNN architecture


The convolution operation

Apply a kernel to data. Result is called a feature map.

Convolution with a 3x3 filter of depth 1 applied on 5x5 data


Convolution example


Convolution parameters

  • Filter dimensions: 2D for images.
  • Filter size: generally 3x3 or 5x5.
  • Number of filters: determine the number of feature maps created by the convolution operation.
  • Stride: step for sliding the convolution window. Generally equal to 1.
  • Padding: blank rows/columns with all-zero values added on sides of the input feature map.

Preserving output dimensions with padding

Preserving output dimensions with padding


Valid padding

Output size = input size - kernel size + 1

Valid padding


Full padding

Output size = input size + kernel size - 1

Valid padding


Same padding

Output size = input size

Valid padding


Convolutions inputs and outputs

Convolution inputs and outputs


2D convolutions on 3D tensors

  • Convolution input data is 3-dimensional: images with height, width and color channels, or features maps produced by previous layers.
  • Each convolution filter is a collection of kernels with distinct weights, one for every input channel.
  • At each location, every input channel is convolved with the corresponding kernel. The results are summed to compute the (scalar) filter output for the location.
  • Sliding one filter over the input data produces a 2D output feature map.

2D convolution on a 32x32x3 image with 10 filters


2D convolution over RGB image


Activation function

  • Applied to the (scalar) convolution result.
  • Introduces non-linearity in the model.
  • Standard choice: ReLU.

The pooling operation

  • Reduces the dimensionality of feature maps.
  • Often done by selecting maximum values (max pooling).

Max pooling with 2x2 filter and stride of 2


Pooling result

Pooling result


Pooling output

Pooling with a 2x2 filter and stride of 2 on 10 32x32 feature maps


Training process

Same principle as a dense neural network: backpropagation + gradient descent.

For convolution layers, the learned parameters are the values of the different kernels.

Backpropagation In Convolutional Neural Networks


Interpretation

  • Convolution layers act as feature extractors.
  • Dense layers use the extracted features to classify data.

A convnet


Feature extraction with a CNN


Visualizing convnet layers on MNIST


History

Humble beginnings: LeNet5 (1988)

LeNet5


Bell Labs demo


The breakthrough: ILSVRC

ILSVRC results


AlexNet (2012)

Trained on 2 GPU for 5 to 6 days.

AlexNet


VGG (2014)

VGG16


GoogLeNet/Inception (2014)

  • 9 Inception modules, more than 100 layers.
  • Trained on several GPU for about a week.

Inception


Microsoft ResNet (2015)

  • 152 layers, trained on 8 GPU for 2 to 3 weeks.
  • Smaller error rate than a average human.

ResNet


Deeper model


Depth: challenges and solutions

  • Challenges

    • Computational complexity
    • Optimization difficulties
  • Solutions

    • Careful initialization
    • Sophisticated optimizers
    • Normalisation layers
    • Network design

Using a pretrained network


An efficient strategy

A pretrained convnet is a saved network that was previously trained on a large dataset (typically on a large-scale image classification task). If the training set was general enough, it can act as a generic model and its learned features can be useful for many problems.

It is an example of transfer learning.

There are two ways to use a pretrained model: feature extraction and fine-tuning.


Feature extraction

Reuse the convolution base of a pretrained model, and add a custom classifier trained from scratch on top ot if.

State-of-the-art models (VGG, ResNet, Inception...) are regularly published by top AI institutions.

Fine-tuning

Slightly adjusts the top feature extraction layers of the model being reused, in order to make it more relevant for the new context.

These top layers and the custom classification layers on top of them are jointly trained.


Fine-tuning