Table of Contents |
---|
Traditional Convolutional Neural Network Architectures
In 1990's Yann LeCun developed first application Convolutional Networks. His paper ''Gradient-based learning applied to document recognition'' is the documentation of first applied Convolutional Neural Network LeNet-5.
This paper is historically important for Convolutional Neural Networks.In his paper he states
''Multilayer Neural Networks trained with backpropagation algorithm consitute the best example of a successful Gradient-Based Learning technique. Given an appropriate network architecture, Gradient-Based Learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns such as handwritten characters, with minimal preprocessing.''
In this paper while Yann LeCun was reviewing methods for handwritten recognition, his research demonstrated that Convolutional Neural Networks outperforms other methods. This is because Convolutional Neural Networks are designed to deal with 2D shapes. (1) While he was researching he created LeNet, which is the first Convolutional Neural Network Architecture. In Traditional CNN Architectures we will take a look into combining modules for CNN Architectures. These combinations are based on ''What is the Best Multi-Stage Architecture for Object Recognition? '' another paper which was published by Yann LeCun on 2009. The next step will be taking a look into LeNet architecture.
Layers in Traditional Convolutional Neural Network Architectures
Generally, the architecture aims to build a hierarchical structure for fast feature extraction and classification. This hierarchical structure consists of several layers: filter bank layer, non-linear transformation layer, and a pooling layer. The pooling layer averages or takes the maximum value of filter responses over local neighborhoods to combine them. This process achieves invariance to small distortions.(2)
Traditional architecture is different from the modern ones. Here are the list and short descriptions of layers used in building models for Traditional CNNs.
Filter Bank Layer-
: This layer acts as a special form of convolutional layer. The only addition is that the convolutional layer is put through theLaTeX Math Inline body {F}_{{CSG}}
operation. This layer calculates the outputLaTeX Math Inline body \tanh
withLaTeX Unit body {y}_{{i}}
:LaTeX Math Inline body \tanh LaTeX Math Block anchor 12 alignment center {y}_{{j}}={g}_{{i}}\tanh({\sum _{i }}{k}_{{ij}} \times {x}_{{i}} )
- Rectification Layer-
LaTeX Unit body {R}_{{abs}} - Local Contrast Normalization Layer-
: This layer performs local subtractive and divisive normalizations. It enforces local competition between features in feature maps and between features at the spatial location in different feature maps.LaTeX Math Inline body N - Average Pooling and Subsampling Layer-
LaTeX Math Inline body {P}_{{A}} - Max- Pooling and Subsampling Layer-
LaTeX Math Inline body {P}_{{M}}
Information on Convolutional, Pooling and Rectification Layer can be found here.
Combination of Modules in Traditional Architecture:
We can build different modules by using layers. We can form a feature extraction is formed by adding a filtering layer and different combinations of rectification, normalization and pooling layer. Mostly Most of the time one or two stages of feature extraction and a classifier is enough to make an architecture for recognition.(3)
-LaTeX Math Inline body {F}_{{CSG}}
: This combination is one of the most common block for building traditional convolutional networks. When we add several sequences ofLaTeX Math Inline body {P}_{{A}}
-LaTeX Math Inline body {F}_{{CSG}}
and a linear classifier. They would add up to a complete traditional network.LaTeX Math Inline body {P}_{{A}}
Figure 1:the structure of
-LaTeX Math Inline body {F}_{{CSG}} LaTeX Math Inline body {P}_{{A}}
-LaTeX Math Inline body {F}_{{CSG}}
-LaTeX Unit body {R}_{{abs}}
: In this module the filter bank layer is followed by rectification layer and average Pooling layer. The input values are squashed byLaTeX Math Inline body {P}_{{A}}
, then the non-linear absolute value is calculated, and finally the average is taken and down sampled.LaTeX Math Inline body \tanh
Figure 2: the structure of
-LaTeX Math Inline body {F}_{{CSG}}
-LaTeX Unit body {R}_{{abs}} LaTeX Math Inline body {P}_{{A}}
-LaTeX Math Inline body {F}_{{CSG}}
-LaTeX Unit body {R}_{{abs}}
-LaTeX Math Inline body N
: This module is very similar to previous module only difference is that a local contrast normalization layer is added between rectification layer and average Pooling layer. In comparison to the previous module after the calculation of non-linear absolute value, they will be normalized and send to the pooling layer, where their average is taken and down sampled.LaTeX Math Inline body {P}_{{A}}
Figure 3: the structure of
- LaTeX Math Inline body {F}_{{CSG}}
- LaTeX Unit body {R}_{{abs}}
- LaTeX Math Inline body N
(Image source(4)) LaTeX Math Inline body {P}_{{A}}
-LaTeX Math Inline body {F}_{{CSG}}
: This module is another common module for convolutional networks.This model forms the basis of HMAX architecture.LaTeX Math Inline body {P}_{{M}}
Figure 4: the structure of
- LaTeX Math Inline body {F}_{{CSG}} LaTeX Math Inline body {P}_{{M}}
Modern Convolutional Neural Network Architecture:
This chapter offers basic knowledge on how to build reliable simple modern architectures and demonstrates certain known examples from literature.
Layers used in Modern Convolutional Neural Networks:
Layers in modern architectures are very similar to the traditional layers, yet there are certain differences, RELU is a special implementation of Rectification Layer. You can find more information about RELU and Fully connceted Layer here.
For a simple Convolutional Network following layers are used:
- Input Layer
- Convolutional Layer
- RELU Layer
- Pooling Layer
- Fully Connected Layer
Main idea is that at the start the neural network architecture takes the input, which is an image size of
LaTeX Math Inline | ||
---|---|---|
|
Few examples for building Net Architecture:
- only a single Fully Connected Layer: This is just a linear classifier
- Convolutional → RELU→ Fully Connected
- Convolutional → RELU→Pooling→ Fully Connected→ Convolutional → RELU→ Pooling→ Fully Connected→ RELU→ Fully Connected: Convolutional Layer between every Pooling Layer
- Convolutional → RELU→ Convolutional → RELU→ Pooling → Convolutional → RELU→ Convolutional → RELU→ Pooling→ Convolutional → RELU→ Convolutional → RELU→ Pooling→ Fully Connected→ RELU→ Fully Connected→ RELU→ Fully Connected: This architectural form has 2 convolutional layers before each Pooling and this form is useful when building a large and deep networks because multiple convolutional layers leads to more detailed and complex features of the input before it is sent to the pooling layer, where some portion of the information will be lost.
How to build the layers:
Convolutional Layer:Generally, we want to use small filters. When building layers stacks of smaller convolutional filters are preferred over a single large layer. Assume that we have three connected
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
Pooling Layer: Max-pooling with
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
LaTeX Math Inline | ||
---|---|---|
|
Literature
Anchor | ||||
---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|