In the research field of Artificial Intelligence (AI), Deep Learning or CNNs “overfitting” is a common problem. It generally describes the use of models or procedures that include more terms than necessary or use overly complex approaches. It can be categorized into two types:
- Using a network that is more flexible than it needs to be.
- Using a model that includes irrelevant components
There are several reasons why overfitting should be avoided:
- Unused predictors waste resources
- Too complex models increase the probability of prediction mistakes
- Irrelevant predictors add random variation to the subsequent predictions
- Too complex models show low portability to other situations, as they are fitted to close to one specific environment
In order to avoid overfitting in CNNs models, this WIKI proposes the Dropout method.(1)
When deeper networks are able to start converging, a degradation problem has been exposed. As the network depth increases, accuracy gets saturated and then starts to degrade rapidly. Overfitting does not cause this degradation, and adding more layers to a suitably deep model leads to higher training error. Read more detailed information on this problem here. (2)
Therefore the method of deep residual networks has been proposed.
Another problem in the field of image classification is spatial invariance, as objects can be variable in size, rotation and several other transformations. By now, the CNNs proposed in this WIKI show no flexibility to such transformed objects.
Have a look at Spatial Transformer Networks for insights on an approach to resolve this issue and also get more information on spatial invariance.
Long training times / performance issues
Large networks need long training times. Complex models can take days of training depending on the hardware it’s running on. In addition to that, a steady increase of CNN complexity has been observed in the past and can be predicted for future nets. Meaning additional layers, filters and other steps. Expanding training data sets are another factor accelerating the greedy computation consumption of CNNs. As a result researchers have to take care of performance optimization to their nets in order to keep training times reasonable.
Several approaches therefore switch to GPUs for training networks. Read more in here.
Gathering training data
With CNNs being able to differentiate between increasing numbers of features while improving feature detection accuracy as well, it is getting harder to provide enough labeled training data sets for new fields of application. By now there are several acquainted labeled data sets like: MNIST, PASCAL, IMAGENET.
These libraries are well suited to cover topics like handwriting recognition as common image processing tasks. Unusual tasks, on the other hand, are not covered by such big libraries therefore have to be created specifically. Unfortunately constructing and verifying such libraries can be a time- and labor-intensive task.
A proposition on this issue is made by generative adversarial networks, which are able to generate new training data by mixing labeled data sets.
Understanding trained networks
Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However, there is no clear understanding of why they perform so well, or how they might be improved. Scientifically and from a security point of view this is deeply unsatisfying. When used in security relevant fields like autonomous driving as an example a high assurance of functionality is desired.
First steps into visualizing and deeper understanding CNNs have been proposed with “Deconvolutional Networks”. Researchers were able to observe individual feature maps in any layer in the model. They also could watch the evolution of features during training and were able to diagnose potential problems with the model. Read more in their elaborations (see literature 3).
As CNNs are a young growing field of research with much potential, it is by now hard to say where this technology will fiend its limits. Many approaches to solve the preceding limits have already been proposed. CNNs have shown brilliant results in categorizing and processing 2D graphics. The future of deep learning will consist of combinations of different AI technologies, where each of those contributes with its own specialization.
- Douglas M. Hawkins, The problem of overfitting , Journal of Chemical Information and Computer Sciences 44 (2004), no. 1, 1–12, PMID: 14741005. http://pubs.acs.org/doi/abs/10.1021/ci0342472
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep residual learning for image recognition , CoRR abs/1512.03385 (2015). https://arxiv.org/abs/1512.03385
- Matthew D. Zeiler and Rob Fergus, Visualizing and understanding convolutional networks, pp. 818–833, Springer International Publishing, Cham, 2014. http://link.springer.com/chapter/10.1007/978-3-319-10590-1_53
Weblinks to Datasets
MNIST: http://yann.lecun.com/exdb/mnist/ (Last visited 29.01.2017)
PASCAL: http://host.robots.ox.ac.uk/pascal/VOC/databases.html (Last visited 29.01.2017)
IMAGE-NET: http://image-net.org (Last visited 29.01.2017)