Review of “ImageNet Classification with Deep Convolutional Neural Networks” article

Taken from


In this blog I’m going to summarize the “ImageNet Classification with Deep Convolutional Neural Networks” article, It was written by Alex Krizhevsky, Ilya Sutskever Geoffrey E. Hinton from the University of Toronto, This article is the explanation about how they trained a deep Neural Network for the ImageNet contest (2010 and 2012), They used five convolutional layers, max-pooling layers after some convolutional layers, three fully-connected layers and a softmax layer. Also, they applied a variant of dropout regularization method, All of these give them top_1 and top_5 error rates overcome the previous state of the art.


In this part, I’m going to describe the steps that they used to build the model, scale the images, activation function, regularization techniques, GPUs, and how they trained the model.

They saw that the Images from ImageNET had different sizes and their model received square images with a size of 256 x 256, for this reason, they rescaled the images and they said that they didn’t use anything for processing images, only “the (centered) raw RGB values of the pixels”.

They used the RELU activation function based on some comparisons with tangent hyperbolic. Another interesting method applied to train the model was to use two GPUs to spread the net, this was an idea because in that time they had memory RAM of 3GB and the amount of data were greater than this could process, Each GPU had the half of neurons model, but the key to spreading the neurons was communicating GPUs only in certain layers (third layer with all inputs from second layer outputs, fourth layer with only third layer inputs from the same GPU).

They also used other techniques like local response normalization, overlapping pooling, and some regularization techniques like dropout and data augmentation. Maybe something not so typical is the probability that they used for dropout this was 0.5.

They talked about the training method that they applied, It was a stochastic gradient descent with a batch size of 128 samples 0.9 of momentum and 0.0005 of weight decay.


The used Relu activation function because It was six times faster than tangent hyperbolic function to arrive 25% of error rate like they showed us in the next picture:

Taken from ImageNet Classification with Deep Convolutional Neural Networks

About GPU utilization, this technique gave them a reduction error rate of 1.7% and 1.2%.

Data augmentation gave them a 1% reduction error rate.

They removed a middle layer and they lost a 2% reduction error rate.

All of those techniques gave them the next final results:

Taken from ImageNet Classification with Deep Convolutional Neural Networks
Taken from ImageNet Classification with Deep Convolutional Neural Networks


The results showed that a deeper network could have better results and now we can see that this is true because the next models go deeper and get better predictions. Maybe the dropout regularization technique avoided the overfitting and help them to improve the results.

We can create a lot of ways to solve the problems but we should understand that in some cases we need machine resources to improve those results. For this reason, always it is necessary to think about the best option with the resources that you have at the moment and It doesn’t justify that you get bad results, It should be a barrier that you should break or understand how to live with that as they showed us.

Personal Notes

It’s very interesting to see that 10 years ago the science had low resources to processing big networks, but It wasn’t a wall to stop the way to solve the problems, instead of that they found different ways to approach to the best option and they got a very incredible result, I think in the possible results that the tech advances are going to facilitate for new models.

In this time maybe these results are not impressive because right now we have models with better predictions, but it is interesting to understand the beginning of all architectures that we are using at the moment.


[Alex Krizhevsky, Ilya Sutskever Geoffrey E. Hinton] ImageNet Classification with Deep Convolutional Neural Networks