As I was studying the case of Image Classification for Airbnb Product Listings, I came across ResNet50. This is a brief overview of what a Resnet is.
Deep Learning Neural Networks
Above is the figure of a normal two-layer neural network. It starts with a[l] to which we apply the linear operator z[l+1] = W[l+1]a[l] + b[l], where W is weight, and b is bias. We apply an activation function, say Relu, to the above; then we get a[l+1]. We repeat the above step to a[l+1]. Andrew NG calls it the main path in his video on the same.
What is wrong with this?
Deep learning is thought of as learning of a hierarchical set of features or representations. So in the simplest sense, stacking more layers on a deep neural network should mean improved performance. In Theory, more layers learn more levels of features. The deeper, the better.Right?
Not really. The problem of vanishing/exploding gradients hampers convergence from the beginning. In the ImageNet paper, we read
"When deeper networks can start converging, degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as reported in and thoroughly verified by our experiments"
In simple words, it means when you increase the number of layers, at first the error goes down, but then after a while, the error starts going back up.
Thought process to Resnets
If we are to compare a shallow neural network model and a deeper one, theoretically speaking what the deeper model should do is to copy the output of the shallow model with identity mappings. It suggests that a deeper model shouldn't produce any more error than its shallow counterpart.
But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution.
The ImageNet paper addresses the degradation problem by introducing a deep residual learning framework. Instead of hoping for layers to fit the desired mapping, it is explicitly set that these layers fit a residual mapping.
The paper hypothesized that it is easier to optimize the residual mapping than to optimize the initial, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.
This is the basic idea behind Resnets.
A Residual block is the building block of a Resent. Let us took Residual blocks in contrast with contemporary plain networks.
Remember an earlier use of the term - main path. Well, in the residual block, a skip connection is introduced, as shown below.
Basically, a connection that enables us to add a[l] to the final activation function. This helps us to take a[l] much deeper into the neural network.
When neural networks are composed of the above said Residual blocks instead of the contemporary network, the below results were observed.
We notice that the errors don't go back up or get saturated.
You can find the original paper here
This is my understanding of Resnets. Will improve the blog when I get a better understanding. You can find an application of the same in my blog on Airbnb Listing Room Images Classification.
Authors of the Paper:
Alumni of Tsinghua University, Kaiming HE served as Lead Researcher at Microsoft Research Asia before joining as Research Scientist at Facebook AI Research in 2016.
After earning his doctorate from Xi'an Jiaotong University, Dr. Xiangyu Zhang joined MEGVII Technology where he now serves as Research Lead & Senior Researcher
Shaoqing Ren earned his doctorate from the University of Science and Technology of China. This was a Joint PhD when worked as Research Intern with Microsoft. He later went on to build a startup of his own - Momenta. He now works with NIO as VP, Autonomous Driving Algorithm.
Jian Sun received a B.S., M.S., and a Ph.D. degree from Xian Jiaotong University in 1997, 2000, and 2003, respectively. Immediately following, he joined Microsoft Research Asia and has been working in the fields of computer vision and computer graphics. After over 14 years with Microsoft, he is now working as Chief Scientist / Managing Director of Megvii Research.