Hi everyone. So, this is a introduction to the course on Deep Learning which will be offered through NPTEL. So, for over the past few years or past decade or so, deep learning has become very prevalent and it is finds applications in a wide range of area.
such as speech, computer vision, natural language processing. In fact, most of the state-of-the-art systems in these areas from even companies like Google, Facebook, etc. use deep learning as the underlying solution. So, in this course, we will... learn some of the foundational or the fundamental blocks of deep learning.
In particular, we will start right from the basics and start from perceptron or a sigmoid neuron a single neuron and from there we try to go to multilayer network of neurons or a multilayer perceptron as it is commonly known. So and we look at algorithms for training such networks and the specific algorithm that we look at is back propagation which uses gradient descent and then we look at several applications. applications of feed forward neural networks like auto encoders and word to wake and so on. Then we will move on to the next type of neural networks which is recurrent neural networks which find applications in areas where you have to deal with sequences.
So sequences are again omnipresent. You have sequences in natural language text. So when you talk of a sentence you can think of it as a sequence of words.
In fact words themselves you can think of them as sequence of characters. And, such sequences also occur in other areas such as speech, you have a sequence of phonemes and also in case of videos where videos could be treated as a sequence of images. So, how do you deal with such sequential data when you want to do various things on top of this data such as you might want to do classification, you might want to do sequence prediction for example, given one sentence in a source language, you might want to predict the equivalent sequence in the target language.
So, all these applications require something known as recurrent neural networks. So, we will be looking at that and we will also look at algorithms for training recurrent neural networks which is again back propagation, but with a twist to it and that is known as back propagation through time. So, we will look at the math behind that and some of the challenges in training recurrent neural networks.
And to overcome these challenges we look at other forms of other types of RNNs or recurrent neural networks which is LSTMs and gated recurrent units which overcome some of the challenges that occur while training recurrent neural networks. And the third type of neural networks that we look at is convolutional neural networks which largely find application in the vision domain. So when you have an image, how do you come up with a good representation for the image and then do various tasks on top of that such as classification, object detection, segmentation and so on.
So again the underlying block here which is almost prevalent in all computer vision applications or image processing applications is something known as a convolutional neural network which uses the convolutional operation. To come up with a abstract representation of an image and not just one representation or deep hierarchical representation abstract representation of an image right. So, we look at what a convolutional neural network is, how it is different from a feedforward neural network and so on.
And once you are done with these three fundamental blocks which is feedforward neural networks, recurrent neural networks and convolutional neural networks, we will then put them all together and look at something known as encoder decoder models which are used to take any kind of input. Say image or speech or text, encode them into a representation and decode some output from this. So this output could either be a classification output or it could be a sequence in itself.
For example, you could encode an image and then I have to generate a caption for the image or you could encode a video and then try to generate a caption for the video and so on. So these encoder decoder models, they use a combination of these fundamental blocks RNNs, CNNs and feed forward neural networks and combine them in interesting ways. to come up with to apply them to various downstream tasks right like image captioning, machine translation, document summarization and so on.
And one another critical component of these models is something known as the attention network which learns to pay attention to important parts of the input. For example, if you are trying to write a caption for an image where the image shows a boy throwing a frisbee in a park, the main components of the image among all the background is just a boy. the frisbee and the green grass that indicates the park, right. Everything else all the other pixels just get summarized into these three main objects which are there in the image. So model which can generate a good caption for this image should learn to pay attention to these critical components of the input and this is not just restricted to images, it could also happen in the case of document classification.
So whether you want to find whether this document talks about politics or sports or finance. There will be some important words in the document that you need to focus on which reveal what is the type of the document or the class of the document. So, for such things also it is very important to find out the important words.
in the input and pay attention to them. So, this is done by something known as the attention mechanism. So, we will look at what attention mechanism is and how to integrate that with encoder decoder models. So, that is the main part of the course and we will be structuring this into 30 hours of teaching or 12 weeks of teaching. Apart from that beyond these basic models, we also have an extended version of the course where we will talk about deep generative models, where we will look at the use of neural networks.
for learning probability distributions and the 3 or 4 main neural networks or 4 main paradigms that we look at here are something known as restricted Boltzmann machines, variational auto encoders, autoregressive models and generative adversarial networks. So we will look at some theory behind these and how all of them connect together, what are their relative advantages, disadvantages and what is the taxonomy under which all these different models fall. So that is going to be an extended version of the course which may not be a part of the main syllabus.
The main syllabus will only contain feed forward neural networks, RNNs, CNNs and sequence to sequence models with attention mechanism or encoder decoder models with attention mechanism. So that is all. So that is the introduction for the course.
I hope you enrolled for it and enjoy the course. Thank you.