Transcript for:
Convolutional Neural Networks and their Applications in Computer Vision

welcome to this course on convolution neural networks computer vision is one of the area's that's been advancing rapidly thanks to deep learning deep learning computer vision is now helping self-driving cars figure out where are the other cars and the pedestrians around it so it's avoid them is making face recognition work much better than ever before so that perhaps some of you will soon or perhaps already be able to unlock a phone unlock even a door using just your face and if you look on your cell phone I bet you many apps that show you pictures of food or pictures of Aalto or just fun pictures of scenery and some of the companies that build those apps are using deep learning to he'll show you the most attractive of the most beautiful or the most relevant pictures and I think deep learning is even enabling new types of art to be created so I think the two reasons I'm excited about deep learning frequency vision and why I think you might be to first rapid advances in computer vision are enabling brand new applications to be built they just were impossible a few years ago and by learning these tools perhaps you will be able to invent some of these new products and applications second even if you don't end up building computer vision systems per se I found that because the computer vision research community has been so creative and so inventive and coming up with new neural network architectures new algorithms is actually inspire that creates a lot of cross-fertilization into other areas as well for example when those work on speech recognition I sometimes actually took inspiration from ideas from computer vision and borrowed them into the speech literature so even if you don't end up working on confusion I hope that you find some of the ideas you learn about in this course hopeful for some of your algorithms and your your architectures so with that let's get started here are some examples of computer vision problems we'll study in this course you've already seen image classification sometimes also called image recognition where you might take as input say a 64 by 64 image and try to figure out if static at another example of a computer vision problem is object detection so if you're building a self-driving car maybe you don't just need to figure out if there are other cars in this image but instead you need to figure out the position of the other cars in this picture so that your car you can avoid them so an object detection usually we have to not just figure out that these other objects say cause in the picture but also draw boxes around them or have some other way of recognizing where in the picture are these objects and notice also in this example that there can be multiple cars in the same picture or at least every one of them within a certain distance of your car here's another example maybe a more fun one is nearest our transfer let's say you have a picture and you want this picture repaint it in a different style so nearest our transfer you have a content image and you have a style image the image on the right does se a Picasso and you can have a neural network put them together to repaint the content image that is the image on the left but in the style of the image on the right and you end up with the image at the bottom so albums like these are enabling new types of artwork to be created and in this course you learn how to do this yourself as well one of the challenges of computer vision problems is that the inputs can get really big for example in previous courses you've worked with 64 by 64 images and so that's 64 by 64 by 3 because they're three colour channels and if you multiply that out that's one two two eight eight so X the input features has dimension one to two eight eight and that's not too bad but 64 by 64 is actually very small image if you work with larger images maybe this is a thousand pixel by thousand pixel image and that's actually just some one mega pixel but the dimension of the input features will be thousand by thousand by three because you have three RGB channels and that's three million oh and if you are viewing this on a smaller screen this might not be apparent but this is actually low res 64 by 64 image and this is a higher-risk thousand by thousand image but if you have three million input features then this means that X here would be three million dimensional and so if in the first hidden layer maybe you have just a thousand hidden units then the total number of weights that is the matrix w1 if you use a standard fully connected Network like we have in closest one or two this matrix will be a one thousand by three million dimensional matrix right because X is um now R by 3 million VM I'm using to denote 3 million and this means that this matrix here will have 3 billion parameters which is just very very large and with that many parameters is difficult to get enough data to prevent in your network from overfitting and also the computational requirements and the memory requirements to train the neural network with the billion parameters is just a bit infeasible but for computer vision applications you don't want to be stuck using only tiny little images you want to use large images to do that you need to better implement the convolution operation which is one of the fundamental building blocks of convolutional neural networks let's see what this means and how you can implement this in the next video and we'll illustrate convolutions using the example of H detection