Transcript for:

okay well welcome to part two of deep learning for coders part one was practical deep learning for coders that too is not impractical deep learning for coders but it is a little different as well discuss this is probably a really dumb idea that last year I started like not starting part two with part two lesson one but part two lesson 8 because it's kind of out of the same sequence so I've done that again but sometimes I'll probably forget and call things lesson one so clap to lesson one in part two less than eight or the same thing if you ever make that mistake so we're going to be talking about object detection today which refers to not just blending out what a picture is a picture of but also where abouts the thing is but in general the idea of each lesson in this part is not so much because I particularly want you to care about say object rotation but rather because I'm trying to pick topics which allow me to teach you some foundational skills that you haven't got yet right so for example object detection is going to be all about creating much richer convolutional network structures which have kind of a lot more interesting stuff going on and a lot more stuff going on in the first day I library that we have to customize to get there so like at the end of these seven weeks I can't possibly cover the hundreds of interesting things that people are doing with deep learning right now but the good news is that all of those hundreds of things are you'll see once you later read the papers like minor tweaks on a reasonably small number of concepts and so we covered a bunch of those concepts in part one and we're going to go a lot deeper into those concepts and build on them to get just in deeper concepts in part two so in terms of what we covered in part one there's a few key takeaways we'll go through each of these takeaways into it one is the idea and you might have seen recently Young Buck Owens been promoting the idea that we don't call this deep learning but differentiable programming and the idea is that you've noticed all the stuff we did in part one was really about setting up a differentiable function and a loss function that describes how good the parameters are and then pressing go and it kind of makes it work you know and so this is kind of I think it's quite a good way of thinking about it differentiable programming this idea that if you can configure a loss function that does but you know describes you have scores how good something is it doing your task and you have a reasonably flexible neural network architecture you're kind of done okay so that's one key way of thinking about this this example here comes from playground tensorflow dot org which is a cool website where you can play interactively with creating your own little differentiable functions manually the second thing then we learnt is about transfer learning and it's basically that transfer learning is like the most important single thing to be able to do to use deep wound effectively nearly all courses nearly all papers nearly everything in deep learning education and research focuses on starting with random words which is ridiculous because you almost never would want to or need to do that you would only want to or need to do that if nobody had ever trained and model on a vaguely similar set of data with an even remotely connected kind of problem to solve as what you're doing now you know which almost so this is where kind of the faster library and the stuff we talk about in this class is vastly different too any other library or course is that it's all focused on transfer learning and it turns out that you do a lot of things quite differently so the basic idea of transfer learning is here's a network that does thing a remove the last layer or so replace it with a few random layers at the end fine-tune those layers to do things be taking advantage of the features at the original network word and then optionally find you in the whole thing into end and you've now got something which probably uses orders of magnitude less data and if you start with random weights it's probably a lot more accurate and probably trained a lot faster you know we didn't talk a hell of a lot about architecture design in part one and that's because kind of architecture design is getting less and less interesting there's a pretty small range of architectures that generally works pretty well quite a lot of the time we've been focusing on using CNN's for generally fixed size somehow ordered data era tins for sequences at some state fiddling around a tiny bit with activation functions I soft maps if you've got a single categorical outcome or sigmoid if you've got multiple outcomes and so forth some of the architecture design we'll be doing in this part gets kind of more interesting particularly this first session about object detection but you know on the whole I think we probably spend less time talking about architecture design than most courses or papers because it's it's not it's you know it's generally not too happy okay so the third thing is we looked at was out of word overfitting and so did the general idea that I tried to explain is is the way I like to build a model is to of all create something that's definitely terribly over parameterised will massively overfit for sure trainer and make sure it does over the it factors at that point you know okay I've got a model that is capable of reflecting the training set and then it's as simple as doing these things to then reduce that overfitting and if you can't start if you don't start with something's overfitting then you you're kind of lost right so you start with something else overfitting and then to make it over fit less you can add more data you can add more data augmentation you can do things like more batch norm layers or dense Nets or you know various things that can handle those a few nice data you can add regularization like weight decay and drop out and then finally this is often the thing people do first but this should be the thing you do last is reduce the complexity of your architecture have less layers or less activations we talked quite a bit about betting's both for NLP and the general idea of any kind of categorical data as being something you can now model with neural nets and it's been interesting to see how since part one came out at which point there were almost no examples of kind of papers or blogs or anything about using kind of tabular data at categorical data in deep learning suddenly it's kind of taken off and it's it's kind of everywhere so this is becoming a more and more popular approach it's it's still little enough known that when I say to people like oh you know we use neural nets for time series and tabular data analysis is often like wait really but it's definitely not such a far-out idea yeah and there's more and more resources available including casual competition recent capital competition winning approaches using this technique okay so the part one you know which kind of particularly had those five messages really was all about introducing you to best practices in deep learning and so it's like trying to show you techniques which were mature enough that they definitely work reasonably reliably for practical real world problems and that I had researched and tuned enough over quite a long period of time that I could kind of say okay here's the sequence of steps and architectures and whatever that if you use this you'll almost certainly get pretty good results and then had kind of put that into the first day I bring into a way that you could do that pretty quickly and easily so that's kind of what practical deep learning pakodas was designed to do so this part two is cutting edge deep learning for coders and what that means is I often don't know the exact best parameters architecture details and so forth to solve Euler problem we don't necessarily know if it's going to solve a problem well enough to be practically useful it almost certainly won't be integrated well enough in too fast AI or any library that you can just press a few buttons it'll start working it's it's all about stuff which I'm not going to teach it unless I'm very confident that you know there is now or will be soon very practically useful technique so like I don't kind of take stuff which just appeared and I don't know enough about it to kind of know like what's a trajectory gonna be so if I'm teaching it in this course and saying like you know this is you know either works well in the research literature now and it's gonna be well worth learning about or we're pretty close to being there it's got to take a lot of freaking often and experimenting to get it to work on your particular problem because we don't know you know the details well enough to know how to kind of make it work for every dataset or every example so it's kind of exciting to be working at this point it means that rather than fast AI and pi torch being obscure black boxes which you just know these recipes for you're going to learn the details of them well enough that you can customize them exactly the way you want that you can debug them that you can read the source code or to see what's happening and so if you're not pretty confident of you know object-oriented Python and stuff like that then that's something you don't want to focus on studying during this course because we we assume that I'm not going to spending time on that but I will be trying to introduce you to some some tools that I think are particularly helpful like the Python debugger like how to use your editor to kind of jump through the code stuff like that and in fact in general there'll be a lot more detailed specific code walkthroughs coding technique discussions and stuff like that as well as more detailed walkthroughs of papers and stuff and so anytime we cover one of these things if you notice something where you're like you know this is assuming some knowledge that I don't have that's fine you know it just means like that's something you could ask from the forum and say hey you know Jeremy kind of was talking about whatever static methods in Python I don't really know what a static method is or like he was using it here because only some resources like you know these are kind of things at all they're not rocket science it's just just because you don't happen to have come across it yet doesn't mean it ha it's just something you learn I will mention that as I cover these research level topics and develop these courses I often refer to code that academics have put up you know to go along with it papers or kind of example code that somebody else is written on github I nearly always find that there's some massive critical flaw so be careful of like taking code from you know online resources and just assuming that if it doesn't work for you that you've made a mistake or something you know this kind of like research level code it's it's just good enough that they were able to run their particular experiments you know every second Tuesday so you should you know you should be ready to kind of do some debugging so on that that's since I just wanted to remind you about something from our old course wiki that we sometimes talk about which is like people often ask what should I do after the less and like had away how do I know if I've got it right and we basically have this thing called how to use the provided notebooks and the idea is this part don't open up the notebook and I know I said this in part one as well but I'll say it again and go shift and a shift and a shift enter until a bag appears and then go to the forums and say it notebooks broken right the idea of the notebook is to kind of be like a little crutch to help you get through step the idea is that you start with like an empty notebook and think like okay I now want to complete this process right and and that might be initially require you op tabbing to to the notebook and reading it figuring it out what it says but whatever you do don't copy and paste it your notebook type it out yourself right back so try to make sure you can repeat the process and as you're typing it out and you'd be thinking like well what am i typing why don't i okay so if you can get to the point where you can you know solve an object detection problem yourself in a new empty notebook even if it's using the exact same data set we used in the course that's a great sign that you're getting it right and that'll take a while but the idea is that by practicing you know the second time you try to do it the third time you try to do it your check the notebook glass yes right and if there's anything in the notebook where you think if you think I don't know what it's doing I hope to teach you enough techniques in this course in this past that you'll know how to experiment to find out what it's doing right so you shouldn't have to ask that but you may well want to ask like why is it doing that you know that's the conceptual bit and that's something which you may need to go to the forums and say like you know before this tip jeremy had done this after this tip jeremy had done that this is bit in the middle way who does this other thing I don't quite know why you know so then you can try and say like here my policies as to why like try and work through it as much as possible and that way you'll both be helping yourself and other people will help you fill in the gaps right if you wish and you have the financial resources now is a good time to build a deep learning box for yourself when I say a good time I don't mean a good time in the history of the pricing of GPUs GPUs are currently by far the most expensive they've ever been as I say this because of the cryptocurrency mining boom I mean it's a good time in your study cycle I mean the fact is if you're paying somewhere between 60 cents and 90 cents an hour for doing your deep learning on a cloud provider particularly if you're still on a k80 like an Amazon p2 or google collab actually I haven't come across it now let's you train on a kad for free but those are very slow jeez you know you can buy one it's gonna be like three times faster for maybe six hundred seven hundred dollars you need a box to put it in of course but you know the example in the bottom right here from the forum was something that somebody put together in last year's cost so like a year ago they were able to put together a decent box referred over five hundred dollars generally speaking your problem I created a veneer forum thread where you can talk about you know options and parts and ask questions and so forth afford it right now that gtx 980ti is almost certainly what you want in terms of the best price performance MIPS if you can't afford it at ten seventy is fine if you can't afford that you should probably be looking for a secondhand 980 or a second Amazonia something like that if you can afford to spend more money it's worth getting a second GPU so you can do what I do which is to have one GPU training and another GPU which I'm running an interactive Cupid in the clock session Ram is very useful try and get 32 gig if you can Ram is not terribly expensive a lot of people find that their vendor or person to buy one of these business classes they on CPUs that's a total waste of time you can get one of them into our eye five or I seven consumers to CPUs far far cheaper but actually a lot of them are faster often you're here CPU speed doesn't matter that's if you're doing computer vision that's definitely not true it's very common now with these like 1080 TRS and so forth to find that the speed of the data augmentation is actually this mo bit that's happening on the CPU so it's worth getting a decency to you your again your GPU if it's running quickly but the hard drives got fast enough to give it data then that's a waste as well so if you can afford an nvme drive they're super super fast you don't have to get a big one you can just get a little one you just copy your current set of data onto and have some big raid array that sits there for the rest of your data when you're not using it there's a slightly arcane thing about PCI lanes which is basically like they're kind of the size of the highway that connects your GPU to your computer and a lot of people claim that you need to have a 16 lanes to feed your GPU it actually turns out based on some analysis that I've seen recently that that's not true you need you need eight lanes GPU so again so like hopefully help you save some money on your motherboard if you've never heard of PCI lanes before trust me by the end of putting together this bus you'll be sick of hearing about you can buy all the parts and put it together yourself it's not that hard can be a useful learning experience it can also be kind of frustrating and annoying so you can always go to like central computers and they'll put it together for you there's lots of online and vendors that will do the same thing now generally like make sure it turns on and runs properly generally not much of a markup so it's not a bad idea we're going to be doing a lot of reading papers basically each week we'll be implementing a paper or a few papers and if you haven't looked at papers before they look something like on the left that thing on the left is an extract from the paper that implements atom you may also have seen atom as a single excel formula on the spreadsheet that are in there the same thing okay the difference is in academic papers people love to use Greek letters they also hate to refactor so you'll often see like like a page long formula where when you actually look at it carefully you'll realize like the same and a sub equation of his eight times you know they didn't know if they didn't think to say about it like that T equal like this that equation announced one I don't know why this is a thing but I guess all this is to say like once you've read and understood a paper you then go back to it and you look at it you just like wow how did they make such a simple feeling so complicated like Adam Wright is like momentum and momentum on this momentum on the gradient and momentum on the squared already that's it right and and speak long things and the other reason it's a big long thing is because they have things like this where they have like theorems and calories and stuff where they're kind of saying like here's or our theoretical reasoning behind why this ought to work or whatever and for whatever reason you know a lot of conferences and journals don't like to accept papers that don't have a lot of this theoretical justification Geoffrey Hinton's talked about this a bit how particularly you know a decade or two ago when no conferences would really accept any neural network papers then there was this like one abstract theoretical result that came out where suddenly they could show this you know I don't like practically unimportant but theoretically interesting thing and then suddenly they could then start submitting things to journals because they have this like theoretical justification so it's kind of yeah academic papers are a bit weird but in the end it's the way that the research community communicates their their findings and so we need to learn to read them but something that could be a great thing to do is to take a paper put in the effort to understand it and then write a blog where you explain it in you know code in normal English and lots of people who do that you know end up getting quite a following end up getting some pretty great job offers and so forth because you know it's such a useful skill to show like okay I can I can understand these papers I can implement code I can explain them in English one thing I will mention is it's very hard to read or understand something which you can't localize which means if you don't know the names of the Greek letters like it sounds weird but it's actually very difficult to understand remember take in a formula that appears again and again that's got like squiggle right you need to know that that squiggle is called Delta or that screw or the Sigma or whatever so like just spending some time learning the names of the Greek letters is like sounds like a strange thing to do but suddenly you don't look at these things anymore and go like squiggle iovers people beat us other weird squiggle looks like a why thing right they've all got notes okay so now that we're kind of at the cutting edge stage a lot of the stuff we'll be learning this class is stuff that almost nobody else knows about so that's a great opportunity for you to be kind of like it's the first person to create an understandable and generalizable code library that implements it or the first person to write a blog post explains it in clear English for the first person to try applying it to this slightly different area but it's obviously going to work just as well or so so when we say cutting-edge research that doesn't mean you have to come up with like the next batch norm or the next atom or the next five liter convoluted convolution it can mean like okay take this thing that was used for translation and apply it to this very similar other parallel and LP task or take this thing that was tested on skin lesions and trusted on this data set of this other clarifications that kind of stuff is super great learning experience and incredibly useful because the vast majority of the world that knows nothing about this whole field it just looks like magic you know you're like a I've for the first time shown greater than 90 percent accuracy at you know finding this kind of lesion in this kind of data [Music] so you know when I say here experiment in your area of expertise you know one of the things we particularly look for in this class is to kind of bring in people who are pretty good at something else you know pretty good at meteorology or pretty good at denovo drug design or pretty good at goat dairy farming or whatever you know examples people we had in the class so probably the thing you can do the best would be to take that thing you're already pretty good at and that on these new skills right because otherwise if you try to go into some different domain you're gonna have to figure out how do I get data for that domain problems are solved in that domain where else often it'll seem pretty trivial to you to take this technique apply to this data set that you've already got sitting on your hard drive but that's often going to be that super interesting thing you know for the rest of the world to see like oh that's interesting you know when you apply it to meteorology data and you use this eridan or whatever and allows you to forecast over larger areas or a longer time periods so communicating what you're doing is super helpful we've we've talked about that before but I know something that a lot of people in the forum's ask people who have already written is that when somebody's written a blog often people on the forum will be like how did you get up the guts to do that or what did you what up the process you got to before you decided to start publishing something or whatever and the answer is always the same it's always just you know I was sure I wasn't good enough to do it I felt terrified and intimidated of doing it but I wrote it and posted it anyway but you just like never a time I think any of us actually feel like we're not total frauds and imposters but we know more about what we're doing then us of six months ago right and there's somebody else in the world who knows as much as you did six months ago so if you act something now that would have helped you at six months ago you're helping some people and honestly if you wait another six months then the you of 12 months ago probably won't even understand that any more offices to advanced it know so Mike it's great to communicate wherever you're up to in a way that you think could be helpful to the person you were before you knew that thing um and of course something that the forum's have been useful for is getting feedback about drafts you know and if you post a draft of something that you're thinking of releasing and the folks here can point out things that they find clear or they think need some Corrections whatever so the kind of overarching theme with part two I've described as generative models but unfortunately then Rachel asked me this afternoon exactly what I meant by generative models and I realize I don't really know so what I really mean is in part one that the output of our neural networks was generally like like a number you know our category where else the outputs a lot of the stuff in part two are going to be like a whole lot of things you know like the top left and bottom right location of every object in an image along with what the object is or a complete picture with the class of every single pixel in that picture or an enhanced super resolution version of the input image or the entire original input paragraph translated into fish or you know it's kind of like often it just requires some different ways of thinking about things and some kind of different architectures and and so forth and so that's kind of like I guess the main theme of the kind of techniques we'll be looking at the vast majority possibly all of the data we'll be looking at will be either text or image data the it would be fairly trivial to do most of these things with audio as well it's just not something I've spent much time on myself yet somebody asked on the forum about like welcoming doing more stuff with time-series and tabular data and my answer was like I've already taught you everything I know about that and I'm not sure there's much else to say particularly if you check out the machine learning course which goes into a lot of that in a lot more detail so I don't feel that there's more stuff to tell you I think that's a super important area but I think we're done we're done with that we'll be looking at some larger data sets both in terms of the number of objects in the data search and the size of each of those objects for those of you that are working with limited computational resources please don't let that put you off feel free to replace it with something small at the simpler in fact when I was designing this course so did quite a lot of it in Australia when I went to visit my mom and my mom decided to book a nice holiday house for us with fast Wi-Fi and we turned up to the holiday house with fast Wi-Fi and indeed it did have Wi-Fi that was fast but the Wi-Fi was not the internet so I caught up the agent and I said like I found thee a DSL router and it's got an ADSL thing plugged in and I followed the cable down and the other end of the cable has nothing to plug into so she called the she called the people you know renting the house ona and called me back the next day and she said actually the tambaran is quite quickly I actually point leo has known internet wait what and so the good old Australian government had decided to replace ADSL in point Leo with a new National Broadband Network and therefore they had disconnected ADSL that had not yet connected and after brought that down so we had fast Wi-Fi which we could use to Skype chat from one side of the house to the other but I had no internet luckily I did have a news surface book fifteen-inch which has a gtx 1070 in it and so I wrote a live German of this course entirely on my laptop which means I had to practice with relatively small resources I mean not tiny like 16 gig ram and six gig GP here so I can definitely you know III definitely and it was all in Windows so I can tell you that most of this you know much one where this course works well on Windows on a laptop so you can always use smaller batch sizes you could use a cut-down version of the data set whatever but if you have the resources you'll get better results if you can use the bigger data sets when they're available okay now it's a good time I think to take a somewhat early break so we can fix the floor so the forum's chill down okay so it was it okay let's come back at 7:25 so let's start talking about object detection and so here is an example object detection and so hopefully you'll see two main differences from what we're used to when it comes to classification the first is that we have multiple things that were classifying which is not unheard of we did that in the planets satellite data for example but what is kind of unheard of is that as well as saying what we see we've also got what's called bounding boxes around what we see a bounding box has a very specific definition which is it's a box all right it's a rectangle and the rectangle has the object entirely fitting within it but it's no bigger than it has to be okay you'll see this bounding box is perhaps for the horse at least slightly imperfect in that this looks like there's a bit of tail here so it probably should be a bit wider and maybe there's leaving a little bit of hoof here maybe there should be a bit longer so like the bounding box this won't be perfect but they're generally pretty good in most data so our job will be to take data that has been labeled in this way and on data that was unlabeled to generate their classes of the objects and each one of those their bounding losses one thing I'll note to start with is that labeling this kind of data is generally more expensive it's generally quicker to say horse person person horse car dog jumbo jet than it is to say you know if there's a whole like horse race going on to label the exact location of every rider and a very horse and then of course it also depends like what classes do you want to label you know if you want to label everything fence post or whatever so generally always just like in like imagenet it's not like tell me any object you see in this picture it's an image notice like here are the thousand classes that we asked you to look for tell us which one of those thousand classes you find just tell me one thing for these object detection datasets it's here's a list of object classes that we want you to tell us about you know find every single one of them at any time in the picture along with where they are so in this case why isn't there tree or jump labeled that's because for this particular data set they weren't one of the classes that the annotators were asked to find and therefore not part of this particular problem okay so that's kind of the specification of the object detection problem so let me describe stage 1 and stage 1 is actually going to be surprisingly straightforward and we're going to start at the top and work down we're going to start out by classifying the largest object in each image so we're going to try and say the person actually this one is wrong dog is not the largest object sofa is the largest object all right so here's an example of a misclassified one bird correct person correct okay that'll be the first thing we try to do that's not going to require anything new so it'll just be a bit of a warm-up for us the second thing will be to tell us the location of the largest object at each image again here this is actually incorrect it should have labeled the sofa but you can see where it's coming from and then finally we will try and do both at the same time which is to label what it is and where it is for the largest thing and this is going to be obviously straightforward actually so to be kind of good warm-up to get us going again but what I'm going to do is I'm going to use it as an opportunity to show you some useful coding techniques and a couple of little faster I Andy details before we then get on to multi-label classification and then object specification so let's start here the logbook that we're using is Pascal notebook and it's all of the notebooks are in the DL too one thing you'll see in some of my notebooks is torch Dakota dot set device you may have even seen it in the last part just in case you're wondering why that's there I have four GPUs on the university server that I use and so I can put a number from North to three in here to pick one this is how I prefer to use multiple GPUs rather than run a model on multiple GPUs which doesn't always beat it up that much and it's kind of awkward I generally like to have different GPUs running different things so I in this case I was running something in this on device one and doing something else another booking device to obviously if you see this in a notebook left behind that was a mistake if you don't have more than one GPU you're going to get an error done so you just change it to zero or delete that line entire so there's a number of standard object detection data sets just like imagenet kind of a standard object classification data set and kind of the the old classic kind of image net equivalent if you like is Pascal vo see visual object class and it's something like that yeah the actual main website for it is like I don't know it's running on somebody's coffee warmer or something it goes down all the time every time he makes coffee I don't know so some folks with mérida it's very kind of thin so you might find it easier to grab from the mirror you'll see when you downloaded that there's a 2007 dataset of 2012 data set that there basically were like academic competitions in those different years just like the Internet data set we tend to use is like actually the image net 2012 competition producer now we'll be using the 2007 version in this particular notebook feel free to use of 2012 and stared it's a bit bigger you might get better results a lot of people in fact most people now in research papers actually combine the two you do have to be careful because there's some leakage between the validation sets between the two so if you do decide to do that make sure you do some reading about the data set to make sure you know how to combine them correctly the first thing you'll notice in terms of coding here is this we haven't used this before I'm going to be using this all the time now this is part of the Python 3 standard library called path Lib and it's super handy it's basically gives you an object-oriented access to directory or a file so you can see if I go path dot something it there's lots of things I can do one of them is iterative directory however path iterate directory returns that hopefully you've come across generators by now because we do quite a lot of stuff that use them behind the scenes without talking about them too much but basically a generator is something in in Python 3 which you can iterate over right so basically you go for oh in that grid Oh for instance right ok or of course you could do the same thing as a list comprehension right or you can just stick the word list around it to turn a generator into the list ok so anytime you see me put mist around something that's normally because it pretend a generator it's not particularly interests ding the reason that things generally return generators is that like what if the directory had 10 million items in you don't necessarily 1 to 10 million long list so we were the for loop just grep 1 do the thing thrown over wait a second throw it away and lets you do things lazily you'll see that the things that's returning aren't actually strings but they're some kind of object if you're using Windows it'll be a Windows path of Linux would be a POSIX path most of the time you can use them as if they're strings so most like if you pass it you know any of the OS path dot whatever functions in Python it'll just work at some external libraries it won't work so that's fine if you grab one of these let's say let's say o equals let's just grab one of these so in general you can change data types in Python just by naming the data type that you want and treating it like a function and that will cast it I had so many time you try to use one of these paths Lib objects and you pass it to something which says like I was expecting a stream this is not a string that's okay so you'll see there's quite a lot of convenient things you can do one kind of fun thing is the slash operator is not divided by but its path slash so like they've overwritten the slash operator in Python so that it works so you can say path slash whatever and that gets you you'll see like see how that's not inside a stream right so this is actually applying not the division operator but the overridden slash operator which means get the child thing in that path that makes sense and you'll see if you run that it doesn't return a string it returns a pathway object okay and so part one of the things the path the object can do is it has an open method right so this it's it's kind of it's actually pretty cool once you start getting the hang of it and you'll also find that like the open method takes all the kind of arguments you're familiar with it's a right for binary your encoding order so in this case I want to load up these these JSON files which contain not the images but the bounding boxes and the classes of the objects and so in Python the easiest way to do that is with the JSON library or there's some faster API equivalent versions but this is pretty small so you won't need them and you go json dot load and you pass it and open file object and so the easy way to do that since we're using path live is just go path open so these JSON files that we're going to look inside in a moment if you haven't used it before JSON is JavaScript object notation it's kind of the most standard way to pass around hierarchical structure don't yet know obviously not just with JavaScript you'll see I've got some JSON files in here they actually did not come from the mirror I mentioned the the original pascal annotations were an xml format but cool kids club uses email anymore we have to use JSON so somebody's converted them all to JSON and so you'll find the second link here has all the JSON files so if you just pop them in the same location that I've put them here everything will so these annotation files jaison's basically contain a dictionary once you open up the JSON it becomes a Python dictionary and they've got a few different things in the first is we can look at images it's got a list of all of the images how big they are and the unique ID for each one one thing you'll notice here is taken the word images and put it inside a constant court images that may seem kind of weird but if you're using you can a notebook or any kind of IDE or whatever this down means I can tap complete all of my strings and I won't accidentally type it slightly wrong so it's just a handy trick okay so here's the contents first few things and the images more interestingly here are some of the annotations right so you'll see basically an annotation contains a bounding box and the bounding box tells you the column and row if the top left and it's height and width and that it tells you that that particular bounding box is for this particular image so you'd have to join that up to over here to find it sexually Oh to top jpg okay and it's of category ID 7 um it also some of them at least has a polygon segmentation not just a bounding box we're not going to be using that some of them have an ignore flag so we'll ignore the ignore flags and some of them have something telling you it's a crowd of that object not just one of them right so that's that's what these annotations look like so then you saw here there's a category ID so then the categories for examples they're basically each ID he has a name here we go okay so what I did then was turned the his categories list into a dictionary from ID to name I created a dictionary from ID to name of the image file names and I created a list all of the image IDs just to make life easier so you know generally like when you're working with a new data set at least when I work with a new dataset I try to make it look the way I would want it to if I designed that data set so kind of do a quick manipulation and so like the the steps you see here and you'll see an h-class basically like the sequence of steps I talk as I started working with is this bead onus it except like without the thousands of screw-ups that I did I find like the the one thing people most comment on when they see me working in real time having seen my classes is like wow you actually don't know what you're doing it's like 99 some of the things I do don't work more percentage of the things that do work end up here so like this is like I mentioned that because machine learning and particularly deep learning is kind of incredibly frustrating because you know in theory you just to find the correct must function and flexible enough architecture and you press train and you don't all right but if that was actually all a talk then like nothing would take any time the problem is that all the steps along the way until what works it doesn't work you know like it it goes straight to infinity or crashes with an incorrect answer size or whatever and I will endeavour to show you some kind of debugging techniques as we go but it's one of the hardest things to teach because like I don't know maybe I just have quite a few get it out yet but it's like the main thing it requires is tenacity I find like the biggest difference between the people I've worked with who are super effective and the ones who don't seem to go very far has never been about intellect it's always been about you know sticking with it basically never never giving up so it's particularly important with this kind of deep learning stuff because you don't get that continuous reward cycle like with normal programming you've got like 12 things to do until you've got your Flash endpoints staged up you know in at each stage it's like okay we have successfully processing the JSON and now we successfully you know I've got the callback from that promise and now I successfully created the authentication system like you know it's this constant sequence of like stuff that works where else generally with training the model it's a constant stream of life it doesn't work it doesn't work it does okay so let's see a look at the images so you'll find inside the GOC dev kit there's 20 toy 2007 and 2012 directories and in there there's a bunch of stuff that's mainly these XML files the one we care about the JPEG images and so again here you've got path tips slash operator and inside there's a few examples of the images okay so what I wanted to do was to create a dictionary where the key was the image ID and the value was a list of all of its annotations so basically what I wanted to do was go through each of the annotations that doesn't say to ignore it and append it the bounding box and the class to the appropriate dictionary item where that dictionary item is a list but the annoying thing is of course is that if that dictionary item doesn't exist yet then there's no list to the pen too so one super handy trick in Python is that there's a class called collections default depth which is just like a dictionary but if you try and access a key that doesn't exist it magically makes itself exist and it sets itself equal to the return value this function now this could be the name of some function that you've defined or it can be a lambda function a lambda function simply means it's a function that you define in place we'll be seeing lots of them so here's an example of a function all the arguments to the function are listed on the left so there's no arguments to the function and lambda functions a special you don't have to write return as there a return is assumed so in this case this is a lambda function that takes no arguments and returns an empty list so in other words every time I try and access something in train annotations that doesn't exist now does exist it as an empty list which means I can go into it okay one comment on variable naming is when I read through these notebooks I'll generally try and like speak out the English words that the variable name is a limit for a reasonable question would be well why didn't I write the full name of the variable in English rather than using a short demonic it's a personal preference I have based on a number of programming communities where the basic kind of thesis is that the more that you can see in a single kind of I grab of the screen the more you can like understand intuitively that won't go every time you have to your eye has to jump around it's kind of like a context change that reduces your understanding it's a style of programming I found super helpful and so generally speaking I try to I particularly try to reduce the vertical height so things don't scroll off the screen but I also try to reduce the size of things so that there's a mnemonic there which if you know it's training annotations it doesn't take long view to see the patient's you know throughout the whole thing yet so I'm not saying you have to do it this way I'm just saying there's some very light programming communities some of which have been around for 50 or 60 years which refused this approach and it's interesting to compare like I guess my philosophy is somewhere between math and Java you know like in math everything is a single character the same single character can be used in the same paper for five different things and depending on whether it's in italics or bold faced with capitals another fire in Java you know variable names sometimes require a few pages well so for me I personally like names which are you know short enough to not take too much of my you know perception to see it once but long enough to have a mnemonic also however a lot of the time the variable will be describing a mathematical object as it exists in the paper and there isn't really an English name for it and so in those cases I will use the same like often single letter that the paper uses right and so if you see something called Delta or a or something and it's like something inside an equation from a paper I generally try to use the same thing just to explain that yeah and by no means do you have to do the same thing I will say however if you contribute to first day I I'm not particularly fastidious about coding style or whatever but if you write things more like the way I do than the wage are but people do okay so by the end of this we now have a dictionary from file names to at Apple and so here's an example of looking up that dictionary and we get back a bounding box and a-plus you'll see when I create the bounding box I've done a couple of things the first years I've switched the X&Y coordinates and the reason for this I think we mentioned this briefly in the last course the kind of computer vision world when you say like oh my screen is 640 by 480 that's width by height or else the math world when you say my array is 640 by 480 it's rows by colors so you'll see that a lot of things like pil pillow image library in Python tend to do things in this kind of width by height or columns by rows way numpy is the opposite way around so I again my view is don't put up with it's kind of incredibly annoying inconsistency fix it right so I've decided fast AI is you know the lump I PI torch way is the right way so I'm always rows by cons so you'll see here I sketched my rows of columns I've also decided that we're going to do things by describing the top left XY coordinate and the bottom right XY coordinate the bounding box rather than the XY and the eye width okay so you'll see here I was converting the the height and width to the top so you know again it's kind of like I often find dealing with junior programmers and particular junior data scientists that they kind of get given datasets that are in shitty formats or happy api's and they just act as if everything has to be that way but your life would be much easier if you take a couple of moments to make things consistent make them the way you want to be okay so earlier on I took all of our classes and created a categories list and so if we look up category number 7 which is what this year's veteran on the 7 is car let's have a look at another example image number 17 has two bounding boxes one of them is of type 15 one some type 13 that is a person and a horse so this would be much easier to understand if we can see a picture of these things so let's create some pictures so having just turned our height width stuff into top left bottom right stuff we're now going to create a method to do the exact opposite because anytime I want to call some library that expects the opposite I'm going to need to pass it in the opposite so here is something that converts a bounding box to a hiking with B bhw the bounding box - okay so it's again reversing the order and credit and giving us the height width so we can now open an image in order to display it and where we going to get to is we're going to get it to show that sets that car we just sorts it out right so one thing that I often get asked on the forums or through github is like well how do I find out about this open image thing where did it come from what does it mean who uses it and so I wanted to just to take a moment because what other things are going to be doing a lot and although a lot of you aren't professional coders you have backgrounds in statistics or you know meteorology your physics or whatever and I apologize for those of you that are professional coders you know this already you need because we're gonna be a lot doing about a stuck with the fast AI library and other libraries you need to go to navigate very quickly through them okay and so let me give you a quick overview of how to navigate through code and for those of you that haven't used an editor properly before this is going to blow your months right for those of you that have you're going to be like so for the demo I'm going to show you in Visual Studio code personally my view is that on pretty much every platform unless you're prepared to put in the decades of your life to learn beer more in apps well Visual Studio code is probably the best editor out there it's free it's open source there are other perfectly good ones as well okay also if you download a recent version of anaconda it will offer to install Visual Studio code for you it integrates with anaconda sets it up with your Python interpreter and comes with the Python extensions and everything so it's it's a it's a good choice if you're not sure if you've got some other editor you like you know search for the right keywords in the health so if I fire up Visual Studio code the first thing to do of course is do a git loan of the faster I library to your laptop you'll find in the root of the repo as well as the environment yml file that sets up a condor environment 52 you one of the students has been kind enough to create an environment - CPU yml file and perhaps one of you that knows how to do this can add some notes to the wiki but basically you can use that to create a local CPU only fast AI installation and the reason you might want to do that is so that as you navigate the code you know you'll be able to navigate into PI torch you'll see all the status is there anyway so I open up visual studio code and it's as simple as saying open folder right and then you can just point it out the faster I get hub folder that you just downloaded and so the next thing you need to do is to set up visual studio code to say I want to use the fast AI Condor environment place so the way you do that is with the select interpreter command and there's a really nice idea which is kind of like the best of both worlds between a command-line interface and a GUI which is you hit this is the only command in each mode ctrl shift P you hit ctrl shift P and then you start typing what you want to do and watch what happens Joseph P I want to changed my interpreter in okay and it appears if you're not sure you can kind of try a few different things right so here we are Python select interpreter and you can see generally you can type stuff in it'll give you a list of things if it can and so here's a list of all of the environments interpreters I have set up and here's my fast AI environment okay so that's basically the only setup that you have to do the only other thing you might want to do is to know there's an integrated terminal and so if you hit ctrl backtick it brings up the terminal and you can the first time you do it it'll ask you what terminal do you want if you're in Windows it'll be like PowerShell or command prompt or - if you're on Linux you've got more shells installed and asked so as you can see I've got it set up to use - okay and you'll see it automatically goes to the directory alright so the main thing we want to do right now let's find out what I couldn't understand is so the only thing you need to know to do that is control teeth if you hit ctrl T you can now type the name of a class function pretty much anything and you find out about it so open image you can see it appears and it's kind of cool if there's something that's got like camelcase capitalized or something that underscore you can just type the first few letters of each pitch so I could be like open image for example look I do that and it's found the function it's also found some other things that match oh there it is okay so that's kind of a good way you can see exactly where it's come from when you can find out exactly what it is and then the next thing I guess would be like well what's it used for so if it's used inside fast AI you could say find references which is shift o smoke set up should say shift open image shift f12 and it brings up something saying oh it's used twice in this codebase and I can go and I can have a look at each of those examples okay if it's used in multiple different files it'll tell you the different files that it's used in it another thing that's really handy then is as you look at the code you'll find that certain bits of the code call other parts of the code so for example if you're inside files data set and you're like oh this is calling something called open image what is that well you can wave your pointer over it and it'll give you the doc string or you can hit f12 and it jumps straight to its definition right so like often it's easy to get a bit lost in like things call things cool things and if you have to manually go to each bit it's if you're ready for us this way it's always one button way right ctrl T to go to something that you're specific you know the name of or f12 to jump to the name the definition of something that you're clicking on and when you're done you probably want to go back where you came from so alt left takes you back to where you were okay so whatever you use BM Emacs Adam whatever they all have this functionality as long as you have an appropriate extension installed if you use pycharm you can get that for free that doesn't need any exchange it's Python you know whatever you're using you want to know how to do this stuff finally I mentioned there's a nice thing called sin mode ctrl K Z which basically gets rid of everything else so you can focus but it does keep this nice little thing on the right hand side which kind of shows you whele okay so that's something that you should practice if you haven't played around with it before during the week because we're increasingly going to be you know digging deeper and deeper into faster iron pipe or fibers as I say if you're already a professional coders know all this stuff apologies for telling you stuff you know okay so we're going to well actually since we did that let's just talk about open image you'll see that we're using cv 2 cv 2 is the library is actually the opencv library you might wonder why we're using open CV and I want to explain some of the units of fast AI to you because some of them are kind of interesting and might be helpful to the torch vision like the standard kind of Pi torch vision library actually uses apply torch tensors for all of its you know data augmentation and stuff like that a lot of people use pillow Pio a standard of Python imaging library I found I did like a lot of testing of all of these I found open CV was about 5 to 10 times faster than to watch vision so early on actually teamed up with one of the students from an earlier class to do the planetlab satellite competition back when that was on and we used to watch vision and because it was so slow we could only get like 25% GPU utilization because we were doing a lot of data augmentation and so then I use the profiler to find out what's going on and realized it was all in in to watch vision pillow or PIL is quite a bit faster but it's not as fast as open CV it also is not nearly as threadsafe so I actually talked to the guy who developed the the thing that python has this thing called the global interpreter lock this before the GI L which basically means that true fred's can't do do pythonic things at the same time yes but it makes python a really shitty language actually the modern programming but they're stuck with it so I spoke to the guy on Twitter who actually made it so that open CV releases the GIM so one of the reasons the faster your library is so amazingly fast is because we don't use multiple processors like every other library does for our data organization we actually do multiple threats and the reason we can do multiple threads is because we use it and see bit unfortunately OpenCV is like a really shitty API it's kind of inscrutable a lot of stuff it does this point documented as they're poorly documented it's documented but like in really obtuse kind of ways so that's why I try to make it so like no one using fast AI needs to know that it's using a CD you know like if you want to open an image do you really need to know that you have to pass these flags to open to actually make it work do you actually need to know that if the reading fails it doesn't show an exception it just silently returns now you know it's these kinds of things that we try to do to actually make it work lastly right but as you start to dig into it you'll find yourself in these places and you're kind of want to know you want to know what and I mentioned this in particular to say don't start using you know height or your data orientation don't start bringing in pillow you'll find suddenly things slow down horribly or the body threatening won't work anymore or whatever I try to stick to using OpenCV for your processing okay so so we've got our image we're just going to use it to to demonstrate the pascal library and so the next thing I wanted to show you in terms of like important coding stuff we're going to be using throughout this course is is using matplotlib a lot better so matplotlib is so named because it was a rich a clone of matlab's flooding later unfortunately MATLAB matlab's plotting library is awful but at the time it was what everybody knew so at some point the matplotlib folks realized or they probably always view that the MATLAB plotting library is lawful so they added a second API to it which was an object-oriented API unfortunately because nobody who originally learned that plot let let the OO API they then taught the next generation of people the old MATLAB style API and now there's basically no examples or tutorials online I'm aware of that use the much much better easier to understand simpler ago so one of the things are going to try and show you because plotting is so important in deep learning is how to use this API and I've discovered some simple little tricks one simple little trick is plot subplots is just a super handy wrapper I'm going to use it lots right and what it does is it returns two things one of the things you probably won't care about the other thing is an axes object and basically anywhere where you used to say PLT dot something you now say ax dot something and it will now do that plotting to that particular sub bike so a lot of the time you'll use this or I'll use this during this course to kind of plot multiple plots that we can compare next to each other but even in this case I'm I'm creating a single plot alright but it's just it's just nice to only know one thing rather than lots of things so regardless of whether you doing one plot and lots of plots I always start now with with this that I was right and the nice thing is that this way I can pass in an access object if I want to plot it into a figure I've already created or if it hasn't been passed you know I can create so this is also a nice way to make your matplotlib functions like really versatile and you're kind of see this used throughout this course so now rather than plot that I am show it's a yesterday on show okay and then rather than kind of weird stateful setting things in in the old-style API you can now use ooohs you know get access that returns an object except visible that's a property it's all pretty normal straightforward stuff so once you start getting the hang of a small number of these oo matplotlib things hopefully you'll find life a little easier so I'm going to show you a few right now actually so let me show you a cool example what I think is a cool example so one thing that kind of drives me crazy with people putting text on images whether it be subtitles on TV or people doing stuff with computer vision is that it's like white text on a black background or black text on a black background you can't read it and so a really simple thing that I like to do every time I draw on an image is to either make my text in boxes white with a little black border or vice versa and so here's a like cool little thing you can do in matplotlib is you can take a matplotlib plotting object and you can go set path effects and say add a black stroke around it and you can see that then when you draw that like it doesn't matter that here it's white on a white background right or here at some black background it's equal and like it's just I know it's a simple little thing but it kind of just makes life so much better when you can actually see your bounding boxes and actually read the text so you can see rather than just saying add a rectangle I get the object that it creates and then pass that object to draw outline now everything I do that again this nice path effect runner you can see matplotlib is perfectly convenient way of drawing stuff alright so when I want to draw a rectangle matplotlib calls that a patch and then you can pass it all different kinds of patches so here's again you know rather than having to remember all that every time please take another function alright now you can use that function every time you know you don't have to put it in a library somewhere I always put lots of functions inside my notebook if I use it in like three notebooks then I know it's useful enough that I'll stick it in a separate library you can draw text and notice all of these take an axis object right so this is always going to be added to whatever thing I want to add it to right so I can add text and outline around it so having done all that I can now take my show image which and notice here the show image if you didn't pass it an axis it returns the axis it created right so show image returns returns the axis that image is on I then turn my bounding box into height width for this particular images bounding box I can then draw the rectangle I can then draw the text in the topple in the top left corner so remember the bounding box x and y are the first two coordinates right so the column to the top left this is the remember the top all contains two things the bounding box and then the class so this is the class and then to get the text of it I just pass it into my categories list and there we go okay so now that I've kind of got all that set up I can use that for all of my object detection stuff from here all right what I really want to do though is to kind of package all that up so here it is packaging it all it up so here's something that draws an image with some annotations right so it shows the image that goes through each annotation turns it into height and width draws the rectangle Roza test okay if you haven't seen this before each annotation remember contains a bounding box and a class so rather than going for o in a and n and going o 0 or 1 I can D structure it okay this is a D structuring assignment so if you put something on there something on the left then that's going to put the two parts of a top-off or a list into those two things to bandy so for the bounding box and the class in the annotations go ahead and do that and so then I can then say ok draw a image of particular index by grabbing the image ID opening it up and then calling that draw and so let's test it out and there it is okay so you know that kind of seems like quite a few steps but to me when you're working with a new data set like getting to the point that you can rapidly explore it it pays off right you'll see as we start building our model we're going to keep using these functions now to kind of see how things go alright so step one from our presentation is to do a classifier okay and so I think it's always good like for me I didn't really have much experience before I started preparing this course a few months ago in doing this kind of object detection stuff so I was like alright I want I want to get this feeling of even though it's deep learning of continual progress all right so like what could I make work all right well why don't I find the biggest object in each image and classifier I know how to do that all right so it's like this is one of the biggest problems I find today with the younger students if they figure out the whole big solution they want generally which involves a whole lot of new speculative ideas I tried before and they spend six months doing it and then the day before the presentation none of it works and this roof right where else like I've talked about my approach to Kabul competitions before I was just like half an hour if you go at the end of that half an hour submit something right and try and make it a little bit better than yesterday's so I kind of tried to do the same thing in preparing this lesson right which is try to create something that's bit better than lasting okay so the first thing was like the easiest thing I could come up with was my largest item classifier so the first thing I needed to do was to go through each of those each of the bounding boxes in an image and get the largest one right so I actually didn't write that first I actually wrote this first right so normally I like pretend that somebody else has created the exact API I want and then go back and write right so I kind of I wrote this phone first and it's like okay I need something which takes all of the bounding boxes for a particular image and finds the largest and well that's pretty straightforward I can just sort the bounding boxes and here again we've got a lambda function so again if you haven't used lambda functions before this is something you should study during the week right they're used all over the place to quickly define a function or like a once-off function and in this case the pythons dead the Python built-in sorted function lets you pass in a function to say how do you decide whether something's earlier or later in the sort order and so in this case I took the product of the last two items of my bounding box list ie the bottom right hand corner - the two items of my bounding box vest a the top left corner so bottom right - couple left is the size the two sizes and if you take the product of those two things you get the size of any boss and so then that's the function do that in descending order I mean often um often you can take something it's gonna be a few lines of code and turn it into one line of code and sometimes you can take that too far but for me I like to do that you know where I reasonably can because again it means like rather than having to understand a whole big chain of things my brain can just say like I can just look at that at once and say okay there it is and also I find it over time my brain kind of builds up this little library of idioms you know and like more and more things I can look at a single line and know what's going on okay so this bill is a dictionary and it's a dictionary because this is a dictionary comprehension a dictionary comprehension is just like a list comprehension I'm gonna use it a lot in this part of the course except it goes inside curly brackets and it's got a key colon value all right so here the key is going to be the image ID and the value is the largest angles okay so now that we've got that we can look at an example and here's an example of the largest bounding box for this image okay so obviously there's a lot of objects here there's three bicycles and three people okay but here's the largest bus and I feel like this ought to go without saying but it definitely needs to be said because so many people don't do it you need to look at every stage when you've got any kind of processing pipeline if if you're as bad at coding as I am everything you do will be wrong the first time you do it right but like there's lots of people that are as bad as be according and yet lots of people write lines minds of code assuming they're all correct and then at the very end they've got a mistake and they don't know where it came from right so particularly when you're working with images write or text like things that humans can look at and understand keep looking at it right so here I have it yep that looks like the biggest thing and that certainly looks like this so let's move on here's another nice thing in path Lib make directory okay method so I'm going to create a path called CSB which is a path to my large objects CSV file why am I going to create a CSV file pure laziness right we have an image classifier dot from CSV but I could go through a whole lot of work to create a custom data set and blah blah blah to use this particular format I have but why you know it's so easy to create the CSB check it inside a temporary folder and then use something they already have right so this is kind of a something I've seen a lot of times on the forum is people will say like how do I convert this weird structure into a way that first day I can accept it and then normally somebody on the forum will say like print it to a CSV file so that's a good simple tip and the easiest way to create a CSV file is to create a panda's data frame all right so here's my panda's data frame I can just give it a dictionary with the name of a column and the list of things in that column so there's the file name there's the category and then you'll see here why do I have this I've already named the columns in the dictionary why is it here because the order of columns matters all right and a dictionary does not have an order okay so this says the file name comes first in the category list all right so that's a good trick to creating a CSV so now it's just dogs and cats right I have a CSV file it contains a bunch of file names and for each one it contains the plus of that object so this is the same two lines of code 15,000 times what we will do though is to like take a look at this the one thing that's different is crop type so you might remember the default strategy for creating whatsis is here to 24 a to 24 by to 24 image in fast AI is to first of all resize it so the largest side sorry the smallest side is to 24 and then to take a random crop assuming it's rectangular a random square run during training and then during validation we take the center crop unless we use data augmentation in which case we do a few ran across for bounding boxes we don't want to do that because unlike an image net where the thing we care about is pretty much in the middle and it's pretty big a lot of the stuff in object detection is quite small and close to the edge so we could crop it out and that would be bad so when you create your transforms you can choose crop type equals crop type got no and no means don't crop and therefore to make it square instead it squishes it so you'll see this guy now looks kind of a bit strangely wide right and that's because he's been squished like this okay and generally speaking a lot of computer vision models work a little bit better if you crop rather than squish but they still work pretty well if you squish right and in this case we definitely don't want to crop so this is perfectly fine right so we you know if you had like very long or very or images that you know such that if a human looked at the squashed version if you like that looks really weird then that difficult but in this case we're just like so the computer whoa okay so I'm going to kind of quite often just dig a little bit into some more depths of fast AI and pipe torch in this case I want to just look at data loaders a little bit more so you already know that let's just make sure this is all run so you already know that inside a model data object when there's lots of model data subclasses like image classifier data we have a bunch of things which include training data loader and the training data set all right and we'll talk much more about this so but the main thing to know about training about a data loader is that it's an iterator that each time you grab the next iteration of stuff from it you get a mini batch okay and the mini that you get is of whatever size you asked for and by default the batch size is 64 okay you can pass below um however so in Python the way you grab the next thing from an iterator is with next right you can't just do that right and why can't you just do that the reason you can't do that is because you need to say like start a new epoch now right in general like this isn't just in pi choice but for any Python iterator you're kind of need to say start at the beginning of the sequence please all right and so the way you do that and this is a general Python concept is you write it up and it says please grab an iterator out of this object right specifically as we will learn later it means this class has to have to find an underscore underscore header underscore underscore method which returns some different object which then has an underscore underscore next underscore underscore yes right so that's how I do that right and so if you want to grab just a single batch this is how you do it X comma y equals next in a data load that Y X comma Y because our our lab data loader is our data sets behind the daily loaders always have an X you know the independent in the Y the dependent variable so here we can grab a mini batch of X's and Y's and now I'm going to pass that to that show image command we had earlier but we can't send that straight to show image for example here it is for one thing it's not an umpire right it's not on the CPU and its shape is all wrong it's not to 24 by 2 24 3 it's 3 by 2 3 4 30 24 furthermore these are not numbers between 0 & 1 why not because remember all of the standard imagenet pre-trained models expect our data to have been normalized to have a 0 mean and a 1 standard deviation so if you look inside see let's use Visual Studio code for this that's what we've been doing so if you look inside transform the strong model so ctrl T transforms from model TFM egg alright which in turn calls transforms 12 ashle transports model calls transport stats and here you can see normalize and it normalizes with some set of image statistics and the set of image statistics they're basically hard-coded this is the image snap statistics this is statistics user insertion models right so there's a whole bunch of stuff that's been done to the input to get it ready to be passed to a pre train model so we have a function called denim for denormalize it doesn't only do normalize it also fixes up the dimension order and all that stuff right and they look the denormalization depends on the transport okay and the data set knows what transform was used to create it so that's why you have to go model data dot and then some data set dot d norm and that's a function that it's stored for you that will undo that all right and then you can pass that a mini batch but you have to turn it into non bio first okay so this is like all the stuff that you need to be able to do to kind of grab batches and unlock them and so after you've done all that you can show the image and we've got that catalyst so that's looking good so in the end we've just got this to end of four lines of code we've got our transforms we've got our model data come learn about pre-trained we're using your resna 34 here I'm gonna add accuracy as a metric fix some optimization function to an LR find and that looks kind of weird not particularly helpful normally we would expect to see a uptick on the right the reason we don't see it is because we intentionally remove the first few points in the last few points the reason is that often the last few points shoots so high up towards infinity that you basically can't see anything so the vast majority of the time removing the last few points is a good idea however when you've got very few mini batches sometimes it's not a good idea and so a lot of people asked us on the forum here's how you fix it all right just say skip by default it skips 10 at the start so in this case we just say 5 by default it's gives 5 at the end we'll just say 1 and so now we can see that the shape properly um if your data sets really tiny he made it he's a smaller batch size like if you only have like three or four batches worth this one is not going to see that in this case so it's it's fine we just have to plot a little bit more okay so we pick a learning rate we say fit after 1 Apoc just training the last player it's a descent let's unfreeze a couple of players do another epoch 2% and freeze the whole thing not really improving why are we stuck at 80% kind of makes sense right like unlike imagenet or dogs versus cats where each image has one major thing they were kicked because they have one major thing and the one major thing is what they're asked to look for a lot of the Pascal data set has lots of little things and so a largest classifier is not necessarily going to do great but of course we really need to be able to see the results to kind of see like whether it makes sense so we're going to write something that creates this and in this case I'm kind of like I after working with this a while I know what the 20 Pascal classes are so I know there's a person and a bicycle class I know there's a dog and I so for class so I know this is wrong it should be so forever that's correct yes yes chair that's wrong I think the tables bigger motorbikes correct because there's no cactus there should be a bus person's correct that's correct Kasper if plants great cars correct so that's looking pretty good all right so um when you see a piece of code like this if you're not familiar with all the steps to get there it can be a little overwhelming alright and I feel the same way when I see a few lines of code in something I'm not familiar with I feel like a 1 as well but it turns out there's two ways to make it super super simple to understand the code or there's one high level where the high level way is run each line of code step yeah step printout the inputs print out the efforts most of the time that'll be enough if there's a line of code where you don't understand how the outputs relate to the inputs go and have a look for the sauce so now all you need to know is what are the two ways you can step through the lines of code one at a time um the way I use paths the most often is to take the contents of the loop copy it create a cell above it paste it out dent it right I equals naught and then put them all in separate cells and then run each one one at a time printing out the inputs now I mean I know that's obvious but the number of times I actually see people do that when they asked me for help is basically zero because if they had done that they wouldn't be asking for help another method that's super handy and there's particular situations where a super super super handy is to use the Python debugger who here is used a debugger before so after two thirds so for the other half of here this would be life-changing actually a guy I know this morning is actually a deep learning researcher wrote on Twitter and his his message on Twitter was how come nobody told me about the Python debugger before my life has changed and like this guy's an expert but because like nobody teaches basic software engineering skills in academic courses you know nobody thought to say to him hey Mark you know what there's something that shows you everything your code does one stair at a time so I replied on Twitter and I said good news mark not only that every single language in existence in every single operating system also has a debugger and if you google for language named debugger it will tell you Harry's right so there's a metal piece of information point in Python the standard debugger is called PDB ok and there's two main ways to use it the first is to go into your code and the reason I'm mentioning this now is because during the next few weeks if you're anything like me 99% at the time you'll be in a situation where your codes not working right and very often it all have been on the fourteenth mini-batch inside the forward method of your custom module that it's like what do you do right and the answer is you go inside your module and you wrap that right and if you know it was only happening on the 14th iteration you type if I equals 13 right so you can set a conditional breakpoint that's put a breakpoint PDB is the Python debugger fast AI imports it for you if you get the message that PDB spots there then you can just say import PD Lee ok so let's try that and justly it's not the most user-friendly experience it just pops up a boss right but the first cool thing to notice is holders should the debugger even works in a notebook all right so that's pretty nifty you can also work in the terminal plus and so what can you do you can type a trip right and there are plenty of tutorials here and the main thing to know is this is one of these situations where you definitely want to know the one letter mnemonics right so you could type next but you definitely want to talk right you could type continue you're definitely less I've listed the main ones you need so what I can do now that I'm sitting here is like it shows me the line I'm Kara it's about to run okay so one thing I might want to do is to print out something and I can write any Python expression and hit them up and find it okay so that's that's a useful thing to do a might want to find out like more about like well where am I in the code more generally I just want to see this line but what's the before it and after it okay so I want a whole forest right and so you can see I'm about to run that line these are the lines above it in the blower okay um so I might be like okay let's run this line and see what happens so go to the next line is ten okay and you can see now it's about to run the next one one handy tip you don't even have to type n if you just hit enter it repeats the last thing you did so that's okay so I now should have a thing called beep right unfortunately single letters are often used for debugger commands so if I just type B it'll run the big man rather than print B for me that's so to force it to print use P print okay so there's bird all right fine let's do next again right at this point if I hit next it'll draw the text but I don't want to just draw the text I want to know how it's going to draw the text so I don't put no next over it I want to ask step into it so if I now hit s to step into it I'm now inside draw test and I now hit n I concede your text and so forth okay and then I'm like okay I know everything I want to know about this I will continue until I hit the next breakpoint so C will continue what if I was zipping along this happens quite often that like let's step into Dean on here I am inside Dean on and what will often happen is if you're debugging something in your PI torch module and it's hidden exception and you're trying to debug you'll find yourself like six layers deep inside pi torch but you want to actually see backup what's happening when you called it from right so in this case I'm inside this property but I actually want to know what was going on up the call stack I just hit you and that doesn't actually run in a thing it just changes the context of the debugger to show me what called it and now I can type you know things to find out about that environment okay and then if I'm gonna go down again it's deep okay so like I'm not gonna show you everything about the debugger but I just showed you all of those commands right yes there's a Oh something that we found helpful as we've been doing this is using from ipython court a debugger imports a trace and then you get a all prettily colored it's usually excellent tip let's learn about some of our students here is it tell us I know you were doing an interesting project can you tell us about it okay hello everyone I mean is a here with my uh my collaborator Britt and we're using this kind of stuff to try to build a Google Translate for animal communication yeah so that involves playing around a lot with like unsupervised machine neural translation and doing it on top of audio where do you get data for that from ah that's sort of the hard problem so there you have to go and like we're talking to a number of researchers to try to collect and collate large data sets but if we can't get it that way we're thinking about building a living library of the audio of the species of Earth that involves going out and like collecting a hundred thousand hours of like gelada monkey vocalization so all right that's great here okay so let's get rid of that set trace um the other place that the debugger comes in particularly handy is as I say if you've got an exception all right particularly if it's deep inside pipes watch so if I like when I times 100 here obviously that's gonna in exception I've got rid of the set trace so if I run this now okay something's wrong now in this case it's easy to see what's wrong right but like often it's not so what do I do percent debug pops open the debugger at the point the exception that okay so now I can check like okay creds Len crits 64 5 times 100 I've got a print that size and 100 oh no one okay and you can go down the list okay so I do all of my development both with the library end of the lessons in G but a notebook I do it all interactively and I use you know percent debug you know all the time along with this idea of like copying stuff out of a function of putting in a desert of cells running it step by step there are similar things you can do inside for example Visual Studio code there's actually Jupiter extension which lets you select any line of code inside Visual Studio code and it'll and say run in Jupiter and it will run it in Jupiter and create a little window showing you the output there's neat little stuff like that personally I think Jupiter notebook is better and perhaps by the time you watch this on the video you know the lab or me the main thing give it a lab selection in the next version of Jupiter notebook pretty similar Wow I just broke it totally okay well we know exactly how to fix it so we were worried about that another time hey debug it this evening okay so to kind of do the next stage we want to create the bounding box okay and now creating the bounding box around the largest object may seem like something you haven't done before but actually it's totally something you've done before okay and the reason is something you've done before is we know that we can create a regression rather than a classification here all right in other words a classification year on there is just one that has a sigmoid or soft mapped out port and that we use across entropy or binary cross entropy loss function like that's basically what makes it if we don't have the softmax it boys at the end and we use means Guidera as a loss function it's now our regression model right and so we can now use it to predict a continuous number rather than the category we also know that we can have multiple outputs like in the planet competition we did a multiple object classification what if we combine the two ideas and to a multiple column regression so in this case we've got four numbers top left out and why bottom-right X&Y yeah and we could create a neural net with four activations we could have no softmax or sigmoid and use a mean squared error loss function and this is kind of like where you're thinking about it like differentiable programming it's not like how do I create a bounding box model it's like all right what do I need I need four numbers therefore I need a neural network with four activations okay that's traffic what I need to know the other half I need to know is a loss function in other words what's a function that when it is lower means that the four numbers are better because if I can do those two things I'm going okay well if the X is close to the first activation and the wires close to the second so forth then I'm done so that's it I just need to create a model with four activations with a mean squared error loss function and that should be it right like we don't need anything new so let's try it so again we'll use a CSV right and if you remember from part one to do a multiple label classification your multiple labels have to be spaced separated okay and then your file name is comma separated so I'll take my largest item dictionary create a bunch of bounding boxes for each one separated by a space no use you know this comprehension I'll then create a data frame like I did before I'll turn that into a CSV and now I've got something that's got the file name and the four bounding box corners I will then pass that to from CSV again I will use crop type equals crop type dot no real next week we'll look at transform type dot coordinate for now just realize that when we're doing scaling and data augmentation that needs to happen to the bounding boxes not just images image classifier data dot CSV gets us to a situation where we can now grab one mini batch of data we can do normalize it we can turn the bounding box back into a height width so that we can show it and here it is okay remember we're not doing classification so I don't know what kind of thing this is it's just a thing but there is the thing okay so I now to create a comic debt based on President 34 but I don't want to add the standard a set of fully connected layers that create a classifier I want to just add a single linear layer with four outputs so first AI has this concept of a custom head if you say my model has a custom head the head being the thing that's added to the top of the model then it's not going to create any of that fully connected Network for you it's not going to add the adaptive average pooling for you but instead it'll add whatever model you asked for so in this case I've created a tiny model it's a model that flattens out the previous layer so remember I'm normally would have a seven by seven by I think 512 previous layer in risen at 34 so it has flattens that out into a single vector of length 2508 fat and then I just add a linear layer that goes from 2508 eight to four there's my four yeah so like that's the simplest possible kind of final layer you could add I stick that on top of my pre-trained risen at 34 model so this is exactly the same as usual except I've just got this custom here all right optimize it with atom user criteria I'm actually not going to use MSC I'm going to use l1 loss so I can't remember recover this last week we can revise it next week if we did it but l1 loss means rather than adding up the squared errors add up the absolute values of years so it's like it's it's normally actually what you want adding up the squared errors really penalize --is bad misses by too much so l1 loss is generally better to work with okay I'll come back to this next week but basically you can see what we do now is we do our ela find find our learning rate learn for a while freeze - - to learn a bit more freeze - - three learn a bit more and you can see this validation loss which remember is the absolute value mean of absolute value with pixels were off by gets lower and lower and then when we're done we can print out the bounding boxes and lo and behold it's done a damn good job okay so well revise this a bit more next week but like you can see this idea of like if I said to you before this class do you know how to create a bounding box model you might have said no nobody's taught me that all right but the question actually is can you create a model with for continuous outputs yes can you create a loss function that is lower if those poor outputs are near to four other numbers yes then you're done okay now you'll see if I scroll a bit further down it starts looking a bit crappy anytime we've got more than one object and that's not surprising right because like how the hell do you decide which birds so it's just said I'll just pick the middle which cow I'll pick the middle how much of this is actually potted plant right this one it could probably improve but you know it's got close to the car but it's pretty weird right but nonetheless you know for the ones that are reasonably clear I would say it's done a pretty good job okay all right so that's time for this week I think J you know it's been a kind of gentle introduction for the first lesson if you're a professional coder there's probably like not heaps of new stuff here for you and so you know in that case I would suggest like practicing learning you know about bounding boxes and stuff if you answer experienced with things like debuggers and that flat live api and stuff like that there's gonna be a lot for you to practice because we're going to be really assuming you know well from next week okay thanks everybody see you next Monday [Applause]