[Music] so welcome to the course programming data structures and algorithms using python so to begin with we will look at some of the things that you are already familiar with just to get a refresher on python and one of the things i will start with is the environment in which we are going to run python code so i am going to be talking to you today about jupiter notebooks so normally when we want to write and run code there are many options available to us perhaps the simplest option is what you see here so you have a text editor right so you type your code and then you open a kind of a console or a terminal and then you load the code that you have typed in in your text editor and then you run it now this is of course easy to do but it's a little bit tedious because if you decide to make a change over here then you have to go back reload it so going around in this kind of cycle of edit and run is a little bit cumbersome if you are using a separate text editor and using directly the console so this gives rise to a more convenient interface which you are familiar with with replit called an integrated development environment or ide so in an ide you have on one side something where you can edit your code and side by side you have this command window where you can run the code and see the output and because you can now see the output and edit your code in the same interface it is much easier to cycle back and forth and see how your code can be changed and how the output changes so you have this quick cycle of updating your code and running it and of course an ide has other features also so it offers you typically a debugger a debugger will allow you to interrupt your code and inspect the values of various variables to see what what's going on you might also be able to prepare some test cases in advance and run them to see how your code performs and so on so this is the default that most people use for developing serious amounts of code and of course different ides are designed for different programming languages some like replica support multiple languages other ids you might find are dedicated to certain languages like python or java or c plus plus and so on so we already have an ide but i am going to present to you yet another way of writing and running code and the question is why would one want one more when you already have an ide so the main point that i would like to emphasize is that writing code whether for yourself or for teaching like i am teaching now also involves collaboration very rarely do you write code only for yourself you usually work in a team so as a team you might be together developing the code or this might be a larger team for instance it could be some kind of a research environment and this is very common say in machine learning you are trying to solve a problem coding is part of the solution but is not the solution in itself so when you run the code not only is it possible that more than one person may be running the code or developing the code to run but there are maybe people who are not running the code who are not even aware of how the code works but who might want to know what the code does so then you want to be able to share your results now not notice that when you have a typical ide when you run the code you can see the results but what happens after you run the code you can't take the code you can't take the id open to another office and show somebody what you have done you would like to have some way of preserving the output for later without having to go through a sort of cumbersome process of saving it to a file or preparing a report or something like that so here on the right you see a typical view of what a jupiter notebook which is an interface that i am going to introduce today looks like so you have on one side the ability to write text so in this case is just a simple title but you can also as we will see introduce more annotations which describe what you have done or what you are planning to do and then you have in this block you have the actual code exactly as you would have typed it into a standard editor or into an ide and now at the bottom you have outside this grey area you have the output of the code what you would normally see in a separate window on the side in an ide right so the documentation is interleaved with the code so you have the documentation here you can have more documentation so in between your code fragments or different parts of your code you can introduce documentation without looking at it in the comment section of a python program so in python of course you can include comments by writing this hash and then writing something after that but this does not look very neat and it is also very difficult to format the idea of this notebook is that you can actually format your text and interleave it with the code so that it looks readable the other thing that you can do in this environment which is difficult to do when you are working in an ide is to replace one piece of code by another piece of code without abandoning the previous one you do not want to throw it away you want to say what if i replace this by that so i might have two different definitions of the same function and i may want to try out both of them without having to go through a tedious process of you know loading a different file or deleting one and replacing it by the other so i want both but only one to be active and we will see that in a notebook this is possible because it really depends on which was the last copy of that function that you executed and finally as i said you would like to keep the outputs available for somebody else to see now this could be for two reasons one is you are just reporting it you want to prepare a kind of documented output after you have run your code so that you can evaluate whether it did the job that it was supposed to do or you want to run it again and the other thing is also that if somebody else wants to run your code they should be able to see what you had got with your output so that when they run it they can see whether they get the same output or they get some other output right so preserving the output is part of saving this project as it were for future use so the concrete interface that we are going to look at is something called a jupiter notebook so you may be familiar with a spreadsheet so what is a spreadsheet a spreadsheet is is basically a large in some sense an indefinitely large square matrix of cells right and in each of these cells you can put a value you can put some text you can put a formula so it is unstructured in the sense that it does not tell you specifically what each cell should contain but it is structured in the sense that there is a position each cell has a kind of location you have a column number and a row number and then you can write formula saying this cell should be the sum of two other cells or it should be the sum of a column of cells or a row of cells and so on so jupyter notebook is like a spreadsheet except you have only this one column right so you don't have multiple columns but you can see that you have one cell then another cell then another cell then another cell so you have a sequence of cells from top to bottom so that's what a spreadsheet looks like now in a spreadsheet you can do whatever you want with a cell right you can put many things you can put text you can put numbers you could put formulas you can put even charts you can put diagrams and so on so in a jupyter notebook each cell holds either a piece of code or it holds some text which you want to insert to explain something about the code to somebody else so it's either code or text and as i said before the text is not just a comment as you would put in a python program it is something that has formatting so how do you specify the formatting well the jupyter notebook supports a format called markdown so this is a very simple type of format which is like the type of formatting you would do if you were just composing some text in a normal text mode without a word processor so you would basically have a hyphen to indicate a bulleted item and all that and the thing with markdown is that it will convert it into nice formatted output so there are various ah resources on the internet so here is one which will tell you how to actually use markdown to generate formatted output so i will leave you to look that up its not very important for this particular course except that it can be done so the main thing about a jupyter notebook is that like in a spreadsheet right we can dynamically change the spreadsheets contents by either updating a cell with a new value or when we update some values we can rerun some formulas either explicitly or implicitly to recompute some other values so in the same way in a jupyter notebook you can add code or update code and then rerun that code so you can change definitions of functions you can rerun an output and see how it changes with each edit so you might wonder about the name jupiter so the notebook as such this is a spreadsheet like format this notebook as such is not specific to python so it is a generic interface to any programming environment which allows you to kind of interpret your code where you can write code and then immediately get it to be recognized and run so that's what happens in python right we change the code and then automatically the python interpreter is able to read the updated code and run it so jupyter was designed to work with three different languages julia python and r so r you may have come across in some statistical context so julia is also a kind of scripting language for these kinds of calculations so if you take the first few letters of julia python and r you get jupiter and that is the origin of the name we of course will not be using it for anything other than python but it is important to note that jupiter as a framework is not limited to only python you can use different what they call kernels you can have different programming languages supported behind the notebook as to the code that you run so one interesting one reason we are focusing on the jupiter notebook format is not that it is necessary in some sense to understand data structures and algorithms in this course but it is an extremely popular format in machine learning which is the broader scope of this whole program so if you look at code which is available for many machine learning projects it is typically saved and you know disseminated using the jupyter notebook format in particular if you know about the site called kaggle so kaggle is a competition for ml machine learning where they post problems so some of these are synthetic problems some of them are real problems some of them even have price money because somebody wants a problem solved and they're looking it's like crowd sourcing they want a number of people to attempt it and after the competition closes many people will post their solutions and these solutions are typically in this notebook format so you can then access the solution which will have both the code and the documentation and then you can examine it rerun it tinker with it and so on so in particular this means that you get the opportunity to take somebody else's code for a given problem and see how it works by modifying it and trying to enhance it so the jupyter notebook has become so successful especially with the growth of interest in machine learning that it has won a number of awards so the jupiter project which is behind the jupiter notebook one in particular this acm software systems award in 2017 which is given for significant software systems which are used by the community so it is a fairly powerful tool so of course you can run jupyter notebook on an individual system whatever system you have whether it's windows or mac or linux by installing it and like any other python system you also have to import and install the relevant libraries that you need we on the other hand will be using a publicly available form of the jupiter notebook which is put up by google in what is called collab which is short for the co laboratory so collab.research.google.com leads you to an interface where you can create these notebooks and save them and it is most importantly free to use so this has a slightly different look and feel from the jupyter notebook that you would install on your own system but essentially it is the same broad structure which has been customized by google for their internal use and then released for public use so it is a customized jupyter notebook and one of the things which will be useful for you not necessarily in this course but outside this course and the other courses that you are doing is that collab has all the standard packages for machine learning pre-loaded so in particular there is a very popular pack library called scikit-learn which has a lot of the standard machine learning models already implemented in it which you can call and use and there is google's own library called tensorflow which is used for deep learning or deep neural networks plus because it is running on the cloud you also have access to hardware which is beyond the limitations of your personal computer in particular google makes it possible for you to run some of this machine learning code on what's called a gpu a gpu is a graphical processing unit and it is very useful to run the kind of large scale matrix calculations which run behind the scenes in machine learning so many calculations which would take an enormous amount of time for you to run on the laptop will run in a much more reasonable amount of time on collab using a gpu so to summarize we will use jupyter notebooks because it is a convenient interface to develop python code in particular this ability to edit save and share the code is useful for me as an instructor also to be able to distribute the code that we discuss in this class to you after the class so you can incrementally update and run your code you can write documentation in between your code using this markdown syntax and most importantly as i said you can preserve the state of the notebook in terms of what you have run what outputs you have generated and export it and this is extremely useful for collaboration and for sharing whether you are sharing it with a colleague or you are sharing it for teaching purposes like we are doing here and the particular version of the jupiter notebook that we are going to be using is google's collab which is free to use and it's configured for machine learning