Transcript for:
Introduction to R for Marketing Analytics

[Music] [Music] hello everybody uh welcome to marketing analytics course uh this is the first module introduction to Art programming and my name is Dr sh vinod Gupta School of Management it kakur I will be taking this course for you uh so introduction to Art programming so the first have to discuss before we jump in in this thing that what why we are doing art programming uh we will do handson of uh marketing analytics now hand Zone can actually be done uh using various uh software but we have chosen Excel for some of the smaller problems and art programming for a bigger problem now Excel is something that is almost inevitable in in today's management era all people of all all uh I would say managers actually use spreadsheet modeling and even the academicians who teaches managers future managers uh will also require to know the integrities of Excel and we will actually majorly focus on how the various uh features of excel uh can be used in marketing problem solving on the other hand the problem with Excel is Excel can generally till now what we have seen that Excel uh becomes a little bit of limited when the data size is big if the data data size the number of rows is more than six lakhs uh Excel actually has some problem so it's better to use a software uh which is which is Handy to uh deal with a little bit of larger size of data now we have multiple software proprietary and nonproprietary uh open source which we have in our hand but we have chosen our because fun reason it's it is uh open source the second reason is it has huge support so there are lots of resources available online uh you can learn it on your own we will be actually uh teaching you a little bit but you can learn it on your own as well another option is python obviously uh but python is more uh used in for the deployment uh when you create a software which will be deployed for a automated kind of a uh problem uh solving but art is more uh good for research oriented work and marketing analytics is often backend research oriented work so we will focus on R programming now r i told that it is online it is uh the open source available online freely downloadable so before we jump in we have to learn that how to download it and how to install the art programming we will have a few sessions on Art programming before we jump into the actual marketing analytics so that you become handy with the software so the first thing is uh this particular presentation will show you how to install R and art studio and I will discard about about discuss about our studio also and then we'll today we'll cover these aspects like vector Matrix data manipulation a little bit eils functions in a few uh probably one or two sessions now uh installation of r that is the first job if you want to install R you have to go to this particular link you can also go to Google and search for c r or something like that or R download or something like that it will ultimately lead you to this particular thing I have choosing Windows because I use Windows if you use some other ways then you can use corresponding uh R software so you have to go to this particular link and click on install R for the first time so it will be written there you have to click on that uh that will actually direct you to the latest version of R available currently the rated latest version is R 3.6.1 so once you go there you actually download R 3.6.1 when you act when we actually see this video uh probably uh more recent version of R might be available if it is available download that now after downloading it you have to install by double clicking on the installer like you do for any other exe file and it will get installed now we actually use R Studio over r r has its own UI user interface also but we use our studio because I have seen it's my personal opinion and probably many of many other people will actually agree with me that R studios's UI is more user friendly than RS UI there are lots of more options available uh lots of more drag and drop kind of options or click kind of options are available which makes it easier uh and so our studio is also free the it's open source software so it is B to use that so we go to this particular link to download art studio so when you go to the link there are various versions of art studio available and we will use uh this Art Studio desktop which is a free version and uh again in the free version for various ways lots of different kinds of installers are available we will use the r Studio the latest version for Windows 10 or 8 or 7 because I am doing it for the windows so the latest version currently available is R Studio 1.2.5 1 but when you do it forther latest version can come whichever is the latest version available install that now the problem with r Studio a little bit of problem with r studio is the newer versions are more focused towards 64bit system now by chance if you have a 32bit system you have to go to this link and then the same thing the latest the the the oldest uh here the older versions are there the version which is most recent but older but at the same time handles 32-bit system can be downloaded from here so you can download that and install that for the marketing analytics purpose both of this our studio our studio whether it is 64bit or compatible with 32 beat both of them will work so we have no problem install any one of them so once you have installed this we will be able to go ahead with learning basic art programming which will be used in marketing analytics course as we go proceed now I will show you that how to start with our studio so once you have installed it you can go and click on this uh start button Art Studio might be shown there or you can just search for our studio and our studio will come you can click on that and once you click on that something like this open now for your case for my case there was already one uh file was open but most of you if you are doing for the first time this is something that you will see this is a view or probably something similar to this view is what you will see now this is an UI as I told that it is more uh friendly than ours own UI uh and there are various aspects of this particular UI currently you can see there are three boxes uh two here and two in this part and one is this side and I will actually explain step by step what are the job of these particular boxes now in any software uh whether it is uh let's say Microsoft Word or Excel or whatever you have to first start with a blank page where you will write anything and save that particular file now here also we'll start with a blank page that is the first job that we have to do open a blank page so to do that what I will do is I will go to file at the corner left corner you will see file and then new file and our script file new file R script you can also click on control shift in so that will open something like this it's a new Untitled file that means it's a new file where no name is there now it's a good practice that I believe that when you have opened a new file in let's say Microsoft Word or Excel it's a good practice that you save it you save it so that later point of time if you write something you just press contrl s and it gets saved so otherwise by chance if your computer hangs or by chance sometimes it's a programming language no so if you run a program sometime it can probably go for a it can go get uh probably Disturbed so so that you do not lose your data do not lose your code it is a good practice to start saving from the very fast so what I will ask you to do is to click on this button which is a blue button you can see it's a floppy disc kind of a button if you can click it's a save button actually if you click on that it will ask you what to save and you choose anywhere so I will probably save in my desktop there is something called I've created a folder and I have probably write probably something like practice. R now once I write that and save that this particular file gets saved now you are ready to use art studio so there are lots of options and not all option can be taught in the first day so we will slowly see that what are the various options available how we can go ahead as per our requirement so first of all there are four boxes that you can see so one box is here one box is here one box is here and another box is here there are four boxes that you can see now these four box I like love to say that there are four quadrants of my screen now each of these quadrant has some job to do the second quadrant which is practice. Ry there practice or r is written it's an editor this is where you write your code this is where you save your code so for that so that you can use it for the later point of time so this is where you write your code on the other hand this part there are three tabs you can see console terminal and jobs we will talk about console so console is the place where your code runs so when you run the code all your result comes in the console now in the right hand side there is first quadrant and fourth quadrant the right hand side at the top in what we call as what I love to call as first quadrant is where the environment is there history is there and Etc so I will focus on the environment part so history is also there and connections is also there but environment part I will focus on environment is where whatever data set variable Matrix that you want to save so that you can use it later can be stored for example if you have ever done any coding let's say in C or C++ you have done INT in I is equal to Z so that I equal to Z is something that in the name of I you are saving some value so that is that I value will be saved in this global environment and in the fourth quadrant there are lots of tabs like files is one Tab and then plots is another tab packages is another Tab and at the right time I will discuss about all these steps so these are the four quadrants as I told here I will write here I will run the code now I have already written one a set of uh codes but I would ask you that either you it is a good practice that you copy the code from there or you type it on your own and run the codes otherwise you can follow the code that I have shared with you but it is very good practice that you type on your own because when you type on your own you do the mistakes and when you do the mistakes you learn from those mistakes it is very important to do mistakes as until and unless you do mistakes in coding you will not learn how to code so it's better that you type on your own do mistakes learn that what kind of things you have done because if I code I will write all of these things are which are right and that will not teach you anything so it is better that you write the codes on your own whatever I'm showing there you write on the editor ually on your own so in the file section you will find out there is a file called w1s one. R so Week 1 session 1. R so I am double clicking on that file and it will open something like this so other than that file I have opened practice. R previously I'm closing it so at this moment I am closing all the things it is again a good practice some people does what so what you will see that some people when they work on W multiple W files remain open even if even some files which are which is not writing right now Still Remains open what happens is by chance if Microsoft Word hangs then all of those things will have a problem all the files will have a problem so it is a good practice to keep only that file open which you using and close all other files so closing is nothing but just clicking on this cross sign so if you by chance have any other tabs open here close that and W1 s1r is something that we will work on now those who want to code on their own as I told that you have to it is better you want to type on your own so then it is better that you open a new file and then save this file and then type it here so whatever is written here you can copy it one by one and type it in here and then run it so then you will know that what kind of mistakes you are doing sometimes it is better not to copy or probably type on your own then you will know what kind of mistakes you are doing and how you can actify those mistakes now let's say you have written this quotes and I will come one by one so first good job is to cleanliness cleanliness is another very important thing for any coding because then you will get less confused so don't write down anything whatever I'm uh telling now don't write down anything again it is a good practice to learn coding by practicing rather than by memorizing so it has to be uh it has to it has it should come from your inside that okay this kind or if it is not coming from inside you should have a resource to fall back on and that should not be something which is your notes so don't write down so first of thing that I will do is I will control I will I will clean my console and to do that what I will do is I will press control l contrl l so I just pressed that contrl and L keeping control and L so contrl L cleans the console at any point of time so what control plus L that is what I have to keep the console clean so often times we actually write lots of codes in the editor and run the codes and we run multiple codes to see that which code is actually working which code is actually giving out the output that I want the previous ones I do not need so then I will just press control cons contrl L to clean my console console is where the codes run so now let's start with this thing so the first thing that you have to understand in R is that R has certain objects like one object is called Vector another object is called Matrix another called object is called data set so depending on Dimensions depending on the contents depending contents depending on the various other aspects the objects differ so the basic object the most basic object of R is called a vector or a variable so you can imagine vector or variable is in Excel it's a one column of an Excel or one row of an Excel so it's better to imagine one column of an Excel which has a name at the top of the column and then there are certain values in it it can have one value it can have multiple values so even if a particular some name contains only one value that is a vector multiple values also that is a vector so for example I start with so I have written this start with a vector so this is a comment anything that starts with a hash sign hash sign is a comment so comment means it will run but it will not give any result so how to run it there are two ways of running it one is you can select the area that you want to run the code that want to run select and then press on this run button see it has the moment I click uh run it came down in the console start with a vector it got run but it it is a comment it will give no output nothing changed so that is the first St now if you want to run two three lines sometimes together one at a time two three lines you have to select probably the whole area let's say I want to run all these three and then click on run button it will also run now a good practice is select the area again I believe that a good practice is to select exactly the portion of the code that you want to run sometimes we want to run line by line so instead of selecting the whole line I can just put my cursor and then press run also if I just put my cursor and press run it will run one single line of code so here the second line of code is a equal to0 that will get run so it gets run a equal to 0 and the moment I run that a is equal to Z you will see that in the global environment zero gets saved in the value of a means means the name is a there's a vector gets created whose name is a and the corresponding value that gets created that gets saved is zero How Will I use that if I write in my console just a and then press and enter I have written a and press an enter it gives me zero it gives me that the output is zero it's like print a it gives me the output is zero there is something written in in the third bracket of one I will discuss about this later so whatever is this guy is I will discuss about it later at later right point I will discuss similarly if you have done that how to store store a value of five in B so I've written B is equal to 5 okay and if I have written b equal to 5 I will run this and see here in five get saved in the name called B see just check these three four lines these three four lines so first line is a equal to 0 so 0o gets saved in a but nothing comes as the output in the next line I have a and then pressed an enter now I'm calling a so what is whatever value is in a gets comes out next again I have written B is equal to 5 so five gets saved in B but no output is there it's just some code has run and something has been done but nothing comes out now if I ask the output B and press an enter Then only it gives me the output of B how will I can how I can use this let's say I want to know what is A + B what is the summation of a plus b I will write a + b and press an enter it gives me the output of a + b so a is zero B is five adds up similarly a minus B it gives me minus 5 and so on a multiplication of B will give me zero and so on so all of these things A + B A minus B A into B all of these these things are actually giving me certain output but a a is equal to 0 or B is equal to 5 is actually storing some value in A and B so that is the first step so this A and B are two vectors now Vector can be a little bit longer also in real life situation our columns in Excel are longer than one single value so here I have created two ways one is I have written a is equal to 1 colon 10 so I will select this and then I will run so the moment I run you see that here the previous value of a got changed so a is equal to a is a integer it has been written that a is equal to a is an integer I stands for integer it has 10 this one colon 10 is that it's saying that it has 10 units 10 10 values 10 I would say addresses and then those values are 1 to 10 so even if these values were something different you would have written one colon 10 only so this one colon 10 and this one colon 10 has no meaning they are not same this one colon 10 is saying that the address starts from one and goes up to 10 there are 10 addresses 10 locations 10 sales where some value is given and then those values are 1 2 3 4 5 6 7 8 9 10 similarly if I run B is equal to C something something something now this this is what we have to understand a think carefully so there are two types of things that you will see in od one is something written some XY Z is written let's say I have written XY Z some some name I don't know and then a first bracket the moment I write this you see something like this where something is written and then a first bracket a first bracket generally signifies a function what is the job of a function it gives certain input and it could Works something and gives the output so that's a job of a function on the other hand by chance if you have seen XY Z and then a third bracket it talks about a location a address a sale so mostly an address not always a sale an address so third bracket talks about an address first bracket talks about a function which has some job similarly here if you see I have written B is equal to C and then first bracket so that means there is a function called C what is its job its job is to give you a vector combination of whatever inputs you give so here the inputs are 2 comma 5 comma 6 comma 8 comma 9 these five numbers are the inputs it converts this C function converts the inputs to a vector form so if I just run this line see here it is written one to five because there are one to five cells from first cell to fifth cell and the contents are this the difference between the first one and second one is this is a integer this is a numeric the first one is integer because the moment I write a 1 col 10 it knows that I'm asking from 1 to 10 only the integer values so that's why it is putting ENT there but when I'm writing B is equal to C 2A 5A 6A 8A 9 the r does not know that whether I will write in the next time 9 comma 9.5 comma 10.3 comma 11 six or whatever it does not know whether the next entries of this particular series will be non- integers or not so that's why he it is playing safe it's putting numeric there so how to get them how to print them here the allocation happened nothing else so how to print them if I print a and then press an enter I get 1 2 3 4 5 6 7 8 9 10 if I press B and press an enter then I will get 2 5 6 89 and so on similarly so these are the two so I will talk about the third this thing uh in in a few minutes so then the next thing is length of a how to find out how long a particular Vector is often times to run a loop we have to know at what point it is ending so what is the length of a length of a length is a function and then if I just select it and press run it will give me length of a is 10 because a has 10 inputs easy nothing fancy till now so I will save this C is equal to length of B is finding the length of B see I have written the uh so whatever I I told that whatever written after a hash is a comment that will have no meaning whatever written before the hash is not a comment so that will run so C is equal to length of B what happens length of B gives me what five because B has five entries that five value gets saved in C now why I have written this is you have to identify that this C the one that is that I highlighted and this C is different this is a name of a particular vector and this is a function R actually more or less understands that but for often times you have to be careful that what you writing class is another function which gives you what class it is means what type of object it is so class of a is an integer class of B is a numeric and class of C is a integer again because C is also an integer so you can any point of time you can write class to find out now I will show you another interesting thing let's say I wanted you to find out something called a sequence and sequence starts from one ends at 30 with a jump of two so 1 3 5 7 9 11 and so on till it reaches 30 now I know that this this particular function is called sequence function acq function so probably let's say if you know that the function's name is acq for example previously I knew class and length so if I know the name of the function I can ask the help like this help within bracket SQ to know that what are the various aspects and in the right side you will see that how the help documentation is coming all the health documentation so you can read it a little bit so so that's how I'm showing in the fourth quadrant one tab I'm showing which is called the help tab there are other tabs will come to that when it is required so in the help tab there is description there is usage so in the usage you will see that here it is written help from is equal to this to is equal to this by is equal to something and as you go down it says form and two is the starting and end values of the sequence by is the number increment of the sequence and if you further come down you have to read it carefully if you further come down the usage of also is given there is one usage is called sequence 1 comma 9 by is equal to 2 so I can probably copy this and paste it here and try what is giving okay it is giving 1 3 5 7 9 that means it starts at one jump ends at nine each jump is two that that's why if I save this if I run this now and then try to print a I got 1 3 5 7 because it starts at 1 ends at 30 by two now you can do this thing as long as you know the functions name acq that that's how I found out I run the help and it give me all the helps if you do not know the function in real life situation you will not there are lots of functions there are probably Millions not if not Millions lacks of function at least for a single human being it is almost impossible to remember all the functions their syntax and Etc so you have to you don't even remember the names so what to do we'll actually try to see in the next video thank you we'll continue from this particular line only from the next video thank you for being in my class hope you will have a wonderful Learning Journey thank you