null | Coconote

hi everyone welcome to this tutorial on r and this is for beginners let's see what is our programming and how it helps so r is well known as a language of data science now if you really look at the ranking from survey of data mining experts based on the softwares they have often used in their work r is used more than python when it comes to data science python is also used however r is predominantly more used for data science kind of activities it's a open source programming language used for statistical computing it is one of the most popular programming languages today it was inspired by s plus and it is similar to s programming language so when it comes to data science what we can say is r is a popularly used programming language across the globe it is free and open source as i mentioned it is optimized for vector operations which we will learn about later it has an amazing community has uh in fact 9000 plus contributed or community packages allowing us to do almost anything or everything using r now when we talk about features of r as i said it's open source programming language so you can install r for free and you can straight away start working you wouldn't have to really go for a licensed version or pay for the software non-coders can also understand and perform programming in r as it is easy to understand and it has various data structures and operators it can be integrated with other programming languages like c c plus plus java and python it consists of various inbuilt packages a lot of sample data sets which can be used and that makes reporting the results of an analysis easier by using r now before we start learning about variables loops how you work with r and so on it would be good to know how you can set up r and work on r so for that what you can do is you can just go to r minus project dot org and once we get to the home page of our project for statistical computing using this link we can click on download r here now that brings you to a page to download it now there are various links here so it shows you the comprehensive r archive network that is cran mirrors and it is available at different urls however i would choose the first one which is zero cloud you can just click on this one and then based on your operating system whether you are working on a linux machine on a macbook or windows you can install it so you can just click on this one as of now i'm using a windows machine so i can click on download r for windows and that takes me to this link which says binaries for base distribution now this is what we can use to work with r straight away however there is one more package that is r studio we will see how we can set up that now this one takes us to the best mirror possible for our location from where we can download r so you can click on this base and then you can download by clicking on this link i have already downloaded this so once you click on this one you can just save it so i have it here already in my downloads and that's more than enough then you can just double click and you can go through the instructions to set up r that would also allow you to basically set up a desktop shortcut which i have already done here on my machine and if i go in here i see our base you can click on this one and that brings you to the page which you can use to straight away start working with r now yes there is one more package called rstudio which is set up on top of base r which makes working with r easier now here also you can start working so it shows you our console and you can click on file and if you have some scripts or files already written in the format of r you can use those so i can click on open script and that takes me to a page where i have some files which are already existing i can just select this one and click on open and that shows me some options here so i have an editor which shows me say if i want to get a library to use built-in data sets i could summarize the data i could do a clean up and we'll see all of this but i would suggest using rstudio rather than just using base r however installing base r would be required and depending on your machine configuration like mine is a 64-bit i have chosen 64-bit while i was setting up base r now when it comes to our studio it is basically a package which makes working with our easier so to install our studio what you can do is you can go to the r studio home page or you can just go to google and say type r studio download and then it takes you to this page you can click on this which says download rstudio you can choose your version you can go for the free version that is our studio desktop and you can click on this download and then you can download rstudio for windows which i have already done and then you have to run through the steps so just click on this one and i already have r studio here right now i can just basically use that so for example if i go to downloads and if i look for rstudio if i do a double click i can say yes and then it takes me to the r studio setup just click on next and here you can choose the location if you would want to place it in a specific location click on next and then it says select the start menu folder so let our studio be chosen here click on install and then it will basically start installing this in a particular location now in my case it is already existing right so we can even click on show details and see what it is doing what packages or what executables it is extracting now once this is done then you will be able to use our studio you can also add a shortcut to your taskbar and you can continue using it so i've already done this this might take couple of seconds just wait for this to complete and you would have r studio which is an easier way of working with r so a lot of developers across the globe would be using rstudio when they are working with r to work on their data science or programming requirements now let's just wait it is almost done and now i can click on finish so so that part is done you can add it as a shortcut so rstudio has consistent commands it has unified interface it makes easy to navigate and manage through r and it is set up on top of your r base now if i click and open on this so that's my r studio which is coming up now here you see console which will show you the result where you can give your commands so where we can get text output now again i can choose a file so i can just say open file and then i can go into a particular location where i have downloaded some data and then basically i can choose say for example rstudio and that brings me here so now you have your script which has some commands right on the left bottom you have console where you can see the output on the right side you also have environment now that is to use or provide variables and then we can also have plots which we can see here now we can look at this as an example so here i am loading the built-in data sets so what i can just do is i can place my cursor here and i can just do a control enter and that basically loads the built-in data sets which we can see here that has been done now there is an inbuilt iris data set and we can just use head option to look at the first six lines of iris data set so just place your cursor and do a control enter and that shows you a summary basically the first six lines of this data set what it contains we will look into this data set later this is a default data set which you can easily find when you're working with r you can also have your cursor place on summary and then just do a control enter so that basically shows you summary statistics for iris data you can do a plot and that basically shows you the plot which you can also maximize and look at it in full screen you can just do a zoom if you are interested in looking into this and we will discuss how or what kind of information we can infer from the plots now when it comes to cleaning up you can just do detach and then we can say package data sets and here we had loaded those data sets so we are just doing a detach and we can say unload equals true so i'll just do a control enter i can also clear off the plots by doing this for whatever plots we had and we can either do a edit and then we can do a clear console from here or the shortcut is ctrl r and you can clear of the console so that's a simple way of starting your working with r by installing our studio so let's continue learning about working with r and basically the first thing which we should learn here is about variables in r so variables as in any programming language is a way to store your data value factor of list values or a data set or object in r it allows us to conveniently reference the variable name basically saving us from rewriting the data value or object many times in our program so when we talk about variables in r they are mainly used to store data with named locations that your programs can manipulate a variable can be a combination of letters digits period and underscore so you can have some valid variables as total sum you can also have dot notation so there are different naming or style conventions in r and we can use dot to separate names in description of a variable we can also start a variable with dot we can include numbers in a variable and remember r is case sensitive so we have to whenever we declare a variable we need to remember what case was used as in in the name of the variable and there can be other conventions also such as using an underscore or even using a case in between the variables so variables can only consist of letters numbers periods underscores your dot followed by a letter not a number and we can declare our variables we can also look at the type of the variables and the class to which it belongs so there are some invalid variables which we are seeing here so that also needs to be remembered so this is an example where you can use an assignment operator which you see here between x and 10 to assign a value to a variable you could also do that by doing a dot y and then assign a value you could be doing that by using a z and then having a computation done between x and y and finally you could do a print so let's see some example here before we move further and for that i can bring up my r studio here so as i said we can basically have different kind of variables or naming conventions for example i could do something like model one and then i can basically assign this so this is just a variable and i could be assigning anything to it i could be assigning different data types which are available here for example i could do something like this and i could do a control enter so that's my variable i can always do a type off and then basically i can check what is the type of my variable so it tells it's a character i can also do a class and then i can basically say show me the class and that shows me it belongs to the character class we'll learn about data types later but we are using assignment operator now if i say what is model 1 it shows me the value but if i would do something like this then it says object model not found and why because it is case sensitive the variable which we had created was all in lower case and the one which we tried to call was starting with an upper case so you could have variables created in such way i could also do something like hello underscore string and this could be my variable where we are using an underscore and then we can just given something here and that becomes my variable which you can always call and check what is the value of that you could also be doing something like this so you could be using different cases and then i could say something like this and that's also my variable and then i can basically look at the value of this variable now if we try to create a variable where we start the variable name with the number what would happen so if i say something like this and then if i try to assign a value to it for example let's say 100 now this one will throw an error message because you cannot have your variable starting with a number but if i used period and then basically give something like this and let's try doing this by giving it a number so if you see here since we gave a period the rule is that it should be followed always by a letter and not a number so i could just remove this and that works perfectly fine so these are some naming conventions which when you practice you will learn about so now i can assign a variable by just doing a dot pairs and then assign any value to it but always remember if you are using a period if you are using a notation then in that case that should always be followed by a letter one more thing which is always practiced in a real-time environment is that we cannot have spaces when we are creating variables so for example if i say first num and then i try to assign this a value it basically fails but obviously i could have done this by doing it underscore and that perfectly works fine and you can basically then call the value for this one always remember one more standard practice which is followed in real time environment is you will try to have variable names with a little meaning to them so for example if i would create a variable and i would say for example let's say bird that's my variable name and then if i assign this a value tiger it works fine but then it really does not make sense and that would basically create a lot of ambiguity in our coding so it is always good to say for example animal and then i would say okay so tiger is an animal and that basically not only allows me to assign a value to the variable but it is also little bit more meaningful now when we talk about variables it is also good to know the different data types which are available in r now like any other programming language r also supports different data types so you have your logical data type such as true and false you have numeric values which is say these numbers you could also be creating an integer which is 3l and 40 l for l and so on you can have a complex number you can have characters which can be just letters or a set of letters or anything which is within the quotes or you can even have raw data so these are different data types we can again see quick examples here on data types let me come out of this one and as we saw already when we created model 1 this was character now i can just say x and let's say 100 and obviously this is going to be not my integer okay so let's see this what is this one this one by default is double it is by default double so if i would want an integer then i would say for example something like like this and this one you can check by using type off and you can see the value for this one so this is an integer so similarly you can have character you can have complex you can have raw data you can have numeric values so all these are different data types you could also be saying for example i would want to check the boolean so i could check this and select this one and now when i check the value for a it is true and we will learn about logical operators where we can basically be using these values assigned to the variables to compare to compute between different variables so this is a simple small example of using variables so we have seen here using variables and also using the assignment operator and then assigning values to the variables and different naming conventions we can also be using different data types which are supports and work with the variables now once we have learnt about variables or data types let's also just first learn about your operators and how they can be used in your r programming language now we might be intending to do some calculations on numeric values uh find out differences between values or say for example compare values so in that we can be using different kind of operators so we have various operators we have arithmetic operators we have rational operators we also have logical operators so before we straightaway look into logical operators let's also understand about the basics such as your arithmetic operators which supports for example let me pull up a notepad file here and when we talk about arithmetic operators here we are talking about your addition [Music] you have subtraction you have multiplication you have division and you have remainder or modulus and you have exponent and what makes it also important is that when you're using arithmetic operators you also need to know about the order of operations so when you say order of operations always the priority is to paranthesis so that takes the priority you have then exponent or your computation if that would involve exponent so let's say exponent here which is then followed by your multiplication and division and that one also follows an order of left to right whichever comes first when we talk about multiplication and division and similarly when we talk about addition and subtraction it is left to right whichever comes first so these are some of the arithmetic operators now we can see some examples here quickly although these are some simple examples so for example i can say 100 plus 100 and that gives me the value right you can always do a 100 minus 50 you can do a hundred multiplication you could do a hundred division two or you could also use modulus to which basically gives you an error here so i will oh just give me a minute so let's give here one more percentage sign and that basically says what would be the remainder so if we would want to look at the ordering when we are using this arithmetic operators we can see an example so for example if i say 34 plus 46 divided by 2 gives me 57 however if i use 34 plus 46 in parenthesis which gets the priority and then i divide my result is different so understanding what arithmetic operators you can use and also the ordering in which that leads to the computation is very important so we can use all of these arithmetic operators and to control the ordering we can be using paranthesis or we can have our computations ordered with what kind of operation we would want whether that would be multiplication or division addition or subtraction now at any point of time i can always do a control l and that allows me to clear my console let's continue our learning and let's learn about operators so when we speak about arithmetic operators we see that allows us to do computations but we have also rational and logical operators which help us in doing our computations or comparing values or sometimes finding difference between different values whether those are group of values or whether those are individual values so with your rational and logical operators you can compare data values so if we would want to see if the values match or not match or if the values are above or below equal to something and so on so when we talk about your rational operators we basically have in case of rational or logical operators national or logical operators so we obviously have greater than you have less than you have greater than or equal you have less than or equal you have equal to and you have not equal these are some of your rational operators we can say and when you talk about your logical operators then you have and you have or and you have not so and is when it compares two values so it returns true if both the conditions are true else it will return a false so for example if i have 10 greater than 20 and 10 is less than 20 now that's not possible and we are comparing the result of both of these so we are checking if both the conditions are true and that's not really true here so we see the value as false now if i would have replaced this one this and with or it would check even if one of the conditions is true it would basically show me a result as true you can also use a not operator which takes each element of the vector and gives the opposite value so we can be using any one of these operators and then basically do our computations so let's see some examples about these logical operators now either you could just be assigning values to your variables and check or you could also be picking up a data set from your machine and then try to use these logical operators so for example if i say x has been assigned 100 y has been assigned 200 and if i try to say x equals y so that already checks the value and compares and tells me that's not true it is false and if i would have used a not operator for example if i would have said something like this one so it tells me true so i can just check simple conditions like this i can say is my y greater than x and that tells me yes it is true if i say y is greater than or equal to x well it would still say true because when you are saying greater than or equal to x so when you are saying this one it works fine right now we can also be picking up some data set and for that what i can do is i can pick up one of the data set from my machine so i can go in here and i have some data sets let's look into that and i would be interested in taking this auction data set and loading the values here so i'll get this path and i will come here i can use auction as my variable name you could have given a dot separated name for example i could have said auction dot data if this is what you want to do and then you can assign variable a value so here i'll say read.csv and i intend to pick up a file so i give this path and when we are working on windows machine we need to give a double slash so i'll say auction.csv now i could give other things like header being true what is the separator if you would want to fill values to take care of missing values we can look at all of those so here i'll just add a backslash i will add a backslash and i will basically just do a control enter now i can look at the values of this by just doing a auction.data and i can see what values it has so it has a lot of data here it has a lot of your data here you could have used some other functions which we can see later where i can choose head and i can see the first top five values so we can basically assign data to the variable and continue working on this now we can keep it simple so let me repeat this step and here i will say auction as my variable name and i'll assign this so i can basically do a also a view on auction so auction and then basically that shows me a tabular format of the data which allows me to look into the data and basically understand it and then i can you know use this to work on variables so what i can do here is i can say x and let's say assign some value to this for which i would want to work on my data set which is auction now what do i want to do here so let's use auction and then i can use a dollar symbol and i can choose which column i'm interested in so for example let's choose bidder and i can just give a value to this one and let's pick up a name so let's say tweet and that's the name and i can be assigning all the values to this or i could say i would want to use another condition so i'll say auction dollar and then let's take this value of bid and let's say it is equals to 100 and then i ended up with comma and i can try doing this now here it gives me a problem because what we did was we did not use the right operator so we will say for example and so i will say x is being assigned the value of auction bidder being tweak and auction bid value being 100 so now once we do this i can look at the value of x and that shows me the value so this is just a simple example of using a logical operator now i could have just said instead of and i could have used or which is basically a pipe which you have to use and that gives you or condition and now hit on enter and if i now look at the values of x it will show me lot of values because we have given an or condition which basically matches one of the conditions so in this way we can use logical operators and continue working and continue doing our computations let's learn about print formatting and how print can be used to view your data when you talk about r r uses print function to display the variables so for example if i have assigned number 10 to x i can do a print x and that will show me the value of x what we see here with 1 in square brackets that also has a meaning which basically means it is a vector and we learn about vectors later so r uses the paste and paste 0 functions to format strings and variables together for printing in few different ways for example if i would do this which i says print paste and then pass in two strings here or two words here such as hello and world that would be printed as follows now i could also do a print paste and then use a separator so my print would look something like this if i use page zero then that avoids any space between these two words or for example these three words so let's see some basic examples here when we talk about print so for example if i bring up my r studio here is an example so x as we say now this is your assignment operator which we already discussed now i can be assigning a value to this so i can just place my cursor here and i can just hit on control enter so value has been assigned now let's look at the value of x now i could also be doing a print x explicitly by using print function for example if i do similarly for message as hello and then i can print the message by using print now if for example i do something like this this is not going to print anything until i call the variable or i use a print function so for example if i do a y pc auto printing shows us the value or i could do explicitly by using the print function by explicit printing now whenever we look at this number 1 as i mentioned it means y is a vector and 5 is its first element now you can also use operator to create integer sequences and we'll learn about sequences or list later but this is just a simple example so i am creating an integer sequence of length 20 i can place my cursor here which would start with 10 and end at 30 so let's look at this values for our sequence of integers now at any point of time you can always use a class to look at the class of say x and that shows me the classes of integers now looking further when we talk about different data types as we learned few minutes before so r has basically five basic or atomic classes of objects so you have character numeric values that is real numbers you have integers you have complex and you have logical values let's spend some time in understanding some basic arithmetic operations and how you can do it using your r programming language now here i've opened up rstudio and these are some basic examples such as performing arithmetic operations now for example we can add two numbers and i can just place my cursor here and please press ctrl enter that shows me the addition i can do a subtraction i can do multiplication division also going for exponential power or use modulo which returns the remainder now when we are performing operations what we can also do is we can change the order of operations and in this case we are using paranthesis so i am putting in 500 into 2 in a paranthesis plus 80 divided by 2 so first it operates what is given in parenthesis and that's why i get a result 1040 similarly i can change the order of operations so here i can give 500 into and then something in the parenthesis so that gets operated first and hence you get result of thousand five hundred now we have already discussed about the assignment operator and what we can do here is we can assign variables some value so for example i create a variable called selling and then i would assign it a value similarly for cost and then we can do some calculation so we can say profit is selling minus cost we can do that and here i can look at the value of profit which shows me 250. now let's also spend some time in understanding data types in our so we can have different types of data so this one shows me an example of assigning a decimal value which is part of a numeric class so i can do this and then if i would be interested in seeing the value of num so i can just look at the value of num if i would be interested in looking at the type of num so i can do that here by just typing in type off and then select this one and pass in your num and it shows me the value is double i can also look at what class it belongs to and that shows me it is numeric so in this way you can not only assign values to a variable but you can look at the class and type of it now here we can assign whole numbers which are also known as integers now if i look at the type of this it shows me double so if i would want to explicitly assign an integer i could have done for example i let's say j and i could have used the assignment operator and i could have done this and then if i look at the value of j it shows me the value but what we would be interested in looking at the class of j so we can do this and it shows me it is an integer so explicitly either i can assign this by using a capital l or i could use a function called as dot integer so we'll see that later now we can also assign boolean values or basically your logicals so here we assign true and then we do a false and we can look at the type of t and that tells me it is a logical class now similarly you might be interested in working on text or string values and here we can do this by saying ch and then passing in a value look at the class of this it tells me it is the data type is character and if you look at the type of it it says me character similarly r also supports complex data types so we can do that too by just doing this and look at the class of it it tells me it is complex and you can also pull out the length of this by now here we are doing a length on the character so let's look at this one and it shows me what is the length of this now one of the useful functions which we usually use in r is print now i can simply do a print hey and that prints whatever values pass to print i can assign a value to a variable and then print it so that is also fine you could have also without using function just typed y and that also shows the value however sometimes using print as an explicit function can be useful it makes your code more readable now here we would use an inbuilt data set that is empty cars and then if you would want to print the data set that shows me the values which shows me the car models and different other features such as mileage cylinder horsepower and so on now one of the use case of print with a paste function can also be seen here so i'm doing a print paste and that basically prints whatever was passed in a concatenated way i could also do a print paste with a separator if i would want to format my data in a particular way so here i've used separator as comma there is one more function paste 0 which can be used so i'm just doing here paste 0 and that tells me just concatenate these values without any space so page 0 shows no space between these two elements which were passed now we can explicitly do some printing and for that i'm using a s print f option i am going to pass in percentage s which is for string and percentage f for float and we can print the values of this so these are some basic operations or usage of your functions to basically do some computations or look at your results so when you talk about basic type of any r object it is a vector and when we talk about vectors empty vectors can be created with vector function a vector can contain objects of same type or a class now when we talk about list list is a vector which contains objects of different classes so these are some basic examples so apart from your print formatting we can be looking at what we call as our objects such as vectors or lists and so on so when we talk about vectors it is a sequence of data elements of same basic type we use the function to declare a vector so we can always do a c function to declare a vector for example here we are creating a variable v 1 and we are assigning it a vector by using c and then giving some basic type so numbers 1 to 5 or for example words you can always do a print or you can also use a class to find out what is the class of the elements or the values which have been passed to the particular object so we can look at some examples like this for example we can see here so list is a vector which contains objects of different classes so you can have numeric objects so that is your numbers such as 1 2 etc which are your numeric values for example here what we are doing is we are assigning a value 1 to a and that can then be used i can either do a print or i can just use auto printing i can also do here a value for a or i could be doing something like this which shows me 0 which can be for missing value so if i would want to use auto printing i can just call a and it shows me the value what has been assigned to it you can always use a type off to look at the value of a which is double by default and if i look at type of a i that is basically an integer because we used l here so in this way we can continue working with say our different classes of objects so for example let's create a vector here so i can say v1 and then basically assign it by using a c function and then pass in the values to this one and that basically gives me a variable and you can look at what are the values assigned to it now if i look at the class of v1 that shows me it is numeric if you use type off and then you would want to see the values of v1 that shows me the values are double now as we were seeing here we can be looking at the class so for example if i create one more variable and then assign values to it using c so passing in some words here for example let's go and say hello world and then i can basically do this and look at the values of this one i could also explicitly print as we discussed earlier by doing a print v2 we could also be having a paste function if we would want to use that so for example if i would do a paste function i could be using and this is missing a bracket so let's complete this and that shows me the value i could have also used for example paste 0 function and that also works fine so it depends on what we are looking at here so if i look at class of v1 which we had it is numeric and v2 is basically having elements which are of the class character so this is just a simple example of having your print functions having vectors created printing out the values of those printing out class and type of these to continue our learning on vectors as i mentioned earlier we can use the c function which can be used to create vectors of objects by concatenating things together so for example if we look at this one which says x and then i use c function and i say 0.5 and 0.6 so we can have a vector of numeric types so let's do this and then we can look at the value of x so it shows me my vector which has 0.5 and 0.06 i can also have my vector of logical values and now let's look at the value of x so it has true and false or we could have done it in this way where we can then look at the values so we can use the short form by using capital t and f i can create a vector with character types and then look at the values of those i can also be creating a sequence of integers as we saw in previous example and then look at the values which start at 9 and end at 29. now you can also create with complex types and look at the values so these are some simple examples of creating vectors now we can also use vector function to initialize vectors so for example if i would do this where i am saying my vector will be of type numeric length is 10 and then look at the values so it just shows me a vector which has all 0s and the length is 10. now you can create a vector of numbers by doing this as we saw in previous example and use explicit printing to look at the values or might be letters and then use a print statement to print function to basically look at the values of the vector now we can also try concatenating the above two so that creates a mixed vector which has two different kind of types here so i can do a mixed vector by using the c function and then passing in my numbers which has numeric types and letters which has character types and then we can basically do a printing of this which shows me the value but here what we see is coercion that is basically casting if you would know as the word in different programming languages so it basically coerces the numbers to character as characters cannot be coerced into numbers and then you can print the values of this mixed vector where everything is of character types so for example at this point of time if i would have done something like class of mixed vector and if i would want to look into the values of this one it shows me everything is of character types here now data type of different vectors can be returned by the function class as we saw just now so it is common to use the class function to integrate an object asking what is the class now we can create one dimensional object such as an integer vector which we have done earlier and then look at the class of it which tells me it is an integer i can also create a numeric vector by giving in some values here so when we do this so i have given the vector function c and then giving in the value and look at the class it shows me it would have numeric values now you can create a character vector and then basically look at the values of it now at any point of time in all of these for example if i would do num i can see what are the values assigned to it i can do letters and i can see the values of this so let me just create some space here now i can create a factor vector and then look at the values of it or also you can see what is the value in this factor vector so here we said as dot factor so factor function is being used here and we are creating a vector of letters and then we look at the class we also look at the values what are assigned to this or what are in this particular vector so if you look into all of these vector examples initially we were using an assignment operator where we were using the c function and when we started creating vectors by say concatenating or vectors of particular types we are using equals here and that also is fine now looking further when we look at concatenating two different kind of vectors so for example here we have say numbers and letters as we discussed earlier it will do coercion that is change one type into other now when we talk about one dimensional objects we can have integer vectors or say float which we saw just now ending at 10.5 so when we say c 1 is to 10 it basically starts with 1 but then there is also you can say a question happening here and then you have the values ending at 10.5 that is float and i can look at the class of it and when we did a class of did we do a class here so let's come here and let's do a class of this one it saves me it is numeric you can look at the values of it similarly you can create a character vector which is 1 to 10 and then basically look at the class of it or basically the value of this vector or as we did the factor vector now for two dimensionals we will explore that when we are learning about matrix so as of now let's forget that now when you talk about mixing objects there are occasions when classes of our objects get mixed together so that could be accidentally or that could be intentional so if you look at this example here we have y which has been given values which is 1.7 and a and at this stage if i would look at the value of y that's my vector if you look at the class of y that shows me it is as character now when you look at some other examples so let's pass in logical and numeric values what would happen in this case so we can again use class of y and that basically has numeric and if you would want to look at the value of y that shows me 1 and 2 here let's go further so let's look at the value of this one so y and then basically see what is the value of y so it is a true and you can also look at the class of it now we are mixing objects of two different classes in a vector remember when we talk about vector we always talk about vector having elements of same type but when we talk about lists which we will learn later that would have basically or that can have your each element of different type so for vectors it is not allowed so when different objects are mixed in a vector coercion occurs so that every element in the vector is of the same class now we have seen earlier the implicit coercion where our r tries to find a way to represent all the objects or elements as i say so all the objects in the vector in a reasonable fashion so we can also be doing explicit coercion so that is from one class to another by using a as dot and then using a relevant function so if i have x here now if i look at the class of x it tells me it is an integer but i can convert that to numeric by doing a as dot numeric or as dot logical or as dot character to basically do a coercion and change the class of the objects now if r cannot figure out how to coerce an object this will result in nas being produced which we can also relate to missing values or not applicable values so for example if we create x and look at the class of x it tells me this character let's try changing character to numeric which will not work and it says n a's are introduced if you do it even in logical that would not work and it shows me any values or if you do a complex it says values have been introduced so at this point of time if i look at the value of x it tells me it was assigned a b c and we try to convert that into a different class now when we talk about vectors it is also good to know about attributes in brief so all your r objects have attributes that is metadata for object so when you talk about our object attributes you could have names you can have dimension names you can have the dimensions that is matrices and arrays you can look at the classes such as integer numeric and so on and you can also look at length which is user defined attribute so if i say x we are assigning a value to x now at this point of time if i see my value to x is 1 but then all objects need not necessarily have attributes so in that case whenever you try to use an attributes function that would return null so at this point of time if i look at the attributes of x it shows me null value so these are some of the basics which help us in working with r and using your vector function or looking at the coefficient which is implicitly happening or explicitly can be done by us by using a as dot sum function now let's learn about lists and how we can work using r on list when we talk about vector which we saw in previous examples vector is a one-dimensional array right and it can hold elements only of same type so we would say vector is more of one dimensional but when you talk about list list is a generic vector that can contain objects of different types so when you talk about say for example matrices matrices can also hold elements of same type but in matrices it is a two-dimensional array we will talk about matrices also we will learn so when you talk about lists they can contain all kind of r objects so you can have dates you can have data frames you can have vectors and many more so in list there is no coercion which is required that is changing of data type there is no loss of functionality and lists do not follow any predefined structure now we can create lists using this list function as it is shown here so you can create a variable and then assign a list to it where you can be using either passing in a vector or what you can do is you can simply create a list by using this list function so let's see some example here now for that what we can do is i can bring up my r studio where we can see an example on list and how it works so when you talk about list what you can do here is let me close this one and this one yeah so what we can do is we can basically say for example test and i can basically give something here so for example i can say music tracks and then i can say how many hundred of them and i can say let's give 100 as number and then we can say how many of them got five stars and i can do this so i can check this and this shows me all the objects or elements of this list right now when we do this what we are doing is we are creating a vector right and vector basically can have coefficient depending on what are the elements which are passed because whenever you use the c and you create a vector it will only accept elements of the same type so for example if i do a class on test it shows me here all the objects are of type character right and you can also use type off to check for our test variable and it is basically having all the objects as character now how would you create a list so what we can do is we can use a list function so for example let's again do a test here but this time i'm interested in creating a list and list can have objects of different types so let's say music tracks and then i can just give hundred and i can say with rating five and now if i look at my test it shows me all the elements of your particular list here we see each element or each object with a double bracket and we can see each element now what we can also do is we can use is list function and then we can pass and test here to check what is it and it is a list right so here we have created a list but if for example we take the previous example where we were creating a vector and if i would do a is list it would show me false right so we just created a simple list and we can also arrange labels or we can use a name function to basically give names so what i can do here is let me create a list first so i can do that like this and now what i can do is i can do a name and i can use a name function to this test and then basically what i can do is i can pass labels so here i can just given some names here so for example i can say let's give it a name product so say we are talking about product of a company and then we can say here i can give count and here i can give rating and this is basically two given names so let's just give there's some error here let me just check this so let's use this name function here and what i'll do is i will basically use names and now let's do a test so that shows me the names what we have assigned to our list objects now we can always access the elements of our objects from a list using indices or even using double square so for example i have test here and basically i can give something like this which gives me based on the indices the position where you are accessing the elements of the list so we can do this what i can also do is we can specify names when creating a particular list so for example what i can also do is i can say product dot category and now i can just give list function so i would want to assign names while creating a list so i can say for example product and this would be say music tracks then i can give say for example count and count would be 100 and then ratings and i can say 5 and now we can basically access this list which we have created so what we have done here unlike earlier when we created a list and then basically used names function to assign a name to it or each object here while creating a list itself we passed in the names so we can also do that now if you would want to basically list display the list or a compactly display structure of a list we can always use the string function and here i can pass in the name so let's choose this one and this is in a more compact way listing down the elements of your list so list can be containing other lists also and we can also do that so for example i create one more list for example i can say similar product and here i can give a list again and what i would want to do is i would want to say product equals and i can say film and then i can basically give a count and then i can give ratings say 4 and here what i've done is i have just created one more list but my intention is not just to create a list but i would want to add this to our existing list so what we can do here is we can take our previous list that is product dot category like what we did earlier and now i intend to say list and here i would want to say for example let's copy this or we can just so this is what we were doing when we were creating a list using product giving the names while creating a list and what i also want to do is here i will just say similar and then pass in similar dot prod so now if you look at our list we have just added new elements so this is one more way where we can create a list and we can basically add or our list can have other list so when we talk about subsetting or extending list so one of the main ways as i said to access a specific element or a subset we use double brackets and we can always do that so for example we take our prod dot category and then i would want to access a particular element so i can always do this by giving the index positions and i can access the elements of my list so this is one single way now here if we use a single bracket instead of double bracket then in that case we will the output would be a list so if i look at this one then this would be a list but if you use double brackets then you are accessing a particular object if we were creating a vector we could just be using a subset by using the c function now what we can also do is we can subset by names or even logical so what we can do here is we can take this product category and if we have defined names then in that case what i can do is i can say i would be interested in music tracks and this is the name we had given so we can close this one and we can try accessing the elements here so we what's the name we had given so [Music] no it's not music tracks that's the value the name is product so we do this and then we can access the elements what we can also do is we can be subsetting based on logicals so what we can do is we can basically just give something like this and here we can pass in values something like this and we missed a bracket so that's also a way of pulling out the values so you can be doing a subsetting using the names which you have assigned to objects within your list or you can say names which you have assigned to the elements or by using logicals now what we can also do is we can use the dollar function now if you see here we are looking at the name and that is preceded by dollar so we can always pull out the values from our list by giving the list name and then give it dollar symbol and then choose the name for example if i choose product i can list the values here i can be looking at say dollar and then choose a count and this is also one way of accessing your elements from the list using your dollar symbol now to add elements to a list as i said you can add a vector of names and that can be passed to your list so these are different ways in which you can work with list and then you can access the elements either using indices or using names or even using dollar symbol and pointing the right names so this is one simple example of working with your list now one more and now i can just do a ctrl l and i can clear that off so your list always remember is a generic vector that can contain objects of different types now when we talk about matrices now matrix is a collection of data elements arranged in two dimensional rectangular layout so we can use matrix function to create a matrix as shown here so matrix is two dimensional now we already know that vector is one dimensional array of data elements or a sequence of data elements but when we talk about matrix it's a collection of data elements that is two-dimensional arranged in fixed number of rows and columns so here you see that we are creating a matrix and we have specified the number of rows is 3 number of columns is 3 and we want it to be arranged by row where we have given the value as true so always remember matrix is 2 dimensional and matrix can have only one atomic vector type unlike your list it's a natural extension of vector going from one dimension to two dimensions so matrix actually needs a vector which contains values that you place in a matrix and at least one matrix dimension so we can choose to specify the number of rows or number of columns when we are creating matrix so let's see a quick example of working with matrix so for example i could just say matrix which will have values 1 2 6 and then i can basically give n row and you can give a value to this one and that's my matrix similarly you could also be giving n columns so i can just say end call and i can choose this one and then pass in the value so that's a matrix where r fills values column by column now if you intend to fill up matrix in a row wise fashion so that your values 1 2 and 3 are in first row then we have to just modify this in a little bit different way so we have to say matrix 1 colon 6 and row is 2 and then i can give by row so you always have these helper functions which allow you to put out the values so for example i do this and then i can do a control enter so now if you see you have the values 1 2 and 3 in your first row so when we pass a matrix function to a vector that is too short to fill up an entire matrix then something different happens we can have a look at this so say you pass a vector containing value 1 to 3 to the matrix function and say explicitly you want a matrix with two rows and three columns how do we do that so for example i can say matrix and here i can say one is to three now i can give n row and then i can give the number of rows which we want is 2 and then i say n column and this one i'll say 3 so i can do this and here what i have done is i have given the values 1 to 3 i have said number of rows is 2 and your number of columns is 3. so here r fills the matrix column by column and simply repeats the vector now if you want to fill using a 4 element vector in a 6 element matrix in that case obviously r will generate a warning message now apart from the simple matrix function which we are seeing you also have some functions such as r bind and c bind which are offers when you are working with matrices so we can use those so for example i could say c bind i could say 1 colon 3 and then i can say 1 colon 3 and that's my c bind that is column bind where i'm passing the values 1 to 3 and which are stacked in a in columns i can also do r bind and similarly we can be passing in the values so i can say r bind and that basically arranges the values row wise so be creating a variable for example let's say n and let me create a matrix here so i'll say matrix now that will contain 1 to 6 and i can say by row and then you can give value which is true and then i can basically say the number of rows is going to be 2 and this is also fine so let's look at the value of n here so you basically created a matrix with one two six you arrange them row wise and the number of rows what you have chosen is two so what we can also do is we can use our bind and we can add values to it so for example if i want to add value 7 to 9 what i can simply do is i can do a r bind i can say i would want to edit my n and then pass in the values so i can just do this and this has basically appended or added values to existing matrix so similarly you could have done a column bind and you could have added values to your existing matrix so for example if i take this one and look at my n and what i could do is i could do a c bind and then i can basically take my n and then pass in values to this one so let's say 10 and 11 and basically i've added 10 11 as a column to my existing matrix so this is one simple way where you work with a matrix and you are appending the values either at a row level or at a column level so let's also look at some other examples so basically if you would want to work with matrix one of the useful things would be naming the matrix that is in case of matrices we can assign names to either the columns or the rows if you don't do it we see the default values here which follows a numbering but what we can also do is we can use two functions here one is row names or you can use column names so these are the two functions which can be used so for example let's do a control l let's try to get our n and this is what we are doing here but what we would want to do is we would want to give them some names so for example i'll say row names and then i will basically pass in a vector which has row names or vector which has column names so what i can do here is i can say i would want to give row names to n and then i basically give some value so for example let's say row 1 and then let's say row 2 and now i can look at my n which has the row names assigned to my rows similarly i could have also given column names so all i need to do here is i need to say column one and then i will say column two and then i can be using column names and let's look at this one so what went wrong here so we have three columns here we forgot that so we have to add one more column name and then it should be 5. so now if you look at this one we have just given row names and column names so naming the columns or rows in your matrices can be very useful now as the previous error says there is also a function called dim names and that's basically an argument of matrix function which can be used so we could also do something like this so for example i have dim names so let's have r n and then what you can do is you can do a dim names which you can then just create a list and in this one you can pass in a vector for row one and then vector for row two and what we can do here is once we have given this let's give a comma here and then give c and then give your column names which is column 1 column 2 and then basically column 3 and now if you just look at dim names so you can just see that you have given some row names and column names and this can be used basically to assign to your list so if you try to store different objects in a matrix what would happen coercion would takes place right so for example if i have x and let's basically try to create a matrix which will have 1 to 8 and let's say the number of columns is going to be 2 so let's look at our x and this has the values now what if i create say l and then basically i will create a matrix which will be a matrix of letters so let's say letters and then here with letters i'll say 1 colon 6. now i would want to give the number of rows and let's give it say 4 and let's say number of columns and let's give it 3 and now let's look at the value of l so it has letters and x is having numbers and what if we bind them together using c bind which is for column wise binding so for example if i do a c bind and then pass in my x comma l so if you see here there is a question which has happened where everything is converted into character so you can always do a class and you can check so this is a simple example of working with matrices there are much more you can do subsetting like what we saw in list but that we can learn later now let's learn about data frames and what is the data frame and how do you use r to work with data frame now data frame is used to store the data in the form of a table and for this we have a function data dot frame to create a data frame so what we know already is that data sets are comprised of observations or what we call as instances or variables and we always have observations to which some variables are associated for example we can talk about data sets of say five people now let's look at the information here here we look at the body mass index bmi where we are using a data.frame function and then we are passing in say gender so we use the c function to pass in the values and then you have height and then you have weight and age and these things then become the columns of your data frame so for example if we would want to work on creating a data frame for people where let's say each person is an instance and properties about each person such as name age child or if the person has a child would become the variables so if we have such kind of information we cannot easily store that in matrix or list now data frames can be used for such cases now it's a fundamental data structure to store data sets pretty similar to matrix as it has rows and columns and here rows correspond to observations now here we can talk about in every individual or every person columns correspond to variables that is properties for each person now difference between your data frame and matrix is that data frames can contain elements of different data types so for example we can have one column being character other being numeric and yet another being logical or numeric so restriction is that elements in one column should be of the same data type now how do we work with data frames let's see some examples so what we can do is we can bring up our r so when we talk about data frames usually we don't create data frames by ourself we import data from data sources such as csv file or rdbms or even your excel or spss and then we create data frames now of course r has ways to manually create data frames using data dot frame function so we can create three vectors first and then we can pass in those vectors to create our data frame so let's do that so let's say name and here i will use the assignment operator which we have learnt earlier and then i'll use c and then i can give some names here so let's say john and let's say peter say patrick and let's say julie and let's also give one more name so let's say bob so this is the vector which we are creating and we can check this is the vector which we have created now obviously you can do a class and you can check what is this and that says it is a vector of character now similarly we can create one more vector which is age and let's give some numbers here so for example let's say 28 and 30 31 38 35 and these are the values for the age so age is also created similarly we can say if each person has children so we can say children and then i'll create one more vector and here i'll give values which are logicals i'm not going to give any numerics or character but i'm using logicals here so if a particular person has children or no so let's have this vector created and now we have three vectors that is name age and children and we can use this to create our data frame so we can just call our data frame as df and what we can do is we can use data dot frame function and then what we can do is we can pass our vectors within this such as name age children and that should create my data frame let's have a look at this and this shows me that the data frame is created now column names are inferred from variables which are passed to data dot frame function so the variables which we have passed to our data.frame function is name age and children and those become the column headings for my data frame now what we could have also done is we could have created it in a different way so i could have said df and then i could have used my data dot frame function and in data data.frame function i could have said name is going to be name age would be age and then i could say children could be children and i could do this and this is also one more way where i'm creating a data frame and in this way we can now have rows of data frames like in matrix so this is also one way of creating a data frame to look into the data frame structure we can always use str and then we can pass our data frame and this basically prints out similar to that of list so we also need to know that under the hood data frame is a list and in this case this is a list with three elements so each list element is a vector of length phi corresponding to the number of observations if we create data frame with vectors not of same length we would get an error now here when we look at our data frame we know that name is a column so name column which is character is actually a factor instead of character to suppress this behavior we can always use a property that is strings as factors equals false so what i can do is i can do a data frame like this use my data dot frame function and then basically we can pass in our vectors that is name age and children and then what i can do is i can say strings as factors and set this value to false so if i do this and now if i look at my data frame structure sorry yeah now let's look at this one and this one shows me that unlike your earlier one now we are creating a data frame where our name would be containing characters there also by default it was showing as character usually if you because this value by default is set to false or it would have created characters or factors as we say now how do we do a subset and extend and sort data frames in r so as we have learned so far in brief about your data frames so data frame is somewhere like an intersection between matrices and lists so if you would want to subset a data frame we can always use the square brackets and in that we can use the single square brackets which are from matrices or we can use double square brackets from list or we can also use the dollar symbol so that all these things can be used to subset the data frame so let's use our data frame which contains information about people so we can select single element from our data frame so here what we can do is we can just say df and then i can use a single bracket and i can just do a three comma two so it would be good if we can first print the data frame and that's my value and now let's do a single bracket and let's look at this one so this tells me that we are using the row index first which is number three which shows me that we would be going to the row number three and then we point or pass in our column index that is number two so we could have done it in a different way also so we could have done df and then give it row index and then give the column name which you are interested in looking at and that also gives me the value so just like matrices we can choose to omit one of two indices to end up with entire row or entire column and for example if we would be interested in looking for information for patrick what i could have done is i could have just that df3 comma and this is showing me the entire row now always remember whatever results we see here that is giving me a data frame with a single observation because there has to be a way to store different data types and that's why the result is also a data frame what we can also do is to get entire age column we can just use our data frame and then we can pass in the column name here like this and that gives me just the column now here the point to notice is result is a vector because columns contain elements of the same type in previous example we were seeing a row and in that row was not a vector it was a data frame because values were of different data types now subsetting a data frame that results in a data frame and contains multiple observations can also be done by doing something like this for example i will do df and then i will say let me get 3 comma 5 and then i can just say age and children for example so let's say age and children and i can be pulling out the values in this way so i could also be just getting the results in the age column if i'm interested in by just saying df and here i can just pass in the column number and that also gives me the h column now we know data frame is a list containing vectors of same length this means we can use list syntax to select elements also and what we can do is we can use our dollar symbol and then choose the column name and this is also one way wherein you can pull out the values or you can use double brackets as i mentioned earlier and pass in the column name so that's also fine or you can give a column number and that also would work and in all these cases the result is a vector now with single brackets you can still do it always remember if you use single brackets then that will result in a data frame the result can be a data frame here but what we are seeing here is a list which contains only age column having the data elements so these are different ways in which you can do a subsetting of a data frame now using single brackets or double brackets can have serious consequences so we need to always think about what we are dealing with and how are we handling it now what we can also do is we can extend our data frames that is we can add variables we can add columns that is adding variables or we can add rows which are nothing but observations so adding columns is like adding new elements to the list and for which we can obviously use dollar or double brackets say for example now this is my data frame and if we would want to add height whose information is in a vector so let's say height let's create a vector here and this one is what i would want to add for each person so let me do this and let me pass in some values here and the last one something like this so this is a vector created now what i can do is so we have data frame is called df so we can say df dollar height and then i will pass in this vector here and now if i look at my data frame you see the fourth column has been added and that's my height column now what i can do is i could have done it in a different way basically if i had my data frame i could have just done df double brackets and then give it a name and then i could have passed my vector in this way however so this is also one way of doing it we have already added the column so we don't need to repeat the step now what we can also do is we can use a c bind function and if you remember c bind that is for column binding so for example let's create a weight vector now and let's pass in some values here so for example let's say 75 65 54 34 78 and these are my values of weight now what i can do is i can just do a c bind and then pass in my data frame and then pass in this vector and in this way i'm just adding columns or i'm extending my data frames by adding more columns to it now obviously if we can use c bind then we can also use r bind to add new rows so for r bind creating a new vector won't work because we need to create a new data frame with one single observation remember row will have values of different data types so we cannot create a vector we have to create a new data frame and then we can add it using our bind so let me create a data frame here for example let's call data frame and storm and let's pass in some values here so i will say data dot frame function and then let's give name what we can do is we can give age then we can give the logical value then we can give say height and since we have added weight let's also add weight and this is my data frame now we can use our bind function so i can say r bind and then i can pass in my data frame and this new data frame which we have created and this tells me that the number of columns of arguments do not match so we will have to check this one so we have our data frame which has just height so it does not have the weight that was only as the result of c bind so let's create the storm again without weight and now let's do a r bind and let's again check what is the reason here so this is height and let me just check this so to look at this this is the error we were getting because i was creating a data frame with four columns and then i was trying to add that to a data frame which had three columns now yes we had done a c bind and c bind was showing us the fourth or fifth column but the original data frame only had three columns so what i did here was i did tom and then basically i created a data frame with three columns which matches with my original data frame which had three columns and then i could use r bind to basically add one more row so what we did was we used r bind and r bind was used to add a new row to our data frame now when it comes to sorting or ordering your data frame say for example we want to sort data frame by h now how do we do that so we could easily do sort df and then select our column and we could just do a sorting now if we do this it is good but not really what we need now other clear way of doing that would be using ranks so for example if i do a ranks and instead of doing a sort i would use order and then basically pass in my column so i would say df and then i would use h now in this case if i look at ranks it shows me a vector of ranks with rank position of each element now if i do a df dollar h it shows me the values and if you look at the ranks it will tell 21 or here the lowest value is you are and that's the lowest value and that's why we see as rank as one and so on we can look at the ranks so what we can also do is we can just do a df and then basically use ranks and we can just look at the result so this shows data frame which is a ordered data frame now based on ranks now if we would want to do it in a descending order what we could also do is we could do a df and then use order and within order i will basically pass my data frame i will choose my column and then i could say decreasing equals true and i could do this and here this could show me the value so it says undefined column names so what i would have to check is what is my data frame here so we have h and then what we would have to do is we would have to select a particular column so let's do that and here i have just selected the column and then there is a comma missing that was showing an error so now we can have the data ordered in a descending wave so there are dozens of packages such as d plier data table which can help you manipulate filter merge and sort your data frames so this is in brief about the data frames working with data frames subsetting them and also sorting the data in your data frames now one more important type of object in your r is vector and that really helps us in various ways so let's see how we work on vectors here so to create a vector we can use the c function and pass in the values those will be the objects or elements within the vector and then you can look at the value of the vector or also at the class of it which tells me the values are numeric now in case of vector all the values have to be of the same type or belong to the same class we can say so here we are creating a vector looking at the value of it and then looking at the class which says the values passed in here our character similarly we can do it for numerics that is true false and then look at the value of this and this class is logical now what we can also do is we can print all the three vectors at once and here we will use semicolon to separate two or more variables and we can pull out the values of all the vectors which see we see here now what happens if we pass in the values which belong to different classes or you can say different data types so within a vector if you do that there is something called as coercion which takes place which will convert all the values into one type and in this case it has converted everything into character similarly we can pass in values wherein we can pass logical and numeric and in this case it's not going to go for character it is going to convert everything into numeric now if i had done this where i passed a character and numeric and if you look at this then it has converted everything into character so character always takes a precedence if it is one of the values of vector and you have other values which are not characters then in that case coefficient will happen there is one more way of creating a vector and that is by providing a range to your c function so we can do that here wherein i said c 1 colon 20 and then basically look at the value of vector 7 so it shows me all the values starting from 1 till 20 however there is one more way you can use the sequence function to do the same thing now i could avoid the bracket i could avoid the c function and i can straight away pass a range and that is also fine to create our vector starting from 1 ending at 25 so what if i want to create a vector with odd values between 1 to 20. now in this case i am going to say how many values to skip or to jump so i'm creating a variable called odd value i'm using sequence function and then to that i'm passing the beginning number the ending number and then the skip or the jump and now if you look at the values it shows me only the odd values well you could have done the same thing to get even values and that's not very complicated so you can start from 2 and then you can do skip wherein after 2 it basically gives you every second value so we are looking at the even values and this is how you can create a vector which is having odd or even values now what if you want to create a vector with 10 odd values starting from 10 so you are basically giving a length so here you can say from where you would want to start what is your skip and then the length of the vector which tells me it gives me 10 odd values beginning from 20 or from 20 onwards that is we take it from 21. now one of the requirements is always to name the values so that we can access the values either by indexing or by their name which have been passed to the value so let's see that so let's create a vector which is called temperature so variable is temperature pass in the values to this look at the values of temperature now what we would want to do is we would want to assign these names to each value which makes it more readable more accessible so i can use the names function pass in my temperature as a vector to names function and then assign the names to each value of temperature now if you look at temperature it shows me the names which have been assigned well we could have done it in a different way we could have created a vector of names something like this and then what i could have done is i could have created one more vector such as temperature and instead of assigning values we could have assigned the vector to our existing vector so if you do this so you are assigning the names vector to the temperature one and now look at the values it still does the same thing so this is where you are assigning names to every value of your existing vector now there is one more way and that is using your sequence so here i'm creating a sequence which starts with 100 and set to 2020 with a skip of 20 values or every jump would be 20 values so let's do that use your names function on price and then what i'm going to do is i'm going to use my paste 0 option which takes p and then 1 to 7 as the value so we know base 0 basically skips the space and we are going to assign those values to as names to price and now let's look at our price so that basically gives me the names as we desire so these are some smarter ways of assigning names to every element or every object within your vector now how do we perform some basic operations let's have a look so let's create a vector passing in the values and then you can simply do an addition on two vectors where each element is getting added to other element of the vector you can subtract two vectors that is element to element subtraction element to element multiplication or division and you can basically perform operations on the vectors now how do we use some inbuilt basic math functions and that's pretty easy this is my vector now let's do a sum which sums up all the elements let's find out a standard deviation for all the values let's find out the variance for all the values here let's do a product of vector values find the maximum or find the minimum value so these are some basic inbuilt math functions which sometimes are useful in our data science or data analysis kind of activities now one more requirement might be comparing the vectors using comparison operators and this is where i create a vector 1 create a vector 2 and let's find out the values in v1 which are smaller than v2 values and that gives me the logicals as the response that is false true and false similarly you can do v1 greater than v2 or you can say where v1 values are not equal to v2 or equal to v2 so these are some simple comparison examples now i can create a different vector and then i can find out individually if the elements in the vector are lesser than three by just doing a v lesser than three so it compares each element with this so you are actually using one scalar value to compare it with all the elements and you can do that it gives you the logicals so you can also be doing slicing and indexing on vectors and this is very much important when you are storing your data in vectors how do you access them so let's create a vector using sequence let's give it some names as we have seen in past and let's look at our price one so that tells me the name and the values now you can access the elements using indexing so let's get the third element and it shows me 590. remember the indexing here starts with one unlike other programming languages like python where indexing starts with zero now i can also get the third and fourth value by doing a three column four i can also specify the vector and say one comma four and that shows me the first and the fourth position or second or sixth position so this is one way where you are using indexing to access the elements similarly i can give the names now that's where we see the benefit of giving names to every element so i can use c function pass in the name and look at the value for that particular name or selectively select different columns or different names or we can also use this square brackets wherein we pass the names so sometimes it can also be useful to use logical positioning that is we would want to find out the logical position if the value exists and we can do that or using true and false and then look at the values so there is one useful way where you can exclude a particular position might be that is an n a value might be a value which you are not interested in and that's where you will say minus 2 which will skip the p 2 value or minus 2 n minus 5 where we are skipping a p 2 and p phi and we can exclude particular values from our vector now how do we do a comparison operator on the values of vector so you can just say price 1 and i would want all the values which are greater than 600 or you can assign this to a filter and then basically pass in the filter for your vector so these are some simple basic operations which you can run using your r programming where you would want to manipulate where you would want to store some data and extract that data use your different logical operators or other operators and perform your basic easy computations now that we have seen some basic operations using r let's look at some more operations when you're working with vectors such as one of the common issues is handling the missing values now here we are assigning a vector to a variable order detail and this one has a missing value now let's see how this is handled and you see all the values in the vector are assigned what you can also do is you can assign names as we have seen earlier by using the names function and then look at the value of order detail so you see the names and these are your missing values which are also taken care now what we can also do is we can perform an operation on a particular vector which will be applied to all values of the vector so for example here i will just add a scalar value plus 5 to the elements in the vector and that shows me number five has been added to each element or each object in the vector now if you would want to work on two vectors for example to add two vectors let's create a vector called new order and then let's add it to order detail now in this case what we are doing is we have a vector which is from 5 to 10 and what we are doing is we are adding values to order detail now our order detail earlier was 10 20 30 n a 50 and 60 and what i have done is i have passed in a vector which is 5 and 10 and you are adding it to the elements so 5 gets added to 10 and then your value 10 gets added to 20 and then you have again 5 which is added to 30. now you cannot add in anything to a missing value so that remains as it is then you add again 5 to 50 and then 10 is added to 60. so in this way you are adding two vectors which are not of same length but you are adding these values now what i can also do is i can update the order by doing this so i'm creating an update order and now let's look at the value of update order what does it show so you are basically doing the same thing so if you would want to work on a subset of vector how do you do that so here you are using some indexes so i'm saying order detail and this is my order detail so let's take one colon two and assign it to first two so if we look at the value which is assigned to first two we have just sliced and added a subset of vector to this one and if i would want to take the length of order detail it shows me the length here which is six elements here including the missing value also what we can also do is we can do some more operations so for example from order detail what i'm doing is i'm saying length minus 1 and then up to the length so let's do this and let's see the result of this so what we have done is we had our order detail which had these values and what we have done is we have said length minus 1 colon length so you have taken these two elements and you have assigned that to your v1 similarly we can do length minus 1 and 2 elements so i can do this and now let's look at the value of v2 so this shows me the value where you are taking length minus 1 and then you are taking it till the second position of the index element which is 20 so you are getting in the values here so you get your 50 na 30 and 20 because you started with length minus 2 and up till the second index position similarly we can use the length and we can take it from this element and let's look at the value of v3 so that shows me that i'm i'm doing some slicing or i'm getting subset of my vector so similarly you can also do this one so v4 and let's do this and then let's look at the value of v4 so it gives me the values based on our subsetting or slicing now you can extract all the values below 30 and this is where you are doing a comparison so you will take your vector and then you would want to compare each value if it is less than 30 and you would want to take all the values here so it gives me the logicals or the response for all the values which are lesser than 30 what we can do is we can also use the square brackets and do this this will show me the actual values here we were just getting the logicals but here we are getting the values now to omit n a value from the vector we can use n a dot omit and this one will help me in getting rid of the n a values plus i'm also checking the values if they are less than 30 and then i am basically doing using this n a dot omit so you can do something like this you can look at the values what you can also do is you can find the order details that are multiples of 3 and here we would want to use modulus and we would want to find out if the remainder is 0 then i am getting the numbers which are divisible or multiples of 3. so let's do this and it gives me again the logical values of all the values which are divisible by 3 giving us a remainder of 0 or if you would want to look at the values then you can say order detail open up a square bracket and then pass in your condition now we can then omit any from this one and then we can look at the values so this is simple way where you are subsetting a vector or extracting the values which you are interested in which might be one of the requirements of your data wrangling or data manipulation or just data extraction now i can also use a sum function now if we do this it returns n a because there is already a missing value and you cannot do a sum on the values now what i can do is i can do a n a dot r m to remove the n a values so i can do a sum on order detail where i intend to add up all the values but what i also want to do is i want to remove the n a value so i'm giving it a value as true and then if i do it it gives me the sum of all the values so similarly you can do a mean you can do a maximum you can find out the minimum value standard deviation or even square root now these are some simple operations what we are doing on vector where we are interested in extracting some specific values now let's look at matrix which we have also discussed and matrix is also one way where you can use the matrix function to create a matrix which is multi-dimensional so for example if i do this and if i look at the value of v i get a matrix which starts with a value of 20 ends with 30 and at any point of time you can convert this to matrix so first we created a vector and now i'll create a matrix out of it wherein i am seeing the row numbers i am seeing the column number and i am seeing the values in that particular column so you have already done that now let's take it to the next level so let's create a matrix wherein we are using the matrix function we will say 0 comma 3 comma 3 and now let's look what it has done so you have created a matrix which is of three columns and three rows and by default the row number and column numbers have been assigned to them we can also create a matrix by passing in values so we can say 1 colon 9 and then give the dimensions that is number of columns is 3 number of rows is 3 and if i look at the matrix now i have passed in the values to my matrix so sometimes you may want to arrange the data in a matrix for particular kind of calculations you can also use n row and by row so you can say how many number of rows you would want and you would want to assign the data row wise so when we are doing this now if you notice the difference between the previous one where we just gave the values and we said three rows and three columns so it was doing it column wise so one two three four five six seven eight nine but here we said by row is true so it has arrange the values in a row wise fashion so it goes one two three four five six and seven eight nine similarly i could have just done this by giving the dimension and selecting by row and if i do this it is still doing the same thing now what we can also do is we can create matrix using vectors so here let's create a vector stock1 and then stock2 now we would want to merge both the vectors so you can always do a c function and then create a new vector that is stocks which is emerged result of stock 1 and stock 2 and let's look at the results so that's my stocks that's a vector and now what i would want to do is i would want to create a matrix using the stocks so i'm giving it a name that is stock.matrix i'm using the matrix function wherein i will pass my vector i will say by row so i want the values to be arranged row wise and i'm also selecting the number of rows so if you look at this one so the values which we had in our stock which was all the values have now been arranged row wise and in two rows so it starts with 450 51 52 45 and 68 that's my first row and the rest five values are arranged in the second row so one of the main requirements is instead of going for default column names and default row names we can give specific names to our columns and rows to make more sense to the data how do we do that so we can basically say days so this is a vector which we are creating and then what we want to do is we want to create a new variable which is stock 1 and stock 2. now this is for my columns and this will be for my rows now how do we assign that so we can say column names and this is where i will say on my stock dot matrix i will assign days which has five values and that will become my column names and similarly using row names function i can basically assign row names to my matrix so if i look at my matrix now it shows me the column names and row names which we have assigned or which we have passed to our matrix now there are different functions which are associated with the matrix and let's look at some examples so these are some simple basic examples now if i say let me find out the number of rows and that gives me the number of rows or number of columns or get a dimension that is the number of rows and columns of your matrix now we might be just interested in getting their own names or column names or even the dimension names which basically will give me returns the row and column names so in this way you can use these symbol functions which are associated with matrix to extract information about your matrix or data which has been transformed into matrix to pull out some information about that one of the requirements which data scientist or data analyst might face is carrying out arithmetic operations on your matrix now what we can do is we can create a matrix which takes values one to fifty we want to arrange it by rows and we will say number of rows is five so that's my values starting from one now i can do a addition here by just doing f5 plus matte one and if you notice number five as a scalar value has been added to every element of the matrix similarly you can do a multiplication you can do a division you can basically return the quotient if you would want to do that or go for exponential values so you can perform simple arithmetic operations for every element of the matrix and what if you want to have arithmetic operations done on multiple matrix so let's create mat 1 plus mat 1 and we get a total where every element is added to every element you can do a subtraction you can do a multiplication and you can get the value so this might be also very useful when you are working on multi-dimensional data you can also do some more operations on matrix such as returns the sum for each column say you are doing a summation or at a row level or you want to do a mean for every row you can do that by using these simple functions now you can add rows and columns to a matrix using r bind and c bind functions so r bind is for row bind and c bind is for column bind but for that we have to first create a vector so let me create a vector of same length which will then be added to every or added as a row to my existing matrix now my matrix has five columns so let's create a vector with five elements and then i can basically add this as a row to my existing matrix by doing this and now if i look at my values i will see the new values at this as the third row and if you also see the variable name becomes the row name and we have added a row to our matrix now similarly i can find out row means that is we have seen earlier by calculating the mean or average so i can do that and i can find out the value of average now what i can do is i have got the average for every column and what we can do is we can basically do a column bind by using a c bind function and i will say i'm going to take the total stock which has three rows and then get the average and now let's look at the total stock which shows me the average value which is the new column which has been added to the matrix so these are some simple very simple operations which you can do but that gives you good insight in what can be done at a matrix level where your data is arranged in multi dimensions now how do we do a selection and indexing in matrix so in vectors we were using either names or we were using positions or we were using indexing now here let's create a matrix called student and we are using the matrix function but within the matrix function we are using the c function to create a vector which will pass in all the values which also has any values if you closely notice we will split these values into number of rows is four so that means the values the number of values in this vector should be a multiple of four i'm saying columns is four and i would want to arrange this data row wise so i've done that and if you would want to get the dimensions out of this so i can do a dim names so what i'm doing here is on my student i am assigning a list which will basically have these names which are basically assigned and now if you look at your student it basically shows me the values which were first applied to the row names that is john matthew sam and alice and then you have one more vector which goes as the column names for the values so you have not only created a matrix by using a vector by defining your dimensions that is number of rows and columns you have arranged the data in a row order and what you have also done is using a list function you have passed in the values which will be applied as row names and column names to your matrix now how do we extract particular columns here so we can take our matrix and we can just say comma 1 and that basically gives me the values for john matthew sam and alice and what we are looking at is the first column now i can also say from first column onwards i would want to look at how many columns so i can do this and now here i'm selecting first and second column i can also be using a vector function here and that also does the same thing where am saying 1 comma 3 and i am getting the values from first and second column so third is not included here now if you would want to do row wise then you have to give the row position first so if i do a student 1 that gives me the row values and this is giving me values for my student which we are seeing here so for john we have 20 30 na and 70 and that's what we get here when you do a row wise operation you can also do a row wise and how many rows do you want you can use the vector function to do that you can also select or slice out a value where you are getting an intersection of row 2 and column 2 and then you can also start from a particular position and then onwards get your rows so these are different ways in which you are slicing the values from your matrix by columns or by rows so at this point of time let me just type in student here and let's look at the value of student and then here we are interested in three column four and then two column three so what does that give me so you are looking at third to fourth row so you are looking at sam and alice and then you are looking at columns two and three so that basically gives you your 26 32 24 and a so first is you're giving your row positions or how many rows you want and then you are giving your column so similarly you can do this you can say from row number 2 to 4 and then column wise you can say 1 to 3 so if we do this so this tells me two columns which is first and second and it shows me rows which is from second to fourth so in this way we can extract data based on rows and columns now if you would be interested in finding out a specific value so for example if i again bring up student this is my student and what i would be interested in is getting the value of john and for specific subjects so might be we are looking for 2 colon 3 now if i do this it shows me for john and what we are interested in is 2 colon 3 so that gives me the value for chemistry and biology so you are giving the columns so row wise you have already specified the name and that basically selects the particular row i could have given a number and chosen which row or which rows we would want to pull out the values now if i would want to find out the value for john and sam now in that case i could use indexing or positioning but that has to be continuous but here you are talking about john and sam which has matthew in between so we will basically create we will get the values for john and sam and then we will look at the value 4 now that is basically giving me the values in the fourth column which is 70 and 75. similarly if you go further you can look at maths and bioscore of sam and alice so you will give your row names that is sam and alice and then you would want the values for maths and bio so that is basically your third and fourth column and we can do that by looking at the values how do you find out an average well that's pretty simple you can use the mean function on student you will select your row name that is john you also want to get rid of n a values otherwise that will give a power problem so you get rid of that by saying n a dot r m equals true and then you get the average score of john now how do i do further computation that is if i want to find out the average and total score of all students so in this case i can apply or i can use an apply function here i'm saying i'm working on student and we would want to give the row number that is 1 and we want to also give the column so i want to find out mean i want to remove or get rid of the n a values and now if i look at help apply it tells me how does the apply function works over the array margins so i will do an apply function on student where i would want to select the first row i would say i want the sum and i want to get rid of the n a values so this gives me the sum for each student and here we are getting a mean value which was for each student so what we are doing here is for example let's look at student again just so that we avoid confusion so we have student and then we have physics chemistry bio maths and i have said row one so basically what we want is for john we want the total and what we can do here is we can say 20 plus 30 avoiding na and then 70 that gives me 120 then you look at matthew so this is again doing a totaling there is no n a value and you look at the value right so when we have chosen apply function we have worked on student now here we are interested in the values that is sum of all the values for this particular row i am saying take care of any and then give me a sum similarly you did a mean and that was giving you a mean for each student so these are some simple operations now what we can also do is we can basically create a vector called passing score and what we would want to do is we want to get the values or find in how many subjects alice has passed how do we do that we will have to compare alice score which should be greater than or equal to the passing score so what we can do is we can create a variable here pass now i'm saying student i would be interested in the values for alice so i've mentioned that row name here i'm then comparing it with passing score which we have created here and that will give me the values wherever alice has passed in a particular subject now i can obviously get rid of the n a values and then look at this which basically tells me there was one subject in which alice passed and rest were either false or any now same thing we can do for sam so sam is here and what we want to do is we want to look at the values here so we will say let's do the same thing for sam and find out the comparison with passing score and get rid of n a values so you are basically extracting values so these are some easier operations and usage of functions on your matrices which are filled in with values at row level and column level and then you can apply one of these functions or multiple functions to basically extract value which makes more meaning so that's with your matrix now let's also look at data frames now data frames as we know is basically data which has been ordered in rows and columns wherein we can assign row names we can assign column names we can do some operations on data frames so let's look at example so if i do a data here so that gets me some sample data sets or functions what we have here so let's do once we have our data here so it says use data package and then you can get list all the data sets in available packages and you can basically look at all the r data sets which we are seeing here it has opened up so i would be interested in getting the air passengers data so i'm going to pass that in the data function and then if i do a head to see the initial data from air passengers it shows me the values what we have similarly we can do that on iris data set and look at the head values i can do a view to look at specific values in a tabular format if that makes more meaning and that makes it easy for analysis now i can do a view on state x77 and that basically shows me the population income and all this for different u.s states so these are some different data sets what we have you can do a view on them to basically understand the data or look in a more readable format you can just do a tail to get some end data so head and tail functions just give you the top six entries or basically your entries from that particular data set now the question is how do we work on this data so i can get a statistical summary so i have the iris data set which we had here so if i do a head it shows me iris data set this is a popular data set which shows the petal lens sepal length of particular flowers and the species what is the length what is the width and what species does that flower belongs to okay now here we can get a summary that is statistical summary of a data set which gives me mean first quartile median mean third quartile and maximum values it basically shows you the count of the entries for each species what we have under the species column now what i can do is i can check the structure of this data set using str i can create a data frame now of this data using the data.frame function so for that we need to also have say for example if we would want to create a data frame let's see how do we do that so first we create a vector of days we can create a vector of temperatures and rain and then we want to create a data frame out of this so i use the data dot frame option i pass in my days temp and rain as the vectors and now if you look at the data frame you basically see that i have my days my temp and rain so those were the variables those were the vector names and those have become the column names row names are auto assigned and basically we are seeing the values which have been passed in my data frame now i can do a summary on this to basically look at what is the length or how many values we have in data frame what is the class of elements so that is character you are looking at your values or summary which gives you mean first quartile median mean and so on and then it also shows you the complete data on rain what is the mode here what how many falls or how many true values we have you can also look at the structure of this data frame by doing a sdr which gives me how many objects we have how many variables we have what are the different variables so that is days temperature and rain and the values for those for days if you notice it is of the type character temperature is numeric rain is logical now how do we do data frame indexing so like your matrix which basically has rows and columns and in multi-dimensions similarly in data frames also you have indexing so you can do a data frame so i could just extract the first row by doing this and that basically gives me the value so you can always compare it by just typing df so that's my data frame and now let's look at the values extract the first row and that shows me monday 25.6 rain value is true now i can also do it column wise so for example i could do it in this way so here what i'm doing is i'm doing extracting the second row from this one so it tells me 25.6 30.1 40.0 37.3 so you have extracted the values for the column right so i would not say extract the second row you would say extract the second column okay so this one should be second column yeah now selecting using column names so that's the easiest way to extract the values for a particular column so i can just do this instead of giving the position of the column or the column number i'll give the column name and that gives me all the values of temperature and if i do this where i'm saying 2 colon 4 and then i'm giving the columns so it gets me the second third and fourth rows for day and temperature and we are looking at the value so you have given your row names and then you have selected your columns you can also do a dollar sign if you would want all the values of a particular column so i can just do a df dollar days or df dollar rain and it shows me the values from my data frame now one more way of doing that is using your bracket notation to return a data frame format of same information so if you want the resultant data in a data frame format you can just do a df rain or df temperature and that is basically giving a data frame so if i had assigned this to a value and if i had look at the type of this that would be data frame now one of the things which we also require is filtering data frames using a subset function so that is subsetting the information from a data frame so we know we have our data frame let's look at our data frame again so that just reminds of what data values we have and here let's get a subset out of it using the subset function so i'm passing in my data frame i am saying i would be interested in the rain column so i am giving subset rain column and wherever the values are true so returns all the columns where it has rained similarly i can do a subsetting by giving a value for temperature wherever the value is greater than 25 and that shows me the value so this is where you are filtering the data in data frames using a subset function to which you have to provide a column name and then giving a condition now one more important thing which might be required is sorting your data frame using order function so i can create a variable by name sorted dot temp i want to do a ordering of data frame and here i am doing a ordering based on temp and now if i look at the value or i can create this in an ascending order so let's look at the values and now if i look at my data frame it just gives me the order or the ranking for the particular values so we have discussed this in other section also so what i can do is i can return all the columns with temperature sorted in a descending order so right now what we were seeing was we were seeing in an ascending order but what we can do is we can do that in a descending order so here i'm creating a variable descending.temp i'm doing a ordering but when i'm doing a ordering i'm using the minus symbol and this one if you would look at in the form of a data frame it shows me the values which are ordered in a descending order based on the temperature column now another way of sorting is by using a particular column so what i can do is i can sort i can do a order and then i can choose the column based on which i would want to order it and then if you would want to get the values of this so it tells me the values have been ordered based on temp so this can be very useful when you would want to sort the data or order it in a particular way to basically understand your data or to make more meaning out of it right similarly one more requirement might be merging your data frames so here i'm creating a data frame so i'm saying authors and i'm using data.frame function and what we are doing is instead of creating three vectors i am basically doing that within my dataframe function so let's do that and now what we can do is at this point of time i can check what my authors look like so this is my authors now here if you see we have the vector turkey venables tierney ripley and mcneil so that becomes my first column which is surname then you have your nationality and then you have deceased where you have also repeated the values four times right so that's something new which you might be seeing so you are creating a vector where you are passing in a value and for other set of values you are basically using a repetitive function now similarly we can create a data frame called books and this one is where i am having name column title and then i have other dot author and you are passing in the values so at this point of time if you would want to look at your books it would look something like this so you have given a name now just closely look at the data frame function so here you are using the names you have the titles whatever values you are passed in always remember when you have multiple vectors they are ending with a comma right so do not forget that and then you have other dot author so that's the name of the column and you are passing in the values where you have also passed some n a values and at this point of time you can look at authors this is your books and our intention will be to merge these data frames so that's what we would want to do might be we are interested in getting the data together so what i'm doing here is i'm saying m1 now i want to use the merge function i pass in my data frames that is authors and books so if we closely look at authors it has three columns and five rows and here you have three columns and we have seven rows so we would want to do a merge so we will say author's books and we will say by dot x so this is where i'm choosing which is the column based on which i would want to merge so i have by dot x which is surname and by dot y which is name so we would want to merge the data where we are giving a condition based on values and surname and name so you see there is turkey here there is turkey here we have venables we have venables we have tierney we have this one we have ripley which we have here multiple entries and then you have mcniel now we don't have our core which is there in your author so let's see what happens when we do emerging here okay and now we see the result of this merge where it has taken all the values from both the data frames so you have surname nationality deceased you get the title you get the other dot author which you are getting in from your books and the name column is avoided right because we are doing the merging based on surname and but y dot name is name so we don't see the name column but what we are seeing here is the values which have been merged and then you can compare so for example let's do a random check so if i look at macneill that's the surname or here it was named so you have mcniel you have a nationality which comes from the first data frame deceased from the first data frame then you have your interactive data analysis and then you look at title.author what you don't look at in the merge is this r core because this does not have any value in your author's data frame so you can do a merging of your data frames using the merge function so please try it out and you can create different data frames and try to use this similarly you can manipulate a data frame so for example here we are creating one more data frame called sales report which is data dot frame you are giving an id product has some values unit price is where you are getting the values as integer and quantity as integer so now if i look at my sales report this is the values which i have let's spend a couple of seconds to look at this value so id value is 1 0 1 2 1 0 10 product is a b so that is automatically assigned unit price is starting where you say 101 140 184 right so we are using a as dot integer we are converting it into integer and basically we are assigning these values here for your unit price and similarly for quantity we are assigning the values by doing a as dot integer and then just doing a run if now once we have done that we have created a data frame now how do you transpose what do you mean by transpose so transpose is when you are changing your accesses so if i do a transpose on sales report and if i want to do a view so you will see the positions which have changed so you have all these values so my row names or row whatever values become the column headings and basically your column headings becomes your row names so that is what you're achieving by doing a transpose you can do a head to look at some initial values you can do a sorting of this data frame by using the order function and you can choose the column and also the order if you would want to have it in ascending or deciding or basically increasing or decreasing values you can also choose a particular column like we are choosing product as a column and i would want to take the values of sales report in a descending order that is unit price and we can just do ordering of data frames or sorting the values and data frame so this is pretty easy please spend some time in practicing these things taking these examples and you will learn more about these functions you can always try creating an example at your end and you can try to look into these now what about subsetting the data frame so when you are saying subsetting the data frame let's do a subset function like what we used earlier i will say subset dot product a i'm using the subset function and here i will get the subset based on the product value being a let's look at this and this shows me only the values where product value matches a now extract the rows for which product is a and your price is 150 so you are still doing a subsetting you are still passing your data frame here you will give the product as a which will tell basically the values for product and unit price greater than 50 so you are giving some conditions and look at the values now if you are only interested in particular columns so if i say only the first and the fourth column product is a and unit price is 150. so you have to still use your subset function pass in your data frame product will be given as a and unit price should be greater than 150 but what i am interested in is the values from the first and the fourth column and now if you see it shows me the values for my fourth column what we can also do is we can create two subsets so set a from data frame where we take the product as being a other one is being b and then we can look at the values so this is just a this is just b and what we can do is we can combine them or we can merge them using column bind so when i say column bind and i'm saying set a set b so it is basically going to stack the data frames column wise and if you do r bind it is going to stack the data frames row wise so we can either use column or we can do a row wise so this is in one way where you can merge the previous example where we saw merging was based on a particular condition which is met based on some columns which might have similar values right and this is where you are straight away merging the data frame using columbine and c bind so if you compare this with the other merge operation what we saw here this was where you are comparing the values of first data frame and second data frame and then merging but here we have just used column mind and row bind so we are not merging on a particular condition we are just tracking them either column wise or row wise now what we can also look at is doing some aggregate operations this is going deeper into data frames so when you use aggregate function you are passing in your data frame you are choosing the quantity column and then you are basically using the list function so list function is going to work on your data frame on the product column so product column for your sales report so at this point of time let's look at sales report and let's look at the value here so this is my sales report and what we want to do is we want to aggregate the values on quantity column but for that i will say i will just take the product columns and i will get a sum wherein i am ignoring the any values let's look at this and that gives me an aggregation value so remember aggregate function is doing a summing up now here we are doing a summing up on your product that is sales report product column is what we have so you are kind of grouping by based on product so we have two products here a and b now what we also want to do is we want to take the quantity column so that's why we have given that first and what we are doing is we are doing a summing up so we are summing up all the values for a and all the values for b and we are seeing that here if there are any n a values we are ignoring it so these are some basic operations on data frames or matrices subsetting them extracting useful information using some inbuilt functions to do transformation or computation and extracting sound values now similarly we can also work on lists now that we have looked at data frames matrices vectors let's also look at one more structure and how we work in r when we have to work on lists so list is basically a structure here and what we are doing is we are creating a list by using the list function and here i am passing in three vectors you see here now c function is being used now in vector we know that all the elements are of the same type now let's create a list wherein we see three vectors which are of three different types or objects of three different types so let's create this list and now let's look at our list so it basically has elements wherein you have values of different types we can create a different list which can also have sequence elements that is 1 to 10 a matrix which is of three dimensions and then also passing a list so this is also one way of creating a list let's look at list two and if we look at the values here list two basically has a vector which has values one to ten it has a matrix of three into three it has a list which has values a having 10 and b having 20. so this is how you can create a list which can have objects of different types so we can also use recursive variable a variable that can store value of its own type so for that you have to use a recursive function something like this so i'm saying is recursive and then do it on your list and we can check if the list basically has a variable that can store values of its own type now one of the main requirements when you're working with list is indexing so i have created a list and here i can access this elements by using an index so if i do this this shows me the matrix what i could have also done is using the dollar symbol and then choosing particular element of the list by doing a mat which is the name given to our matrix or by choosing a name that is vector so you can access the elements using indexing or dollar generation or giving the name of a particular element now i can also work on list and i can get the third elements second value so we can do that and that shows me 20 or you could have done by giving the value 3 that is the third element and within that you are looking for second element so i can get the length of the list i can get the class of the list which shows me this type list and what i can also do is i can convert vectors into list so here we are creating a variable price which is being assigned a vector which has 10 20 and 30 and now what i want to do is i would want to convert this vector into list and for that i'm using the list function so i am creating a variable called price list and then i'm saying as dot list so that's going to convert my vector into list and now let's look at price list which shows me a list or you can look at price which is a vector so that's when you are converting your vector into list now how do you convert your list into vector and that also can be done by doing a unless function so i can basically work on price list wherein we converted vector to list and i can just do a unlist on that which will convert my list into a vector looking at the values of the vector now sometimes we may want to get the dimensions so we can use the dimension function to convert the vectors to a matrix so that it can have multiple dimensions so here we create a vector which has four values and then i am going to give a dimension to this so that it is converted from vector into matrix by giving dimensions 2 comma 2 and now if you look at price 1 it has basically changed into rows and columns of two into two dimensions so these are some simple examples of working with list now when you talk about basic data type functions we have seen how you use the assignment operator how you get the data type of a particular variable or the class to which it belongs i can assign different values such as 10.5 so the previous one was showing me the value numeric and now what we would want to do is we want to assign a value 10.5 look at the class of it it says numeric type of it shows double so by default it belongs to the double class now i can check if the values in n1 are numeric and that shows me true and similarly for n2 and that shows me 2. so you are using the numeric function which returns true if the given value is numeric similarly we can have integer assigned to a particular variable and for that either i can do as dot integer or i can assign a value with capital l so i can do this and look at the value of i1 similarly i2 and look at the values and if i would want to check if that is an integer let's look at the values of i2 which was an integer i1 which was an integer ni 3 which is an integer so here we have assigned integer values to a particular variable now all integers are numeric but all numerics are not integers so let's check that so if i do a is numeric on i 1 which was assigned as dot integer 10 that shows me true if i say is dot integer on i1 so was that an integer and if i look at the value it shows me true now let's look at the character values so if we say c1 c2 and look at the class of this it shows me this of character type similarly on c2 and you can always validate that by using the character function you can also use some inbuilt functions such as converting to an upper case or getting a sub string from the starting till the position what you would want the elements i can do a paste function which basically will give me the data combined or you can say concatenated you can also use a paste 0 which we know will get rid of the space and it just concatenates them without a space i can also use a specific separator which we have seen examples and we can do that and what we can also do is we can replace set of characters so here i am saying substitute and then if i look at the values it has basically replaced rob with cena and let's look at the length of it or number of characters in this so these are some basic operations what you're doing on matrices on your data frames on your list and also on your variables where either you are assigning them values of a particular type or you are changing the data types you can also go for coercion in case of vectors we have seen that where if you are passing in values of different types that's coerced into same types so later we can learn more on functions and flow control and how that is handled in r let's learn how r can be used to take care of flow control that is if i would want to have a if else condition and if what i would want to compute or if i would want to check some values how r can be used so here if statement consists of a boolean expression which is followed by one or more statements so we can just say if we can pass in a boolean expression where we would want to compare particular value or we would want to check a particular value and then whatever is passed in the statement will get executed so what we can do is here we can use assignment operator i can pass a value to x now we can always do a type of and that can tell me that x is basically an integer and now i can use my if where i can say is dot and then i can choose integer and i would want to check the value of x if that is an integer then i will just use brackets and i'll pass a statement here so let me say print and let's say x is an integer and we can execute this and this tells me that the boolean value is true now if for example we would have done something else or say for example instead of integer if i had used let's say character for that matter and we can check the value and we can do this so here we will check the values and it says there is an error with the bracket and let's check this one so if x because we missed a bracket here so let's do that one and then try this and it doesn't show me any result so how could we handle something like this if the boolean expression does not match to true and in that case we can always go for else statement so we can check for a value so if the boolean expression is true statement will be executed and if it is false then next statement will be executed so we could have done the same thing here where i said print x is an integer which we know is not true and what i could do is i can here after this one say else and then i can open up one more bracket and then i can say print and i will say x is not a character and now we know that x is not a character so this is a simple way where you can use if else and you can control the flow by passing in the conditions now that's when you are using if else statements now what about while loop so that also can be useful when you are programming in r so an else statement is executed when the condition in the if statement results to false so that basically means what we can do here is let's pass in a word or a set of words like this for example let's say v and then we use c function to create a vector for example and then i can just say hello world and if you look at v you can look at the class of v it's of characters and if you look at type of v it is having the objects or elements as character now what we can do is we can basically then say count and let's assign this a value to now what we would want to check is is the count of elements in our v equals to two so what i can do is while my count is less than say five now i'm saying i would want to do something while the count is less than 5 so we have already given a value to count as 2 and now what i can do is here i can open up a bracket i can say print and then pass the value of v and then what we do is not only this we will also increment the value of count and we will say count plus 1 and here it gives me error probably because we have missed a bracket so let's see what we are missing out here so let's just check this one again so here it is we have created v which has two elements of the type character and then what we do is we assign count a value of two and we would want to check while the count value is less than 5 we would want to print the value of v so what we are doing here is we are saying while then you pass in an expression which will check the value of count we do a print and then we increment the value of count now this is a simple example where you are using while to basically test an expression and while that expression is true you would be doing something whatever is passed within your brackets now we could also be going for for loop now for loop is basically used to iterate over a list of elements or a range of numbers so for example if i have a vector like fruit which has some values i could just say for i in fruit i would want to print something so let's try this also as an example to test our for loop now we can just say names and we can basically then assign values to this so let's say vj aj dj and let's say sj and let's create this let's look at the value of names now what i can do is i can use a for loop and i can say for i in my names so i will say for i in names now what do you want to do so open up your brackets here and then we would want to say print i and then basically close the bracket so you see for every element in this vector it is basically going to print the name one by one so you are iterating through a set of objects by using a for loop now this is how we can work on for loop so if else while and for loop can be very useful when you would want to iterate or when you would want to check the value of an expression or when you would want to loop and do a particular task it's always good to understand how you manage flow control in r that is either when you're working with your for loops your while loops also understanding how you can use your logical operators for working with your data in r so let's look at some examples and understand logical operations so either you could be having and or you could be doing a or where you are evaluating one condition or you are using not so these are your logical operations now here i can assign a value to x and then i can check if my x value is less than 10 and it shows me false so i have been checking the value of x so let's see is it greater than 10 and that's true now i can use logical operations here so i can say and so i'm saying is my x value less than 20 and is my x value greater than 10. now both these conditions are not true so in this case we get the result as false but if i say x is greater than 20 which is true and i am saying x is greater than 5 that's also true and x is equal to 25 now whenever we are talking about and we have to look at all the conditions have to be right so let's look at this and we get the value as true but if i say x is greater than 10 or x is later than 5 then one of the condition has to be true which is true in our case so we get the result as true we can take a different example we can say is x less than 20 which is not true but is x equals to 30 and that's also not true so in this case we get result as false now we can straight away compare some numbers and we can say is 12 equals 3 and that's false and if i say not then that basically will give me the result as true so these are some simple logical operations which help you when you're working with your data in r now we can create a data frame by using an inbuilt data set empty cars and let's look at our data frame so that shows us the values with all the different car models and the different column names so car models are the row names and then you have other things like mileage and cylinder and so on which are the specification for the data now what i can do is i can filter out values here using indexing so i can say data frame now in that data frame i would want to compare the value of mileage which is greater than or equal to 30 and then i can end it with comma so that gives me the value wherever the mileage is greater than 30. i can also do a subset on data frame where i can select a particular value so we can be doing this or we can be using square brackets we can also do a dollar and compare the values now we will use our logical operations knowledge here so we will work on data frame where i am interested in the mileage which is greater than 20 and i am looking at the column hp horsepower and that should be greater than 100 remember when we are doing a and both the conditions have to be met as true and that shows me the result where you are looking at the mileage and you are looking at the horsepower column both of these are met and that's why we get the result so these are some simple examples of using your logical operations either when you're working on a data frame so same thing can be done on a matrix same thing can be done on a list or a vector or individual values now let's also learn about flow control that is how if else or else if is handled in r so you can do a single condition check so for example i assign a value to hot which is false and i am saying temperature is 50 now what i would want to check is if the temperature value is greater than 60 which in our case will not be true which will not be true because temperature has been assigned 50. so is it greater than 60 no so if i do this if condition and i am saying if the condition is true then i would want to assign the value of hot to true and now if you look at the value of hot it is still false why because the condition which we passed for our if is not true it has not been met so whatever was passed within the statement has not been done now let's change the value of temperature as hundred and now if we do the same thing we say is my temperature greater than 60 which is right so then whatever has passed in the bracket will be applied so hot will be assigned new value and now if you see the hot value is set to true so this is a simple single condition check what you are doing now certain times there can be multiple conditions to check and that's where we use else so in this case we go for assigning a value to score which is 63 so let's do that and now let's say is my score value greater than 80 which is not true so whatever is passed in here which is print it's a good score will not be done but it will jump to else and then whatever we have passed in else will be done so it will say it's not a good score so let's do this if and it says it's not a good score so this is a simple way of using if else where you are checking two conditions or you are checking the condition but what if the condition is not met then your control is passed to your next statement now i can also do a else if so i can say score is 63 and i can say is my score greater than 80 that's my first condition so it would pretend good score but might be i would want to check something else so i'll say else if and i'll say is my score greater than 60 yeah and is it less than 80 remember the and which has to evaluate and true for both the conditions so i'll say print decent score i can still keep on giving conditions here in elsif score less than 60 and score is greater than 33 that would not meet so that will be ignored and then you have else which says print poor so first it checks or evaluates for the condition which you have passed for if if that doesn't work then it goes to else if and if anything in else if is met then it's going to take that into consideration and it will not go for else if if and else if conditions are not met then it goes to else and we see decent score already printed here now that's a simple example of if else and if else if wherein we are evaluating a condition but probably we have multiple other things to check now how do you work with while loops in r that's very simple so what we can do is we can assign a value to x and now i will say while my x is less than 10. so i'm going to create a loop so i have said my x has been assigned a value of 0 and that's fine so this is going to be less than 10 but if we are going to just do this then it will keep running and it will get into an infinite loop so we'll see how we do that so we'll say while x is less than 10 i would want to basically have the value of x i would want to print x is still less than 10 adding 1 to x and what we are doing is we are incrementing the value of x now if you do not do this step then it will get into an infinite loop because x will be always less than 10 so we are incrementing the value of x by one and then we are giving a condition so if at any point of time x is equals to 10 i would want to say x is equal to 10 terminating the loop and then basically my while loop ends so we can do this so let's say x is 0 and then do this while loop and now you see it is at every step it is basically printing out the value of x it is still less than 10 adding 1 to x and it also gives you the value of x so when we do a x is currently and i print out the value of x so it shows me 0 next time you increment it it becomes 1 and 2 and so on so this is where you are using a while loop where you are looping where based on a particular condition and then you basically have once the condition is met you are able to complete the loop now let's look at let me take this one here we'll look into functions in a later stage so let me take this function and let's get rid of this one i would also want to talk on break statements and while loop and once we are done with the flow control on while loops then we can look at the functions aspect either we can look at how we control our functions or how we create built-in functions so let's look at this one and let's continue with our while loop so we just saw a simple while loop here and what we also want to see is when you are working with your while loop how do you break if a particular condition is met so we saw a simple example of while loop and that's fine wherein we were printing out something we were auto incrementing the value of x we were also checking at one point of time within our while loop if the value of x was met we would say we are terminating the loop and it comes out of that now if that does not happen then we continue doing it how about a break statement so break statement is when you would want to end the while loop if a particular condition is met so for example here i assign a value to x which is 0 now i want to evaluate this lesser than 5 so that means i will be auto incrementing the value of x so i'll create my while loop i'll given a condition that x is less than 5 now what i want to do is i want to use the cat function which will print the value so i am saying x is currently and i am printing out the value of x then i say print x is less than 5 because we have not yet incremented the value of x we are adding 1 to x like what we saw in previous example i am saying x is then incremented by 1 and here i am saying if x reaches 5 so while we keep incrementing the values within the file loop we'll see if x's value is 5 we will print it is equal to 5 and we can just do a break now if you do not use a break you can still end the while loop but break is basically to end this loop here based on condition which is met and we can do this and then run this while loop so you see here x was met as 5 and we just broke out of the loop so that's your simple while loops what we are seeing similarly we can work on for loops so for loops can also be useful so your conditionals what we saw is if else or else if your while loop is while a condition is not yet met you keep looping and keep doing some actions now what you can do is you can also work on for loops so here i am creating a vector and then i am going to loop that is i am going to i trade through every element so i will say 4 and when you're using for loops you'll say for and then you can given anything you can given any value i can say i i can say x so i'm just giving temporary variable invector and then i'm printing it out so this basically prints all the values one by one so there is one more way to do it you can say 4 and you can say i in and i would want to take length of the vector so 1 2 the length of vector that is till the last element is reached i would want to print the vector elements using the value of i so what is i here it's the index position and i can do it in this way so if you are looping over a list so i'm creating a list and it's very simple so you can just do a for loop where you can say for i in list i want to print the i and that gives me the list elements or you say for i in and you give from starting position that is 1 till the length of list and you would want to print every element so here we can also use double brackets so if you would want to loop through a matrix so sometimes that might be required so let's create a matrix which has 1 to 25 values around by row and you look at your matrix and now what you want to do is you want to iterate through a matrix so you want to do a looping so i'll say for i in matrix i would want to print out the values and that prints out all the values in matrix now what if i want to print the square and square roots of numbers between 1 to 25 so i can say for i wherein the value starts with 1 ends with 25 and then within my for loop i can basically give this condition where i am saying get me the square root that is i into i or get me a square root of i and just print it out so i am saying message i is this one square root is this and my square is this and square root is this so if i look at this values here now i am looking at all the values from 1 is to 25 i am looking at the square of the values and i am looking at the square root so what we did was we did a 4 we passed in the elements by saying i in 1 to 25 and within the bracket i have said what do i want to do for every element so either i have calculated a square i have calculated a square root and then i am printing out when i am using the message function which takes the value which you are passing in comma the value of i similarly square and similarly square root so these are some simple examples of understanding flow control in r that is using your for loops your while loops and also your if else later we will spend time in learning about functions which could be either created by the user or built-in functions and also factors in r all the best happy learning take care hi there if you like this video subscribe to the simply learn youtube channel and click here to watch similar videos turn it up and get certified click here

Transcript for: