Transcript for:
Advanced Data Representation

hello everyone welcome back to this Channel and today we're going to start off a new series called Advanced a level computer science we starting from chapter 16 and in this video we're going to look into the advanced version of data representation so this is how the chapter looks like first we'll talk about the what data type is and how are files different types of files being stored and access and lastly we'll learn how to use binary to represent real number which is decimal numbers in mathematical term by the way if you like my videos and you have not subscribed yet I would appreciate it if you could hit the Subscribe button because there are over 70% of you who have not subscribed yet and that will be the only favor I ask from you thank you so much so let's move on to data type so this is the topic that we mentioned briefly in the programming chapter but why is data type important to the computer look data type tells the computer the type of data that a variable can store so I just want to give you an example here and when we are learning about data representation we know that we can represent um character using asy and using we can also use binary to represent numbers in denary and this bite of binaries here they can you know it can mean a in asky or it can also mean 97 in denary therefore it is important for the computer to know all right this bite of code is it used to represent a character or is it a number and then only then they can interpret accordingly so this is a few data types I want you to know there are two types of data buil-in data type and userdefined data type so build-in data type are predefined data type provided by the programming language so the programming language will Define the following stuff the range of possible value for example Java integer data type this is the range that is acceptable and what operations that are available for manipulating values assigned to that variable so if you were to add up characters you get a concatenation of it but if you were to add up numbers you get you know some of the number whereas the other type of data type is called user defined data type and one of this is that you can create userdefined data type in Python using classes something we'll talk about in a chap in a few more chapters basically you create your own type in this example I create a type called point which represent coordinates like there's an x value it's a y value now there's also something called a composite composite and non-composite data type and the composite data type refers to data structure that contains multiple values for example array list dictionary so these are the data types that inside one data type it contains many other different primitive data type and in sud code we create something like recall data type you can see that it has many different data types like string value real date pin so this called composite data type similarly in Python we can create a class whereas non-composite data type known as primitive data type represent a single value and does not have internal subcomponents so one example is integer floating Point numbers character bullan and enumerated data type which is something we'll talk more on the other side which is here what is enumerated data type it is a user defined collection of name constant value that represent a set of related items or options meaning that this particular data type they have a fixed amount of possible values for example I can create an enumerated data type called days of week that contain only Monday to Sunday RGB color red green blue and this is how I create data type like that in sudo code um you can see that if I were to assign the other day like uh fifth day then it wouldn't work because it is not part of the enumerated data type okay by the way um a note here is that this is not a string it's called inum enumerated okay and this is how I do it in Python you first need to import the inum class and then you can create a class called day of week and then create a new variable using this enumerated data type that has limited number of values defined by you the next type is called pointer data type um it stores the memory address of another variable or data element in the computer memory so if we look look at this example here line by line so we can first create a pointer type that points to the integer data type and in sudo code this is done by the upper Arrow not sure that's right the upper arrow symbol that will point to the integer data type and when you declare a variable you want the integer data type that's when you can use the type that you just declared which points to integer declaring variable so this is um basic one how we declare variable and assign a value to it and assign the address of number one to my integer pointer and if you want to find out the address which a variable is store that's when you use the a symbol all right and then you can de referencing it de referencing means that look we are using the mind integ pointer which contains not a variable but an address but then if you want to get the value within that address that's called the reference L you can use this uper a here and it will then take 100 out of this pointer multiply by two so number two is actually assigned to 200 I know it's a bit complicated you might say what's the point but then this is just an example that I want to show you to illustrate what um how to point to a data type as an and um then how to get an address of a variable and how to Def referencing an address next set data type in programming set data type is an unordered collection of unique elements without duplicate values and set is unordered it has unique elements and it is also unstructured meaning no indexing can be associated with the elements in the set so this is some examples some of you might have learned it in mathematics all right and what some of the set data type operation meaning once you created this data set this operation you can do you can check if a value exists in a set and you can add a new data value remove an existing data value adding one set to another and this is how I do it in Python basically I create a list and to convert it to a set you can just use a set function and what it will do is that this set function it will contain only the unique element the duplicat one will be removed okay if you want you can to convert this set back to list using the list function with that you will have removed all the duplicator item inside your list so another thing you can do is call intersection meaning um you find out what is the common elements that both sides have and this is the code use to find the intersection you feel free to copy and paste to use it now that's all about data types um hopefully now you have a better understanding let's move on to file organization what are the different types of files you know files are used to St data what but then what are different types of files used to s data so we are look into all types of files is stored using specific binary code Z and one that allows the files to be used as intended now we have different types of file called the text file which is Source data in human readable format according to character code that's the txt file normal one whereas a binary file store data in the format that the computer can interpret directly as a sequence of bytes so the organization of binary Fe file is based on the concept of record so let's say I have a file here this file will have two records and within each record it will have fs and also what's the value of that VI it can be a bit abstract to look like that therefore I provide an um example let's say for an image file the record would be what is in within each pixel and the F would be what is the red blue component of that particular pixel okay it could be RGB green also can be involved but then this just an example pixel one pixel 2 pixel 3 pixel 4 whereas the audio file could contain each record could be like the first millisecond of that voice what's the amplitude what's the right amplitude and second third fourth so on and so forth so that's is how I do to help you understand the concept of record and um f and value now let's look into the three different types of files that is mentioned in your syllabus serial file sequential file and direct access file and we'll learn how file deletion and access work in all three different types of files so serial file is a file organization method where records arranged in a linear sequence one after the other so real life scenario when serial file is used is that daily transaction locks in a financial institution each day the institution records transaction sequentially in the loog file pending new transaction at the end where sequential file is when records are arranged sequentially and access in a predefined order there's an order we want it to maintain typically from the beginning to the end and modification often require rewriting the entire file meaning if you want to sort it then you have to rewrite the whole file um real life example order processing you want to know which order comes in first so that you know which file which who to ship the product to First the next one is the fastest F type of file called direct access a file organization method allowing data retriever or storage at any location within the file without needing to sequentially Traverse the entire file so for both serial and sequential file to find a particular record you need to go through this entire file one after the other direct access mean you can directly access a record without going through the file from beginning to the end now how do we achieve direct access the first way is to use a sequential while file with a separate index file so look this is my um table here so instead of going through the record one by one I can create an index file that stores ID in asending order and then and this ID here it has another column called pointer which points to the specific record meaning that once we get the ID we can just Travers through this list which is sorted which is easy to find and then once we find the record we can look at the pointer and then point us directly to the record that we want um in this example here it says that scanning through the index table is a lot faster than scanning through the entire table due to the usage of efficient data structure which we'll learn in the future but then in essence it says that going through here is faster because it's sorted we can use some specific technique to make the process a lot quicker and effective the second way to achieve direct access is to use a hashing algorithm what hashing does is that it takes in data as input and produce an output that tells us the location which we should store the file for example my hashing algorithm I created a simple one let's say my algorithm is that I will divide the number by 10 to get the remainder of the ID and that remainder will be where I st the file for example 2076 if you divide by 10 your remainder will be six and this is when you put this record into this file location six and with that you can see that different records are put in different types of the file and this allow a direct access meaning if you want to access the file you just find the remainder of the student ideal when it's divided by 10 and just go to that location instead of transversing one after the other now now um of course a collision Could Happen um when the ID has seven as the least significant figure this is when what we call Collision and one way to avoid Collision is that we need to use a good hashing function mine is really bad uh this will prevent Collision of in other words we minimize it the produce hash value that seem random without predictable patterns so one of the method is to use prime number of a similar size so this is something that you will learn um when you started coding hashing algorithm for um the next few chapters so dealing with Collision there are a few ways Implement a sequential search method to locate an empty address immediately following the calulated position meaning once you got a collision you just do a sequential search after the current position to find an empty splot or you can reserve a set of additional address at the end of the file to accommodate overflow data meaning if there's a Collision move the F there establish a link list accessible from each address for efficient data Retriever and management now um of course sometimes we might not have numbers let's say you want to use names as the input to your hashing algorithm then we can use the asky code for the alphabetic characters so this is just a side note to help you understand how to use it now file access for the different types of file will also be different access occurs sequentially from start to end with data retriev or pended in a linear order that's for serial file very ineffective sequential files also take place sequentially but if the key value is known for the record containing the one data the process is faster because only key few value needs to be read instead of all the columns you only look at one direct access is pretty fast easy you just use the hashing algorithm reverse the process and find the um place which stores the data you want to and of course just now as mentioned Collision could happen this means that you might need algorithms to um handle the Collision properly whereas for file deltion involve marking the records as delete for serial file sequential file this is a bit tedious meaning because the data need to be transferred from the old file to new one until the targeted record for deletion or modification is encountered and and after which any changes are made and the remaining requests are then copied to new file so that's um sequential file direct access file involves marking the specific location as available meaning we don't just take out the data we simply mark it as this is available meaning if there's a collision in the future in that particular spot then it can replace the value directly now choice of organization like which type of file should you use for serial file use it when they needs to be maintained in a chronological order or when continuous appending of data is required and sequential use it when data needs to be processed in a predefined order direct access if you want to access your file in a very quick manner so direct access should be the ideal file if you prefer quick speed now the third part of the video talks about how to represent real number so far we have learned how to represent numbers um negative numbers in using binary but we haven't learned how to represent real number like this decimal point now to understand how to use binary to represent real number you need to understand how a decimal number can be written in different type of standard form these are three ways are equivalent to 64.7 um so these are the only sensible choice we can encod it into our program and I want to show you the first method which is um not the best method but then hopefully after this segment you can appreciate the letter method it's called fixed Point representation so it's a numerical format used in Computing where specific number of bits are allocated for the fractional part which is the decimal point and integers meaning icate four bits for example for whole number three bits for fractional part that's why it call fixed Point representation the number of bits is fixed so for example if I only have eight bits then I could I need to allocate one bit for sign bit positive or negative four bits for whole number three bits for fractional so if you were to use tree bits these are the only possible decimal point you can get so 0 represent 0.125 0 0 1 0.125 meaning one step is equal to .125 of course using this method let's see if I have this bite of code here so this bite of code stand for -1.7 5 so you can see 101 if we refer to the tables it is 0.75 so hopefully now you can see that if I were to increase my bits for fractional part I will have more um choices regarding on what I pick um so using eight bits um Let's do an analysis on what's the largest positive value and negative one so the largest positive value that it can represent is 15.875 smallest 0.125 and negative value -125 and it's the same so you can see that the range is not very big and if you have something special like 15 14.99 it can't be represented using 8 Bits fix Point representation so this is when floating Point representation comes in as more flexible but it's Al can also be a bit harder to understand but it can represent a bigger range of number now how FL Point representation work is that um it has the term mantisa exponent and R which is the base implied to now what happened is that in this method we want to specify bits used to represent mantisa and the other bits used to represent the exponent for example um for bits to the mantisa and four bits to the exponent look we let's look in into the mantisa part U for now it can be a bit harder to understand but let me finish the whole picture and hopefully at the end of it you'll understand it so there are multiple ways we can let's say we using four bits to represent just the mantisa and if there are many ways to um organize where the binary point is meaning where do we put the point part of it so if I were to put the point part that means I use three binary digits for integer and one for fractional part then these are the only possible value the fractional part can only be one or zero therefore I can only have 0.5.0 and integer part uh I can only have from value from zero to four because I I need to use two bits for the integer and also the leftmost bit as the sign bit here so these are the Tre largest and three smallest representation for if I put a floating Point binary Point here but if if I were to move the point to this side here meaning I only start I only have zero because I don't allocate any bits for the integer and that's the sign bit look I not if when I have more bits for fractional then I can have more choices in regards on how instead of 0.5 and one 0.0 I can have 0125 25 7375 so again this is the largest and smallest repr presentation of course if you don't want any point you can also write use all the bits just for the thinnery now let's look into what we can use we can do with um this floating Point representation what can we represent if you look here let's say this the leftmost bit is the sign bit and this is like um the three bits we Ed to represent the Mena values you can see that the largest possible value imagine that I'm using all the bits for floating Point meaning like this fractional and these are the largest positive value I can represent so you can see that E uses 4 point to to the power of seven so the biggest number I can represent is 112 and the smallest positive number after calculating it I got one over 248 whereas the smallest one is also the same now you can see that um this is how we do it so basically for exponent this one is also the sign bit if we were to compare fixed Point representation that we do at the beginning and floating Point represent that we do at the end you can see that floating Point representation even though it's more complicated it can represent a larger range of values now again we are not done done yet with floating Point representation I still want to show you how to convert a floating point to um using floating Point representation so let's see First Step that you need to understand is called normalization to achieve maximum Precision we need to normalize a floating Point number in practice the largest possible magnitude for the value represented by the Matha will achieve Optimum Precision so what does it mean so for Diner 2 these are the following um form that represent dinary 2 they all dinary 2 okay but then you can see that um it's the last row that has the biggest mantisa okay so this is dinary ne4 that is when we increase the mantisa value like from 125 25 to 0.5 as we increase it it will be it will achieve more Precision for us and when the number is represented with the highest magnitude from mantisa the two most significant bits are different so if you look at this the two most significant bit they are both zero this means that the mantisa value is not at its maximum whereas 0.5 you can see that the first two bits they are different similarly goes to4 1.0 the two most significant bits are different so what we want to achieve is that we want the most we want the Mena to have the two most significant bit different so let me explain how to do it um we keep decreasing the exponent and then we increase the mantisa to achieve normal normalization so this is the same thing for dinary 2 I keep increasing it until I got the two most significant bit as different dinary ne4 is the same go go down keep going down one step at a time now let's do some conversion to really understand the whole thing I know that most of you might feel very confused um hopefully after this part it will be fine now let's convert 9.7 to floating Point representation so to do this we'll convert the whole number part first We'll add a leading zero to it so nine if you convert it to dinary it will be 1 Z 0 1 and we add a leading zero to it and then we'll convert the fractional part 0.75 you know that we can use one one so if we were to use only two bits this is something we can represent so after that you combine step number one and step number two using a point the next one is when we want to normalize it to achieve maximum Precision so I'm going to move the everything to to achieve one zero to achieve um normalization so meaning I'm going to move the point go two three four until that the first two bit is one zero and one so this is how I move my binary point and after that this gives an exponent of value of four meaning we shift the point four times to the left so assuming that 14 bits are allocated four bits 10 bits for the menesa and four bits for the component this will be our floating Point representation so 0 1 0 0 is the exponent metisa whatever we calculated here we add four bits to it to make it like a 10 bit so can you see how I got the exponent okay now let's convert another floating Point representation which is like 9.73 which is a little bit harder to do because we can no longer do this part so let's try to do again we convert the whole number part 0 1 0 0 one add a leading zero to it and to convert the fractional part the method is that we keep multiply lying it by two and then we only record the whole number part for example 0.73 multip by two I got 1.7 46 I'll keep the one and then followed by 0.46 * 2 0.92 I'll keep the zero um 0.92 * 2 1.84 keep the one keep the one then I will have gotten my fractional part 01011 and then combine the two part the whole number part and also the fractional part and again we move the floating point the binary point until we see the one zero part okay so looking at this example we move four steps so this give an exponent value of four meaning we put our exponent value as four and assume that 40 bits I used I basically just add um everything to it one z z 1 one Z one one there are eight bits so I add two zero to it okay so one zero to it to make it a 10it bit so this is my mentis and component exponent for to represent 9.73 now let's try to solve negative last one of the video persever so started by converting 7.75 to Binary 7111 at a leading zero 75 use one one combine the two part for negative one is a little bit different we need to convert to two's compliment form firstly you convert first to one compliment basically just invert all the value and then two complement you add one to it and then move the binary Point values until you see one zero okay after that since we move four times the exponent value is four then you just add a few bits to make this mantisa a 10 bits mantisa after that add everything together and that will be your floating Point repres fix floating Point representation for 7.75 all right um problem with using floating Point numbers approximation and routing errors so this as you can see we we are approximating a lot all right and also another problem is that there's limited range of numbers and this might mean that we might need to use a lot of bits to represent the floating Point okay so that's something that you can read to understand what's the problem with floating Point number and that's the also the end of this video I know that this the floating Point number can be a bit harder to understand so just watch a few more times to understand the joice of it and with that I will close off the video thank you so much for watching I'll see you in the next video goodbye