[Music] welcome students to the iit madras online bsc program i am very happy to meet you in this program i am professor parto pratim das i teach in the computer science and engineering department of iit kharagpur i have been teaching for several decades now and i am really happy to be part of this course which is unique in india the first time you are going to get an opportunity to get an online bsc degree from an iit so let us get started with the course as you know this will be on database management systems and in the this is the first module in which i will take you through the overview of the course primarily so this is the objective to understand the importance of database management systems today i am sure all of you in some form of other are familiar with the need and use of database systems in today's life but we will try to take a deeper look go deeper and deeper as the course progresses and also in this module we will give you the necessary information about the course so the outline would be y databases and the know your course prerequisites outline and the course textbook so why databases a database management system contains information about a particular enterprise when you use the term enterprise it could mean a small business it could mean a big business it could mean a nation it could mean a bank it could mean a global organization or even it could mean a small group or even individuals so any entity which has the requirement of collecting collection of interrelated data set of programs to process that data and it is an environment that is convenient and efficient to use will need a database system and that is what the database system will provide now if we just quickly try to talk of what different kinds of applications the databases have you will know that the most familiar possibly is banking applications we are regularly doing various forms of net banking transactions now you have upi paytm ah veem google pay all of that are very large database applications at the back we have reservation applications airlines railways isatc all of those they are all very important applications critical applications then we have a lot of academic applications universities we deal with reservations your admissions examination registration grading all of that we typically say it is an erp system and all iit's nits and several institutes use those kind of systems very heavy database applications any different kinds of sales like the whole of e-commerce retailers amazon ebay snapdeal all of those are very very wide big database applications then even in other areas like manufacturing in human resources in travel everywhere databases are kind of applications on which we necessarily survive today and what might come as a little bit of surprise for you that several applications which you don't really associate with the typical database application like net banking transaction applications like your email system say gmail google mail or yahoo mail or your hotmail they are also database applications because they keep all the information about your mails and your contacts your send receive information and so on inside them even ah the social media the facebook the instagram the twitter the whatsapp kind of chat applications telegram all of these need databases so it is so pervading in our daily life in our daily applications today that it is very difficult to you know isolate and talk about only one that or only a few which will be useful so what you are going to learn here is going to be useful all across board and therefore the skills of data both programming skills of database design skills of database administration has a deep requirement in the industry both in the core i t infrastructure industry as well as i t applications industry in varied kinds of bpos and so on so databases as i said touches all aspects of our life so simple application and university database application so you can add new students new faculty courses instructors timetable classroom you can register students for courses generate rosters for classes the timetable assign grades on examination compute the grade point average all of these were actually are part of a university database application naturally in earlier days these database applications were built directly on top of the file system you all know that by now that a computer system always has a file system what if you are working on windows what you get to see through windows explorer or if you are working on linux what you get to see through the you know hierarchy of directories is a view of the file system so you can keep any document electronically in that system you can bundle them into folders and you can have variety of types of files and you can have hierarchy of folders this is this is in simple terms what a file system is all about so naturally data can also be kept in terms of files which we typically say is a flat file so if you ah want to refer to the nearest approximation of a flat file today as we widely use is what is known as csv you know you ah you must have seen this csv in dot csv it is called comma separated value so where you write various different values like this separating them by comma meaning that they are the values of various different fields and it is a very common way that we ah in in a small scale share ah data in terms of a file system today ok i am sorry so the question is ah what are the problems of using file system simply to store data in fact historically we will soon ah go through that in the next module historically file systems have been used for databases for a long time so some of the key issues are data redundancy and inconsistency we will see that when you keep them in terms of flat files ah parts of the same data needs to be duplicated and kept at multiple places which lead to duplication of information that is ah bloating of data as well as it becomes difficult to keep them consistent consistent in the sense that if you have the same data at two files or at two places then when you update one it is quite possible that you will forget to update the other right so the other is difficult in accessing the data like if you have data in csv file the only way you can access is open it either as a notepad and just like a text keep on reading it or at best you can open it with the excel kind of spreadsheet application and traverse them as table but you cannot do anything easily in an automated fashion data isolation is another problem where if there are multiple files and formats it is very difficult to keep them integrated and consistent integrity is a big question because in several places we need to have integrity to be maintained for example if i am holding a say savings account in a bank then most banks will mandate that i must have a minimum balance right so i cannot take make a transaction to debit in a enough amount so that my balance could go below that minimum balance if it does then the bank will not allow it the minimum balance could be 0 or could be a positive number so for any action that you do you will have to remember what is this minimum balance and the fact that this integrity has to be maintained which is very difficult if you use files where this condition the fact that account balance has to be more than 0 or more than a minimum amount will get buried somewhere deep down in the code right so it is very difficult to make those kind of things so it becomes with all these it becomes quite problematic to use file system for managing data further there are you know more fine issues like atomicity is a big question you are probably not so familiar with atomicity unless you have studied about atomicity in terms of the operating system course ah atomicity talks about that when i do something to keep the consistency it is important that i do the entire task or i do not do it at all so what am i talking about let us say we are talking about a basic net banking transaction that i am paying my friend an amount of hundred rupees so this has two parts one is hundred rupees will have to get debited from my account after the check that i have enough balance to pay hundred rupees and then the hundred rupees debited has to be has to get credited to my friends account now this whole thing if it is done then the transaction is fine but suppose after the 100 rupees have been debited from my account suppose the system fails and rest of the program could not be executed so what will happen my account has been debited by 100 rupees but my friends account has not been credited by 100 rupees so which means that in that entire banking system hundred rupees have simply got lost so this is a very big consistency problem and that mandates that we must do such operations in an atomic fashion which means that they cannot be divided further that either the whole transaction debit and credit and logging whatever happens together or it does not happen at all right concurrency of access is is of course you understand because the same system is being used by several ah earlier it used to be tens of users and hundreds of users now it is more than couple of millions of users who use the same system in a concurrent fashion from different parts of the organization different parts of a nation or different parts of the world so at at any point of time two people might want to actually reserve a birth on the train at the same time right naturally if one birth is left you cannot allow both of them to take that birth so you have to manage it in such a way that the system remains fair and also the birth can go only to one of them but not to both so these are there are several such issues with concurrency security of course ah you know what is going on in the world you know all the spyware and all those ah you know identity theft issues happening so with that security is a big big problem so these are we talked about few points in the earlier slide and these points of atomicity concurrency security these are finer and more difficult points to maintain using a file based or a file system based data management application so you need a database and database can offer a solution to all those all these issues as well as actually to lot more ok so let me move on to give you a quick round up of the course so i am sorry what is happening so first what i talk about is ah what is the what are the prerequisites that is to understand this course comfortably you must know a couple of things i am sure most of you know this but still i made a moderately exhaustive list so that in case you don't you can take your time in between and go and brush them up so that you do not have any difficulty in following the course so the first is about set theory you i am sure all of you know what set is the basic definition and different ways set can be like a set of people a set of horses set of accounts or a a set of prime numbers which is a subset of natural numbers where ah no other natural number other than itself can divide it so these kind of different forms of definition you must be aware of membership subset superset power set universal set all these con concepts should be cleared to you then the basic operations like union intersection complementation set difference cartesian product we will very randomly and widely use them de morgan's law will use it very often so please ah get yourself familiarized with this there could be several references from where you can study this i have just mentioned some of the online courses like so you could just go through their videos the courses may not be running or you may not have time to attend them like in moocs nptel you have a discrete mathematics course which is very useful you can look at those videos if you are not sure of some of these prerequisite topics then i think for set theory at least you already have had a mathematics for data science 1 which should have covered these topics anyway so in that case you would not need to study anything further then you need to know about relations and functions ah what makes a relation which builds up on the set what are binary relations order pairs different notions of domain range image pre image inverse the properties of relations and what is a function and what are the properties of functions and so on again the discrete mathematics course and your data science course would be sufficient to give you the required knowledge in case you have got task we need some knowledge about propositional logic that is basically in simple terms it is little bit of a different representation of boolean algebra right so i am sure you would have done it at some point in your course but still specifically we need the notion of truth values truth tables operations like conjunction disjunction negation implication equivalence closure and ah closure under operation and so on in case you are not very familiar then please refer to videos in the discrete mathematics course you can obviously take up some books as well more importantly we need familiarity with pre predicate calc logic as well which is a which is called the next order logic beyond the propositional logic where we talk about quantifiers for all men men are mortal this for all is a property which does not belong to a specific person but it belongs to a a the entire collection of the set or we can say that for every child there is a mother so there it is not specific for any particular child but you pick up any child the child will have a mother so these are different kinds of notions of quantification in terms of existential and universal quantification you must be familiar with those while i need them use them i would try to give a quick definition round up but that may not be enough to use it in the context where we need it so better get prepared with these prerequisites you can use this course or some other book as well it will be good to have a good knowledge of data structure we will use it get after all database is data structure in a very very large sense i will talk about what are the differences is to when you call it a data structure and when you call it a database system but in terms of data structure lot of the notions are rampantly used in database course the arrays list particularly search trees binary search trees and other kinds of search trees balance trees b tree hashing and hash map those kind of ah data structures ah not very esoteric data structures ah like ah splays and all that we will need but these are the basic data structures notions about which would be very very critical you could refer to these moocs ah nptel videos or refer to a standard textbook in the ah algorithm or data structure the next is this course need that you know about python programming in python is going to be staple food for this course as well ah i think you already have a course programming in python that would be enough whatever additional may be required in terms of ah python libraries and features like ah db connectivity of python and so on will be covered in this course itself but be very very try to be very very proficient not only familiar but proficient with python try to write and read as much of python code as you can so these were all the essential ah prerequisites of the course ah finally before i end this is the desirable i should say the desirable prerequisites that it would not significantly harm if you do not know them or do not know them well but knowing them will certainly help you understand some of the intricate issues of databases better particularly i will talk about algorithms and programming in c we do we will not use c in this course but it comes handy in terms of understanding several algorithmic issues particularly sorting and searching would be widely used so if you are not very familiar in any case this is going to be very very useful to you in terms of whatever you desire to do so try to be familiar with them as well last but not the least object oriented analysis and design or object oriented programming is a concept where you try to look at data elements as kind of capsules described by certain properties and they are inter relationships which come in in in some way very handy and close with the database system concepts and also at the same time differ in certain significant ways so if you are familiar with object oriented analysis and design it will have a good overlap with the database system design and some of the core issues it is not again an essential prerequisite but is good to know so if you know if you know say the related programming languages like java or c plus plus that would also be of great help but these are the desirable prerequisites in terms of the course outline as you are aware this is a 12 week course so naturally we have ah in this 12 weeks ah 5 modules per week so we will have total of 60 modules out of this twelve weeks the first eight weeks would be primarily around application programming which is kind of i should say the simpler simpler part of the database management system and the part which more people need to do in the industry so kind of the if you if you just look at the number of jobs or the variety of jobs available with the database knowledge this is primarily around the application programming we will talk about what application programming is but just know that application programmers need to know the content of week one to eight very well and that gives you a good job opportunity for some of the advanced tasks and some of the you know more little bit more difficult to get but more paying jobs are related to database administration and design those are the aspects that we discuss from week nine to twelve and that is what will form the whole gamut of things that we need to do in this course so this is the textbook that will follow ah silver search scott and sudarshan there are multiple editions available the content that you see here has been prepared based on the sixth edition but obviously seventh edition will also do it has some more stuff which you will have to skip anyway and of course of the sixth edition also we are not being able to cover everything we have taken a few things selectively but it will be good to buy this book if you do not have already because this you will need references to do its this book in every module almost at every minute and this is an excellent book to learn and practice database management system so to summarize ah we have talked about the importance of database management systems in modern day applications wide range of applications and i have introduced to you the various basic aspects of the course doing giving you the kyc for this course and from the next module onward we will move on to talking about the actual content of the course have a nice day and let us meet in the next module