Welcome to Database Management Systems. In this course, we will have 40 modules. Each module would be... of about half an hour. So, this is the first module where we would talk about the overview of the course.
So, we will discuss the importance of database management systems in modern day applications. and we'll familiarize you with different aspects of the course so this will be the outline first we'll try to explain why we need databases and then we'll run through a KYC on the course prerequisites course outline the textbook and the TAs who will help us in this course So first why do we need databases? A database management system contains information about a particular enterprise.
So it deals with collection of interrelated data. We have a set of programs which access and manipulate that data and together a DBMS presents an environment for convenient as well as efficient use of data of the enterprise under consideration. So, there could be different database application.
Actually, if we look around, in the world that we are living today, every aspect of our life has been touched by certain database application. Some of the very common and wide applications will include banking. We are doing net banking very regularly.
Net banking is a highly distributed database application that allow us to do different kinds of retail as well as corporate banking. We perform different kind of booking reservation, railway reservation, airline reservation, hotel reservations, all of these are now possible over internet as well. These are all big database applications.
When we are attending colleges, universities, the students, courses, teachers, course enrollments, performance of the students in different courses, examination, all are parts of huge university database applications. There are different sales applications. Online retailing applications like I am sure all of you have been using some of the Amazon, Flipkart, eBay, Snapdeal, all these are online. retail database applications which allow us to order to select items to make payments and to track the deliveries of different items that we have ordered There are database applications in terms of manufacturing, production systems, inventory management, any big factory of manufacturing need to use a huge database application to manage the supply chain, the inventory, the orders, everything.
There are database applications in human resources, applications like LinkedIn. are huge human resource application of database the social media applications all involve different kinds of database applications so in this database management system course we are going to understand how such applications can be designed developed and managed over a period of time. Database applications are typically characterized by the fact that they are very large.
If we have a relatively small set of data, then possibly we would be able to manage it in terms of using a excel sheet or a couple of excel sheets but soon when it goes beyond a certain size we need a database application so the fact of being large is a critical factor of any dbms So with all these we as we observe that database says cover all aspects of our life and hence understanding DBMSS is a critical requirement for any computer science information technology student. So, if we talk about a specific database example, say university database, then we will have several application programs which will allow us to do several requirements. applications like when new students join we would need to add the students we will need to add new courses when courses are floated the we will need to add new instructors when faculty join we need to do allotments we'll need to do registration of students for courses we need to conduct examinations we need to assign grades to students when they are graded for different courses and compute their GPA and so on so all the activities that a university has to deal with our application programs of varied kinds that the university database need to to work with. In the earlier days, before the days of databases, typically such applications used to be managed through file systems. We all know that Ola system has a file system where different files, text as well as data files can be stored and information can be written in these files in a certain order.
And just to quickly recall file systems typically have a large number of sequential files which can be written and read in a certain order and some random access files where you can reach a particular point in the file to do certain access operations, certain manipulation operations. So in the earlier days, it was a collection of file systems which managed large enterprise data as is required. Thank you. Over time it was observed that the file systems to store data, to manage data has a lot of drawbacks. For example, if you look at in a file system there is often a lot of data redundancy and inconsistency.
consistency. These are terms which we will loosely define here and as we go across along the course, we will understand these terms better. But redundancy just to explain redundancy is a concept where the same data is written at multiple places in different forms and that may give rise to several forms of inconsistencies because if you write the same data in multiple files because you need to deal with many of these aspects. So there is a file for students, there is a file for teachers, there is a file for particular courses as a file for enrollment and so on, several data items may be written. redundantly copied in multiple files and once that is done then we can it is very easy to get inconsistent in terms of the data because you may update the data in one file and may forget to update the data in another file.
So, that is one of the first problems with using the file systems. Then there is difficulty in accessing the data because as I said the data is the files often are sequential in nature. even if they are random access, then every task might need to use data from multiple files and opening those files and reaching to the appropriate point of access is a non-trivial task.
Then there is issues of data isolation because there are multiple types of files, there are multiple formats used therein. Very importantly there are a lot of integrity problems. The database, any database application need to have a lot of integrity. For example, if you want to withdraw money from an account bank account then certainly the balance need to be positive you can you draw only that much amount up to that much amount which is which exists in the in the account so any application will need to check for this so If you use a file system based application to store the data, then at every point wherever you are updating the balance, you will need to make such checks which make the application quite complicated and often. creates the possibility that certain integrity checks may be missed out.
So, it is hard to code these new constraints over a period of time. Then there are issues of atomicity of update. What atomicity means is the ability to do certain operations in a single as a single unit. So, what you want is either that operation happens or the and if it happens then it happens in full in totality whereas otherwise it may not happen at all for example consider that there's a funds being transformed from one account to another so this means the account from which the funds are being transferred needs to be debited certain amount and that same amount has to get credited to the account to which it is being paid now if for some reason of failure or because of the the fact that there was link issues or something, if you are not able to make this whole transfer, then it is possible that you have already debited the account, but you have not been able to credit that account.
Now, this will be a major cause of inconsistency in the database. So, what you want is if the transfer can happen, then it must happen in totality that is both the debit and the credit must happen together or nothing should happen at all. So, these are there are several examples of requirement of automaticity for update which is critical for maintaining consistency of the data. The other aspect which has become very very deeply required in every aspect of database is concurrency of access.
If there is a database then certainly there is not a single user, there are multiple hundreds of users you think about net banking, you think about relevant reservation multiple users are trying to make bookings in multiple trains from varied stations to vary stations in different classes and so on so all of this must go on at the same time that is what is called the concurrency of update so it is quite possible that while you are trying to update you check the but availability on a certain train on a certain date that you intend to travel at the same time someone else may be checking for the birth availability on the same train on the same date and there could be conflict of concurrency because there may be one but available and you are trying to book that you have seen that one but is available so you go ahead and book it try to book it and there's another user who also saw that one but is available and that user makes payments and tries to book that so concurrency needs to make sure that both of these users should not be allowed to make the booking to the same bar because then that will be a disaster. So, uncontrolled concurrency can add to several inconsistencies in the application. Then certainly there are you all would be very familiar that today we are living under a whole lot of security threats.
So, there has to be proper security that it should be possible to access the data by a user to the extent the user is allowed to do that. So, as a user you should be able to access certain parts of the data, the manager of the system, the administrator should be able to access a bigger part of the data possibly. So, security is hard to provide in terms of a file system based applications to store data.
So, all these with we conclude the database systems have those which provide solution to take take care against all those above problems and we are trying to learn how to do such things. So moving on, I would quickly take you through familiarizing with you with the overall plan of the course. So what we first I talk about are the course prerequisites.
These prerequisites are kind of certain elementary level knowledge in computer science. and related discrete mathematics that you should have you should be that would make it easier for you to understand and follow the course otherwise if you find at any stage that you are finding any of this aspect difficult to follow in the course then I would advise that you go back to some of the background material and try to study them so I have tried to list down this prerequisite topics, one certainly is the set theory because that is the basic premise on which databases are designed on. So starting from definition of the set, membership, concepts of subset, superset, pass set, different operations of union, intersection, complementation, difference, Cartesian product, De Morgan's law, all these basic set theory you should be very familiar and conversant with.
If you are not. I have mentioned one MOOCs course, this is a past course but you can access the videos and the contents which has a very nice discussion on these aspects of set theory which you may refer to. Moving on the next which goes on top of the sets are the concept of relations and functions. We all know that relation.
is a subset of a set. So, if I have a set A, then I can define a binary relation over that set A, which is basically the pair of elements from set A or in other words, the relation, binary relation is a subset of the cross product of A with itself, where the Domain and the range are related together. So, there are concepts of the image of a domain, the pre-image of the range, the inverse relation and several basic properties of relations like a relation being reflexive, symmetric, anti-symmetric, transitive, total relations and so on. You should be familiar with these, otherwise you will find it difficult to follow major concepts in the database systems because this database system Primarily the one that we are going to take you through in this course is relational in model. So, it is heavily based on relations and functions and you should understand one specifically a relation becomes a function and when what is meant by functions being injective, subjective, bijective, what is meant by composition and inverse of functions and so on.
Again if you need you can refer to this MOOCs course on discrete mathematics to brush up your. knowledge about relations and functions. We also need you to have a basic understanding of the propositional logic which is truth values, true and false and the different operations of conjunction, disjunction, negation that is and or not, what is meant by implication, what is meant by equivalence, you know that given two variables which have or two propositions which can take a value true or false, the conjunction of them can be represented in terms of a truth table where we say that only when both these propositions are true, then the resultant conjunctive proposition becomes true, otherwise the conjunctive proposition is false.
So, you should be familiar with these concepts, if you are not please brush up your ideas about propositional logic. We need a little bit of predicate logic as well. A predicate logic in contrast to propositional logic deals with quantification that the knowledge of existential and universal quantifier where we say that whether certain proposition predicates hold for all values in the domain or for some value in the domain, whether there exists some value for which it holds or whether for all values it holds and based on that the. predicate logic is build up.
We do not need very advanced concepts here, just basic level familiarization will help and the same MOOCs course on discrete mathematics would be of your help in case you need to brush it up further. On aspects of computer science, certainly you need a good familiarity with data structures, array, list. particularly binary search tree, what is called a binary search tree, what is meant by height of a binary search tree, when we say that a binary search tree is balanced, what are the ways and conditions of balancing is particularly the V trees for organizing good search trees, hash tables, what is hashing, we need you to be familiar with this concept.
Because, the databases will be heavily designed based on the concepts of B trees and hash tables and so on. There are courses on design and analysis of algorithm, fundamental of algorithms. I have mentioned two excellent courses in MOOCs from which you can brush up if you need to.
Certainly you need certain familiarity with common algorithms particularly I would mention sorting and searching algorithms because these are critical for database applications and again the The same MOOCs courses would be of your help and it will be good to have familiarity with programming in C because several of the applications need some application high level application programming to be performed. and we would assume that those aspects we describe in C because that is a fundamental and most commonly known language. Besides these prerequisites which I have marked as essential because they will certainly be required for the major parts of the course, it will be good if you have some familiarity with object oriented analysis and design and some language which is more heavily object oriented.
oriented object line in nature like C++ or Java. So, again some MOOCs courses, the related MOOCs courses are mentioned here in case you need to brush up. Moving on, this is your course outline.
So, as I said the course comprise 40 modules, so it is divided into 8 weeks. So, this plan is given based on what we do in different weeks from week 1 to week 8. So, as the course unfolds, you will be able to, you will take, we will take you through these modules on these topics. And on the right, you can find that I have marked that.
The initial part of the course which we cover from week 1 to first half of week 5 is primarily meant for application programming which means that the database system has already been designed and basic premises, the schemas and constraints have been set up but now you want to write different data query, different data manipulation applications and that is where the large volume of data is needed. database engineers work so they are called application programmers so that's the I should say the first level in terms of a database understanding and you must target to become a master of application programming to get started with the other half of the course which start from the middle of week 5 with storage and file structure and goes on till the next is meant for for the analysts who are responsible either for designing a particular database which the application programmer can use, tune that for performance, index it properly, design queries to be efficient. So, these kind of analysts will be involved more with the understanding of the second part of the course and the second part of the course would also be useful for the database.
DBMS designers, if you want to really become an advanced programmer, you want to work on database engineering in terms of creating database management systems, not merely creating databases or database applications, then you need to have a good initial command over the second half of the course. So while you prepare for the course where you go through the modules, please keep this in mind that your familiarity with the application programming must be at the highest level and in the later parts will be relatively little advanced but they are required for good design and good development of consistent efficient system in in future We will follow a text book, this is as you can see this is the sixth edition that I am following in this course, it is called Database System Concept by Selvager, Scott and Sudarshan. This is kind of a classical book in database systems.
Current version actually is the 7th edition. So, if you get access to the 7th edition, you can use that as well. But whatever we are following in this course, 6th edition is good enough. So, I advise that you try to get a copy of this book to yourself. Moving on, we have different 3 TAs who will help you in this course, Srijani Majumdar, Himadri Bhushanbhuya and Guru Nath Reddy.
So these are the 3 TAs. their emails. I have also put their mobile numbers. However, I would advise that unless you are really stuck, avoid calling them on the mobile because they are research students as well, busy with their work.
there would not work as well, but you can certainly put all your questions on the forum which will be promptly responded by some of these TAs or by myself. And in case you would have very specific follow-ups to do, you can write email to one or all of these TAs. So, this is about the course overview. So, in this module, we have discussed about the important about database management systems in the modern day applications and we have tried to familiarize you with different aspects of the course and I reiterate that please give due consideration to the prerequisites as mentioned we have floated in an assignment called assignment 0 where there are questions on different aspects of these prerequisites please try to solve that assignment and see your performance this assignment I would like to mention that this assignment will not go in the final evaluation of the performance of the course this is just to give you an idea for self assessment of your preparedness for this course so if you find that on questions on certain topics say on relation function or on data structure if you have not been able to answer the questions well then it will be good to check back and go through the prerequisites once more but please keep in mind that database management system is a course which depends on these background knowledge quite heavily so if you have gaps in understanding those topics Then all through the course, you will get into several difficulties in understanding and problem solving.
So that is about our first module. So from the next module, we will start introducing the database management system. Enjoy the course and try to learn, try to become a good database engineer.