Hello and welcome to CareerRide. I'm Nishant and today I'm gonna talk about top 20 DBMS interview questions and answers for freshers and experienced professionals. So without wasting time let's get started now.
So the first one is what is a database? A database is an organized collection of logically related data where it can be accessed and manipulated. A database could be big or small depending on the requirements of its purpose. For example, a local contractor could set up a small database of his workers details consisting of names, phone numbers, address, salary etc. A large corporation could set up a giant database where it uses the data to make better and efficient business decisions.
Now what is DBMS. DBMS or database management system is the middleman between the user and the database. It is a software that provides a systematic approach of creating, storing, updating and retrieving the data stored in a database. Or in other words, a DBMS is a software for database management.
There are four common types of DBMS and the first one is hierarchical database system. And this type of database is structured as a tree hierarchy. Here a record can have single parent and multiple child records. And this database is limited to a specific use due to its limitation.
Next is network database system. These databases also have hierarchical structure. But here each record has multiple parents and multiple child records.
Next is object-oriented database system. In this type of system the information is represented as objects and this type of database uses an object-oriented programming language. Next one is relational database management system and this is the most popular DBMS.
Here the data is stored on tables and all the tables are related to one another. The data is stored in A row based table. Yes, RDBMS is most commonly used database. Because it is a collection of organized set of tables, the data can be accessed easily in RDBMS.
Now what are the advantages of a DBMS? There are many advantages of DBMS. And the first one is data security. RDBMS provides an improved data security. It allows only users who are authorized to access the database, preventing data integrity from being compromised.
Next is improved data quality. Data quality includes accuracy and reliability. DBMS provides the capability to enforce constraints on database, helping it to store accurate data.
Now the next is faster data accessibility. Now DBMS helps to retrieve and display data quickly, making data accessing faster and easier. Next advantage is Data Backup.
Data loss is a major concern for all businesses. But a DBMS can take periodical backup automatically and restore the data in case of any technical glitch. Simplicity. A DBMS operations like deletion, Insertion or creation of table are easy to implement. Reduce data redundancy.
DBMS doesn't allow redundant data in multiple places. Yes, duplicate data can cause data inconsistency and can provide company with unreliable information. Now what are the differences between a DBMS and RDBMS?
DBMS i.e. Database Management System is a software that provides a systematic approach of creating, storing, updating and retrieving the data stored in a database. Or in other words, a DBMS is a software for database management.
RDBMS i.e. Relational Database Management System is actually an extension of DBMS. RDBMS stores data in a database.
tabular format consisting of rows and columns. and all the tables are related to one another. Now let us understand differences between them.
DBMS stores data in a flat file format whereas RDBMS stores data in tables consisting of rows and columns. In DBMS there is no relationship between its data. Whereas RDBMS stores data in multiple tables and there is A relationship between the tables. In DBMS, normalization process is absent. Whereas normalization is there in RDBMS.
DBMS is usually preferred for small datasets. RDBMS deals with larger datasets. DBMS does not support distributed database.
Whereas RDBMS supports distributed database. Next is, what do you understand by query optimization? Query optimization is the process of finding the best methods for implementing each query that minimizes time needed to evaluate a query.
The goal for query optimization is to speed up the process of fetching informations from the database and reducing the system resources required to run a query. It enhances the performance of a query. especially for the execution of complex SQL statements. Now, what is data partitioning and what is the importance of partitioning? Partitioning in database is a process where a large table is divided into smaller and manageable parts.
Partitioning is done mainly to improve maintenance, performance or management. There are two types of partitioning methods. And the first one is Vertical Partitioning and second is Horizontal Partitioning. Now let us try to understand each of them. And the first one is Vertical Partitioning.
In Vertical Partitioning, the table is split by column wise. This way a table is created with fewer columns and an additional table is added for infrequent used columns or very wide columns. Now consider a table called Employee Table.
Now in this table in order to improve maintenance performance or management, we can partition the table vertically and keep infrequent used columns or very wide columns in a separate table. Now instead of one table called employee we can have two tables that is employee and employee job description. Coming to horizontal partitioning. In horizontal partitioning the table is split row wise. Now consider a table.
In this table all reports are arranged in a single table. This could be partitioned horizontally by year where each table would store monthly reports of that particular year. Therefore when a report is queried, it could look at a particular year and access the required table which would result in better performance because of small size of table and scanning through fewer data.
Now let us understand advantages of data partitioning. And the first one is improved performance. It optimizes performance since it is scanning through fewer data. And second is manageability and control.
Yes, it is more manageable since it is divided into small parts and provides better controls. Alright, so the next one is what is data replication? Data replication is a process in which copies of the same data are distributed in multiple locations or servers. And this is done to make the data more accessible.
Data replication helps in creating a distributed database system. Even though copying of the same data across multiple servers increases data redundancy, there are major advantages to this process. And the first advantage is disaster recovery. In case of any loss of data due to a disaster or technical failure at a particular server, we can recover the data from the other servers and this could save a lot of money and time. Second advantage is improved data accessibility.
With multiple copies of data at different locations, users accessing from different regions can choose to access data from closest server, resulting in better response time. Third advantage is enhanced server performance. There is fewer loads on a particular server since users are accessing from different servers from different locations. Hence boosting the performance of servers which also improve users experience.
Now I explain the types of data replication. There are different types of data replication depending on its requirement. In data replication, the source database is referred to as a publisher, whereas the destination database is referred to as a subscriber.
Now coming to types of data replication and the first type is full table replication. Full table replication copies the entire data to all of its subscribers, including new, updated, as well as existing data. And this process could involve higher cost and computing power. since copying all of the data requires a higher network bandwidth especially with large databases next type is transactional replication in transactional replication the publisher database is fully copied only initially and then the subscribers would only receive updates whenever a data is modified and this process proves to be more efficient than the full table replication Next is a snapshot replication.
In a snapshot replication, a snapshot of publisher database is replicated in all the subscribers database and it doesn't monitor for the changes made to data. And this type of replication is generally used when changes made to data tends to be infrequent. And the last type is merge replication. In merge replication, data is merged from two or more databases. into single database.
It is very complex process compared to other replication methods Because both the publisher and subscriber database are allowed to make changes dynamically. Now, what is normalization and what are different types of normalization? Normalization is a process of organizing data in such a way that it reduces data redundancy.
Yes, redundant data is duplicate data stored in multiple places in the same database. Normalization is done by dividing the large table into multiple tables and establishing a relationship between them. Now there are six types of normalization, which are first normal form, second normal form, third normal form, BCNF that is voice code, normal form, fourth normal form, and fifth normal form.
But we will only explain the first three in details here. So let's talk about rules of first normal form. In first normal form, each cell in a table must contain only one value, and every entity or column must contain a unique name meaning no two columns can share the same name.
Now let's have one table. So in this table we can see in the first row column subject has two values. in a single cell. Therefore applying first normal form will be important over here.
And after applying first normal form, the table would be like this. Coming to rules of second normal form. In the second normal form, the table should be in first normal form and all non-key columns of the tables must depend on the primary key and there should not be any partial dependency between its data. Now in order to understand second normal form in details, let's have few concepts clear.
The first concept is candidate key. Now consider a table first. Okay, so candidate keys are entities or columns that can uniquely identify the entire row.
And in this table, there are two candidate keys, that is student ID and phone number. Since their values are are always unique and there can be two or more candidate keys in a table. Now next concept is primary key.
Again consider the same table and a primary key is a column that can uniquely identify the entire row. There can only be one primary key in a table. Basically a primary key can be chosen from any of the candidate keys which can best uniquely identify the entire row.
In this table, the primary key would be student ID. Now let's understand partial dependency. Partial dependency can arise if there are two or more candidate keys in a table. And the attributes or values depend on part of candidate keys present in the table. Now let's consider a table called student table.
In this table there are two candidate keys student id and sport id and the column name age and sport id depend on student id and sports and coach depend on sport id. Thus the values are not fully dependent on a primary key. but are partially dependent on both the candidate keys. Now in order to achieve second normal form, we must eliminate the partial dependency by dividing the table as follows. Now let's try to understand rules of third normal form.
In third normal form, a table should be in second normal form and there should be no transitive dependency. Now what is this transitive dependency? Transitive dependency arise when non-primary attributes are dependent on non-prime attributes. And since non-prime attributes can store null values, this can cause problems. Now let's consider a table.
And in this table, the primary key is student ID. But the column total marks is dependent on exam type, which is not a primary key. Hence, the transitive dependency occurs.
Therefore, third normal form solves this by dividing the table as follows. Now here the column total marks depends on the primary key exam ID which eliminates transitive dependency and is in the third form. Alright, so the next is what is denormalization? Denormalization is a technique that is used to add redundant data to a table in order to optimize data access speed.
And in this process, we actually add redundant data to one or more tables in a normalized database in order to get rid of complex join operations. One important point of denormalization is that it is done only on a normalized database. You might ask why we need denormalization. after we already normalized the table.
Suppose we have two tables that is student details and sports details. In these table if we want to retrieve the student details along with the name of coach of the sports they are playing we would have to do a joint operation and this is fine for a small database but if the database is large it would be very time consuming and inefficient. To be efficient in this situation, we have to denormalize it by adding redundant data to the student details table.
Now that the table is denormalized, if we want to query for student details along with their coach's name, we will get better read performance since there is no need to do a joint operation. However, we have sacrificed the write performance of the database in this process. Now the next question is what are the different types of keys in a database? Now in order to discuss keys in a database, let's have a sample table over here.
Now the first key is candidate key. Candidate keys are keys that can uniquely identify a row. In this table, there are two columns that can uniquely identify a row and they are student id and phone numbers and these two keys are candidate keys because they contain unique values and by just specifying one of them we can get the entire row and the value for the candidate key is unique and not null and there can be more than one candidate keys in a table. Next key is primary key. Primary key is a key that is selected from the set of candidate keys with which can uniquely identify a row in a table and there can be only one primary key in a table.
A primary key must contain unique values and cannot contain null values. Now in this table the best key that can uniquely identify a row is student id as every student is assigned a unique student id. Now the next key is super key. Superkey is an attribute or a set of multiple attributes that can uniquely identify a row. A superkey is a superset of candidate keys, where all candidate keys are superkeys, but not all superkeys are candidate keys.
Super key attributes can contain null values. And in this table, the super IDs are student ID, phone number, student ID and student name, student ID and phone number, student name and phone number. And these are just a few examples of keys that can uniquely identify a row. Now let's see these keys, that is age and department. Together these keys cannot be super key.
because there can be many students in department that are of same age, hence cannot uniquely identify a row. Now coming to foreign key. A foreign key is a key that establishes a relationship between the two tables. It is an attribute that references the primary key of another table. Now the next is composite key.
A composite key is a combination of two or more attributes that help to identify a record. in a table uniquely. The attributes cannot uniquely identify a row independently and only through combination they can identify a row. Now in this table student name age department is a composite key because together they can uniquely identify a row whereas independently they cannot guarantee uniqueness.
Now try to understand alternate key or secondary key. Candidate keys which are not selected for primary key are called alternate key or secondary key. And in this table, the alternate key is phone number since student ID is selected as primary key. Now the next question is explain the difference between unique key and primary key. Let's have a table before discussing this question.
Now primary key is a key that can uniquely identify a row. There can only be one primary key. A primary key is selected from any one of the candidate keys and cannot contain a null value. And in this table, the best key that can uniquely identify a row is student ID.
As every student are assigned a unique student ID. Coming to unique ID. Unique ID is also a key that can uniquely identify a row but can contain null values. The not null values in a unique key must be unique. And in this table, the key phone number is a unique key, since it can uniquely identify a row, but it can have null values.
Because there might be a student that doesn't have phone number. So in a primary key, the values are unique and cannot be null. Whereas in unique key, the values are unique, but it can contain null values.
There can only be one primary key but there can be multiple unique keys. In a table, we cannot alter primary key but we can alter unique key as long as it is unique. Coming to the next question, what are the differences between trigger and stored procedures? Stored procedures are pre-written codes consisting of a group of SQL statements that are stored and executed to perform a particular task. They are invoked explicitly which means they are executed only when called by the users.
It can accept inputs and also return a result after execution. Now triggers are a special type of stored procedures and the only difference is that they are invoked implicitly which means they are executed when they encountered an event such as insert, update, delete. An example of a trigger is when you sign up for a particular website and you get a welcome mail in your email.
Here a trigger is invoked when it encounters an insert event in the database while signing up. Now let's find differences between trigger and stored procedures. A trigger is invoked implicitly whereas a stored procedure is invoked explicitly.
Stored procedures can take inputs in the form of parameters whereas Triggers cannot take any inputs. Stored procedures can return result but triggers cannot. Stored procedures can be scheduled to execute at a particular time but a trigger cannot.
Stored procedures are used for performing a task that the user requires whereas triggers are normally used for maintaining the integrity of the database. Coming to the next question, what are indexes? Explain the difference between clustered index and non-clustered index. Indexing is a data structure technique that is created for the purpose of optimizing the performance of a database by quickly locating and accessing the data in a database.
An index is a small table that has only two columns. The first column contains a copy of the primary key of the table and the second column contains the address or location of the disk block where that particular key value are stored. An index in a database is similar to an index of a book where if you want to read a particular page, you can just refer from the index page and go directly to the page without having to scan through the entire book.
There are primarily two methods of indexing, clustered indexing and non-clustered indexing. Now clustered index is an index in which the order of the rows of the index corresponds to the rows of the table. The index contains pointer to block but not to the location where the data is actually stored.
Actually similar records are grouped together and indexes are created for these groups. Clustered index is like a dictionary where the data is stored in alphabetical order. So in the clustered indexing records are physically reordered to match the index. Since the table can be stored only in one order, there can only be one clustered index for a table.
By default, the cluster index are created automatically whenever we apply primary key constraint on the table column. Now coming to non-clustered index. In the non-clustered index, the index contains pointer to the location where the data is actually stored.
It is similar to the index of a book. If you want to read certain chapter or a topic, then you can directly go to that page by using index of that book. In the non-clustered index, the order of the rows of the table and the index does not have to be same. And the data and non-clustered index are stored separately.
Hence, there can be multiple non-clustered index in the table. Okay, so the next question is, what do you understand by cursor? Mention the different types of cursor.
A cursor is a temporary work area reserved in the memory. and is allocated by database server when performing DML operations. This temporary area contains data that are retrieved using SQL queries, and the data can be accessed in row-by-row manner.
This collection of data in the cursor is known as active set. The main purpose of using a cursor is to be able to manipulate through the result set in row-by-row manner. Now there are two types of cursors based on how they are accessed.
That is implicit cursor and explicit cursors. Now implicit cursors are programmed to run automatically when a DML operation such as insert delete update is performed. Implicit cursors are also known as default cursors. Now coming to explicit cursors.
Explicit cursors are defined by the programmer and are triggered When we execute SELECT statement that returns a result set. And this result set can now be processed one by one which provides more control to the programmers. Now the next question is what is SQL and where it is used?
SQL stands for Structured Query Language. And it is a language used for interacting with the database such as storing, retrieving. or manipulating data in a database.
It is also considered as standard language in relational database management system. And popular RDBMS such as MySQL, MSXS, Oracle, SQL Server, SQLite, all uses SQL. Okay, so the next question is, what do you understand by subqueries in SQL? A subquery is simply defined as a query inside another query.
It is also called as inner query and the outer query is called as the main query. A subquery must always be enclosed by parenthesis. In a nested query, the subquery is always executed first and its output is supplied to the main query to perform its operation.
In order to understand this, let's have a student table here. And in this table, if you want to retrieve the name and roll number of a student who belongs to computer science department, we can perform a nested query. So here you can see sub query is placed inside parenthesis and in the where clause. Alright, so coming to the next question.
What are the differences between having and where clause? Where clause is used to filter records in a database. In the where clause, there is specified condition and the records are retrieved according to that specified condition.
Where clause filters the data. by individual rows. Now in order to understand let's have an employee table. So in this table if we want to retrieve only those records whose country is India, we can use where clause as follows. Now having clause is also used to filter data in a database.
But why do we need it if we already have where clause and this is because the where clause cannot work with aggregate functions. Hence, having clause is introduced. So in a table if you want to display only countries where there are more than 4 customers, we have to execute a query in this manner. So in this SQL query, we have seen that having clause is used along with an aggregate function count. Now the difference between where and having clause is that where clause filters by individual rows whereas having clause filters based on groups of data.
Having clause cannot be used without group by clause whereas where clause can be used without group by clause. Where clause cannot work with aggregate function whereas having clause can work with aggregate function. Where clause can be used with select update delete statement whereas having clause can only be used with select statement. Now what are joins in SQL, what are different types of joins? Joins are used to retrieve data from two or more tables based on related column between them.
Now there are different types of joins in SQL and the first type is inner join. This join returns records on a condition that there is a match in both the tables. Now let's have two tables called customer and order.
And in these tables, if we want to retrieve the customer name and their order, we would have to run an inner join. Next is left join. Left join returns all the values selected from the left table and only the matching values from the right table.
And when there is no match on the right table, the result will include the row but with a null value in each column from the right. So left join returns all records from the left table, even if there is no matches in the right table. Now let's have a left join query.
Here the left table is customer, since it is mentioned to the left or before the join statement. And this will return the values as follows. Now right join. Right join returns...
all the values selected from the right table and only the matching values from the left table. And when there is no match on the left table, the result will include this row but with a null value in each column from the left. Now let's have a write join query.
Now here the right table is orders since it is mentioned to the right or after the joint statement. So the result would be like this. Now coming to full outer joins.
This join will return all the records from both the tables. And when there is no matching value, it will show value as null. Now let's have a full outer join query.
And the output of this query would be like this. Next one is self join. In self join the table is joined to itself as if there were two tables and a temporary name is given to each table and this is useful when comparing data within a table. Now in order to understand it clearly let's have a table called employee.
Now if we want to match the employees that belong to the same city we'll have to fire following query. Where A is the temporary name of the first table and B is the temporary name of the second table. And the result would be like this.
Okay so the next question is what are the differences between drop, truncate and delete command. Okay so the first one is truncate. Truncate command falls under DDL commands type of SQL.
Truncate is used to delete all the rows of a table at once. but doesn't delete the table. The structure of table remains the same. This command once performed cannot be rolled back.
And the syntax of truncate command is now delete command. Delete falls under data manipulation language. It is used to delete existing records from a table. It can delete a single record or multiple records depending on conditions specified.
And the syntax of delete command is, now if the where clause is omitted, it will delete all existing records from the table, just like the truncate command. Now coming to drop command, drop command falls under data definition language. It is used to delete the database objects. In case it is a table, it will delete the whole table, which includes existing records.
as well as the structure of the table. And here is the syntax for drop command.