Transcript for:
Introduction to Time Series Databases

hi guys welcome back to the channel today i'll go over an introduction of time series databases at first we will talk about what is time series data and what are some examples of it then we'll take a look at what makes time series data unique and how is it different from other relational data from there we will look at a few example queries that can be efficiently answered by time series databases these queries are not performant in other databases such as relational document or graph finally we're going to wrap it up by looking at some existing time series databases that are being used in various companies so with that being said let's get started [Music] by definition time series database is a database purpose built to store time series data that means when you design a time series database the storage engine is designed from ground up with only time series data in mind that's because time series data comes with its own unique properties and challenges [Music] that brings us to the question what exactly is time series data and how is it different from any other data that we have in our databases by definition time series data is a sequence of data points over a time interval so let's say you can have a interval of 10 minutes all the data that are coming into the system within this 10 minutes are gonna be in increasing time so let's say you have one data point at minute one the next minute two the next minute three so every data that is coming into your system is associated with a timestamp and more commonly the data is only coming in in the order of increasing time so if the first data that you get is from minute one the next one is going to be from minute two there are cases when the data can be out of order but more often than not the time series data tends to be uh according to the actual time so this is an example of a table that contains some time series data over here you can see the table has four fields id timestamp air quality and temperature for example you can look at this as data coming from some kind of a sensor that is measuring the air quality and temperature at a particular location if you focus on the timestamp field you can see all of them are very similar except the last digit which is the lowest unit of time so you can see the first row has one and then two three and four that means the data that's coming into the database is an increasing time and you have the air quality and temperature which are which are the metrics that the sensor is measuring so you get this information with a timestamp all the time so this is a pretty good example of how time series data looks like the takeaway from this is every row has a timestamp associated with it and that's what makes this a time series data so let's talk about a few examples of time series data the first one i have is sensor data which is the example we just looked into these are very commonly time series database because sensors are designed to emit data every second or every millisecond so if you have a sensor no matter what it's measuring the sensor will be emitting data every second or every millisecond perfect example of a time series data then we have weather data if you have some kind of an instrument that's measuring what how the weather is like what the temperature is like over time that tends to be time series data because every measurement of the weather is associated with the timestamp next one is system monitoring data this is one of the most common use cases of time series data an example would be of the temperature of your laptop over a period of time or the cpu usage of your laptop over a period of time if you're talking about servers you can look at a queries per second over a period of time number of error errors that's coming into your system over a period of time all these are different metrics concerning system monitoring and all of them tend to have an associated timestamp so that you can see the changes over time we also have website activity data if you have a web app that people are visiting uh every day you can track the activities of the users within your website so you can use their ip or some other data to see if the same users are coming into your system uh every day or every month or with what frequency are people coming or if you have an associated timestamp with every click you can see the user flow how your user is interacting with your website over a period of time so website activity data is a very good example of time series data [Music] and more commonly you can look at stock prices uh any stock price is associated with the time uh over a given day the stock price changes with time it can go up and down so if you are uh maybe you're writing a script or maybe you were just using an api but every time you call the api or scrape a website for stock price this price is going to be associated with the time and you can take a look at how the stock price is changing over a period of time so now that we have looked at a few examples of what time series data looks like let's see what makes it unique like the data must be unique that we're talking about a purpose-built database just to store this kind of data so let's take a look at what exactly makes time series data unique so the first property of its uniqueness is a time series databases tend to have very high write throughput just think about the examples we just went through the first one we talked about was sensor data the way sensors are designed they are designed to measure whatever they're supposed to measure more than once every second and if you're writing all the all these data into the database your right throughput is going to be very very high same for a website activity tracker right if you have multiple web users coming to your website very very often your right throughput is going to be very high stock prices very similar depending on how granular you want your data to be you want to be calculating the stock price every second or maybe more than once every second once again very very high right throughput your rights in a time series database can be regular or irregular so you can have a regular write where let's say your sensor emits data up five times every second so every second you're gonna have five writes into your database that's how our regular write is going to look like you can also have irregular writes let's say you have a sensor connected to the internet but the sensor is intermittently losing connection and establishing connection again so the data the sensor emits it's not going to be in regular intervals you might get certain bursts of data so you can have either of the pattern in a time series database the data needs to be highly compressed in a time series database uh that's because if you look at the examples every row is associated with a timestamp however every row is going to be very similar to each other so the data in a time series database tends to be very similar when you compare row by row and just by the sheer volume of data that you need to write into these databases you want your data to be highly compressed so that you're not using a ton of space that's a very important consideration when you're using a time series database because if your sensor or any other source of data writes a ton of data every minute you want the database to compress your data so that if you have to do 40 million rows in a given day you don't use the data that 240 million rows in a mysql database would take because that would be very very inefficient [Music] the next one is in a time series database it's very common to have large range scans of many records let's look at an example let's say you have data for over a month worth in your database and you want to look at the average price of a stock over a month to look at the average price over a month you would have to go through all the rows you wrote to your database over 30 days that's a lot of rows that you have to go calculate and then find the average or the maximum or the minimum so it's very common in a time series database to have queries that scan bunch of a bunch of records at once [Music] the next one is a right to latest time entry only we already talked about it before but more commonly in a time series database the chances are every time a piece of data comes in it's associated with the current time so as as you keep writing to the table you only keep writing uh with increasing time so you don't write past data you only write present data and then the next data is going to be with the timestamp associated uh the associated timestamp is going to be the next minute so you don't really write historical data you write the current data more often than not and lastly you want native support for things like summaries aggregations or rollups later on we're going to take a look at a few common time series queries that the database lets us uh get the results off and you want these queries to be natively supported you don't want to do fancy joins or fancy clauses to get these data from the table because you want every read to be very very performant we're going to talk about all these aggregation summaries and roll ups in a later slide so that's what makes the time series data and time series databases unique so now let's take a look at some queries that time series database can help us answer we can get the daily stock price of amazon so let's say you have uh in your database you have the price of amazon stock over one year what you can do is you can look at the average price for every day over over a year and this query can be very very performant in a time series database whereas if you were doing a group buy or average in my sql or some other relational database the query would work but it would take a long time another example is the daily average temperature in san francisco let's say you have a database full of temperature reading all over the from all over the world and you want to see a chart of how the daily average temperature changed over a period of year very easy in a time series database to run this query and get the result in a very short amount of time similarly you can have unique website visits every month let's say your database logs one line into your database every time someone visits the website using a query you can uh chart how many visits you're getting every month from on your website in a relational database you can do the same thing but it's gonna be way less performant than it's going to be in a time series database [Music] you can also look at the highest revenue months in the last five years let's say in the database every row is the amount of revenue you are uh getting on a daily basis using a time series database you can roll it up by month every year and then look at which month uh in which year was your highest revenue month and you can also look at things like the average uber right prices by city so let's say you have a database where you have each row is one uber ride with the associated price so you can group by city and after grouping by city you can look at the average price in uh uber ride per city so you can get very very granular with your time data and this wouldn't have been possible in a relational database or a document database or even a graph database we just looked at a few of the examples but as you can imagine the use case can be whatever you can think of and time series databases are built to answer any question you have related to time data so what are some time series databases that are used right now in production so the first four are purpose-built time series databases we have quest db time scale db influx db all three of them can be used uh very easily some of them are open source some of them have companies behind them and then we have the typical amazon offering called time stream this is a cloud database which is managed by aws and very performant on when it comes to storing or reading time series data the last one is cassandra which is a open source nosql database that you can use for multiple use cases when it comes to cassandra cassandra is a very good at time series data but your data model will have to be designed with that in mind you can make it perform very well but you would need to put in the work to design the best schema before you start writing to cassandra for the other four that i have they are more out of the box so you can just connect to your application and start writing to it immediately and it should work perfectly whereas for cassandra you would need to put some manual work to get the best performance out of it and yeah that's all i had for you all today hopefully this was a good introduction to time series databases and why we need them in a future video i am gonna go deeper into one of the databases and show you how you can ingest data into it how you can run different queries and compare the performance of these databases to that of maybe my sql postgres or mongodb so hopefully you'll check it out when i release that one hopefully the video was helpful if you have any questions just leave them in the comments and i'll get back to you as soon as i can with that being said hopefully you learned something and i'll catch you on the next one bye