Transcript for:
BigQuery Pricing Overview

hello all welcome to Tech Capture so in this video we are going to discuss about very crucial topic in Google cloud that is about BigQuery pricing so you might heard about a lot of issues due to the unexpected billing in the BigQuery so I heard from lot of people that BigQuery pricing is complex it's difficult to understand and I get lot of requests from people to consult me on the BigQuery pricing or help them to reduce the BigQuery cost so I thought okay let's create one video where I could explain the bitquery pricing because once we understood the bitquery pricing we can take the preventive measure to avoid unnecessary cost instead of the corrective measures okay so that's why I started working on this uh bigquery pricing i detailed and gathered some information which I could showcase as a demo and then I also created one uh detailed blog on the bigquery pricing i will just add the link in the description of this video so mostly yeah this video will be for them maybe for the architect or data engineer who want to understand the bigquery pricing and this would be mostly a theoretical video conceptual video so there won't be much demo part or practical part here because this is kind of this topic is related to understanding how BigQuery pricing work as and when I will show you demo as and when I will show you artifact like how BigQuery pricing will work but mostly I'll just walk you through the BigQuery pricing different components how it cost how BigQuery or Google cloud charges for which operation it charges so all things I'll just navigate you through and I will try to explain you in a more better way okay so let's start and first see how BigQuery pricing works so BigQuery actually charges for couple of things actually four to five things actually so what are these things or what are these item so first and more major is BigQuery cost us for the storage okay so where our store the data how much data we are storing on BigQuery so definitely there is a free tier for storage as well so 10 GB per month is free for BigQuery storage pricing then the compute pricing means how much queries we are executing how much data we are analyzing so it's not about the number of queries it's about how much data you are processing let's say suppose you are having one table with one terabyte of data and you are executing one query so it's processing uh kind of if you're executing one query with select star from the table it's processing whole one TB of data okay so you executed one query only but you will be charged for whole 1 TB of data okay and let's say suppose on other hand you are has just having small table that is kind of megabytes okay only MBs of data there you even execute like 20 queries also it will not going to cost you that much apart from instead of that you will get charged for only single query on the very huge table so you have to take care how you are writing your query because select star always cost more than using select individual column names okay because query is a column database so second is a compute pricing we will go detail in each of these pricing layers so do not worry I'll just go in detail currently I'm just showing the topics like on which operation or which items BigQuery charges so for compute uh so it is 1TB per month is free so you can query 1TB of data per month which is free okay so that is for on demand pricing so in computer also there are some uh pricing models in storage also there are some pricing models so we'll see that in detail then it also charges for the streaming insert okay then it charges for the BI engine memory that is one of the feature we are having for the query acceleration then it also charges for the data transfer and exports okay so then now we'll talk about the storage pricing okay so just before before just uh jumping into the storage pricing okay so let me just go navigate you also to the BigQuery pricing page okay so just go to the BigQuery pricing page as well so this is BigQuery pricing page here you will see the detailed pricing now you can see here also uh how BigQuery is charging us you can see BigQuery pricing it's uh for the compute mainly it is for compute and storage and you can see BigQuery also charge for the operation like BigQuery Omni Bitquery ML BI engine read write all these things okay uh and there are free usage there which I mentioned already for the compute you can query 1TB of data for a free per month and you can store up to 10 GB of data free per month okay so we'll just take a reference from this page as well as a source of truth because this is uh official Google cloud documentation and whatever the pricing here would be the exact pricing in terms of numbers okay so let me just go back to our slide and as and when required we'll just come back to this page to have the reference and just to quickly check how you are utilizing the bigquery cost so let's go back to our slide now coming to our slide let's talk about storage pricing how BigQuery charges for the storage so BigQuery offer two primary storage option okay so how we are storing the data first is active storage let's say we are having some table some data which is frequently used okay so that is considered as active storage so how frequently so we can check it is 90 consecutive days so if your data is not used for 90 days it's considered as a long-term storage okay uh and the active storage it means your data is keep accessed throughout the 90 days okay and BigQuery automatically transition it to long-term storage if our data is not accessed for the 90 days and definitely there would be a different pricing for the active storage and long-term storage so just check what is the difference in pricing for the active and long-term storage i'll just go back to a BigQuery pricing page now here you can check the active storage long-term storage so you can clearly see the statement the pricing storage of the table automatically drops by 50% if it is not accessed for the 90 days so it will automatically go into the long-term storage mode and your pricing will drop so let's check this is the active the pricing is 0.02 GB per month and for the long term the pricing is 0.01 so which is completely half so if you are not accessing your data the pricing will be lower and if you are very frequently access data then your pricing will be more okay I mean for active data pricing will be more and for long-term storage data which is not accessed for 90 days pricing will be automatically lower so this is for the active and long-term storage but here you can see uh logical storage then physical storage what is this logical storage and physical storage okay so okay let's check that also what is logical storage and what is physical storage so let me just go back to our slide so for storage also this is kind of the storage option but for the billing also BigQuery offers two storage billing models okay so you can see here BigQuery offers two billing models one is for logical one is for physical physical means data is compressed and then stored on the disk so that is kind of physical storage logical is to your total size of the data and you can choose the billing model if you want a logical or physical but you should first understand which is which will give you the more benefit logical or physical okay and each of the table you will see what is the logical size of your data when it is stored and what is the physical size of your data when it is compressed and storage okay so I will show you that but you can see for logical storage building data is built based on the logical size of data it include indexes metadata and additional overhead it reflects uncompressed size of your data okay it's logically representative bigquery so it is kind of uncompressed data physical storage billing model so data is built based on the physical data which is compressed and then stored on the disk after compression this model may result lower storage cost but BigQuery user compression techniques to reduce physical subs of the data now definitely let's say suppose your logical size is 1GB your physical size might be around 500 MB or 400 MB because the data is compressed and then storage then you might think okay then why we need logical storage model if it is the less data then we will go with the physical storage model right but even the cloud providers are smart okay so they kept more pricing for the physical and less pricing for the logical uh I will just show you one example I have one table where I will try to show you both physical storage and logical storage and then try to show you the cost difference we'll just calculate the cost and then we'll see how much cost we have to pay for that table okay so let me just go to my Google cloud console now so I'm having uh this is my BigQuery project and let's say I will check one huge table so I am having one this tag table okay this if I go for this tag table so I will show you the details what is the logical size of the table because this table if I show you it is having more than 10 million records okay so I will show you so you can see the logical size is 18 GB but physical size is only 6 GB so it is more lower than your kind of the logical data okay then we'll just try to calculate the cost which is best suited for us logical or physical so I'll just go to the BigQuery pricing and check the logical price and physical price okay so I will check the active data okay so active logical price 0.02 per GB per month so I'll just open my calculator okay and just try to calculate So it is per okay so 0.02 per GB and we are having 18 GB of data so let's take the exact value 18.75 right so go back to Bitquery pricing 18.75 into 02 so how much is the value 0.375 for 80GB of data okay I'll just note it down somewhere i'll just note down here for physical oh sorry for logical we are paying this much dollar okay let's check for physical now because for physical our data size uh for physical this is the price 0.04 and data size is lower so physical size is 6.12 only so just clear it and check for 6.12 into uh what is the physical price active physical is 0.04 0.04 as if 0.248 physical okay so now you could understand even though the size or price is more for the physical uh we could save some cost here right so that's how we have to calculate case by case based on your data okay so sometimes if uh the compressed data is not making more difference then you can go with the uh logical as well okay so it depend upon the type of data you are having how complex data you are having okay but how we can check what is these The storage billing model is used for your data set so the storage billing model is on the data set level if you go here and just click on the edit details and go in advance option so you will see storage billing model unspecified so whenever it's unspecified by default it use the logical billing you can change it to logical or physical so by default it's logical you can change it to physical okay but I think it takes some time whenever you change it to physical around 24 hours of time it take to reflect your updated billing model okay so that's one thing you have to remember that you can choose your billing model at the data set level and physical is kind of lesser size but the cost is more so you have to choose accordingly which billing model you want logical storage billing model or a physical storage billing model hope this is clear what is active logical long-term logical active physical and long-term physical for both 10 GB free for each month now this is clear let's go back to our slide and check for the next pricing component okay so we are done with the storage pricing let's check what is next compute pricing so compute pricing is how much data we are quering uh BigQuery is a columnar database because uh yeah based on that it will cost us based on how much column we are querying instead of how much uh record we are quering so I will just show you one example okay let me just go back to Google cloud console i will take example of the same table okay this tag table so let go back and if I try to query select star from uh this table remove the limit 1,00 so this table is having 10 million records okay so whenever I'm using select star it is costing me for 18.75 GB it means for whole table it is costing me as a compute cost let's say I will just query one record limit one so it will query only one record still I will be charged for 18.75GB because a big query doesn't look how much records you are fetching it will look for how much columns you are fetching okay so let's say now I will instead of the select star I will fetch whole table data 10 million record but instead of all column I will just only fetch few columns let's say what columns okay i will just fetch two columns ID and title okay so instead of select star I will just write another query and here I will just use select ID comma title so here I'm fetching 10 million record but only ID and title so I will be only build for 600 MB and here let's say I'm just fetching limit one records so which should cost me more one record or 10 million records one record is costing me more because we are using select star and it will cost us 18.75GB but here we are fetching 10 million record but it is costing us very low that is for 684 MB so you understood most of the mistakes happening in the writing bigquery query that we should avoid using select star as it is costing us more we should not look for how much records it is fetching we should look for how much column it is fetching so this is how query columnar database works and how that query pricing is work for BigQuery okay hope you understood that now let's go back to the compute pricing and see how this pricing will work okay so let me just go back to the compute pricing okay now let's check in compute also we are having two pricing model one is on demand it means just pay for whatever you are using so under this model users are charged based on amount of data processed by each query we just saw in current query this is measured in TB so charge is cost is per TB so in our latest query we saw we were fetching 18 GB data for single query so it will be per month it is calculated how much terabyte of data you fetched and your costing will be based on that terabyte per unit okay so this approach is suitable for variable workload where query frequency and data volume can fluctuate so this is easy or easy to calculate however your query is running you will be charged on your query only so just like on demand pay for whatever you are using so key point consider cost calculation charges are applied per TB of data processed no upfront cost you don't need to pay additional thing just pay whatever you are using automatic scaling so resources are scaled what resources so each query execution happen on the compute at the back end so more you use more automatic scaling will happen at the back end consideration cost predictability cost can vary based on your query complexity when we want to optimize cost it recommend to structure our queries as we discussed instead of select star use the columns while designing the table use partitioning and clustering if you're not aware about what is partitioning and clustering I created a comprehensive guide what is partitioning and clustering with a detailed demo how we can create partition table and how we can create cluster table and how it reduce the cost how it improve the query performance i have explained everything in that demo so this is for ondemand pricing we have another pricing model here is flat rate pricing so here it this models allow users to purchase dedicated query processing capacity so I will give you simple example let's say you are going for a picnic and you are booking a hotel so you will pay only for the days you will be living there so let's say you are living there for 4 days you will pay only for 4 days but let's say now you know that you are going to stay there for 1 month so you can book hotel for whole 1 month and you can pay discounted price even though you are living in like 27 days 28 days but you already paid for 30 days so that 3 days amount of 3 days money is gone okay but you know that you are going to stay there minimum for 30 days maximum can be 35 36 you can afford that but if you are staying you know you know go stay there for minimum 30 days you can book hotel for 30 days at a discounted price so this on demand pricing is sometimes look we don't know how many go how many days you are going to stay so you can go with the on demand pricing flat rate pricing is sometimes you know there is a huge workload there is huge query going to run in your organization so you can see here this model allow user to purchase dedicated query processing capacity known as a slot so at flat monthly rate it's idle for organization with consistence high query workload seeking predictable cost so if it is predictable that okay I going to use this month of terabyte of quering then it's better you can just use or you can just use the flat rate pricing key point so users received specific number of slot ensure dedicated resource for query execution predictable billing so it's not like something unexpected come up because of the because you already purchased the flat rate pricing so you are going to pay only for that one whichever you purchased Okay and the reserved option you are reserving the slot sometimes so you will have the capacity automatically and definitely autoscaling will be there consideration like upfront commitment so you have to commit for a certain level of capacity as I mentioned I am committing for 30 days of my stay in hotel that's why they provided me the discounted price similar way here also happening if you are going with the flat rate pricing you are paying for the fixed capacity okay if you use less you already paid for the upfront cost okay so that is how your compute pricing for the on demand and flat rate pricing work you want to check the difference how much it is for on demand how much for flat rate or you can check on the pricing page so let me just go over to the pricing page so here we are on the on demand so you can see for on demand you are paying 6.25 TB uh 6.25 per TB 1tb per month is free so you are paying 6.25 per TB per month for this one and it might vary based on the regions minor okay for this region which is AWS region okay I don't want to check in AWS let's say this one so there might be slight differences yeah Europe is more US is less so based on your region your pricing will vary okay so while calculating cost let make sure you are using the correct region okay and uh so this is for the on demand let's check for the uh flat rate so let me check here flat rate pricing so flat rate pricing is based on the slot so if you are using the slot you are paying based on the slot slot is like computational unit you need for executing query or you can calculate the slot based on how many slots you are using you can calculate from the information schema metadata you can check how much slot you used for previous month based on that you can calculate your query slot okay so then you can decide you want to go with the flat rate pricing or uh uh what we can say on demand pricing also like if you already associated with the Google cloud you are already paying very high to the Google cloud you are partnered with Google cloud then Google also help you to understand which billing model is best suited for you okay so you can take their consultation as well in terms of choosing your billing model so hope this is clear on the compute size and then we are having some other uh pricing so I'll just walk you through here we considered uh storage we checked capacity then there are some data transfer so whenever you are transferring the data okay so for each data transfer there is associated cost for BigQuery Omni you can see the pricing if BigQuery Omni is used to connect your uh I mean you can query the data from AWS also you can query data from Azure so that is how you can use the BigQuery Omni so I already created a BigQuery omni video how we can carry the data from Amazon S3 into BigQuery so you can see the pricing here this is for per TB okay then for data injection there is a separate pricing if you are loading batch loading so it is free using the slot pool if it is streaming insert then there is certain pricing it is 0.0134 per 200 MB okay uh and for data extraction if you are doing the batch export it is free if you are using the streaming reads then it is kind of $1.43 per TB okay uh so then data replication pricing is also there so if it is the cross region replication or cross region turbo replication so if you can see here if you are copy operation in the same location US east one to US east one it is free replicate one time from bigquery to US multi-reion from US multi-reion to US central one it is also free because it is part of the US replicate from bigquery US multi-reion to other one you can see the following table so pricing is there it is like if it is cross region it is more costly so we can see if you moving data from US region to US what is the pricing US to North America what is the pricing uh then based on like if it is source location and destination you can see this table so this is your replication charges okay and uh for the turbo it is kind of the faster one here also you can check the pricing uh then what else we are having ML so BigQuery ML pricing is also here based on the model so you can see this is very high if you are using the regression model so model creation so it is like $400 per TB okay so this is very high cost so make sure you gone through the pricing before you start working as a BigQuery ML operations uh then here what are the free operations it is there batch load this is free copy data free export data free data delete operation free metadata operations are free then storage first 10 GB per month is free queries 1TB of data you can query for a free so this is kind of free operation so I think we covered most of the things okay uh let me just go back to the slide and let's see if anything is remaining oh I think yeah We covered everything so that's how the Bitquery pricing works if you still having any confusion just let me know i'll just try to help you out but yeah understanding Bitquery pricing before start using Bitquery before start designing the data warehouse on BigQuery is very crucial and is very important okay so thank you for watching this video this video got bit longer bit theoretical but yeah you are here till the end it means you definitely learned something and thank you for watching this video and just keep watching all the videos as well so we'll see you in one more next amazing video