Back of the envelope math is a very useful tool in a system design toolbox. In this video, we'll go over how and when to use it, and share some tips on using it effectively. Let's dive right in. Experienced developers use back of the envelope math to quickly sanity check a design. In these cases, absolute accuracy is not important.
Usually it is good enough to get within an order of magnitude or two of the actual numbers we are looking for. For example, if the math says at our scale our web service needs to handle 1 million requests per second, and each web server could only handle about 10,000 requests per second, we learn two things quickly. One, we learn we'll need a cluster of web servers with a load balancing in front of them. We'll need about 100 web servers. Another example, If the math shows that the database needs to handle about 10 queries per second at peak, it means that a single database server could handle the low for a while, and there's no need to consider sharding or caching for a while.
Now let's go over some of the most popular numbers to estimate. The most useful by far is requests per second at the service level or queries per second at the database level. Let's go over the common inputs and the request per second calculations. First input is DAU or daily active users.
This number should be easy to obtain. Sometimes the only available number would be monthly active users. In that case, estimate the DAU as a percentage of the MAU.
The second input is the estimate of the usage per DAU of the service we're designing for. For example, Not everyone active on Twitter makes a post, so only a percentage does that. So 10 to 25% seems to be reasonable.
Again, it doesn't have to be exact. Getting within an order of magnitude is usually fine. Now the third input is a scaling factor.
Usage rate for a service usually has peaks and valleys throughout the day. We need to estimate how much higher the traffic would peak compared to the average. This would reflect the estimated request per second peak where the design could potentially break.
For example, for a service like Google Maps, the usage rate during commute hours could be five times higher than average. Another example, for a ride-sharing service like Uber, weekend nights could have twice as many riders as average. Now let's go over an example. We'll estimate the number of tweets created per second on Twitter.
No these numbers are made up and they are not official numbers from Twitter. Let's assume Twitter has 300 million MAU and 50% of the MAU use Twitter daily so that's about 150 million DAU. Next we estimate that about 25% of Twitter DAU make tweets and each one on average makes two tweets.
That is 25% times two so this is 0.5 tweets per DAU. For the scaling factor, We estimate that most people tweet in the morning when they get up and can't wait to share what they dream about the night before. And that spikes the tweet created per traffic to twice the average when the US East Coast wakes up, let's say. Now we have enough to calculate the peak tweaks created per second. We have 150 million DAU times 0.5 tweets per DAU times two times scaling factor divided by 86,400 seconds in a day.
Now that is roughly about 1500 tweets created per second. Let's go over the techniques we use to simplify the calculations. First, we convert all big numbers to scientific notation.
Doing math on really big numbers is very error-prone. By converting big numbers to scientific notation, part of the multiplication becomes simple addition. and division becomes subtraction. In the example above, 150 million DAU becomes 150 times 10 to the sixth or 1.5 times 10 to the eighth. There are 86,400 seconds in a day.
We round it up to 100,000 seconds and that becomes 10 to the fifth seconds. And since it's a division, 10 to the fifth becomes 10 to the minus fifth. Next we group all the power of tens together.
and then all other numbers together. So the math becomes 1.5 times 0.5 times 2 and 10 to the 8 times 10 to the minus 5th which equal to 10 to the 8 minus 5 which is 10 to the third. Putting it all together is like 1.5 times 10 to the third or 1500. Now with practice, we should be able to convert a large number to a scientific notation in seconds. And here are some handy conversions we should memorize. As an example, we should know by heart that 10 to the 12th is a trillion or a terabyte.
And when we see a number like 50 terabyte, We should be able to convert it quickly to 50 x 10 to the 12, which is 5 x 10 to the 13. We're going to ignore the fact that 1 kilobyte is actually 2 to the 10th bytes, or 1024 bytes, and not a thousand bytes. We don't need that degree of accuracy for back of the envelope math. Let's wrap up by going through one last example. We'll estimate how much storage is required for storing multimedia files for tweets. We know from previous example that there are about 150 million tweets per day.
Now we need an estimate on a percentage of tweets that could contain multimedia content and how large those files are on average. With our meticulous research, we estimate that 10% of tweets contain pictures, and they're about 100 kilobyte each, and 1% of all the tweets contain videos, and they're about 100 megabyte each. We further assume that the files are replicated with three copies each, and that Twitter will keep the media for five years.
Now here's the math. For storing pictures, we have the following. We have 150 million tweets times 1 in 10 tweets with pictures, times 100 kilobytes per picture, times 400 days in a year, times 5 years, times 3 copies. So that turns into 1.5 times 10 to the 8th.
times 10 to the minus 1 times 10 to the fifth times 4 times 10 to the second times 5 times 3. Again we group the powers of tens together this becomes 1.5 times 4 times 5 times 3 which is 90 and 10 to the 8th minus 1 plus 5 plus 2 which is 10 to the 14th and that becomes 9 times 10 to the 15. which is, from the table, 9 petabytes. Now for storing videos, we take yet another shortcut. Since videos on average are 100 megabytes each, while pictures are 100 kilobytes, a video is a thousand times bigger than a picture on average. Second, only 1% of tweets contain a video, while pictures appear in 10% of all the tweets.
So videos are 1 tenth as popular. Putting the math together, the total video storage is a thousand times one tenth of pictures storage, which is 100 times 9 petabytes or 900 petabytes. In conclusion, back of the envelope math is a very useful tool in our system design toolbox. Now don't over index on precision. Getting within an order of magnitude is usually enough to inform and validate our design.
If you'd like to learn more about system design, check out our books and weekly newsletter. Please subscribe if you learned something new. Thank you so much and we'll see you next time.