Transcript for:
Comprehensive Guide to System Design

This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies. Haik will teach you the core concepts you need to know for a system designs interview. This is a complete crash course on system design interview concepts that you need to know to ace your job interview.

The system design interview doesn't have to do much with coding, and people don't want to see you write actual code, but how you glue an entire system together, and that is exactly what we're going to cover in this tutorial. We'll go through all of the concepts that you need to know to ace your job interview. Before designing large-scale distributed systems, it's important to understand the high-level architecture of the individual computer.

Let's see how different parts of the computer work together to execute our code. Computers function through a layered system, each optimized for varying tasks. At the core, computers understand only binary, zeros and ones, these are represented as bits.

One bit is the smallest data unit in computing, it can be either zero or one. One byte consists of 8 bits and it's used to represent a single character like A or number like 1. Expanding from here we have kilobyte, megabyte, gigabytes and terabytes. To store this data, we have computer disk storage, which holds the primary data. It can be either HDD or SSD type. The disk storage is non-volatile.

It maintains data without power, meaning if you turn off or restart the computer, the data will still be there. It contains the OS applications and all user files. In terms of size, disks typically range from hundreds of gigabytes to multiple terabytes.

While SSDs are more expensive, they offer significantly faster data retrieval than HDD. For instance, an SSD may have a read speed of 500 MBps to 3500, while an HDD might offer 80-160 MBps. The next immediate access point after disk is the RAM or Random Access Memory.

RAM serves as the primary active data holder and it holds data structures, variables and applications data that are currently in use or being processed. When a program runs, its variables, intermediate computations, runtime stack and more are stored in RAM because it allows for a quick read and write access. This is a volatile memory. which means that it requires power to retain its contents, and after you restart the computer, the data may not be persisted. In terms of size, RAMs range from a few GB in consumer devices to hundreds of GB in high-end servers.

Their read-write speed often surpasses 5000 MB per second, which is faster than even the fastest SSD's disk speed. But sometimes even this speed isn't enough, which brings us to the cache. The cache is smaller than RAM, typically it's measured in MB.

But access times for cache memory are even faster than RAM, often just a few nanoseconds for the L1 cache. The CPU first checks the L1 cache for the data. If it's not found, it checks the L2 and L3 cache, and then finally it checks the RAM.

The purpose of a cache is to reduce the average time to access data. That's why we store frequently used data here, to optimize CPU performance. And what about the CPU?

CPU is the brain of the computer. It fetches, decodes and executes instructions. When you run your code, it's the CPU that processes the operations defined in that program.

But before it can run our code, which is written in high-level languages like Java, C++, Python or other languages, our code first needs to be compiled into machine code. A compiler performs this translation and once the code is compiled into machine code, the CPU can execute it. It can read and write from our RAM, disk and cache data.

And finally we have motherboard or mainboard, which is what you might think of as the component that connects everything. It provides the path phase that allow data to flow between these components. Now let's have a look at a very high level architecture of a production ready app. Our first key area is the CI-CD pipeline, continuous integration and continuous deployment. This ensures that our code goes from the repository, through a series of tests and pipeline checks, and onto the production server without any manual intervention.

It's configured with platforms like Jenkins or GitHub actions for automating our deployment processes. And once our app is in production it has to handle lots of user requests. This is managed by our load balancers and reverse proxies like NGINX. They ensure that the user requests are evenly distributed across multiple servers, maintaining a smooth user experience even during traffic spikes.

Our server is also going to need to store data, for that we also have an external storage server that is not running on the same production server. Instead, it's connected over a network. Our servers might also be communicating with other servers as well, and we can have many such services, not just one. To ensure everything runs smoothly, we have logging and monitoring systems, keeping a keen eye on every micro interaction of storing logs and analyzing data. It's standard practice to store logs on external services, often outside of our primary production server.

For the backend, tools like PM2 can be used for logging and monitoring. On the frontend, platforms like Sentry can be used to capture and report errors in real time. And when things don't go as planned, meaning our logging systems detect failing requests or anomalies, first it enforces our alerting service, After that, push notifications are sent to keep users informed from generic something went wrong to specific payment failed. And modern practice is to integrate these alerts directly into platforms we commonly use like Slack.

Imagine a dedicated Slack channel where alerts pop up at the moment an issue arises. This allows developers to jump into action almost instantly, addressing the root cause before it escalates. And after that, developers have to debug the issue. First and foremost, the issue needs to be identified.

Those logs we spoke about earlier, they are our first port of call. Developers go through them searching for patterns or anomalies that could point to the source of the problem. After that it needs to be replicated in a safe environment.

The golden rule is to never debug directly in the production environment. Instead, developers recreate the issue in a staging or test environment. This ensures users don't get affected by the debugging process. Then developers use tools to peer into the running application and start debugging. Once the bug is fixed, a hotfix is rolled out.

This is a quick temporary fix, designed to get things running again, it's like a patch before a more permanent solution can be implemented. In this section let's understand the pillars of system design and what it really takes to create a robust and resilient application. Now before we jump into the technicalities, let's talk about what actually makes a good design. When we talk about good design in system architecture, we are really focusing on a few key principles. Scalability, which is our system growth with its user base.

Maintainability, which is ensuring future developers can understand and improve our system. And efficiency, which is making the best use of our resources. But good design also means planning for failure and building a system that not only performs well when everything is running smoothly, but also maintains its composure when things go wrong.

At the heart of system design are three key elements, moving data, storing data, and transforming data. Moving data is about ensuring that data can flow seamlessly from one part of our system to another. Whether it's user requests seeding our servers or data transfers between databases, we need to optimize for speed and security. Storing data isn't just about choosing between SQL or NoSQL databases.

It's about understanding access patterns, indexing strategies, and backup solutions. We need to ensure that our data is not only stored securely, but is also readily available when needed. And data transformation is about taking raw data and turning it into meaningful information.

Whether it's aggregating log files for analysis or converting user input into a different format. Now let's take a moment to understand the crucial concept in system design, DeKalb theorem, also known as Brewer's theorem, named after computer scientist Eric Brewer. This theorem is a set of principles that guide us in making informed trade-offs between three key components of a distributed system, consistency, availability, and partition tolerance.

Consistency ensures that all nodes in the distributed system have the same data at the same If you make a change to one node, that change should also be reflected across all nodes. Think of it like updating a Google Doc, if one person makes an edit, everyone else sees that edit immediately. Availability means that the system is always operational and responsive to requests, regardless of what might be happening behind the scenes. Like a reliable online store, no matter when you visit, it's always open and ready to take your order.

And partition tolerance refers to the system's ability to continue functioning even when a network partition occurs. Meaning if there is a disruption in communication between nodes, the system still works. It's like having a group chat where even if one person loses connection, the rest of the group can continue chatting.

And according to CAP theorem, a distributed system can only achieve two out of these three properties at once. At the same time, if you prioritize consistency and partition tolerance, you might have to compromise on availability and vice versa. For example, a banking system needs to be consistent and partition tolerant to ensure financial accuracy, even if it means some transactions take longer to process, temporarily compromising availability. So every design decision comes with trade-offs, for example a system optimized for read operations might perform poorly on write operations.

Or in order to gain performance we might have to sacrifice a bit of complexity. So it's not about finding the perfect solution, it's about finding the best solution for our specific use case. And that means making informed decision about where we can afford to compromise. So one important measurement of system is availability. This is the measure of system's operational performance and reliability.

When we talk about availability, we are essentially asking is our system up and running when our users need it. This is often measured in terms of percentage, aiming for that golden 5 nines availability. Let's say we are running a critical service with 99.9 availability.

That allows for around 8.76 hours of downtime per year. But if we add two nines to it, we are talking just about 5 minutes of downtime per year. And that's a massive difference, especially for services where every second counts.

We often measure it in terms of uptime and downtime and here is where service level objectives and service level agreements come into play. SLOs are like setting goals for our system's performance and availability. For example, we might set an SLO stating that our web service should respond to requests within 300 milliseconds and 99.9% of the time.

SLAs, on the other hand, are like formal contracts with our users or customers. They define the minimum level of service we are committing to provide. So if our SLA guarantees 99.99% availability and we drop below that, we might have to provide refunds or other compensations to our customers. Building resilience into our system means expecting the unexpected. This could mean implementing redundant systems, ensuring there is always a backup ready to take over in case of failure.

Or it could mean designing our system to degrade gracefully so even if certain features are unavailable, the core functionality remains intact. To measure this aspect we used reliability, fault tolerance, and redundancy. Reliability means ensuring that our system works correctly and consistently.

Fault tolerance is about preparing for when things go wrong, how does our system handle unexpected failures or attacks. And redundancy is about having backups, ensuring that if one part of our system fails, there is another ready to take its place. We also need to measure the speed of our system, and for that we have throughput and latency.

Throughput measures how much data our system can handle over a certain period of time. We have server throughput which is measured in requests per second. This metric provides an indication of how many client requests a server can handle in a given time frame.

A higher RPS value typically indicates better performance and the ability to handle more concurrent users. We have database throughput which is measured in queries per second. This quantifies the number of queries a database can process in a second.

Like server throughput, a higher QPS value usually signifies better performance. And we also have data throughput, which is measured in bytes per second. This reflects the amount of data transferred over a network or processed by a system in a given period of time.

On the other hand, latency measures how long it takes to handle a single request. It's the time it takes for a request to get a response. And optimizing for one can often lead to sacrifices in the other, for example, batching operations can increase throughput but might also increase latency.

And designing a system poorly can lead to a lot of issues down the line, from performance bottlenecks to security vulnerabilities. And unlike code which can be refactored easily, redesigning a system can be a monumental task. That's why it's crucial to invest time and resources into getting the design right from the start.

and laying a solid foundation that can support the weight of future features and user growth. Now let's talk about networking basics. When we talk about networking basics, we are essentially discussing how computers communicate with each other.

At the heart of this communication is the IP address, a unique identifier for each device on a network. IPv4 addresses are 32-bit, which allows for approximately 4 billion unique addresses. However, with the increasing number of devices, we are moving to IPv6, which uses 128-bit addresses, significantly increasing the number of available unique addresses. When two computers communicate over a network, they send and receive packets of data. And each packet contains an IP header, which contains essential information, like the sender's and receiver's IP addresses, ensuring that the data reaches the correct destination.

This process is governed by the Internet Protocol, which is a set of rules that defines how data is sent and received. Besides the IP layer, we also have the application layer, where data specific to the application protocol is stored. The data in these packets is formatted according to specific application protocol data, like HTTP for web browsing, so that the data is interpreted correctly by the receiving device.

Once we understand the basics of IP addressing and data packets, we can dive into transport layer, where TCP and UDP come into play. TCP operates at the transport layer and ensures reliable communication. It's like a delivery guy who makes sure that your package not only arrives, but also checks that nothing is missing.

So each data packet also includes a TCP header, which is carrying essential information like port numbers and control flags necessary for managing the connection and data flow. TCP is known for its reliability, it ensures the complete and correct delivery of data packets. It accomplishes this through features like sequence numbers, which keep track of the order of packets.

And the process known as the three-way handshake, which establishes a stable connection between two devices. In contrast, UDP is faster, but less reliable than TCP. It doesn't establish a connection before sending data and doesn't guarantee the delivery or order of the packets. But this makes UDP preferable for time-sensitive communications like video calls or live streaming, where speed is crucial and some data loss is acceptable. To tie all these concepts together, let's talk about DNS, Domain Name System.

DNS acts like the internet's phonebook, translating human-friendly domain names into IP addresses. When you enter a URL in your browser, the browser sends a DNS query to find the corresponding IP address, allowing it to establish a connection to the server and retrieve the webpage. The functioning of DNS is overseen by ICANN, which coordinates the global IP address space and domain name system.

And domain name registrars like Namecheap or GoDaddy are accredited by ICANN to sell domain names to the public. DNS uses different types of records, like A records, which map the domain to its corresponding IP address, ensuring that your request reaches to the correct server. Or 4A records, which map a domain name to an IPv6 address. And finally, let's talk about the networking infrastructure which supports all this communication.

Devices on a network have either public or private IP addresses. Public IP addresses are unique across the Internet, while private IP addresses are unique within a local network. An IP address can be static, permanently assigned to a device, or dynamic, changing over time. Dynamic IP addresses are commonly used for residential Internet connections. And devices connected in a local area network can communicate with each other directly.

And to protect these networks we are using firewalls which are monitoring and controlling incoming and outgoing network traffic. And within a device specific processes or services are identified by ports which when combined with an IP address create a unique identifier for a network Some ports are reserved for specific protocols, like 80 for HTTP or 22 for SSH. Now let's cover all the essential application layer protocols. The most common protocol out of these is HTTP, which stands for Hypertext Transfer Protocol, which is built on TCP IP. It's a request-response protocol, but imagine it as a conversation with no memory.

Each interaction is separate, with no recollection of the past. This means that the server doesn't have to store any context between requests, instead each request contains all the necessary information. And notice how the headers include details like URL and method, while body carries the substance of the request or response. Each response also includes the status code, which is just to provide feedback about the result of a client's request on a server. For instance, 200 series are success codes, These indicate that the request was successfully received and processed.

300 series are redirection codes. These signify that further action needs to be taken by the user agent in order to fulfill the request. 400 series are client error codes.

These are used when the request contains bad syntax or cannot be fulfilled. And 500 series are server error codes. This indicates that something went wrong on the server. We also have a method on each request.

The most common methods are get, post, put patch and delete. Get is used for fetching data, post is usually for creating data on server, put and patch are for updating a record and delete is for removing a record from database. HTTP is one way connection but for real time updates we use web sockets that provide a two way communication channel over a single long lived connection. allowing servers to push real-time updates to clients. This is very important for applications requiring constant data updates without the overhead of repeated HTTP request response cycles.

It is commonly used for chat applications, live sport updates, or stock market feeds where the action never stops and neither does the conversation. From email-related protocols, SMTP is the standard for email transmission over the internet. It is the protocol for sending email messages between servers.

Most email clients use SMTP for sending emails and either IMAP or POP3 for retrieving them. IMAP is used to retrieve emails from a server allowing a client to access and manipulate messages. This is ideal for users who need to access their emails from multiple devices.

POP3 is used for downloading emails from a server to a local client, typically used when emails are managed from a single device. Moving on to file transfer and management protocols, the traditional protocol for transferring files over the internet is FTP, which is often used in website maintenance and large data transfers. It is used for the transfer of files between a client and server, useful for uploading files to server or backing up files. And we also have SSH or secure shell, which is for operating network services securely on an unsecured network. It's commonly used for logging into a remote machine and executing commands or transferring files.

There are also real-time communication protocols like WebRTC which enables browser-to-browser applications for voice calling, video chat and file sharing without internal or external plugins. This is essential for applications like video conferencing and live streaming. Another one is MQTT which is a lightweight messaging protocol, ideal for devices with limited processing power and in scenarios requiring low bandwidth such as IOT devices. And AMQP is a protocol for message-oriented middleware providing robustness and security for enterprise-level message communication. For example, it is used in tools like RabbitMQ.

Let's also talk about RPC which is a protocol that allows a program on one computer to execute code on a server or another computer. It's a method used to invoke a function as if it were a local call when in reality the function is executed on a remote machine. So it abstracts the details of the network communication allowing the developer to interact with remote functions seamlessly as if they were local to the application.

And many application layer protocols use RPC mechanisms to perform their operations, for example in web services, HTTP requests can result in RPC calls being made on backend to process data or perform actions on behalf of the client. Or SMTP servers might use RPC calls internally to process email messages or interact with databases. Of course, there are numerous other application layer protocols, but the ones covered here are among the most commonly used and essential for web development.

In this section let's go through the API design, starting from the basics and advancing towards the best practices that define exceptional APIs. Let's consider an API for an eCommerce platform like Shopify, which if you are not familiar with, is a well-known eCommerce platform that allows businesses to set up online stores. In API design we are concerned with defining the inputs, like product details for a new product, which is provided by a seller, and the outputs, like the information returned when someone queries a product of an API.

So the focus is mainly on defining how the CRUD operations are exposed to the user interface. CRUD stands for Create, Read, Update and Delete, which are basic operations of any data-driven application. For example, to add a new product, we need to send a POST request to slash API slash products where the product details are sent in the request body. To retrieve these products, we need to send a GET request to slash API slash products. For updating, we use PUT or PATCH requests to slash products slash the ID of that product.

And removing is similar to updating. It's again slash products slash ID of the product we need to remove. And similarly we might also have another GET request to slash product slash id which fetches the single product.

Another part is to decide on the communication protocol that will be used, like HTTP web sockets or other protocols, and the data transport mechanism which can be JSON, XML or protocol buffers. This is usually the case for RESTful APIs, but we also have GraphQL and gRPC paradigms. So, APIs come in different paradigms, each with its own set of protocols and standards.

The most common one is REST, which stands for Representational State Transfer. It is stateless, which means that each request from a client to a server must contain all the information needed to understand and complete the request. It uses standard HTTP methods get, post, put, and delete. And it's easily consumable by different clients, browsers, or mobile apps.

The downside of RESTful APIs is that they can lead to overfetching or underfetching of data. because more endpoints may be required to access specific data. And usually RESTful APIs use JSON for data exchange. On the other hand GraphQL APIs allow clients to request exactly what they need, avoiding overfetching and underfetching data.

They have strongly typed queries, but complex queries can impact server performance, and all the requests are sent as POST requests, And GraphQL API typically responds with HTTP to unread status code, even in case of errors, with error details in the response body. GRPC stands for Google Remote Procedure Call, which is built on HTTP2, which provides advanced features like multiplexing and server push. It uses protocol buffers, which is a way of serializing structured data.

And because of that it's efficient in terms of bandwidth and resources, especially suitable for microservices. The downside is that it's less human readable compared to JSON and it requires HTTP2 support to operate. In an ecommerce setting you might have relationships like user to orders or orders to products and you need to design endpoints to reflect these relationships For example, to fetch the orders for a specific user, you need to query to get slash users slash the user ID slash orders.

Common queries also include limit and offset for pagination or start and end date for filtering products within a certain date range. This allows users or the client to retrieve specific sets of data without overwhelming the system. A well-designed GET request should be idemponent, meaning calling it multiple times doesn't change the result and it should always return the same result. And GET requests should never mutate data, they are meant only for retrieval, if you need to update or create a data you need to do a PUT or POST request. When modifying endpoints it's important to maintain backward compatibility, this means that we need to ensure that changes don't break existing clients.

A common practice is to introduce new versions, like version 2 products, so that the version 1 API can still serve the old clients and version 2 API should serve the current clients. This is in case of RESTful APIs. In the case of GraphQL APIs, adding new fields like v2 fields without removing old one helps in evolving the API without breaking existing clients.

Another best practice is to set rate limitations. This can prevent the API from DDoS attacks, it is used to control the number of requests a user can make in certain timeframe, and it prevents a single user from sending too many requests to your single API. A common practice is to also set course settings, which stands for cross-origin resource sharing.

With course settings you can control which domains can access to your API, preventing unwanted cross-site interactions. Now imagine a company is hosting a website on a server in Google Cloud data centers in Finland. It may take around 100 milliseconds to load for users in Europe, but it takes 3-5 seconds to load for users in Mexico.

Fortunately, there are strategies to minimize this request latency for users who are far away. These strategies are called caching and content delivery networks, which are two important concepts in modern web development and system design. Caching is a technique used to improve the performance and efficiency of a system.

It involves storing a copy of certain data in a temporary storage, so that future requests for that data can be served faster. There are four common places where cache can be stored. The first one is browser caching, where we store website resources on a user's local computer. So when a user revisits a site, the browser can load the site from the local cache, rather than fetching everything from the server again.

Users can disable caching by adjusting the browser settings. In most browsers, developers can disable cache from the developer tools. For instance, in Chrome, we have the Disable Cache option in the Developers Tools Network tab.

The cache is stored in a directory on the client's hard drive managed by the browser. And browser caches store HTML, CSS, and JS bundle files on the user's local machine, typically in a dedicated cache directory managed by the browser. We use the cache control header to tell browser how long this content should be cached. For example here the cache control is set to 7200 seconds, which is equivalent to 2 hours. When the requested data is found in the cache we call that a cache hit, and on the other hand we have cache miss, which happens when the requested data is not in the cache, necessitating a fetch from the original source.

And cache ratio is the percentage of requests that are served from the cache compared to all requests. and a higher ratio indicates a more effective cache. You can check if the cache was hit or missed from the xcache header, for example in this case it says missed so the cache was missed and in case the cache is found we will have hit here.

We also have server caching which involves storing frequently accessed data on the server side, reducing the need to perform expensive operations like database queries. Server-side caches are stored on a server or on a separate cache server, either in memory like Redis or on disk. Typically, the server checks the cache from the data before querying the database. If the data is in the cache, it is returned directly, otherwise the server queries the database. And if the data is not in the cache, the server retrieves it from the database, returns it to the user, and then stores it in the cache for future requests.

This is the case of write-around cache where data is written directly to permanent storage, bypassing the cache. It is used when write performance is less critical. We also have write-through cache where data is simultaneously written to cache and the permanent storage.

It ensures data consistency but can be slower than write-around cache. And we also have write-back cache where data is first written to the cache and then to permanent storage at a later time. This improves write performance but you have a risk of losing that data in case of a crash of server.

But what happens if the cache is full and we need to free up some space to use our cache again? For that we have eviction policies which are rules that determine which items to remove from the cache when it's full. Common policies are to remove least recently used ones or first in first out where we remove the ones that were added first or removing the least frequently used ones.

Database caching is another crucial aspect and it refers to the practice of caching database query results to improve the performance of database driven applications. It is often done either within the database system itself or via an external caching layer like Redis or Memcache. When a query is made we first check the cache to see if the result of that query has been stored. If it is we return the cached data avoiding the need to execute the query against the database.

But if the data is not found in the cache the query is executed against the database and the result is stored in the cache for future requests. This is beneficial for read-heavy applications where some queries are executed frequently. And we use the same eviction policies as we have for server-side caching. Another type of caching is CDNs which are a network of servers distributed geographically. They are generally used to serve static content such as JavaScript, HTML, CSS, or image and video files.

They cache the content from the original server and deliver it to users from the nearest CDN server. When a user requests a file, like an image or a website, the request is redirected to the nearest CDN server. If the CDN server has the cached content, it delivers it to the user.

If not, it fetches the content from the origin server, caches it, and then forwards it to the user. This is the pool-based type of CDN, where the CDN automatically pulls the content from the origin server when it's first requested by a user. It's ideal for websites with a lot of static content that is updated regularly.

It requires less active management because the CDN automatically keeps the content up to date. Another type is push-based CDNs. This is where you upload the content to the origin server and then it distributes these files to the CDNs.

This is useful when you have large files that are infrequently updated but need to be quickly distributed when updated. It requires more active management of what content is stored on the CDNs. We again use the cache control header to tell the browser for how long it should cache the content from CDN. CDNs are usually used for delivering static assets like images, CSS files, JavaScript bundles or video content.

And it can be useful if you need to ensure high availability and performance for users. It can also reduce the load on the origin server. But there are some instances where we still need to hit our origin server, for example when serving dynamic content that changes frequently, or handling tasks that require real-time processing, and in cases where the application requires complex server-side logic that cannot be done in these CDNs. Some of the benefits that we get from CDNs are reduced latency. By serving content from locations closer to the user, CDNs significantly reduce latency.

It also adds high availability and scalability. CDNs can handle high traffic loads and are resilient against hardware failures. It also adds improved security because many CDNs offer security features like DDoS protection and traffic encryption.

And the benefits of caching are also reduced latency because we have fast data retrieval since the data is fetched from the nearby cache rather than a remote server. It lowers the server load by reducing the number of requests to the primary data source, decreasing server load. And overall faster load times lead to a better user experience. Now let's talk about proxy servers which act as an intermediary between a client requesting a resource and the server providing that resource.

It can serve various purposes like caching resources for faster access, anonymizing requests and load balancing among multiple servers. Essentially, it receives requests from clients, forwards them to the relevant servers, and then returns the server's response back to the client. There are several types of proxy servers, each serving different purposes.

Here are some of the main types. The first one is forward proxy, which sits in front of clients and is used to send requests to other servers on the internet. It's often used within the internal networks to control internet access.

Next one is reverse proxy which sits in front of one or more web servers, intercepting requests from the internet. It is used for load balancing, web acceleration, and as a security layer. Another type is open proxy which allows any user to connect and utilize the proxy server, often used to anonymize web browsing and bypass content restrictions. We also have transparent proxy types which passes along requests and resources without modifying them.

but it's visible to the client and it's often used for caching and content filtering. Next type is anonymous proxy, which is identifiable as a proxy server, but does not make the original IP address available. This type is used for anonymous browsing. We also have distorting proxies, which provides an incorrect original IP to the destination server.

This is similar to an anonymous proxy, but with purposeful IP misinformation. And next popular type is high anonymity proxy or elite proxy, which makes detecting the proxy use very difficult. These proxies do not send X forwarded for or other identifying headers and they ensure maximum anonymity. The most commonly used proxy servers are forward and reverse proxies. A forward proxy acts as a middle layer between the client and the server.

It sits between the client, which can be a computer on an internal network, and the external servers, which can be websites on the internet. When the client makes a request, it is first sent to the forward proxy, The proxy then evaluates the request and decides based on its configuration and rules whether to allow the request, modify it, or to block it. One of the primary functions of a forward proxy is to hide the client's IP address.

When it forwards the request to the target server, it appears as if the request is coming from the proxy server itself. Let's look at some example use cases of forward proxies. One popular example is Instagram proxies. These are a specific type of forward proxy used to manage multiple Instagram accounts without triggering bans or restrictions.

And marketers and social media managers use Instagram proxies to appear as if they are located in different area or as different users, which allows them to manage multiple accounts, automate tasks or gather data without being flagged for suspicious activity. Next example is Internet use control and monitoring proxies. Some organizations use forward proxies to monitor and control employee internet usage.

They can block access to non-related sites and protect against web-based threats. They can also scan for viruses and malware in incoming content. Next common use case is caching frequently accessed content.

Forward proxies can also cache popular websites or content, reducing bandwidth usage and speeding up access for users within the network. This is especially beneficial in networks where bandwidth is costly or limited. And it can be also used for anonymizing web access.

People who are concerned about privacy can use forward proxies to hide their IP address and other identifying information from websites they visit and making it difficult to track their web browsing activities. On the other hand, a reverse proxy is a type of proxy server that sits in front of one or more web servers, intercepting requests from clients before they reach the servers. While a forward proxy hides the client's identity, a reverse proxy essentially hides the server's identity or the existence of multiple servers behind it. The client interacts only with the reverse proxy and may not know about the servers behind it. It also distributes client requests across multiple servers, balancing load and ensuring no single server becomes overwhelmed.

Reverse proxy can also compress inbound and outbound data, cache files and manage SSL encryption, thereby speeding up load time and reducing server load. Some common use cases of reverse proxies are load balancers. These distribute incoming network traffic across multiple servers, ensuring no single server gets too much load.

And by distributing traffic, we prevent any single server from becoming a bottleneck, and it's maintaining optimal service speed and reliability. CDNs are also a type of reverse proxies. They are a network of servers that deliver cached static content from websites to users based on the geographical location of the user.

They act as reverse proxies by retrieving content from the origin server and caching it so that it's closer to the user for faster delivery. Another example is web application firewalls, which are positioned in front of web applications. They inspect incoming traffic to block hacking attempts and filter out unwanted traffic.

Firewalls also protect the application from common web exploits. And another example is SSL offloading or acceleration. Some reverse proxies handle the encryption and decryption of SSL TLS traffic, offloading that task from web servers to optimize their performance. Load balancers are perhaps the most popular use cases of proxy servers.

They distribute incoming traffic across multiple servers to make sure that no server bears too much load. By spreading the requests effectively, they increase the capacity and reliability of applications. Here are some common strategies and algorithms used in load balancing. First one is round drawbin, which is the simplest form of load balancing, where each server in the pool gets a request in sequential rotating order. When the last server is reached, it loops back to the first one.

This type works well for servers with similar specifications and when the load is uniformly distributable. Next one is list connections algorithm which directs traffic to the server with the fewest active connections. It's ideal for longer tasks or when the server load is not evenly distributed.

Next we have the least response time algorithm, which chooses the server with the lowest response time and fewest active connections. This is effective when the goal is to provide the fastest response to requests. Next algorithm is IP hashing, which determines which server receives the request based on the hash of the client's IP address. This ensures a client consistently connects to the same server, And it's useful for session persistence in applications where it's important that the client consistently connects to the same server. The variants of these methods can also be weighted, which brings us to the weighted algorithms.

For example, in weighted round robin or weighted list connections, servers are assigned weights typically based on their capacity or performance metrics. And the servers which are more capable handle the most requests. This is effective when the servers in the pool have different capabilities, like different CPU or different RAMs.

You also have geographical algorithms, which direct requests to the server geographically closest to the user or based on specific regional requirements. This is useful for global services where latency reduction is priority. And the next common algorithm is consistent hashing, which uses a hash function to distribute data across various nodes. Imagine a hash space that forms a circle, where the end wraps around to the beginning, often referred to as a hash ring. And both the nodes and the data, like keys or stored values, are hashed onto this ring.

This makes sure that the client consistently connects to the same server every time. An essential feature of load balancers is continuous health checking of servers to ensure traffic is only directed to servers that are online and responsive. If a server fails, the load balancer will stop sending traffic to it until it is back online.

And load balancers can be in different forms, including hardware applications, software solutions, and cloud-based services. Some of the popular hardware load balancers are F5 Big IP, which is a widely used hardware load balancer known for its high performance and extensive feature set. It offers local traffic management, global server load balancing, and application security. Another example is Citrix, formerly known as Netscaler, which provides load balancing, content switching, and application acceleration.

Some popular software load balancers are AJProxy, which is a popular open-source software load balancer, and ProxyServer for TCP and HTTP-based applications. And of course NGINX which is often used as a web server, but it also functions as a load balancer and reverse proxy for HTTP and other network protocols. And some popular cloud-based load balancers are AWS's Elastic Load Balancing or Microsoft Azure Load Balancer or Google Cloud's Load Balancer. There are even some virtual load balancers like Winware's Advanced Load Balancer, which offers a software-defined application delivery controller that can be deployed on premises or in the cloud. Now let's see what happens when a load balancer goes down.

When the load balancer goes down, it can impact the whole availability and performance of the application or services it It's basically a single point of failure, and in case it goes down, all of the servers become unavailable for the clients. To avoid or minimize the impact of a load balancer failure, we have several strategies which can be employed. First one is implementing a redundant load balancing by using more than one load balancer, often in pairs, which is a common approach.

If one of them fails, the other one takes over, which is a method known as a failover. Next strategy is to continuously monitor and do health checks of load balancer itself. This can ensure that any issues are detected early and can be addressed before causing significant disruption.

We can also implement auto scaling and self-healing systems. Some modern infrastructures are designed to automatically detect the failure of load balancer and replace it with a new instance without manual intervention. And in some configurations the NSFailover can reroute traffic away from an IP address that is no longer accepting connections, like a failed load balancer, to a pre-configured standby IP, which is our new load balancer.

System design interviews are incomplete without a deep dive into databases. In the next few minutes I'll take you through the database essentials you need to understand to ace that interview. We'll explore the role of databases in system design, sharding and replication techniques, and the key ACID properties. We'll also discuss different types of databases, vertical and horizontal scaling options, and database performance techniques. We have different types of databases, each designed for specific tasks and challenges.

Let's explore them. First type is relational databases. Think of a relational database like a well-organized filling cabinet where all the files are neatly sorted into different drawers and folders. Some popular examples of SQL databases are PostgreSQL, MySQL, and SQLite.

All of the SQL databases use tables for data storage, and they use SQL as a query language. They are great for transactions, complex queries, and integrity. Relational databases are also ACID compliant, meaning they maintain the ACID properties. A stands for atomicity, which means that transactions are all or nothing.

C stands for consistency, which means that after a transaction your database should be in a consistent state. I is isolation, which means that transactions should be independent. And D is for durability, which means that once transaction is committed, the data is there to stay. We also have NoSQL databases, which drop the consistency property from the ACID.

Imagine a NoSQL database as a brainstorming board with sticky notes. You can add or remove nodes in any shape or form, it's flexible. Some popular examples are MongoDB, Cassandra, and Redis. There are different types of NoSQL databases such as key-value pairs like Redis, document-based databases like MongoDB, or graph-based databases like Neo4j. NoSQL databases are schema-less, meaning they don't have foreign keys between tables which link the data together.

They are good for unstructured data, ideal for scalability, quick iteration and simple queries. There are also in-memory databases. This is like having a whiteboard for quick calculations and temporary sketches. It's fast because everything is in memory. Some examples are Redis and Memcache.

They have lightning fast data retrieval and are used primarily for caching and session storage. Now let's see how we can scale databases. The first option is vertical scaling or scale up.

In vertical scaling you improve the performance of your database by enhancing the capabilities of individual server where the data is running. This could involve increasing CPU power, adding more RAM, adding faster or more disk storage, or upgrading the network. But there is a maximum limit to the resources you can add to a single machine and because of that it's very limited.

The next option is horizontal scaling or scale out. which involves adding more machines to the existing pool of resources, rather than upgrading the single unit. Databases that support horizontal scaling distribute data across a cluster of machines.

This could involve database sharding or data replication. The first option is database sharding, which is distributing different portions shards of the dataset across multiple servers. This means you split the data into smaller chunks and distribute it across multiple servers. Some of the sharding strategies include range-based sharding, where you distribute data based on the range of a given key. Directory-based sharding, which is utilizing a lookup service to direct traffic to the correct database.

We also have geographical sharding, which is splitting databases based on geographical locations. And the next horizontal scaling option is data replication. This is keeping copies of data on multiple servers for high availability. We have master-slave replication, which is where you have one master database and several read-only slave databases. Or you can have master-master replication, which is multiple databases that can both read and write.

Scaling your database is one thing, but you also want to access it faster, so let's talk about different performance techniques that can help to access your data faster. The most obvious one is caching. Caching isn't just for web servers, database caching can be done through in-memory databases like Redis. You can use it to cache frequent queries and boost your performance. The next technique is indexing.

Indexes are another way to boost the performance of your database. Creating an index for frequently accessed columns will significantly speed up retrieval times. And the next technique is query optimization.

You can also consider optimizing queries for fast data access. This includes minimizing joins and using tools like SQL Query Analyzer or Explain Plan to understand your query's performance. In all cases, you should remember the CAP theorem, which states that you can only have two of these three, consistency, availability, and partition tolerance.

When designing a system, you should prioritize two of these based on the requirements that you have given in the interview. If you enjoyed this crash course then consider watching my other videos about system design concepts and interviews. See you next time.