Latency is the time taken by data to reach from one location on a network to another. As an example, if server A in Delhi sends some data to server B in Mumbai, the time it takes for the data to reach from Delhi to Mumbai will be the latency of the system.
Latency is usually measured between a client (Web, Mobile App, or IoT Device) and the server. It helps developers to understand how quick a client will receive the response.
There are several reasons why latency occurs. The major cause is the distance, especially between the client that is making the requests and the server that is serving the requests. As an example, suppose a backend system is deployed on a server present in the Mumbai location. If there is a client present in Pune that is roughly 150 kms away, the latency would be something like 5-10 ms. But if there is a client present in Delhi, that is roughly 1400 kms away, the latency may go up to 30-40 ms.
Though 30-40 ms numbers seem to be very small, while loading a website, there are 100s of back-and-forth request and response cycles that take place, which may seem like a delay. In addition to the distance, the size of the request, internet speed, problems with cables, etc., may also introduce latency. There is a term called round trip time (RTT). If a request takes 10 ms from the client to reach the server, and 10 ms from the server to the client, the RTT will be 20 ms, which is double the latency.
On the internet, data traverses through a network and there can be multiple network nodes in place before it reaches the server and back to the client. These nodes can be gateways, proxy servers, etc., and each additional call introduces more delays and adds to the overall latency.
There are several methods of how latencies can be reduced. The most common approach is to use a CDN (Content Delivery Network). A CDN caches static data like images, static HTML pages, etc., that is requested frequently, and there are several CDN servers available that are closer to the client. As an example, if a client is in Delhi and the server is in Mumbai, we can set up a CDN server in Delhi that will cache the most common requests. So whenever the client performs the same request, instead of the request reaching the Mumbai server, it will reach the Delhi CDN server and get instantly served from the cache, thus reducing latency from 40 ms to 10 ms.
We will discuss in detail about CDN and caching in upcoming chapters.
Throughput can be defined as the volume of data that a system can process in a specific timeframe. Throughput is a very important performance indicator for benchmarking several computing systems. Throughput can be:
Systems can be designed for high throughput or low throughput at the cost of consistency and durability. We always need to achieve the right balance based on the requirements.
Throughput is a little different from bandwidth. Throughput is the measure of volume per given timeframe. Bandwidth is the maximum amount of data that can be transferred in a given amount of time.
For example, your internet provider may provide you with a maximum of 100 Mbps internet connection. This is the bandwidth, meaning you cannot transfer data greater than 100 Mbps. However, due to traffic congestion, latency, or other factors, you are able to transfer a maximum of 75 Mbps. This will be your throughput.
In the next chapter, we will discuss availability and consistency.