I. Why scale?
When we talk about system designs. We might talk much about multiple dimensions such as availability, scalability, latency, throughput, performance etc. Scalability sometimes scares us. Why do we have to scale the system? How do we do that? Well, in order to maintain the availability we have to scale it because the system is not strong enough at peak point. The massive throughputs. We need more workers, more powers to handle requests.
II. Workload Distribution
The core of scaling is the power of physical devices. No matter what we do with softwares, they are inside the physical devices. We have to do the distribution to fix the softwares with devices. We already know that there are two dimensions of scaling which are vertical and horizontal scale. In deed, we talk about scaling, we talk about horizontal scale. Because the physical device has limitation. But the number of devices we can create how many that we want as well as the cloud service allow us to do.
2.1 Low level scale
Getting back to the core of scaling, the power of physical devices. We do the softwares somehow use efficiently resources but not exceeded the limitation of devices. How? We do the distribution. There are two places for doing that. The inside individual device and among devices. Let’s say the device here is a server, an EC2. This EC2 is for background jobs processing. We have to manage somehow the workload is fit with the power of this server any period of time. How? Talking about distribution, we can’t forget the queue.
By this way, we limit only N message to process, make sure that the workload in its control. If we are doing the IoT system for enterprise. We have one hundreds IoT devices need to push the data to our cloud every minute. How do we do that at scale? Well, we have a few questions to clarify this case.
- Keep the record timestamp and sending.
- 100 records from 100 devices every minute.
- Prevent data lost, can’t use REST.
Well, we know that the IoT has its own protocol MQTT which is standard protocol for IoT. We simply call it message protocol. We can see it in Rabbitmq, AWS SQS, EventBridge and Kafka. No matter which one we implement. The main idea is the distribution of workload. They use message queue and let us decide how many messages that we want to process by pub/sub mechanism. This is the low level of scaling.
2.2 High level scale
Now, we talk about high level of scaling. Well, we already had good tools to control the workload of this server by distributing the proper number of messages to process. However, the system has large number of messages in queue, waiting for being processed. Time by time, they become massive. How? It’s like we have a restaurant. Now we have a lot of customers. We actually need to hire more workers which means creating more servers. Eventually, we make this flow automatically. The system decides itself how and when to do the scaling by monitoring.
That’s it. Scaling is scaling. Distributing workload and hiring more workers.