Introduction
If you are a data scientist, ML engineer or a backend developer who likes to build models and deploying them as a service, learning the basic of designing scalable systems will help you design better systems. I feel the knowledge of designing scalable machine learning systems is not enough among the data community.
Designing a system is a complex task. There are many concepts, technologies and patterns which we can use. At the core, it requires knowing the trade-off between the choices you are going to make.
Basic idea is to start small and gradually build a complex system. In this tutorial, I’ll share my structured plan on how to go about designing ML systems in your companies.
Table of Contents
- Must answer key questions
- Functional and Non Functional Requirements
- High Availability
- Scalability
- Durability
- Consistency
- Maintainability
- Security
It is important to understand the business before thinking of which technologies we are going to start with:
Question | Explanation |
Who are the Users? | Here we want to understand how the users are going to interact with our service, what steps are they going to perform to complete an action. |
What is going to be the scale of the system? | Here we mainly want to understand the volume of request we should expect. Also, to be prepared in case there could be unexpected spikes in traffic at certain time of the day. |
What is the expected performance? | Here we want to clarify about Service Level Agreement (SLA) (eg. p99 latency) to achieve. Also, answering this questions helps in making decision about which technology should be used. For example: if you want to show video recommendations to users based on their real time interaction, then using batch prediction process wouldn’t be a great idea, we would be better off using stream processing. |
How much cost will it incur? | Here we want to understand the financial implications of the decision we make related to technology, tools we are going to use to build the service. |
Next step is to define the requirements under two categories:
- Functional Requirements
- Non-Functional Requirements
Functional Requirements
Talks about behavior of the system. Think in terms of how the user is going to interact with the system, what actions is she going to perform. For example, a functional requirement while designing instagram could be that a user should be able to comment on posts or upload pictures.
When listing down functional requirements, always think of the customer first and work backwards. This helps us think about the features/endpoints/resources we would need to implement in our code.
Deriving functional requirements for system which aren’t user facing (specially backend ML systems like fraud detection, recommendation) can be tricky. In such cases, the functional requirement boils down to some API calls which the system needs to respond to.
Non-Functional Requirements
Talks about qualities of the system such as:
Explaining each of these topics would require a separate blog in itself. But, for now lets understand each one in brief:
1. High Availability
The definition of availability can be multiple. It could be either based on:
Count Based Availability | Time Based Availability |
Refers to the success ratio of the requests. For example: 99% means the system didn’t respond for 1/100 requests. | Refers to the percentage of time the system has been working and responsive. For example: 99% means the system was down for 3.65 days a year i.e. 1/100 request fails. |
We can ensure high availability using methods such as: eliminating single point of failure by deploying the service across multiple regions and availability zone can ensure, if one zone/region goes down, there are others which can actively respond to requests.
2. Scalability
Refers to the ability of the system to handle existing/growing load of users without slowing down. Here, load is defined as requests per second or volume of data transferred or number of concurrent active connections.
There are two ways to scale systems: Horizontal and Vertical Scaling.
We implement horizontal scaling by adding more machines to the system. It is always preferred over vertical scaling since we can always add more and more servers to handle the load. However, having multiple machines does introduce some level of complexity to the system. For example: a horizontally scaled system needs to handle service discovery, load balancing, request routing etc effectively.
We implement vertical scaling by increasing the resources (cpu, memory, disk etc.) on a single machine. It is easier to implement. Smaller companies, new startups usually start with this approach since their load is limited and expected to grow at a relatively slow pace.
3. Durability
Refers to the ability of the system to ensure the data stored is not lost. For example: when a user uploads a video on facebook, the system must ensure that the video is not lost or corrupted and displayed to the user as expected.
We achieve durability by maintaining several copies of data. Introducing this sort of data redundancy ensures that the system has eliminated a single point of failure in data storage.
4. Consistency
There are several acronyms which mention consistency as one of the factor of system performance such as ACID, BASE, CAP, Tunable consistency. But they have different meaning.
ACID acronym is quite popular among engineers working closely with databases. C in ACID stands for strong consistency which refers to ensuring the database constraints are not violated. That means, say, a user made a transaction of $100. The transaction must not take place if the user doesn’t have sufficient balance.
But among BASE, CAP, Tunable consistency, consistency refers to maintaining same copies of data across all replicas (of the database). We can largely categorize consistency into:
- Strong consistency: Common use cases are banking transactions, wallet transactions, online payments, booking systems etc.
- Eventual consistency: Usually works in places where data inconsistency for a tiny time period is not a big problem. For example: Lets say you have deleted your post on instagram but your friend can still see it. After some minutes, your friend confirms that she also can’t see your post. Here, the system became consistent “eventually”.
- Weak consistency: This happens when the system shows different values of an object to different users. For example: Say your friend has sent you multiple messages on Whatsapp. But you either don’t see all of those messages or you see them out of conversational order.
5. Maintainability
Maintainability is a quite long process taken to keep a system alive and running. After we launch a system, maintaining it to ensure the performance isn’t getting degraded over time is critical. Some of the aspects that we should think of while maintaining a system are:
- Failure handling
- Setup a process to handle service failures and network partitions
- Monitoring
- Setup a process to monitor the health of the system, focus on key metrics which better show the health of the system
- Testing
- Increase test coverage. Ensure testing of individual components and end to end data flows.
- Deployment
- We should aim for automatic deployments and rollback (when needed) to ensure deployments are done safely with less human involvement
6. Security
Security is the last fundamental block of building scalable systems. As an ML engineer/Data Scientist, you probably don’t need to worry about this in case your service is a backend service and doesn’t need to access internet.
To ensure that a system is secure, we should look at the following:
- Service access
- Here we should ensure non authorized users shouldn’t get access to the system. Using identity and permission management service usually take care of the authorization process.
- Service Protection
- Here we should worry about how to protect our infrastructure from common attacks like DDoS, SQL Injection, cross site scripting etc. Cloudfare is the most popular and reliable security solution available.
- Data Protection
- Here we should establish safe guards to protect the data available in the databases or during transfer. Safe guard the data from corruption or stolen during transit.
Summary
In this blog, we discussed various concepts used in designing scalable systems. Be it consistency, scalability or availability, all these characteristics of a system depend on the questions we discussed at the beginning of the blog. Answering those questions will help not only chose which technology to start with, but also protect you from surprises later.
Having worked as a data scientist/ML engineer myself, I can’t emphasize enough the skill to build scalable data systems is quite helpful. Specially, when dealing with chatGPT, LLM based models which struggle with latency issues quite frequently, being mindful of the concepts shared in this blog will surely help.
In case you have questions/suggestions, feel free to write them below in the comments section. Stay tuned for more posts:)
You have covered a lot of topics in this nice blog. Keep it up!