Description
Database sharding is a process of storing a large database across multiple machines. Because a single machine can only hold and process so much data, eventually some systems will scale beyond the ability of a single machine to handle data. Further, as systems scale, they may also need to split data between machines due to security and location considerations. Database sharding overcomes these problems by splitting the system into smaller chunks, allowing work to either be done in parallel, or only in the locations with the relevant data.
Obviously, it matters a lot how you split up your data. For instance, it’s unlikely that splitting a customer table based on the customer last name will be as helpful in a large distributed system as it would be to split up customers by location. You probably also want to have shards that are roughly the same size. The idea behind sharding is to improve performance, specifically via parallelization, but it’s also helpful if it also provides some resilience to outages. So that will also need to be a consideration when you start thinking about sharding.
Database sharding can be a very useful tool for making your application more resilient to load. However, it’s complex and you really need to think through it carefully if you are considering using it in your environment. There are several different ways to do it, with different advantages and disadvantages, and these will need to be thoroughly considered before starting. Plus, sharding is actually a fairly drastic operation, requiring support and extra work for the remaining lifetime of your application. This means that you shouldn’t really consider it until most other options have been exhausted.
Links
Join Us On Patreon
Level Up Financial Planning
Podcasting has definitely been a journey for both of us. When we started BJ wasn’t even a developer and Will was working for himself. Now 8 years later BJ is leading a team of developers and Will is back working for himself. It has been an amazing journey with you all this past years. We have...
Published 07/20/23
Simple systems fail simply. Complex systems also fail simply, but their interconnectedness with other systems makes mitigating failures much more complex. Past a certain level of complexity, system failures are an emergent property of the system – that is, the set of system parts has a set of...
Published 07/13/23