Preempting System Issues
Listen now
Description
Simple systems fail simply. Complex systems also fail simply, but their interconnectedness with other systems makes mitigating failures much more complex. Past a certain level of complexity, system failures are an emergent property of the system – that is, the set of system parts has a set of failure cases that the individual parts do not have by themselves. This means that it is more difficult to predict what can go wrong with a system. At some level, prediction is nearly impossible. However, you can predict many of the things that are likely to cause problems, simply by engaging in a few fairly simple thought exercises, you can greatly reduce the number of unexpected problems that your system encounters. While it can be tempting to wait until a problem occurs to try to mitigate it, this is unwise in a production system that other people are dependent on. A system failure usually costs money at a minimum, and the problems can be far more severe than that. As a result, it’s common for software services to include a Service Level Agreement or SLA, that dictates expectations about the frequency of system outages, response times, and time expected to complete work. Even if your system is engineered so that it doesn’t completely fall over when a problem occurs, it can still violate an SLA and cost money. The consumers of your application probably have their own clients who have their own expectations. SLAs tend to bleed inward from clients to the services that they use and then to the services that those services use. In contrast to SLAs, systemic problems, including both errors and latency tend to bleed outward from one service to its clients and then to the clients of that service. As a result, when you are thinking about how to find potential systemic problems, it’s often best to think of these problems from two different angles. That is, you need to consider how errors and latency will bleed out as a result of a problem, while also considering how SLAs bleed in to put more stringent expectations on your system than you might expect. In effect, you are dealing with a balance between tolerance for errors and difficulty in error mitigation. Depending on how critical your system is to your clients, these expectations will vary. You can’t prevent every problem in a system, but you can usually prevent a large percentage of them by planning ahead. However, until you’ve encountered enough unexpected problems, it can be difficult to envision how something can go wrong, or even have a realistic thought process for thinking about what can go wrong. However, if you go through the thought exercises we’ve outlined here, then you have a good chance of preventing most of the problems that will plague a complicated application. While this doesn’t fix everything, it can give you enough breathing room to fix the truly unusual problems that you’ll occasionally encounter. Links Join Us On Patreon Level Up Financial Planning
More Episodes
Podcasting has definitely been a journey for both of us. When we started BJ wasn’t even a developer and Will was working for himself. Now 8 years later BJ is leading a team of developers and Will is back working for himself. It has been an amazing journey with you all this past years. We have...
Published 07/20/23
Published 07/20/23
Feedback is any information, observation, or even opinion about the performance or behavior of another individual our group. It can be formal as in performance or peer evaluations or informal such as with mentoring a junior developer. It is a form of communication designed to provide guidance...
Published 07/06/23