Mind Your Nines

Or Shoshani

October 6, 2020

min. read

TL;DR

In today's world, nearly every business is an internet business and is dependent on code. And guess what? If something (i.e., your business) is dependent on code, downtime can have a direct and devastating impact As soon as your service degrades or crashes, you start losing money--real money. To illustrate, in 2017, companies lost an average of $100,000 for every hour of downtime on their site. So a single outage can cost a company millions of dollars.

And it’s not just money, You might find the brand reputation you worked so hard to establish going down the drain. Consider the customer who reaches a 404 page instead of their target page; You know what happens next? They leave your site--and they may never come back.

‍Focus on zero downtime (Availability)
To help put this into tangible terms, let’s talk about Availability. This is the ability of the user community to access the system when needed . Availability is measured in terms of nines (9s), which express system uptime as a percentage of total system time.

100% system uptime, like a perfect circle, is more of a theoretical concept than a physical reality in system architecture. In the real world, no service is perfect. Most services fall somewhere between 99% and 100% uptime. And while it sounds perfectly acceptable, Availability at 99% actually represents more than seven hours of downtime per month--which is not okay, if you want to retain your customers, that is. Let’s take a closer look at what these nines mean:

Five-nines, or 99.999% availability, means 5 minutes, 15 seconds or less of downtime per year.
Four-nines, or 99.99% availability, means 52 minutes, 36 seconds of downtime per year.
Three-nines, or 99.9% availability, means 8 hours, 46 minutes of downtime per year.
Two-nines, or 99% availability, means 3 days, 15 hours, and 40 minutes of downtime per year.
One-nine, or 9% availability, means over 332 days of downtime per year.

Most cloud vendors offer some type of Service Level Agreement around availability. Amazon, Google, and Microsoft set their cloud SLAs at 99.9% . Hitting 99.9% availability is decent , at just under eight hours of downtime a year. But consider how many people rely on web tools to run their lives and businesses--A whole lot can go wrong in those eight hours of downtime a year.

What Do Your Customers Think?

As we move these systems to public or private clouds, users rightly expect and demand no downtime at all. This is a difficult task to accomplish, given that cloud computing platforms are relatively new, and yet they are marketed as being more reliable and more scalable than traditional systems in enterprise data centers.

Ultimately, your end-users and customers really only care if your site is available when they need to use it. It isn’t the nines that matter so much as the end-user experience.‍

Be Proactive About Your Nines

Today the game isn’t about uptime anymore; it’s about how quickly you can fix complex systems when the inevitable occurs. Customers don’t care how many nines of uptime your system has if they can’t get the information because of cloud-internal failure. Preventing a negative impact to your customer is essential to providing a good customer experience.

Modern software architectures are more distributed than ever. At first glance, distributed services would seem to be more reliable, and in many ways, this is the truth. But they’re also incredibly complex, usually mixed together by third-party services running (mixed bag).

What’s more, organizations have to deal with networks. With so many interconnected infrastructure elements in place, networks include countless points of failure and require adherence to incredibly strict standards for management, maintenance, and resiliency to offer high levels of availability.

Consider this issue in your cloud infrastructure configurations. For a service to be available, the DevOps team must maintain uptime among VMs, databases, microservice orchestrators, virtual storage machines, internal network elements (load balancers, firewalls) among other elements. Maintaining the “nines” reliably means ensuring high availability across all of these interdependent parts of the configuration.

A proactive operations approach prevents failures from ever happening in the first place. When you adopt a proactive approach, you can solve problems before customers are affected and you can effectively identify the weaknesses in your system before they can be exploited. At Lightlytics, we help DevOps teams to automatically predict, pre-empt, and prevent failures, downtime and ]business disruptions caused by infrastructure development or updates by simulating all possible dependencies and impacts on operations before deployment. We’re helping businesses create amazing customer experiences by proactively ensuring your nines are at their most optimal state at all times. With Lightlytics, you can ensure reliable and scalable infrastructure--and score top marks with your customers as a result.

At Lightlytics we help DevOps teams to automatically predict, pre-empt, and prevent failures, downtime, or business disruptions caused by infrastructure development or updates by simulating all possible dependencies and impacts on operations before deployment.

Want to learn more about our solution?

Contact us

‍