AWS and The Rush to the Cloud | Axess Systems

Can't find what you're looking for?

Get in touch with our expert team who will point you in the right direction.

Axess YouTube Axess LinkedIn
Back Arrow Back to all

How the AWS Outage Exposed the Problems with the Rush to the Cloud

6 minute read

pexels-cookiecutter-1148820

You don’t need to be a tech expert to know about the recent AWS outage. Hundreds of major sites were affected – from Snapchat to HMRC – but the impact extended far beyond household names.

Outages have happened before, and likely will again. But the scale of this one has exposed how much overreliance on singular cloud platforms is hurting companies.

It makes sense to cast a critical eye back at this rush to the cloud – and how companies can better shore up their defences against another future outage.

The Rush to the Cloud: What’s Behind It?

If the last few years have taught us anything, it’s that the world of work is constantly changing. Every company wants to keep up with that change – but in trying to do so, many jumped to then-new cloud technology before they were ready.

This was especially the case at the height of the COVID-19 pandemic, where restrictions meant a more intense interest in the cloud as a result of remote working.

The downside was that many companies couldn’t take the time to assess if moving to the cloud was right for them in the long term. As a result, these organisations were left with what Brian Kirsch calls a “cloud hangover” – with touted benefits being outweighed by real-life problems.

4 Lessons on Risk and Resilience from the AWS Outage

As we mentioned, no company can protect itself from every cloud outage that will happen. But there are plenty of lessons you can take forward from what happened with AWS.

1. Incident response planning should be a priority

Incident response planning is a bit like healthy eating – we all know we should do it, but actually putting the pieces together is more difficult!

The rarity of outages makes it easy to fall into a mindset of “it’ll never happen to us”. But the AWS outage showed that this is, at best, indifference – and at worst, ignorance.

The first test of your Incident response shouldn’t be in real time. Sit down with your IT team and methodically talk through what your company would do if your chosen cloud platform goes down, from start to finish. That way, you won’t be left floundering when an outage happens again.

2. Diversification is essential

You’ve heard the phrase “don’t put all your eggs in one basket”. It’s exactly the same when it comes to IT. Relying on a single cloud provider creates a single point of failure – meaning that if you’re only on one platform, you’re constantly running the risk of your entire operation going down.

Hybrid and multicloud systems obviously guard against this, spreading risk across different vendors, but they also allow for a more nuanced approach to the cloud in general. You can choose what workloads are best suited for the cloud, rather than moving everything at once.

3. The cost vs resilience trade-off

Of course, we can’t ignore the fact that many companies choose a single cloud provider for a single reason – cost. However, these initial savings are often casualties of the cloud hangover.

CyberCube has estimated that losses for the AWS outage will range from £28.5m to £436m – not including secondary effects like reputational damage.

Resilience investments – like in a multicloud or hybrid setup – may involve more initial outlay, but they offer you more protection against the impact of major outages. In effect, they pay for themselves.

4. Big names aren’t infallible

AWS is one of the Big Three hyperscalers – but its size means businesses sometimes assume uptime is guaranteed.

In reality, this isn’t the case. It’s yet another example of why you should always read the small print. When you look at your cloud provider’s Service Level Agreement (SLA), interrogate the detail – if a cloud provider promises “99.9% uptime”, see what that means in hours per year.

Also, align the SLAs with what your business needs. Say your business is a critical one – like a hospital or a bank. In this case, you might not be able to afford more than 30 minutes of downtime – and any SLA you agree to should reflect that.

Case Study: How Netflix Weathered the AWS Outage

Though it didn’t entirely escape the outage, Netflix came out of it better than a lot of its contemporaries. The reasons behind this are the lessons above put into action.

Firstly, its infrastructure was built to be resilient. Instead of running from one location or system, they run in multiple regions – allowing for real-time failover.

Netflix are also masters of preparedness. They simulate outages regularly, identifying weaknesses in their architecture using software specific to this form of chaos engineering.

The scale of Netflix isn’t the important thing here. Companies of any size can learn from them about the value of diversification and forward planning.

The cloud isn’t inherently bad – and outages as wide-ranging as the one with AWS are still rare. However, blind adoption without planning is the bigger danger, opening up your system to vulnerabilities that affect the real people your business is meant to help.


With Citrix Universal Licensing, you can easily use and manage your Citrix products across hybrid, cloud, and on-premise environments – cutting costs and making you more adaptable.

To find out more, get in touch with us today or book a free consultation.

Stay in the Know!

Sign up for emails to get the latest news and events from Axess Systems.