AWS Outage

Amazon AWS Outage

Amazon Web Services (AWS) is the market leader in cloud computing service providers. The reputation of AWS is not subjected to any type of ambiguity because of its sturdy network of data centers and an extensive assortment of products and services. However, AWS also has its own share of flaws, and one of them came into public attention very recently i.e. Amazon AWS Outage.

According to news reports, AWS experienced an outage in one of its data centers in North Virginia on August 31, 2019.

The failure of utility power was the reason for this outage. But the actual damage was not just in the form of downtime for AWS services. So, what actually happened? Let us explore the news further to get a comprehensive impression of the influence of AWS outage.

Know about the AWS Outage Incident

Power outage should not be a huge issue for AWS as it appears on paper. With the assurance of backup generators at each datacenter, AWS assures customers about the safety of their data. But the contrary happened on August 31, 2019, when the Amazon US-EAST-1 datacenter in North Virginia experienced a power failure at 4:33 AM. The backup generators of the datacenter did not work and showed signs of failing at around 6:00 AM. As a result, almost 7.5% of EC2 instances, as well as EBS volumes, were not available.

According to Amazon’s summary report of the incident, power at the data center was fully restored at 7:45 AM.

Within almost three hours of fully restoring the datacenter, Amazon stated recovery of only 1% of the instances by 10:45 AM. Amazon also reported that certain EC2 instances and EBS volumes were hosted on hardware that was affected by the power failure. In addition, Amazon also stated that it was investing efforts in recovering all the affected EC2 instances and EBS volumes. However, the actual twist in the tale arrives here!

Security concerns are #1 barrier to cloud projects, so Cloud Security becomes important. Read this Beginner’s Guide to Cloud Security and learn the basics.

Unraveling the Actual Problem

One of the AWS consumers affected by the power outage on August 31, 2019, and an author/programmer, Andy Hunt, brought some interesting observations to the front. Andy published a tweet on September 3. 2019, four days after the incident, stating the risks of AWS data storage. Hunt requested status updates constantly from Amazon following the power outage. However, Hunt stated that it was a very exhausting experience for him, and he did not receive any useful updates.

Reportedly, Amazon sent the message stating that their engineers were investigating the affected instances. Amazon also stated that it might take some time for the engineers to investigate the issues and affected instances.

On the other hand, Amazon also informed Hunt about the lack of a specific ETA. Amazon suggested that Amazon cannot provide any information until the engineers finished their investigation. The final piece of this interesting yet appalling story came on September 3, 2019, when Hunt received an update from Amazon.

The update stated that all the EBS servers underlying the affected volumes could not be recovered. The update also added that various attempts at recovering the volumes failed and so the data was considered unrecoverable. The good thing for Hunt in this whole fiasco was that he had working backups of his data. However, many consumers who trusted the advertisements of redundancy and durability in EBS could have faced larger problems.

Learn about Amazon KMS AWS KMS Key Management Service.

  • Reddit also Bore the Brunt

One of the notable names that faced the negative impacts of this AWS outage is Reddit. Reddit users faced problems in accessing the website on August 31, 2019. Many users found a 503 error that is an HTTP status code indicating unavailability of a website’s server for connections. Even though Reddit attributed the error to ‘an elevated level of errors,’ it ultimately found out the source of the problem to be with its hosting provider, i.e., AWS.

Reddit had to experience negative impacts on seven components of Reddit. The components were desktop web, mobile web, native mobile apps, vote processing, comment processing, spam processing, and modmail.

What to Learn from This Incident?

The notable takeaway from such a huge mistake on behalf of AWS is the need to focus on data security. AWS brands its cloud resources and services based on certain aspects. Cost-effectiveness, scalability and security are three prominent criteria that you would find in almost all AWS advertisements. For example, the AWS Backup and Restore service advertises the facility for recovering all data types with AWS.

However, that did not happen in reality on August 31, 2019. Many consumers are less likely to have kept any working backups of their data hosted on AWS cloud. Why? When the cloud service platform assures you of comprehensive data backup and restoration features, it is hard not to take their claims seriously.

On the other hand, the actual picture is completely different than your assumptions. For instance, the advertisement for Amazon EBS states 99.999% availability and an annual failure rate of 0.1% to 0.2%. Seems quite appealing, doesn’t it? In contrast to this claim, Amazon also has some clauses that preserve it from any responsibilities for data loss. Obviously, you can avail credits for loss of service availability but what to use if you lose significant data!

According to Amazon’s terms and conditions for using Amazon EC2, you have to provide an agreement for termination or replacement of EC2 resources due to retirement, failure, or other AWS requirements. In addition, Amazon also claims that they are not liable for any damages, liabilities, or losses such as corruption, deletion or loss of data, applications, or profits. So, this implies that you cannot claim any sort of compensation from Amazon upon losing data due to unprecedented termination or failure of AWS systems.

Also Read: Route 53 Pricing

Is Your Organizational Data Really Safe on Cloud?

A detailed observation of the information associated with the recent AWS outage on August 31, 2019, creates many doubts regarding the integrity of data on the cloud. If AWS consumers like Hunt lost their data after a power failure at an Amazon data center, you could also be the next in line. The loss of significant data could be the cause of the downfall of a business or even lead to critical circumstances.

One fact is clear that downtime imposes negative outcomes for a business. However, it can be a short-term phenomenon, and your services can be back online. On the other hand, if you lose data, then it is practically impossible to recover the data. So, the AWS outage definitely calls for better strategies to ensure data backup rather than depending on AWS solely.

Here is the big picture of security concerns associated with Cloud Services. Get aware of these cloud security risks and keep your data safe in the cloud.

So, What’s the Next Course of Action to Secure Your Data?

The news of Amazon experiencing AWS outage is not new if you have been thinking so. AWS has been through many major outages in the past. In June 2016, AWS data centers were affected by a storm in Sydney. As a result, many prime websites and online businesses were down for almost 10 hours on the weekend. So, AWS outages are bound to happen if the past series of events is considered. But we have to ensure appropriate backups for our data hosted on AWS rather than just depending on AWS for it.

What is the most promising solution in this case?

The answer is – secondary backup!

Yes, you need a secondary backup strategy by your side. No one can blame hardware failure for the loss of business. So, a smart decision by opting for the services of a secondary backup provider could go a long way in safeguarding your precious data. Most important of all, your backup provider should be completely independent of your cloud data hosting provider. Safety from all sides never hurts, does it?           

About Pavan Gumaste

Pavan Rao is a programmer / Developer by Profession and Cloud Computing Professional by choice with in-depth knowledge in AWS, Azure, Google Cloud Platform. He helps the organisation figure out what to build, ensure successful delivery, and incorporate user learning to improve the strategy and product further.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top