• About Ed Boris
  • Contact details
  • Linkedin Profile

Ed Boris

~ Expert in digital transformation

Ed Boris

Monthly Archives: June 2017

British Airways Technology Chaos, really caused by a power surge?

18 Sunday Jun 2017

Posted by Edouard Boris in Business Continuity, Cloud, Innovation, SAAS

≈ Leave a comment

Tags

Capacity Management, Incident Management, Service Management

Most people are questioning the BA’s version on how their entire Information System went down on  May 27th 2017, impacting 75,000 passengers for up to 48 hours and will cost up to £80m.

British Airways states that a human error caused by an engineer who disconnected a power supply cable trigerred major outage due to  a power surge.
The question is how a such outage lasted so long? The “power surge” term is misleading, because most people will think power in terms of electricity  as opposed to Information Ecosystem.
In terms of Service Outage Communication, the challenge is to inform without revealing some embarrassing facts, the challenge is to partially say the truth without lying. In this instance, I must admit that BA is doing a good job.
My theory is that BA’s system crashed as a result of the power outage, but BA’s team did not restart the entire ecosystem in sequence. My assumption is that BA’s system were all restarted simultaneously causing what they have called the “power surge“. The question is whether BA had a datacenter restart runbook, or not, and whether if the required documentation existed, whether it was ever tested.
Complex ecosystems require to restart key Infrastructure components, but following a pre-established sequence. For example, the core storage first, then database cashing infrastructure followed by database systems, this is even more true with architectures based on microservices.
In other words, backend systems should be restarted first followed by frontends. If you do not follow a pre-established sequence, the different components of the ecosystems will randomly resume they operations and start “talking” and expect answers. When a non synchronised datacenter restart is performed,   It is likely to end up with data corruption. Furthermore, as the front-end caching infrastructure is not warm, the backend will crash under the load, preventing the reopening of services.
If this scenario happened at BA, the databases storing flight reservations, flight plans and customer details got corrupted up to a point where it became impossible to resume their operations from the second datacentre, also now partially corrupted as a result of the active-active synchronisation performed in between the two datacenter.

British Airways had then no other options than to restore backups and then replay system logs of unsynchronised systems, and then only resume synchronisations with the second datacenter.

Obviously, this is a much more difficult reality to explain, but I talked to several IT experts and no-one, absolutely nobody is buying the power surge story.
I’m looking forward to hearing from the internal investigation that BA’s chief executive has already launched.
Follow Ed Boris on WordPress.com

Recent posts

  • La vie du Colonel Edmond Robert Lévêque et de Marguerite Lévêque June 10, 2023
  • What most CIOs and CMOs miss when they negotiate their SaaS SLA. January 21, 2021
  • Ethic, Business, Politics and Global Warming September 16, 2018

Tags

agile Airport Chaos Architecture Design Black Friday Business Continuity Business Transformation Capacity Management capacity planning change cloud Incident Management Integrations Linkedin NATS payment PCI Planned Obsolescence post mortem Retail saas security service delivery service design Service Management service strategy Social Social media Software Design

Categories

  • Agile
  • Black Friday 2014
  • Business Continuity
  • Business Ethic
  • Cloud
  • Cyber
  • Data Science
  • Digital Transformation
  • Financing Decision
  • Innovation
  • New Trends
  • Open Compute Project
  • Payment
  • Retail
  • RightSourcing
  • Risk management
  • SAAS
  • Security
  • SmartSourcing
  • Social
  • Talent Management
  • Uncategorized

Archives

  • June 2023
  • January 2021
  • September 2018
  • June 2017
  • March 2017
  • April 2016
  • November 2015
  • January 2015
  • December 2014
  • October 2014
  • July 2014
  • June 2014
  • May 2014
  • March 2014

Blog at WordPress.com.

  • Subscribe Subscribed
    • Ed Boris
    • Already have a WordPress.com account? Log in now.
    • Ed Boris
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar