Tags
You probably know by now that Facebook went down this morning for 50mn.
Some hackers claimed to have caused it when FB actually reported that they introduced a change in their configuration management system.
As I wrote previously, and tested with so many interviewees:
‘What is the first cause of incident in the industry?”
Forget people, software, hardware, your grandmother, the first cause of incident is CHANGE. I’m sure you have heard of the idiom “If it ain’t broken, don’t fix it.
So FB introduced a change in their configuration management system, which triggered an outage for billions of people. You can work out 1 billion x 50mn = time recovered for people to actually socializing with humans that they could see! Great news.
In recent years FB have pushed an initiative called “Facebook’s Open Compute Project”, designed to drive standardization and automation right through the datacenter).
It is very surprising, despite the resiliency and multiple datacenters, that one single change was able to take down such service during 50mn.