Tags

As I wrote in a previous post, I noticed in my career that the engineers who are making the biggest contribution aren’t necessarily the ones with the strongest specific software, architecture or mathematical skills. They need to develop a new mindset: critical thinking and problem-solving competencies.

Do not confuse ‘continuous improvement’ initially developed back in 1880 and also known as the suggestion system. http://www.answers.com/topic/employee-suggestion-systems

Toyota still uses it, but this is not a problem-solving approach. The suggestion system requires an individual to make suggestions on whatever improvement they have an idea about, whereas problem-solving requires a team.

Now, do you remember Doctor House with his team writing down the previous patient conditions and symptoms (known previous problem, known previous changes),  do you remember the House’s white board?

If you ever managed complex, huge real time information systems, you know how difficult it is to determine the actual root cause of a problem. You know, how frustrating it can be to confuse a symptom with a cause, and how difficult it is to determine with certainty the sequence of root cause(s)/causes, which have eventually triggered the problem(s).

How does problem-solving requirement translate in real life? An option is to follow the principle of differential diagnostic procedure (DDP) for Incident management.

Putting together a DDP team composed of experts in different technical and non technical fields, setting up the right DDP structure and culture, in particular a no blame policy to ensure all opinions can be expressed (Do not operate like House!). Overtime, the DDP principles and discipline will increase the troubleshooting and analytical skills of the team, it will become a mindset. Ultimately,  DDP will reduce the MTTR i.e Mid Time to Restore/Resolve.

DDP team members will vary from phase to phase but members could include Application Development, Operation Center, Performance team, Infrastructure and business function (e.g. customer facing team).

Typical script used during a DDF will be in 4 steps:

Step 1: Information gathering. The chair person will gather information on the impacted service such as timing of occurrence, symptoms of the issue, logs, and list of recent changes.

Step 2: Candidate conditions (Listing of potential root cause).

Step 3: Sequencing. The candidate conditions will be sorted by the most likely causes. Specific analysis will be requested for the most likely causes in order to rule in or out the candidate condition.

 Step 4: Fixing. Candidate condition should be mitigated and proper monitoring implemented for measuring effectiveness of the solution.

Each of the step analysis, options, solutions, and validation should be discussed by the DDF for cross challenge/validation of assumptions and solutions.