Preparing for a Root Cause Analysis
By Michael W. Blanchard, CRE, PE, Life Cycle Engineering
Introduction
The effectiveness of a root cause investigation is predicated on several elements, but the time spent preparing for the subsequent analysis is the most important. A thorough preliminary investigation, identifying the right team members, and anticipating problems at the analysis meeting could mean the difference between a highly reliable asset and recurring failures. To drive this point home, consider the analogy of assembling a puzzle. You would start with a box of puzzle pieces and then proceed with placing the pieces together until complete. An experienced puzzle builder can develop many tricks and techniques to complete the puzzle efficiently, but someone who has never built a puzzle will likely struggle. However, even the most skilled puzzle builders cannot complete the puzzle if pieces are missing or the facilities don’t accommodate this activity. The same stands true for a root cause analysis (RCA); the team cannot complete the analysis if critical evidence is missing, team members are absent, or the meeting facilities are dysfunctional. I’ve participated in several RCAs where we were unable to complete the analysis because either evidence was unavailable or key personnel were not present. I’ve also participated in analysis meetings where the facilitator was unable to follow the agenda because of inadequate equipment or facilities. We’ll discuss practical ways to anticipate these problems and ways to contain their negative impact on RCA effectiveness.
Collecting Evidence - Strike While the Fire is Hot
Although every investigation is unique, there is critical evidence that must be collected, some with a sense of urgency. You need to “strike while the fire is hot”. Unplanned events sometimes occur on weekends and at night when most support staff are off site. You must make yourself available for call-outs or train operators and maintenance craft to collect time-sensitive evidence. Eyewitness testimonies, failed parts, process data stored in short-term memory, and environmental conditions at the time of the incident may be lost forever if they’re not gathered in a timely manner.
These actions will help you collect the necessary evidence:
- Take pictures of the asset before, during and after repairs are complete.
- Evaluate failed components and send them to applicable subject matter experts (SMEs) for analysis.
- Mine your computerized maintenance management system (CMMS) for equipment history of repair, preventive and predictive maintenance.
- Search the equipment library for installation documentation, operational and maintenance manuals, drawings, and records of the asset life cycle.
- Review operational logbooks, either electronic or hard copy, for additional details of long-term and short-term history.
- Include standard operating procedures for the asset and ancillary equipment in the evidence package.
- Record a snapshot of process control screens that reveal the failure through key parameters. This will be the basis for developing the timeline of events, a prerequisite for the subsequent analysis.
The RCA Team – Identifying Key Players
The effectiveness of the RCA in mitigating or eliminating unplanned events also depends on having the right roles present at the analysis. Too many people at the RCA may pose a problem, but the absence of key players will likely result in a stalled or ineffective analysis. The RCA team should consist of a trained and unbiased facilitator, those directly involved in the incident, equipment and process specialists, operators and maintenance craft, maintenance and reliability engineer, and possibly an environmental and safety specialist. Additional team members may be named based on the data and evidence collected. The process owner should be included as an ad hoc member to gain support in the solutions and implementation phases. The process owner will likely possess expertise, historic perspective or knowledge of the specific event. The report generated from experts analyzing failed components also requires interpretation. This may be done by the author of the report or local SMEs. Communicate the importance of attending team meetings to each team member. Building an RCA team consisting of the right people is critical to the outcome of the analysis.
RCA Meeting Logistics – Prepare for the Unexpected
It would be unfortunate, to say the least, to invest precious time and resources preparing a thorough preliminary investigation and forming an RCA team only to have the analysis fail because of poor planning of meeting logistics. I’m a firm believer in Murphy’s Law: what can go wrong will go wrong.
Here are some tips to ensure RCA team attendance and effective use of meeting time:
- Develop and stick to an agenda that doesn’t last more than two hours.
- Reserve a centrally located meeting room with sufficient seating and IT equipment. Team members and ad-hoc resources off-site may need to attend the meeting remotely so provide conference call numbers and on-line meeting information.
- Schedule the RCA at a time when attendees are free or flexible. Team members on shift may need to come in on overtime to accommodate the majority of schedules and others may need to reschedule lower priority meetings.
- Send an email one to two days prior to the RCA emphasizing the importance of attending the meeting.
- If a follow-up meeting is required, schedule it at the end of this meeting.
Taking the time to develop contingencies for “what could go wrong” at the analysis meeting will help ensure efficient use of everyone’s time and create an environment for success.
Summary
Reliable organizations make it a part of their daily work to prepare for the unexpected. The same holds true for root cause investigations. The analysis is unlikely to produce solutions capable of preventing event recurrence if preparation isn’t given due diligence. Thorough preparation for the unexpected will set the stage for a successful analysis phase. The practices detailed in this article cover basic problems that I’ve personally experienced. You will need to assess your own investigation to ensure that you collect sufficient evidence, form the right team, and properly plan meeting logistics and format.
Michael Blanchard is a Reliability Engineering Subject Matter Expert with Life Cycle Engineering (LCE). He has more than 25 years experience as a reliability leader in a variety of industries. Mike is a licensed Professional Engineer, a Certified Reliability Engineer, and a Certified Lean-Six Sigma Master Black Belt. You can reach Mike at mblanchard@LCE.com.
© Life Cycle Engineering