Has the Role of the Reliability Engineer Changed?
By Michael Blanchard, PE, CRE, Life Cycle Engineering
Look up Reliability Engineer (RE) jobs on Indeed.com and you’ll find more than 20,000 openings listed. I would argue that emerging technologies are driving some of this huge demand, which raises some questions. Has the role of the manufacturing plant RE changed? What new skills and competencies are required and what do REs need to do to stay on top of the changes and continue their vital role in managing the life cycle of assets?
Elements of the "Traditional" Reliability Engineer Role Which Have Not Changed
Building Strategic Partnerships – The reliability program cannot function effectively and achieve its goals in a vacuum. Its success depends on the support from key partnerships across functional areas including operations, maintenance, quality, design engineering, information technology, work management, materials management, procurement, health, safety and environmental. The RE must develop and nurture these strategic partnerships to achieve maximum results.
Root Cause Failure Analysis (RCFA) – This has been the sledgehammer in the RE’s toolbox and I don’t see that changing. The RE’s RCFA responsibilities include:
- Developing and updating trigger criteria for RCFA
- Preparing for the subsequent analyses including thorough preliminary investigation, gathering evidence, identifying the right team members, interviewing witnesses
- Selecting the most appropriate tools to facilitate the analysis and validate the probable root causes (5 Why, Design/Application review, Ishikawa (Fishbone), Sequence of events, Fault tree analysis, Change analysis, FMEA, Event and causal factor analysis)
- Identifying and evaluating solutions to prevent recurrence
- Verifying solutions
- Documenting and leveraging results
Proactive root cause analysis, on the other hand, has changed and I discuss that in the next section.
Leading Teams – The RE must be able to facilitate RCFAs, and lead Reliability Centered Maintenance (RCM) activities and change initiatives. This requires cultivating competency in leading people, managing tasks and facilitating decisions.
Life Cycle Asset Management (LCAM) – REs have responsibilities for optimizing each phase of the asset life cycle, beginning at conceptual design and continuing through shut down and decommissioning.
- Concept – Design for reliability and maintainability, comparing design options
- Create/Acquire – Configuration management, commission plan, install for reliability
- Operate and Maintain – Risk plan, operating plan, maintenance plan, capital plan
- Decommission and Dispose – Decommissioning plan, asset disposal process
Management of Change (MOC) – In my experience, many reliability problems are caused by design and uncontrolled changes. The RE is the process owner of this best practice used to ensure that safety, environmental and value stream risks are controlled when an organization makes changes in their facilities, documentation, personnel, or operations.
Risk Management – REs are risk managers and will continue to apply a risk-based asset management (RBAM) strategy across the entire life cycle of an asset, minimizing risk to the value stream. Many RBAM competencies remain the same. REs apply a risk-based approach to asset maintenance and operations; prioritize reliability efforts on critical equipment and failures that impact operations; and incorporate RCM principles to decrease downtime, lower maintenance expenditures and minimize total cost of ownership.
Asset condition monitoring, however, has changed and I discuss it in the next section.
Continuous Improvement (CI) – REs spearhead efforts to improve performance using plan-do-check-act (PDCA) methodology, data mining and modeling, and advanced analytics skills. CI competencies that have not changed include:
- Opportunity identification
- Measuring defective performance
- Proactive root cause analysis process
- Cost/Benefit analysis of improvements
- Sustainability
New Elements in the Reliability Engineer Role Driven by New Technologies
Predictive Maintenance Strategy and Internet of Things
The Internet of Things (IoT) is now used extensively in industry to support asset health and the RE can serve a vital role. Leveraging IoT and complementary technologies requires new skills and expertise that most organizations don’t have in-house. The RE will likely partner with an IoT service provider to set up, access and analyze data from sensors and devices, and convert them into actionable instruction. The RE or gatekeeper in the feedback loop uses this intelligence for condition monitoring, risk management and reliability improvement initiatives.
Figure 1 - Internet of Things (IoT)
Predictive technology sensors are strategically located on critical machines and communicate via the cloud. The cloud provides the infrastructure for streaming, analyzing and storing data for more thorough advanced analytics later. These systems can gather information and statistics from the data to be used for process and reliability optimization. This feedback goes through the RE for validation and follow-up action. It essentially allows for on-line condition monitoring of plant equipment so failures can be predicted and any necessary repairs are planned and executed prior to functional failure.
Because one of the RE’s responsibilities is to manage the Predictive Maintenance Strategy (PdMS) they must possess or develop the skills necessary to manage an online condition monitoring program for their area of responsibility. The aim is to predict when equipment failure may occur, and to prevent failure from occurring by performing planned corrective maintenance. Predicting failure was typically done using one of many technologies including vibration, thermography, ultrasonics, tribology and motor analysis. Data collection has historically been route-based using handheld devices. With the advent of the IoT, online condition monitoring with advanced analytics is now available on a large scale. Equipment condition is continuously monitored by comparing readings to pre-defined parameters. This enables the tracking of patterns, or combinations of patterns, that might indicate equipment failure. To manage the PdMS the RE must have a fundamental understanding of the predictive technologies and manual data collection with handheld devices will be necessary to validate cloud-based failure predictions.
Reliability Engineers need to develop a working familiarity with the following areas of technology to remain effective in their role:
Predictive Technology Sensors
The RE will need to understand the fundamentals, application and maintenance of wireless sensors. In addition to process sensors (i.e. proximity, pressure, water quality, chemical, gas, smoke, level, motion detection, and humidity) the following predictive technology sensors are widely used in online condition monitoring.
- Vibration sensors to monitor the vibration of equipment
- Temperature sensors to monitor temperature variation
- Infrared sensors to measure the heat being emitted by the object
- Oil Level sensors to measure the variation in oil levels
- Acoustic sensors to detect changes in ultrasonic sound made by the equipment
- Motor Voltage and Current sensors to monitor for corona, arcing, tracking and imbalance
Real-time Condition Monitoring
The IoT platform has the ability to process real-time streaming data as fast as it can be collected, allowing for quick response to changing conditions. IoT software captures and aggregates huge amounts of data from connected machines and immediately analyzes it using predictive modelling to ultimately deliver intelligence for corrective action. The RE will likely partner with information technology, process engineering and cloud services to maximize system capabilities.
Online condition monitoring also helps the RE to determine when assets are nearing end of life. This allows for the operations team to plan for its replacement and disposal.
Big Data Analytics
Big data analytics is cloud-based software to monitor and analyze signals from typically thousands of wireless sensors strategically placed on critical assets. It then triggers the necessary maintenance or operations actions based on rules, conditions, algorithms and models defined by the RE and process engineering. The RE should become familiar with:
- Streaming analytics, used to analyze huge dynamic data sets. Real-time data streams are analyzed to detect situations that require urgent and immediate actions.
- Spatial analytics, used to analyze geographic patterns to determine the spatial relationship between objects.
- Time series analytics, used to analyze time-based data to identify trends and patterns.
- Prescriptive analysis, a hybrid of descriptive and predictive analysis used to understand the best course of action that can be taken in a particular situation.
Proactive root cause analysis (RCA) methods have not changed but the huge amount of cloud data available provides the RE with the ability to statistically validate root causes. There is always risk associated with applying solutions to probable root causes not validated. Big data analytics complements the RCA process and also serves the RE to manage risk.
Building Failure Models and Machine Learning
The RE will have the opportunity to build failure models to generate PF curves for planning corrective action. This requires knowledge of the reasons for failure or failure mechanisms, identifying the combination of key parameter values that indicate failure, and using statistical data analytics and mathematics to build the model. These failure models serve as PF intervals.
Machine learning is used in those instances where you can’t define a failure model for your equipment using advanced data analytics. Machine learning is integral to the way data is processed, allowing algorithms to find impending failures. The RE should first build competency with data mining and modeling techniques before attempting machine learning.
Digital Twin Technology
Digital twins are virtual representations of assets and processes that are used to understand, predict, and optimize performance in order to improved performance. A digital twin is built with asset data by simulating asset performance in different usage scenarios under varying conditions. Models based on input factors such as associated risks, operating scenarios, and system configuration can be used to simulate a range of business outcomes such as total expected cost of maintenance and system unavailability over a period of time. The RE will need to develop advanced analytical skills to design and deploy different simulation models.
Prescriptive Maintenance
Prescriptive maintenance works with cloud technology by detecting asset degradation before functional failure, and prescribing corrective action options to mitigate the problem. Multiple scenarios are run, possible outcomes are weighed, and then a decision is made for the operations and maintenance of the systems. This approach has the ability to significantly improve the effectiveness of the maintenance organization as well as minimize maintenance spending. The RE should have a fundamental understanding of the technology and apply it where benefits outweigh the costs.
Augmented Reality
A significant amount of the RE’s time is spent tracking down equipment history, drawings and other key information while troubleshooting equipment in distress and during root cause failure analysis. Augmented reality complements online condition monitoring by providing the reliability professional with on-the-spot visualization and solution of maintenance problems in their infancy. The troubleshooter uses smart “visual displays” to guide them to the asset in distress and overlays key information about the equipment (i.e. O&M manuals, schematics, maintenance history, cloud data, advanced analytics, etc.) to guide their endeavor. This technology will minimize diagnostic time and improve repair maintenance quality, thereby positively impacting plant reliability.
Drones and Unmanned Vehicle Technology
Some failure modes remain unmitigated because of difficulty accessing the affected areas. Unmanned vehicles are being used by maintenance organizations to conduct inspections on infrastructure and other facility assets in hard-to-reach areas. REs should consider using drones and reality modeling to enhance productivity and safety in asset inspections.
New technologies are developing rapidly and many of them will have potential to help manufacturers operate more safely and more profitably. The Reliability Engineer that can harness new technological approaches effectively will remain an influential and highly valued member of the engineering team.
Michael Blanchard is a Reliability Engineering Subject Matter Expert with Life Cycle Engineering (LCE). Mike is a licensed Professional Engineer, a Certified Reliability Engineer, and a Certified Lean-Six Sigma Master Black Belt. A leader dedicated to helping teams achieve their goals, Mike maintains a strong focus on sustaining gains and delivering value. You can reach him at mblanchard@LCE.com.