Planning to Fail: Necessary for Success
By Benjamin Moses, Technical Director at AMT – The Association For Manufacturing Technology (owner of the International Manufacturing Technology Show).
Failure modes and effects analysis (FMEA) is a tool to prevent errors that lead to defects, overproduction and over processing. Common analyses are used in design (DFMEA), process (PFMEA) and function. The core principles are the same. These principles are to identify failures and implement countermeasures. Documenting a FMEA is straightforward. There are best practices that can streamline the process. General business practices can benefit from this tool.
These are steps for creating an FMEA:
- List the step, process or product.
- Brainstorm potential failures for each step, process, or product.
- Identify potential consequences or effect of each failure.
- Rate the severity of the consequence.
- Identify the cause of the effects.
- Rate the likelihood of consequence.
- Rate the ability to detect failure modes.
- Determine the risk priority number (RPN). This score is created by multiplying all three values together. The highest RPN score is the highest risk. This defines the prioritization of tasks to reduce the overall risk.
While the instructions are straightforward, there are best practices to increase the usability of the process. The first best practice is gross scoring. It is common to use every integer from 1-10 in scoring, but this leads to more confusion and frustration. Using a scoring resolution of 1-5-10 will improve usability.
The first iteration will take the longest. It is important to understand the cross-functional team dynamics to successfully complete the first iteration. The most efficient teams are a balance of individual and group sessions. This process will iterate through the life of the process or product.
As countermeasures are implemented, the process will change. These changes should be reflected in the FMEA. Resource allocation for this process is generally low considering the impact. The initial iteration may take dozens to 100 man hours to create. This is depending on the size of the system. This process and vocabulary can be directly related to daily business practices.
Jeff Traver, AMT Vice President - Asset Management and Operations, provided the following case:
Recently, AMT relocated its headquarters from their existing building to temporary space to facilitate the redevelopment of their property in McLean, Va. The main objective of the move was to transfer operations in a manner that would not impact AMT's members or working partners. Therefore, the move was planned to begin on a Friday afternoon and end on Saturday with a Sunday contingency. There was specific focus on AMT's IT operations, which need to operate 24/7. For the IT transfer to take place "invisibly," careful project planning, as well as contingency planning was required.
Contingency planning assumes a "failure event," which is why the concepts of FMEA are easily applicable. The first step for the team was to identify the inventory of all of the IT systems that touch any outside transaction, individual, or system. Once these were identified, it became necessary to understand any combination of internal systems that are required for any of those "touches" to occur. This allowed AMT to create a list of "foundational IT systems" that would have to remain operational throughout the move.
The team treated each resource/device as an individual "patient on life support." A list of possible failures was created for each "patient." Next, a pro forma plan of intervention was developed for each of the statistically significant failure modes. An extra step also was taken to create a potential list of failure modes for the interventions.
The analysis showed that the critical element was time. The bottom line was that the team knew that they should allocate four days to the IT move in order to maintain a seamless transition. Therefore, the original plan of moving over the weekend wouldn't be adequate.
To maintain 24/7 IT operations, cut all the connections and physically move the servers, the team created a parallel redundant system. AMT's IT team worked with a backup supplier to "virtualize" AMT's servers at a remote location. The team was then able to switch over from the AMT servers to the "virtual" servers on Thursday prior to the actual physical move. This gave them the four days needed to account for any disruptions to the plan. As fate would have it, that extra time was necessary. The revised plan called for the physical move of the servers to take place on Thursday night. Due to a logistical conflict, the servers couldn't be moved until Friday. Then, it took an addition day to bring operations back to normal.
All in all, making four days available for the move became necessary. Without the up-front analysis and contingencies to foresee possible failures, the essential time might not have been allocated, and the results would have come up short.
FMEA is a broad tool to help mitigate risks and prevent waste. While heavily used in manufacturing and design, there are other strong business applications. Here are two tools to aid in your FMEA journey: