What is FMEA?
FMEA (Failure Mode and Effect Analysis) is a structured approach used to identify potential failure modes in a system, process, or product, and to evaluate the potential impact of those failures.
The goal of performing an FMEA is to proactively identify and address potential failure modes before they occur, to minimize risks, and improve the overall reliability and safety of the system, process, or product. FMEA is widely used in various industries, including medical, aerospace, automotive, and consumer electronics, among others.
When is FMEA used?
FMEAs are often used when:
- Developing a new product design or process
- Modifying an existing product design or process
- Complying with company policy
- Mandated by a customer
- Required by regulations or standards
- Needed for other quality and reliability tools
FMEAs are most beneficial when they are initiated early in the development of a new product or process when changes can still be made easily and at a low cost. FMEAs are intended to be maintained throughout the life of the product/process, as new data becomes available and opportunities for improvement are needed.
Types of FMEAs
There are several different types of FMEAs. The FMEA types differ in their focus, scope, and timing but are grounded in the same principles and fundamentals. By using multiple types of FMEA throughout the design and production process, it is possible to identify and address potential issues at various levels of the system, ensuring that the final product is reliable and safe. Below we list and describe six types of FMEAs commonly used.
Figure 1: 6 Types of FMEAs
1. System FMEA
- System FMEA is the highest level analysis of an entire system, focusing on system-related deficiencies, including interfaces, or interaction between subsystems or with other systems, and evaluates their potential failure modes and effects. In System FMEA, the goal is to ensure the product or system will accomplish its intended functions in a safe and reliable manner and to ensure the overall risk of the system is low.
- System FMEA should be initiated during the earliest conceptual phases of design. Typically this is not later than the start of the architecture and technology feasibility phase 2A when there may be opportunities to influence system-level decisions.
2. Design FMEA
- Design FMEA focuses on individual components or subsystems within the system and evaluates their potential failure modes and effects. In Design FMEA, the goal is to improve the design to ensure the product operation is safe and reliable during its useful life. Design FMEA usually assumes the product will be manufactured according to specifications.
- Design FMEA should be initiated as soon as the design architecture/concept are established, typically this is not later than the start of the detailed design phase 2B when there may be opportunities to influence component and subsystem level design.
3. Process FMEA
- Process FMEA focuses on the manufacturing or assembly process and evaluates each step in the process or assembly to identify potential failure modes and their effects. In Process FMEA, the goal is to improve the manufacturing process to ensure that a product can be reliably built to design requirements in a safe manner, with minimal downtime, scrap, and rework.
- Process FMEA should be initiated as soon as the process concept is established. Typically, this is not later than the start of detailed design phase 2C prior to design transfer when there may be opportunities to influence the process and process control plan for production.
4. Software FMEA
- Software FMEA is a type of Design FMEA that analyzes the software elements, focusing on potential software-related deficiencies, with emphasis on improving the software design and ensuring product operation is safe and reliable during useful life.
- Software FMEA should be initiated as soon as the initial software design architecture is established, typically this is not later than the start of detailed design phase 2B when there may be opportunities to influence the software design.
5. Use FMEA
- Use FMEA focuses on the user and how they will interface with the product, considering human factors, ergonomics, and user interface design. It is used to evaluate the potential safety and usability risks associated with a product or system, and to develop strategies to mitigate or eliminate those risks. Sometimes a Use FMEA will be included in the scope of a System FMEA.
- Use FMEA should be initiated as soon as the initial design concept is established. Typically, this is no later than the start of the detailed design phase 2B when there may be opportunities to influence the design for use.
6. Service FMEA
- Service FMEA focuses on serviceability of the product identifying and preventing failures that might occur due to improper installation, operation, maintenance, or repair.
- Service FMEAs should be initiated as soon as planning for serviceability begins. Typically, this would begin no later than the detailed design phase 2B when there may be opportunities to influence the design for service.
Figure 2: Product Development Workflow with typical starting points for FMEAs
Why should FMEA be used in product development?
- Provides a methodical process for identify design and process failure modes
- Is an effective tool to aid in performing a product risk analysis
- Improves the quality, reliability, and safety of the product or process
- Reduces development time and cost when applied early in the design process
- Helps to identify critical-to-quality characteristics
- Reduces costs associated with rework and warranty claims
- Documents and tracks risk reduction activities
- Ultimately increases customer satisfaction
How do you Perform an FMEA?
Performing an FMEA can be summarized into 5 key stages as shown in Figure 3 below. FMEAs are documented in a worksheet similar to the generic one shown in Figure 4 below.
Figure 3: 5 Key Stages in Performing an FMEA
Figure 4: Generic FMEA Worksheet. Column headers are described in FMEA Worksheet Definitions at the end of the blog
1. Planning and Preparation
a. Determine Type and Scope of the FMEA
FMEAs can take a significant amount of time and can be costly. When determining if an FMEA is needed, it is important to consider areas of the greatest potential risk, areas of a design or process that may present new technological challenges, concerns of potential for safety issues, important regulations, or mission critical applications. New product development projects often start with a System FMEA, which may help to identify risky subsystems or components, etc., that may require a follow-on Design FMEA. The exact scope of the FMEA should be determined by an FMEA cross-functional team.
b. Assemble a cross-functional team and assign a facilitator
Assembling the appropriate cross-functional team is an essential step in the process of conducting an FMEA. An FMEA analysis requires subject-matter experts from a variety of disciplines to ensure all necessary perspectives have been included in the analysis. Well defined teams can provide a synergistic environment that can discover things that individuals alone may not see.
When performing a System or Design FMEA, a typical core cross-functional team includes representatives from systems engineering, design engineering (electrical, mechanical, software), manufacturing, test engineering, and quality or reliability. Additional representatives and disciplines may be included on an as-needed basis. Each type of FMEA should have a core cross-functional team that ensure the applicable subject matters.
FMEA teams work best when there is a facilitator that is knowledgeable in the process steps of an FMEA and can lead the team through the process. The facilitator can prepare and manage the updates to the documentation, schedule the meetings. Often times, Quality or Reliability takes the role of the facilitator, but this can be anyone on the team.
c. Training, Establish Ground Rules, and Assumptions
Before diving into the FMEA with the team, make sure to provide refresher training on the process steps, clarify the ground rules for how the team will work together and any assumptions (e.g., the Design FMEA assumes the product will be manufactured within specifications, but may include any design deficiencies that could result in unacceptable variation in the manufacturing process).
d. Gather Relevant Data
It is important to adequately prepare for the start of an FMEA by pulling together relevant data that will help you identify the items for analysis, potential failure modes, current controls, etc. This data will vary depending upon the type of FMEA you are performing.
Some examples include:
– Bill of Materials
– Warranty, recalls, and other field history for comparable products
– Engineering or process requirements
– Drawings, schematics, or other design specifications
– Regulations
– System Block Diagram
– Process Flow Chart
– Actual parts (or similar design intent)
– Past FMEAs
– Operator Instructions
– User Instructions
– Quality Performance Data
e. List each item and its function, and unique ID
Using the information gathered from above, the team can begin the FMEA by listing each item (design component, subsystem, or process step that is being analyzed) and its function (what it is intended to do or main purpose). When possible, it is best to identify the function relative to a given standard of performance or requirement. Note: some people may choose to break out requirements into a separate column. Assign a unique ID to each row of the analysis for future traceability. See Design FMEA example in Figure 5 below.
Figure 5: Preparation Stage from sample Design FMEA
2. Failure Analysis
The Failure Analysis stage of the FMEA includes identification of all potential failure modes, causes, effects, and controls. See example Design FMEA in Figure 6 below.
Figure 6: Failure Analysis Stage from sample Design FMEA
a. Brainstorm Potential Failure Modes
Brainstorm all potential failure modes (ways the item can fail) using the data gathered above to trigger your thoughts. You may find it helpful to use a stack of post-it notes and a whiteboard when brainstorming to allow a quick flow of ideas. Continue to ask yourself how the item might fail to deliver its intended function. If there is more than one potential failure mode, add a new row and assign a new ID for each new failure mode.
b. List all Potential Effects for each Failure Mode
An effect is the impact or consequence of the failure on the system, the end user, or can also be on property or the environment. There can be more than one effect for each failure mode, however in most cases we will often use the most severe of the end effects for the analysis.
c. List all Potential Causes for each Failure Mode
Here we want to drill down to the root cause or specific cause of the failure. This often requires the team to keep asking why, until you reach the root cause. You may also be aware of previous design or process failures with this type of design or material. List all potential causes for each failure mode. Often there may be more than one cause per failure mode. Add a new row and assign a new ID for each potential cause.
d. List Current Controls
Current controls are the methods or actions currently planned or already in place to reduce or eliminate the risk associated with each potential cause. Current controls can be broken down into Preventive and Detection.
Preventive Type Controls are the methods or actions currently planned or already in place that describe how a failure mode, effect, or cause is prevented. Prevention type controls are intended to reduce the likelihood that the failure will occur and are used as input to the occurrence rating.
Detection Type Controls are the methods or actions currently planned or already in place that describe how a failure mode, or cause can be detected before the product is released to production. Detection type controls are intended to improve the likelihood that the failure will be detected before it reaches the end user and are used as input to the detection rating.
3. Initial Risk Analysis
The initial risk analysis occurs prior to the implementation of any risk mitigations. An FMEA assesses risk based on the Risk Priority Number (RPN) which is a numerical rating of the risk for each potential failure mode/cause calculated from the product of Severity, Occurrence, and Detection. The steps below will take you through the process of determining the ratings for Severity, Probability of Occurrence, Probability of Detection and the calculation and prioritization of the RPN for each failure mode/cause. See example Design FMEA in Figure 7 below.
Figure 7: Initial Risk Analysis Stage – Design FMEA Example
a. Rate Severity
Severity is a rating number associated with the most serious effect for a given failure mode, based on criteria from a severity scale, similar to the example shown in Table 1: Severity Ratings.
Table 1: Severity Ratings
b. Rate Probability of Occurrence
Occurrence is a rating number associated with the likelihood that the failure mode and its associated cause will occur for the item being analyzed, based on criteria from a probability of occurrence scale similar to the example shown in Table 2 below. In the example Design FMEA in Figure 7 above, the Current Controls (Prevention Type) are intended to reduce the likelihood of occurrence. Without these current controls, the initial rating of occurrence would likely be higher.
Table 2: Occurrence Ratings
c. Rate Probability of Detection
Detection is a rating number associated with the best control from the list of detection-type controls based on criteria from a detection scale, similar to the example shown in Table 3 below. In the example Design FMEA in Figure 7 above, the Current Controls (Detection Type) are intended to improve the likelihood of detection. However, the Detection Type control listed is a design analysis without good correlation, which has a poor chance of detecting failure modes.
Table 3: Detection Ratings
d. Calculate Risk Priority Number (RPN)
Calculate the Risk Priority Number (RPN) for each potential failure mode/cause by multiplying Severity, Occurrence, and Detection. Most FMEA worksheets have the RPN set up as an automatic calculation.
The RPN can now be used to rank failure modes from highest to lowest risk in order to prioritize actions for those starting with the highest RPN value. It is also very common to establish criteria/thresholds for determining which failure modes require action first, which ones can be addressed later or accepted with no further action. See Table 4 below for an example of criteria established for determining action needed. In the example Design FMEA in Figure 7 above, we’ve calculated an RPN = 40. Using the criteria established in Table 4, action is required, and it is high priority.
Table 4: Criteria for Determining Actions Needed
4. Risk Mitigation
When it has been determined that actions are recommended, the team must investigate and identify additional controls to reduce the likelihood of occurrence of the failure mode or improve the likelihood of improving the detection of a failure mode. In the Design FMEA example shown in Figure 8 below, the recommended action was added to test and certify the AC adapter to a well-recognized international safety standard. The responsibility and target completion have been documented to ensure ownership and timeliness.
Figure 8: Risk Mitigation Stage – Design FMEA Example
5. Reassess Risk
After the implementation of the recommended actions, document the actions taken to reduce the RPN to an acceptable level. Include a reference to the objective evidence (e.g., a released design or process document, report, engineering change order, etc.) for traceability. Reassess the Severity, Occurrence and Detection ratings based on the actions taken. In the Design FMEA example shown in Figure 9 below, there was no change in Severity based on the actions, this was no surprise, but there was a slight reduction in the probability of occurrence, and a significant improvement in the probability of detection of a failure mode.
Figure 9: Reassess Risk Stage – Design FMEA Example
Conclusion
Failure Mode and Effects Analysis (FMEA) is an effective risk management tool for identifying and addressing potential failures, improving product or process reliability and safety, and reducing development time and costs associated with design, rework, and warranty claims.
Multiple types of FMEAs can be used throughout the product lifecycle, from design and development to production and maintenance. Following a systematic process for conducting FMEAs can be time intensive, but when applied early and strategically, it will end up saving both time and money in the development of a safe and reliable product.
FMEA Worksheet Definitions
- ID: Unique identifier for traceability.
- Item: The design component, subsystem, or process step that is being analyzed.
- Function: Is what the item is intended to do (e.g., its main purpose). When possible, it is best to identify the function relative to a given standard of performance or requirement. Note: some people may choose to break out requirements into a separate column.
- Potential Failure Mode(s): Is the way in which the item may fail to deliver the intended function and associated requirements.
- Potential Failure Effects(s): The potential impact of the failure on the system, the next step, or the end user, property, or environment.
- Severity: A rating number associated with the most serious effect for a given failure mode, based on criteria from a severity scale, like the example shown in TBD.
- Potential Causes: The root cause(s) or specific reason(s) for the failure.
- Occurrence: A rating number associated with the likelihood that the failure mode and its associated cause will occur for the item being analyzed.
- Current Controls (Prevention): The methods or actions currently planned or already in place that describe how a failure mode, effect, or cause is prevented. Prevention type controls are intended to reduce the likelihood that the failure will occur and are used as input to the occurrence rating.
- Current Controls (Detection): The methods or actions currently planned or already in place that describe how a failure mode, or cause can be detected before the product is released to production. Detection type controls are intended to improve the likelihood that the failure will be detected before it reaches the end user and are used as input to the detection rating.
- Detection: A rating number associated with the best control from the list of detection-type controls based on criteria from a detection scale, like the example shown in TBD.
- Risk Priority Number (RPN): It is a numerical rating of the risk for each potential failure mode/cause calculated from the product of Severity, Occurrence, and Detection. The RPN is commonly used to rank failure modes from highest to lowest risk and then focus on recommended actions for those with the highest RPN value. It is also quite common to establish criteria/thresholds for determining which failure modes require immediate action, which ones can be addressed later or accepted with no further action. See TBD below for an example of Risk -Action Criteria Table.
- Recommended Actions: Actions for reducing the Severity, Occurrence, and Detection of a failure mode and its associated cause.
- Responsibility and Target Completion Date: Responsibility is the person responsible for implementing the recommended action. The Target Completion Date is the target date for implementation, whether it is in the design or the process.
- Actions Taken: These are the actions taken to reduce the RPN to an acceptable level and should include reference to the objective evidence (e.g., a released design or process document, report, engineering change order, etc.)