X

A Primer on Functional Safety in Auto

October 3, 2024

Deep thoughts - if two self-driving cars get into an accident… who would be at fault? The automotive industry’s case for functional safety.

Hopefully this is a somewhat tongue-in-cheek way to grab your attention. To be clear, self-driving cars aren’t the primary driver of automotive functional safety (FuSa). Automotive functional safety has been an area of focus since 2011, when the Automotive Safety Integrity Level (ASIL) based upon the ISO 26262 specification was first introduced.

However, the awareness of cars with increased levels of autonomy is indeed driving an even greater focus on FuSa. Understandably so, as increasing levels of autonomy also lead to significant increases in the amount of semiconductors employed in the vehicle. In turn, these semiconductors have greater levels of physical control of the vehicle. The increase in semiconductors in the vehicle also drives explosive levels of growth in overall system complexity.

As an illustration of the complexity associated with today’s car, a high-end vehicle contains over 300 million lines of software. This is expected to grow to 1 billion by the end of this decade.  With these levels of complexity, there is a good reason why there’s such a keen focus on FuSa.

Functional safety, as it’s defined, is the discipline of implementing safety measures to prevent or reduce the risk of harm caused by a vehicle’s electrical or electronic systems failing or behaving unexpectedly.

There are a lot of key components to unpack from what appears to be a fairly innocuous statement.

  • Implementing safety measures to prevent or reduce risk: Safety measures typically translate into additional hardware overhead that needs to be added when a semiconductor or system is being designed.  This overhead needs to accomplish several key things:
    • Know what “normal operation” looks like & detect when a system is not acting “normally”
    • Once abnormal behavior has been detected, either
      • Provide a corrective action that restores the failed system to “normal operation”
        or
      • Provide a flag or warning from the failed system that will allow the vehicle to be able to take the appropriate action in response to the flagged departure from “normal operation” – depending on the severity of the failure, the right response from the vehicle could from “take no action” (one pixel on a display is not working) to cripple the vehicle (something affecting vehicle safety has failed) pulling it over to the side of the road.
  • Risk of harm caused by a vehicle’s electrical or electronic systems: ASIL, which is measured by the letters A,B,C, & D where each progressive letter of the alphabet leads to increasing levels of scrutiny. The elements that are typically considered in establishing the required safety level (A vs. D for example) for a given component (i.e. Antilock Braking System, Air Bag, Lane Keeping, Headlight, etc.) includes the following:
    • Hazard analysis
      • What could happen to the vehicle in the case of a failure?
    • Risk analysis
      • How likely is a failure going to happen?
      • How severe will the impact be if a failure happens?

The ASIL is then determined by:

  • The failure rate - how likely the failure may occur?
  • What is the level of tolerance in the system to undetected failures

While an argument might be to ensure that undetected failures never occur, higher levels of ASIL typically come with significantly higher costs due to the additional hardware overhead required to detect failures. So if the inherent failure rate of a component is very low and the impact of an undetected failure is quite limited, then specifying an ASIL D for that component will likely result in a loss-leader, especially so in the cost-conscious automotive industry.

As an extreme example, ASIL D solutions can be based upon triple mode redundancy (TMR).  TMR employs three identical copies of the same hardware with additional hardware that can detect if the output of the three copies match. When they don’t match, typically one of the three outputs doesn’t match the other two. The checking hardware will flag that there was an error but rely on the two matching outputs for the correct value. As you can begin to see, the costs of achieving ASIL D certification can get expensive.

  • Systems failing or behaving unexpectedly: Another important point that can be easily overlooked in this statement. While this was just alluded to, which is the ASILs correlate to different detected failure rates, ASIL does not specify the absence of a failure. It is anticipated that failures will indeed naturally occur in a system for a number of reasons.  

While functional safety is not primarily focused on preventing a random failure, it is, however, the responsibility of the system, to “always” be able to recognize when the system is failing or behaving unexpectedly.

The term FIT (Failure in Time) is the term used in FuSa to specify how many failures - i.e. missing an event when a component isn’t working properly over a period of time that is specified in the ISO 26262 spec and reflected in the ASIL rating. ASIL D, with a rating of 10 FITs implies that only 10 failures are acceptable in 1 Billion Hours of operation. That is equal to 10 failures in roughly 114,000 years or 1 failure in just over 11,400 years. To the best of my knowledge, there are no cars on the road that have reached that age - yet. In other words, this specification is very stringent.

It’s also key to note that I have been trying to use the word “component” carefully, because while ASIL certification is specified at the semiconductor device level, it is also measured and specified at the system level – which takes into account all the devices associated with a given system / component in the vehicle. This implies that while an ASIL D FIT rate of 10 is required at the system level, this budget of 10 will be distributed across the different semiconductor devices that make up the system. This implies that the FIT rate at the device level must be some percentage of that 10 – further increasing the complexity of designing a device targeting auto while still remaining profitable.

These insights into these stringent specifications should hopefully instill a sense of increased confidence that there is very significant scrutiny in the electronics systems in the automobile - with levels of scrutiny that are directly correlated to the impact of failure of a given system.

There are many different topics that could be discussed when looking at FuSa and it’s easy to go down a rabbit hole. To put the scale and importance of FuSa into perspective, an automotive OEM typically will have their own safety department with teams of engineers focused purely on FuSa. The same level of staffing also exists for the semiconductor manufacturer who is selling components into automotive applications. Both the OEM and semiconductor companies will require a “Safety Office” staffed with a “Safety Manager” and multiple safety engineers. This is also true for the Tier 1s.

Lastly, two more topics to cover – systematic fault coverage and random fault coverage.

Systematic fault coverage evaluates the design, test, verification, documentation, and other such processes to ensure that there are faults or errors that have been systematically introduced due to bad hygiene in the areas of design and test of the device. Systematic fault coverage also extends into the manner that software is developed both in the form of firmware as well as overall system software. Stringent processes and methodologies are called out in the ISO 26262 specification with correlated levels of scrutiny as dictated by the given ASIL.  

The importance of addressing systematic fault coverage requirements cannot be overstated.  Several years ago, the highly visible case of a vehicle that suffered from a faulty “stuck accelerator” was found to have not employed best practices in the development of the underlying software that was used to control the operation of the accelerator. Realizing that poor software development practices were at fault resulting in “spaghetti code,” the OEM quickly settled the case resulting in several combined financial settlements in excess of $2.5 B over 10 years ago.

Random fault coverage focuses on the random hardware failures which can occur unpredictably during the lifetime of a component. These failures can occur even if there have been no flaws in the development and manufacturing of the component. These failures typically are caused by cosmic neutron strikes or alpha particles from the package material. Here again, there are different ASILs that correspond to the rate at which these random failures go undetected. ASIL D specifies 10 FIT for the probabilistic metric for random hardware failures (PMHF).

ASIL D is a very difficult specification to achieve at the component level and so typically systems are designed using devices that have a lesser ASIL random fault coverage rating i.e. ASIL B. Through ASIL decomposition, which is a structured way of adding redundancy to the system, the requisite ASIL can be achieved. Random faults ultimately will be detected via the redundancy.

FuSa is a very complex topic of which I've barely scratched the surface. These rigid processes ensure that the vehicle is designed to stringent specifications to minimize the effect of the failure of a component. As mentioned, with an ever-increasing amount of complex systems taking over the control of the vehicle, the need for rigid safety processes can’t be overstated. Hopefully as you have been able to see in this blog, there is a lot of scrutiny and rigor in designing a system to achieve a given ASIL to hopefully avoid two self-driving cars from getting into an accident.

Deep thoughts - if two self-driving cars get into an accident… who would be at fault? The automotive industry’s case for functional safety.

Hopefully this is a somewhat tongue-in-cheek way to grab your attention. To be clear, self-driving cars aren’t the primary driver of automotive functional safety (FuSa). Automotive functional safety has been an area of focus since 2011, when the Automotive Safety Integrity Level (ASIL) based upon the ISO 26262 specification was first introduced.

However, the awareness of cars with increased levels of autonomy is indeed driving an even greater focus on FuSa. Understandably so, as increasing levels of autonomy also lead to significant increases in the amount of semiconductors employed in the vehicle. In turn, these semiconductors have greater levels of physical control of the vehicle. The increase in semiconductors in the vehicle also drives explosive levels of growth in overall system complexity.

As an illustration of the complexity associated with today’s car, a high-end vehicle contains over 300 million lines of software. This is expected to grow to 1 billion by the end of this decade.  With these levels of complexity, there is a good reason why there’s such a keen focus on FuSa.

Functional safety, as it’s defined, is the discipline of implementing safety measures to prevent or reduce the risk of harm caused by a vehicle’s electrical or electronic systems failing or behaving unexpectedly.

There are a lot of key components to unpack from what appears to be a fairly innocuous statement.

  • Implementing safety measures to prevent or reduce risk: Safety measures typically translate into additional hardware overhead that needs to be added when a semiconductor or system is being designed.  This overhead needs to accomplish several key things:
    • Know what “normal operation” looks like & detect when a system is not acting “normally”
    • Once abnormal behavior has been detected, either
      • Provide a corrective action that restores the failed system to “normal operation”
        or
      • Provide a flag or warning from the failed system that will allow the vehicle to be able to take the appropriate action in response to the flagged departure from “normal operation” – depending on the severity of the failure, the right response from the vehicle could from “take no action” (one pixel on a display is not working) to cripple the vehicle (something affecting vehicle safety has failed) pulling it over to the side of the road.
  • Risk of harm caused by a vehicle’s electrical or electronic systems: ASIL, which is measured by the letters A,B,C, & D where each progressive letter of the alphabet leads to increasing levels of scrutiny. The elements that are typically considered in establishing the required safety level (A vs. D for example) for a given component (i.e. Antilock Braking System, Air Bag, Lane Keeping, Headlight, etc.) includes the following:
    • Hazard analysis
      • What could happen to the vehicle in the case of a failure?
    • Risk analysis
      • How likely is a failure going to happen?
      • How severe will the impact be if a failure happens?

The ASIL is then determined by:

  • The failure rate - how likely the failure may occur?
  • What is the level of tolerance in the system to undetected failures

While an argument might be to ensure that undetected failures never occur, higher levels of ASIL typically come with significantly higher costs due to the additional hardware overhead required to detect failures. So if the inherent failure rate of a component is very low and the impact of an undetected failure is quite limited, then specifying an ASIL D for that component will likely result in a loss-leader, especially so in the cost-conscious automotive industry.

As an extreme example, ASIL D solutions can be based upon triple mode redundancy (TMR).  TMR employs three identical copies of the same hardware with additional hardware that can detect if the output of the three copies match. When they don’t match, typically one of the three outputs doesn’t match the other two. The checking hardware will flag that there was an error but rely on the two matching outputs for the correct value. As you can begin to see, the costs of achieving ASIL D certification can get expensive.

  • Systems failing or behaving unexpectedly: Another important point that can be easily overlooked in this statement. While this was just alluded to, which is the ASILs correlate to different detected failure rates, ASIL does not specify the absence of a failure. It is anticipated that failures will indeed naturally occur in a system for a number of reasons.  

While functional safety is not primarily focused on preventing a random failure, it is, however, the responsibility of the system, to “always” be able to recognize when the system is failing or behaving unexpectedly.

The term FIT (Failure in Time) is the term used in FuSa to specify how many failures - i.e. missing an event when a component isn’t working properly over a period of time that is specified in the ISO 26262 spec and reflected in the ASIL rating. ASIL D, with a rating of 10 FITs implies that only 10 failures are acceptable in 1 Billion Hours of operation. That is equal to 10 failures in roughly 114,000 years or 1 failure in just over 11,400 years. To the best of my knowledge, there are no cars on the road that have reached that age - yet. In other words, this specification is very stringent.

It’s also key to note that I have been trying to use the word “component” carefully, because while ASIL certification is specified at the semiconductor device level, it is also measured and specified at the system level – which takes into account all the devices associated with a given system / component in the vehicle. This implies that while an ASIL D FIT rate of 10 is required at the system level, this budget of 10 will be distributed across the different semiconductor devices that make up the system. This implies that the FIT rate at the device level must be some percentage of that 10 – further increasing the complexity of designing a device targeting auto while still remaining profitable.

These insights into these stringent specifications should hopefully instill a sense of increased confidence that there is very significant scrutiny in the electronics systems in the automobile - with levels of scrutiny that are directly correlated to the impact of failure of a given system.

There are many different topics that could be discussed when looking at FuSa and it’s easy to go down a rabbit hole. To put the scale and importance of FuSa into perspective, an automotive OEM typically will have their own safety department with teams of engineers focused purely on FuSa. The same level of staffing also exists for the semiconductor manufacturer who is selling components into automotive applications. Both the OEM and semiconductor companies will require a “Safety Office” staffed with a “Safety Manager” and multiple safety engineers. This is also true for the Tier 1s.

Lastly, two more topics to cover – systematic fault coverage and random fault coverage.

Systematic fault coverage evaluates the design, test, verification, documentation, and other such processes to ensure that there are faults or errors that have been systematically introduced due to bad hygiene in the areas of design and test of the device. Systematic fault coverage also extends into the manner that software is developed both in the form of firmware as well as overall system software. Stringent processes and methodologies are called out in the ISO 26262 specification with correlated levels of scrutiny as dictated by the given ASIL.  

The importance of addressing systematic fault coverage requirements cannot be overstated.  Several years ago, the highly visible case of a vehicle that suffered from a faulty “stuck accelerator” was found to have not employed best practices in the development of the underlying software that was used to control the operation of the accelerator. Realizing that poor software development practices were at fault resulting in “spaghetti code,” the OEM quickly settled the case resulting in several combined financial settlements in excess of $2.5 B over 10 years ago.

Random fault coverage focuses on the random hardware failures which can occur unpredictably during the lifetime of a component. These failures can occur even if there have been no flaws in the development and manufacturing of the component. These failures typically are caused by cosmic neutron strikes or alpha particles from the package material. Here again, there are different ASILs that correspond to the rate at which these random failures go undetected. ASIL D specifies 10 FIT for the probabilistic metric for random hardware failures (PMHF).

ASIL D is a very difficult specification to achieve at the component level and so typically systems are designed using devices that have a lesser ASIL random fault coverage rating i.e. ASIL B. Through ASIL decomposition, which is a structured way of adding redundancy to the system, the requisite ASIL can be achieved. Random faults ultimately will be detected via the redundancy.

FuSa is a very complex topic of which I've barely scratched the surface. These rigid processes ensure that the vehicle is designed to stringent specifications to minimize the effect of the failure of a component. As mentioned, with an ever-increasing amount of complex systems taking over the control of the vehicle, the need for rigid safety processes can’t be overstated. Hopefully as you have been able to see in this blog, there is a lot of scrutiny and rigor in designing a system to achieve a given ASIL to hopefully avoid two self-driving cars from getting into an accident.

Robert Bielby

Automotive System Architecture & Product Planning Consultant

Subscribe to TechArena

Subscribe