Transportation

Safety Engineering In The Time Of Covid-19


(This is the third in a series of articles I’m calling ‘Opening the Starsky Kimono,’ where I reveal Starsky Robotics’ key insights we previously kept top-secret. The first covers the end of Starsky and limitation of AI and can be found here. The second (here) covers the business use case for AV trucks and the commercial irrelevance of true AI to that aim.)

A few months ago I wrote that people part of the challenge of deploying an autonomous truck was that people didn’t really value, or understand statistical safety.

How things have changed.

In the last two months everyone has developed a qualified opinion on statistical safety.  At least, that is, when it comes to public health in response to Covid-19. VCs who told me they thought the risk of unmanned trucks was too great are now tweeting that we should accept a higher death rate so as to re-open the American economy.

The statistical arguments that underpin proposed responses to Covid-19 aren’t that different from the models we used at Starsky to perform our public unmanned test.  The entire Covid-19 crisis, in fact, presents a surprising parallel to explain what safety engineering really is.

Safety is not the absence of risk, but the absence of unacceptable risk.  Just as every public health policy will lead to some number of fatal Covid-19 cases, any deployed AV will have a greater-than-zero fatality rate.  Making that system safe is a matter of understanding how, why, and when it will hurt people and ensuring that those reasons are acceptable.  

It is unacceptable to deploy a system that regularly hurts people while it’s working as expected in normal driving conditions.  On the other hand, it can be acceptable to deploy one that might hurt people while failing in rare ways in uncommon driving situations.  As long as you know the exact risks you’re taking.

Think of it this way – if you walk through a Covid-19 ward you won’t necessarily die.  To die you’d first need to be in contact with the virus, catch it, have a particularly bad case, and ultimately succumb to the illness.  If you need to walk through that ward, you can mitigate those circumstances by taking precautions while walking (6’ apart, masks, hand-washing), responding quickly to potential exposure (testing and going on early treatments), and quickly going on a full course of treatment.  While the chance of fatality is still greater than zero, it’s significantly lower.

For AVs you can also break the problem down.  A failed system or freak incident doesn’t necessitate a fatality  The freak incident might either happen when the AV isn’t nearby or the system failure might be when the AV isn’t near a person, the AV system’s onboard diagnostics then have the opportunity to catch the failure, and assuming that they do the system then has the opportunity to avoid incident. 

Through a decent amount of work you can figure out the statistical likelihood of each of those steps.  For some you can look at road safety data, your design team can conduct FMEAs to understand which failures pose the risk of harm and their causes, and you can do an incredible amount of real world testing — I’d estimate we drove on the same 8mi stretch of road 1500-2500 times for our unmanned run.  

One of the surprise MVPs of the entire Unmanned program was diagnostics.  At Starsky we were able to build a highly modular and measurable system.  Each node was only supposed to do very specific and measurable things.  The front normal camera, for example, was supposed to spit out an image every so many milliseconds.  If it failed for a few milliseconds we would log a failure, and if that failure continued we would go to a minimal risk condition.  The lane detection model similarly was supposed to spit out lane lines every few milliseconds and those lane lines should look fairly similar to the previous set (give or take a few radians).  If that failed for too long we would pull over.  In the two months before our unmanned-run every safety-driver disengagement was predicted by the diagnostic system.

That is to say, that if the safety driver hadn’t been in the vehicle we wouldn’t have crashed.  We would have come to a safe stop.

For some branches in the failure tree we didn’t like our odds.  For example – even if we successfully avoided the accident there was a measurable likelihood that someone would rear-end our truck while pulled over on the side of the road.  We could, however, mitigate that risk by having a safety driver in a follow-car who was rated at being able to get in and start the truck in under 60 seconds to decrease our exposure to that risk meaningfully.  

Doing an unmanned run is a matter of certainty – we needed to be statistically confident that we wouldn’t need a safety driver for the test that didn’t have one.  To stretch the parallel – we needed to be incredibly sure that we knew that our precautions would make us unlikely to catch Covid-19 if we walked through that ward as the first step towards a broader re-opening.

Our simple high level metric was the number of consecutive zero-disengagement runs we had completed.  A zero-disengagement run is a run where the safety driver wasn’t needed from the beginning of the test to the very end.  

When we did our first zero-disengagement run, back in Aug’17, it was a matter of luck.  We had been trying for 3 days nonstop and everything finally worked as planned.  That would have been the first time we could have taken the person out of the truck, but we would have truly been rolling the dice.  As a metric, consecutive zero-disengagement runs are useful because if you haven’t needed a safety driver for 1000 consecutive tests it’s highly likely that that you won’t on your 1,001th test and could therefore take the safety driver out.

You can then do additional work to lower that number of consecutive tests. By doing an incredible amount of documentation to understand how the system worked and making sure that it was safe as intended, by building rigorous diagnostics which allowed the system to know if it was failing, controlling the conditions we drove in and countless smaller mitigations; we were able to model out that 80 zero-disengagement runs in a row would indicate a 1 in a million chance of fatality accident were we to take the driver out on the 81st test.

On June 11th, 2019, at Starsky Robotics we completed our 80th consecutive zero-disengagement run.  On June 15th we completed our 141st.  And on the 16th, we took the person out and completed the first ever unmanned public highway test.

Which is to say, we walked through the Covid-19 ward and didn’t get infected, let alone died.  For us to healthily live full-time in the Covid-19 ward there would have been a fair amount more work.  Throughout the second half of last year we were in the process of ruggedizing our system to support full-time unmanned operations, we would have needed to drive on the pre-selected routes thousands more times, and probably would have found a whole lot more diagnostics to write.

Just like wide scale re-opening of the economy without mass Covid-19 death, it was possible for our approach to lead to the deployment of unmanned vehicles.  And someday, someone will do it.



READ NEWS SOURCE

Also Read  Infusing A Dose Of Human Driver Skepticism Into The AI Driving Systems Of Self-Driving Cars