The article “Is Progress In Autonomous Technology Gated By Research In Animal Communication” explored the not very well understood “low level” perception and communication which helps humans navigate through the physical world. Humans (and animals) innately evaluate the physical future behaviour of other actors in the environment by looking at factors such as speed, direction, body positioning, and even gestures of other actors.
One of the key factors is through the observation of the other actor’s eyes — the point of the other actor’s focus. Simple examples which show the importance include a distracted driver on their cell phone driving straight towards you or the stereotypical professor absent mindedly walking towards a busy street. Both situations raise an alarm for the observer. Observed focus is used today to determine engagement in ADAS safety systems or pedestrian intent at intersections for Autonomous Vehicles.
Focus is an interesting concept. We all have an intuitive sense of it and the analogy most used is the concept of a camera. For cameras, the region in focus is clear and the regions outside are blurred. In human perception, focus means a concentration of senses and brain cycles such that fidelity of the objects under focus are at their most detailed. When something is not in focus, it is implicitly held in an abstract manner. That is, we know something is in that physical space, but we are not exactly sure of what it is. A simple example in the act of driving may be that the object ahead of us is recognized explicitly as a Tesla
In contrast, conventional artificial intelligence algorithms in autonomous vehicles such as Waymo and Tesla work towards a divergent model. They opt for a model more akin to an knowledge oracle where the autonomous vehicle observes its whole environment constantly and with full detail. In fact, this is a key part of the safety value statement which has been offered by autonomous vehicles for addressing the distracted driving problem.
The conventional methodology for autonomous vehicles is to train the artificial intelligence engines to recognize labeled objects with ever growing databases. Of course, recognizing an object in a pixel (or lidar point cloud or imaging radar) map in all of the potential orientations is a very difficult problem. Invariably, there is confusion caused by objects such as a van which has a person’s image wrapped on the outside or a pedestrian walking with a bicycle (Phoenix Uber Fatality). Similar to google image search, this “data-up” method of solving the problem has serious robustness issues because one is potentially always missing the next interesting training set.
In response, AV manufacturers have pushed out the training to the physical world. Tesla, in particular, touts the use of its nearly 1M fleet with Autodrive as a mechanism for reaching closure on this training task. There are three reasons that this approach may not be wise and ultimately may not work as currently constructed.
- Reactive Methodology: Fundamentally, the process is reactive in nature. That is, one records the world, has to build a process to analyze the data , do root analysis, and solve the problem. Each step has significant challenges and is expensive.
- Execution Velocity: Operating a real-world test bed has its advantages, but the real world moves very slowly. Most cars are parked most of the time (less than 5% utilization is the norm), so the number of driven miles for even a million cars is not very high.
- Sampling Bias: Perhaps the most important shortcoming is that of sampling bias. The Tesla road fleet is highly correlated to specific areas, so whatever validation is done is limited to those situations. What about the next interesting situation ? How does define a clear expectation for the next customer for where the AV will work ?
Given the robustness challenges of the current approach, it would seem using the ideas of focus and abstraction from the human world would be useful. In fact, Joel Pazhayampallil from a silicon valley startup called bluespace claims that there is utility in such an approach for autonomous vehicles. Joel was a cofounder of drive.ai and was intimately involved with the GM Super Cruise project. “With our next generation technology, we are able to detect and reason about objects without the need for detailed training sets. This removes many potential points of failure in the current AV stack and creates a generalizable software solution,” said Joel.
Will this approach work ? Time will tell. However, in the world of chess, the initial solutions were all based on raw data and computation, but eventually, the winning solutions used a combination of human insight and computing power. One gets the feeling that AV technology is on a similar technological arc. That is, it seems reasonable that one does not need to know the ultimate details of all objects to enable autonomous operations in an automobile, and combining higher level insights can add robustness while simultaneously lowering power and cost. Afterall, the human brain is able to drive while multitasking while spending only about 100 watts of power (a conventional laptop).
For those interested in this topic, you may enjoy “Sustainability, COVID-19, Elon Musk And A Tale Of The Upper And Lower Brain” or “How Safe Is Safe For An AV ? The Answer (Expectation And Communication)”