What is AV perception anyway?

I’ve been researching and thinking a lot about perception for autonomous
these days. And I’ve been doing quite a bit of educating people on
perception. Here is a long, over-due, but short, 101 on visual sensing for cars.

Stop now if you’re already an expert.

Autonomous vehicles logically require certain things in order to function:

  1. The vehicle knowing where it is in time and space, sometimes called
  2. Knowing where it wants to go and the route to get there
    – often called path finding
  3. Seeing and recognizing objects – known
    as object recognition
  4. Understanding and predicting how a recognized
    object will behave – perception
  5. Making a decision about perception,
    like changing the route

There are many technologies that can be involved and work together to make
vehicles autonomous. And there is, as of yet, no single best way.
Perception is particularly challenging. The vehicle’s control systems and
actuators physically make the car move, forward, backwards, and turn. But
knowing when to perform which action in a particular situation requires

Google defines perception as “the ability to see, hear, or become aware of
something through the senses.” And “a way of regarding, understanding, or
interpreting something; a mental impression.”

I have taken to describing perception as the vehicles ability to recognize
an object, have a very good idea how the object will behave, and then make
a decision about what to do.

Vehicle perception is obviously predominately aided by sight since
vehicles have yet to be able to respond accurately to smell or sounds,
although understanding sounds will clearly help. (Just yesterday, I was
turning right on Alma and looking at the traffic coming from my left, when
a cyclist came up riding on the sidewalk in the wrong direction. Luckily,
he had enough sense to know that when breaking many rules, use your voice!
He shouted at me to look up at him, which I was very grateful for. In my
urge to break into traffic, I would have taken a cursory glance right
before gunning it, but may not have responded in time if he rode out in
front of me. Since I had my windows rolled down, I was able to thank him
for being vocal. I can see a future where vehicles will have sound
sensors to aid in their decision making as well, but for now, it seems
researchers are focusing on making cars see well.

Ultimately, the vehicle needs to be able to detect objects, figure out its
own relative position to objects and know what spaces it can occupy,
basically free and open space in which it can fit (occupancy grid), track
objects (they tend to move), know the lanes on the road in which it is
legally allowed to move (lane segmentation), and know where it is in time
and space (self-localization.)

The core technologies helping a vehicle see and some of their advantages
and disadvantages are below:

Lidar chart
Cameras and LiDar can do many functions including object detection,
occupancy grid, object tracking, lane segmentation, and self-localization.
Radar can help with object detection, object tracking, and lane
segmentation. Ultra-sonic can detect objects and obstacles only.

[Commercial Break] Companies like DeepScale have created software to help
sensors, such as cameras, which tend to be less expensive and more robust
than LiDAR, improve their ability to see. DeepScale’s software also helps
suites of sensors eliminate errors when sensor data is conflicting. Their
software improves hardware and aids in perception.

Object recognition, a critical aspect for perception, is done with
software. This is where the (sometime controversial) machine learning or
deep learning comes into play. Object recognition is hard and tedious,
especially if done primarily by humans. A friend told me about a mobile
app that gamifies object detection. Instead of playing flappy birds while
waiting for you Starbuck mocha whipped skinny Frappuccino, you can play
this game and identify objects for the machines. Supposedly, this
goodwill crowdsourcing will help everyone – a social good and part of the
on-going edification of snowflakes. The controversial part arises from
deep, machine learning. Some believe that deep learning is an unknown
process and the murky steps along the way produce unacceptable errors as
interim steps to the right answer that are simply not acceptable when
deciding where a 3,200 lbs projectile places itself in time and space. I
am not sure I agree or disagree. It’s like only believing Elon Musk’s
opinion about artificial intelligence. My jury is still out. Anyhoo, the
point about perception in autonomous vehicles is, it is hard and it is
vital to do it well. It is important that the vehicle understand the
world it is operating in and make good decisions. The autonomous vehicle
will have to become aware.

Personally, I’m a believer that autonomous vehicles are not only a good
idea, but an idea whose time has come.

We cannot continue on the path we
are on. Too many people are dying and being injured, too much of our
scare resources are being consumed by automobiles, and we cannot build
enough roads or enough cars to address the world’s mobility needs. We
simply have to try something else.

I think the perception problem can be solved for vehicles in the next
several years and will continue to improve rapidly over time. I’m ready
to get into an AV and begin changing the world.