Helping Autonomous Vehicles Not Get Pulled Over By The Cops: Making a Road Sign Classifier

Rishi Mehta
6 min readNov 9, 2019

Being only 13-years-old I have to wait a couple of years before I can drive. My parents, on the other hand, know how to drive and do it on a daily basis. Me being the curious child I am, I wanted to know, what was something my parents feared when driving. Not getting in an accident but in terms of things that make them constantly worried.

To my surprise, it wasn’t what I expected. They said they probably fear not obeying traffic signs the most. That was when it hit me, if being able to obey traffic signs is essential for humans when driving, that means it’s essential for an autonomous vehicle. After all, if we want to achieve level 5 autonomy, AV’s have to do everything a human can do when driving.

So I got to work. I began first figuring out how it all worked. Then I translated that to code. The most important thing to remember was that there are 3 steps that a sample image must go through in order for it to be classified by the program. The Preprocessing, the Convolutional Neural Network and lastly, the Output.

Photo by Joshua Hoehne on Unsplash

Step 1: Preprocessing

Converting the image into greyscale

If I showed you 3 images and told you to extract information from them, it wouldn’t be that hard. But you know what would be easier, extracting that same amount of information from 1 image. Instead of analyzing the 3 channels of a coloured image(Red, Green, and Blue), the computer just analyzes one channel(White/Black).

This makes the process far more efficient and makes it simpler for the computer to understand.

Example of an image being converted to greyscale

Getting Equalized

The issue with having just a greyscale image is that images usually have a small range of pixel values. This implies that the shades of whiteness or blackness are very close together, there is low contrast, which makes it much harder for our algorithm to actually try and analyze the image.

When we equalize the image, it spreads out the value distribution, creating more contrast in the image.

Before the preprocessing:

Before

After the preprocessing

After

Now that the preprocessing has finally been completed, the computer can actually understand different parts of the image. The next step is to actually determine what sign our image is using a CNN.

Step 2: Convolutional Neural Network(CNN)

The goal of our CNN is to point out features that make up certain signs. For example, it might look for the letters STOP for it to be a stop sign or that it’s an octagon.

It’s similar to me trying to identify that a tree is a tree by subconsciously thinking, “There are branches and leaves, so it must be a tree.”

In our image below the CNN essentially says “Oh, almost every feature of the sample image matches a car, but it also might be a truck or bike. But I think there is a 95% chance it will be a car.” So the CNN would go with whatever has the highest chance of being correct by saying that it is a car

An example of the CNN process

How do we get those learned features?

In the image above, we can clearly see that all the magic happens in the middle, where it says learned features. The middle is where the Neural Network actually identified that this is a car by using several different parameters.

So how does it do it? The answer is through kernels and strides. Think of strides as a way of moving the kernel through the image and the kernel as the way of observing features in our image.

Example of kernels and strides working together to analyze an image

In each kernel, it checks how closely certain parts of the image are to what the kernel is looking for. There are different kernels for different features, each kernel having its very own feature map.

These features maps are then stacked on top of each. This makes it so that after each map gets data on the image for their respective features, they join together to give an output.

Overfitting

There is one issue, overfitting. This is where the model starts to memorize the images in the dataset. It might seem like a good thing at first but the catch is that it reduces accuracy for images that it has never seen before.

My model learns through the trial and error process. Images in the training dataset are run through the model, and the model provides a prediction as to what image it thinks it is.

Adding a dataset does speed up the process significantly, but there are signs that the dataset didn’t account for. If the prediction is incorrect, the model adjusts its parameters, which are the weights and bias values.

More time passes as the model becomes more and more accurate, but with too many trials, the model will begin to memorize the training data and then overfit.

Step 3: The Prediction

The last part of the process is for the algorithm to actually give an output. As mentioned previously, the Convolutional Neural Network uses kernels and strides to identify features of an image that it takes from a video stream.

After all of that, the algorithm would ideally have a pretty good idea of which sign the image is by choosing whichever sign has the highest percentage chance of being correct.

It outputs the predicting sign by attaching it to whatever class it thinks it is.

The program outputs class 34, which in my dataset means the turn left data sign

So yeah, that’s how I build my very own traffic sign classifier using a convolutional neural network.

Final Thoughts

Similar to when I made my lane detection algorithm, I found that the more I failed the more I got frustrated but worked harder to make it work.

For me, the difference was this time instead of using anger as a tool for motivation I thought about the future. I envisioned me completing it and celebrating it. That’s what really kept me going in the end.

It reminded me of when I first joined The Knowledge Society(TKS). In the applications, I said that competition drives me because I wanted to be the best. But now I have an entirely different perspective.

My new motivation for working hard is thinking about the future and envisioning a world where I will be a major contributor. What I’ve recently been thinking about is:

You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future.

Steve Jobs

--

--

Rishi Mehta

17 y/o working on building a fall detection system for seniors | fallyx.com