Developing Patience and Understanding with CNNs

My first attempt at creating a convolutional neural network equally tested me on my patience and understanding of how they work. On a machine with limited capabilities, I pursued creating a Convolutional Neural Network (CNN) that can identify an image of numerous traffic signs with high accuracy. This problem is a great way to learn and understand convolutional neural networks, as this problem and system is used within self driving cars. Self driving cars need to be able to detect traffic signs, correctly identify what the sign means, and act upon it to safely transport passengers to their destination. I retrieved a dataset of over 30,000 images of 43 different traffic signs, and created a CNN that could identify them with as high of accuracy as I could.

My appraoch to this problem is a common approach to a lot of CNN type problems. My first step was going to take all my training images and pre process them so that the only difference between all my images was the difference in the signs. Some images come in different sizes, color hues, and angles, and if we can process them all to get them in a sense on the same playing field, it will make the learning step of our CNN a little easier. The next step, which is one that is different amongst CNN approaches, was for me to store the information of each image into numpy arrays. With how the data was given to me, it would be very difficult to split them into training and validation sets for our CNN to learn and validate with doing this step. By converting the images into numerical arrays, I am able to easily split them into 80% training images, and 20% validation images with one line of code.

Given the computational limitations of my machine, I only used 10,000 of the 30,000 training images. This reduced the time it took my machine to run through each epochs in the learning phase. As far as building my CNN architecture, I wanted to make it a few convolutional layers deep. I started with 4 convolutional layers with every other layer having a pooling layer and dropout layer provided. This meant that after every 2 convolutional layers of learning, we pooled our features for computational simplicity, and then droped the least important information to reduce the likelihood of overfitting.

After running 30 epochs through my initial CNN, we actually created a model that was 97% accuracy with our validation images. For a first and simple network that took roughly 45 mins to learn (relatively slow machine), I thought that this initial model served its purpose in my understanding of how powerful a simple CNN can be in learning images. I moved on to using our initial CNN to test on our 10,000 testing images, and we ended up having an accuracy of about 94%.

In a real world setting, there are a couple of other steps I would have taken in further developing this CNN. First off, if I had the time and computing power, I would have deepened the CNN by adding more layers, as well as having layers with more neurons. I believe this would have increased our accuracy on both the validation and testing images. When it comes to traffic sign classifcation in a real world setting with self driving cars, we need this accuracy to be as high as possible to reduce the likelihood of accidents. The other step I would take would be to augment all my training images to different possible shades of colors, angles, and blurriness, so that we can test our model on as many possible scenarios a self driving car might capture an image of a traffic sign as possible. I did create an augmented dataset, and my model was taking over 30 minutes to run one epochs through our CNN.

I learned alot about how powerful a CNN can be in image classification, as well as how patiences can influence a machine learning engineer into carefully designing their CNN architectures. With more powerful machines to use, CNN’s tend to be the leader in industry in how machines can recognize and act upon images within artificial intelligence.