Applying deep learning for a driver assessment module
Convolution Neural Networks (CNN) have proven themselves to be a very powerful identifier of road features in self-driving automotive technologies. A team of engineers trained CNN to detect road types and roadside features. With the help of various datasets used for such a training, the model teaches driving skills to an automobile similar to how a toddler learns how to walk.
This article describes how the CNN model can be trained and integrated within an existing Driver Assessor System. The essence of this model along with its improved version, VGGNet, are also described for greater understanding of this relatively unknown topic. The usage of VGGNet provides a huge boost in the accuracy of predictions specifically when in-cabin driver facing cameras are to be considered. As embedded systems, driver assessor systems come with limited computational capabilities. The engineering team undertook the mission to develop CNN using real world data against the model and existing Driver Assessor System and the consequent outcome is published in the conclusion section of this article. The paper also describes the model integration of CNN and training process optimization, while emphasizing the reusability of neural networks.
Periodic driver evaluation is a significant step in fleet management and service industries. Periodic evaluation with correct measures of feedback can improve a driver’s performance. With recent advances in the field of machine learning and combining its value with instrumentation, we can easily attach confident and precise data to a logged-in central computing system. Such a computer along with other modules, can be used as a combined tool to evaluate a driver’s performance. :
Figure 1. Traditional driver evaluation system (Source: Einfochips)
Figure 1 gives a simple description of the traditional driver evaluation system. A system like this can be used to take data from various sensors, provide it to an on-board computer (Low Grade) for storage and transfer it completely to an offline data processing unit. The offline data processing unit publishes the data to the supervisor when the driver comes in for evaluation.
Under traditional circumstances, the data supplied with a video have to be manually supervised along with data from sensors. But, we all know that manual supervision can be prone to error.
Given this scenario, a mechanism to intelligently abstract, classify and extract information would be greatly helpful along with help from OBD-II specifically. Here, we will discuss how one can fuse/combine two strands of information to produce a legitimate outcome for the driver as well as the supervisor.
What is different here?
Only sensor data may not be sufficient at the server end to determine a driver evaluation. Example: Let us take a real time scenario - a downstream slope, which always leads to continuous acceleration, for say, 8 to 10 seconds, beyond the posted limit. This will surely impact driver’s performance data to drive the vehicle. This is in reality a true-negative (false positive) situation.
We propose a system which can help determine upstream slope as well as downstream slope whereby true-negatives gets detected leading to improved assessment of the driver.
Figure 2. Trained neural network based driver evaluation system (Source: Einfochips)
As shown in the diagram above, the offline processing of the data block is replaced with intelligent processing of the data block. This processing of data is achieved with the help of trained neural networks. The CNN offloads the task of continuous observation of video from a human being to an intelligent machine. The Idea here is to classify the feature -- in this case upward and downward slopes – and use the information along with OBD-II data on a time scale to determine the performance of the driver.
NeuralTalk2 is used to classify the image. It uses a Convolution Neural Network mechanism to classify the image. For each detected feature it provides the confidence value.
Figure 3. General deep learning flow diagram (Source: Einfochips)
CNN works in the following manner:
Step 1: Filters, parameters and weights are assigned some random values. Weights are meant for fully connected network’s value and filters are number of convolution filter.
Step 2: Network takes the input image, starts forward propagation (Convolve, Non-linear activation, Spatial pooling, and finally enters Fully Connected (FC) layer), it also finds the probability of each feature -- refer to Convolution and Non linearity section along with pooling section in Figure 3.
Here fd takes as input a value xd and a parameter value wd and produces as output a result xd+1. While the type and sequence of functions is manual, the parameters w=(w1,…,wD) are learned from training .
Nonlinear Activation ReLU (Rectified Linear Unit):
f(x) = max(Stride)
Step 3: Calculate the total error.
Total Error = ∑ ½ (target Error – output Error)²
Step 4: Using back propagation to minimize the error. To do this, derive a gradient of error with respect to weight and use gradient descents to update filter values and parameters to minimize output error. A few things that change here are the filter values and the connection weight gets updated. This is carried out in a fully connected layer.
w = wi – ƞ * dL/dw
w = Weight
wi = Initial Weight
ƞ = Learning Rate
Since this application is concerned with the assessment of the driver, the scene or the scenario in which he/she is driving is of prime importance rather than the instrumentation reading alone. To understand the scene (let us visualize a road), we require a very good classifier. To overcome this, we used NeuralTalk2 which is internally using VGGNet. VGGNet is a convolution network that is best at identifying the location of the object. It is also very good at classifying the object itself. Input is an image and output is probability (Figure 4).
Figure 4. General VGGNet blocks (Source: Einfochips)
VGGNet achieves this ability by relying on one aspect. It believes that to recognize/classify an image, it is very useful to understand the depth of image. The more in-depth analysis is done, the more the accurately the image can be classified. In other words, the more the convolution, the better the classification of objects. In VGGNet, there are 16 convolution layers, which serve to boost the accuracy of prediction. CONV layer performs 3x3 convolution with stride as 1 and pad as 1. POOL layer performs 2x2 max pooling with stride as 2 and pad as 0.
A training set of inputs have been provided during the training phase to NeuralTalk2. We limited the training set of straight roads, upward slope, downward slopes, curves, left turns and right turns. This was ultimately helpful in training the model in a more efficient way as the number of inputs were limited. We did this since we wanted to achieve higher degree of accuracy while making our assessment.
The trained network/model can be generated which specifically looks out for features mentioned in the training set like upward slope, left turn, and right turn. This model can be ported to most x86 Linux computers.
- Input Sensors: OBD-II is a standard interface within the vehicle. The hardware is used to gather OBD-II’s engine rpm, heat, speed data from the end sensors and provides those data as input to compute module.
- Assessment Dashboard: Dashboard will give the first level analysis of the data. It will show ideal vs. original data graphs. When clicking on any point on the original data graph, it will show the complete details available in data set, at that point.