Applying deep learning for a driver assessment module -

Applying deep learning for a driver assessment module


Convolution Neural Networks (CNN) have proven themselves to be a very powerful identifier of road features in self-driving automotive technologies. A team of engineers trained CNN to detect road types and roadside features. With the help of various datasets used for such a training, the model teaches driving skills to an automobile similar to how a toddler learns how to walk.

This article describes how the CNN model can be trained and integrated within an existing Driver Assessor System. The essence of this model along with its improved version, VGGNet, are also described for greater understanding of this relatively unknown topic.  The usage of VGGNet provides a huge boost in the accuracy of predictions specifically when in-cabin driver facing cameras are to be considered. As embedded systems, driver assessor systems come with limited computational capabilities. The engineering team undertook the mission to develop CNN using real world data against the model and existing Driver Assessor System and the consequent outcome is published in the conclusion section of this article. The paper also describes the model integration of CNN and training process optimization, while emphasizing the reusability of neural networks.


Periodic driver evaluation is a significant step in fleet management and service industries. Periodic evaluation with correct measures of feedback can improve a driver’s performance. With recent advances in the field of machine learning and combining its value with instrumentation, we can easily attach confident and precise data to a logged-in central computing system. Such a computer along with other modules, can be used as a combined tool to evaluate a driver’s performance. :

Figure 1. Traditional driver evaluation system (Source: Einfochips)

Figure 1 gives a simple description of the traditional driver evaluation system. A system like this can be used to take data from various sensors, provide it to an on-board computer (Low Grade) for storage and transfer it completely to an offline data processing unit. The offline data processing unit publishes the data to the supervisor when the driver comes in for evaluation.

Under traditional circumstances, the data supplied with a video have to be manually supervised along with data from sensors. But, we all know that manual supervision can be prone to error.

Given this scenario, a mechanism to intelligently abstract, classify and extract information would be greatly helpful along with help from OBD-II specifically. Here, we will discuss how one can fuse/combine two strands of information to produce a legitimate outcome for the driver as well as the supervisor.

What is different here?

Only sensor data may not be sufficient at the server end to determine a driver evaluation. Example: Let us take a real time scenario – a downstream slope, which always leads to continuous acceleration, for say, 8 to 10 seconds, beyond the posted limit. This will surely impact driver’s performance data to drive the vehicle. This is in reality a true-negative (false positive) situation.

We propose a system which can help determine upstream slope as well as downstream slope whereby true-negatives gets detected leading to improved assessment of the driver.

Figure 2. Trained neural network based driver evaluation system (Source: Einfochips)

As shown in the diagram above, the offline processing of the data block is replaced with intelligent processing of the data block. This processing of data is achieved with the help of trained neural networks. The CNN offloads the task of continuous observation of video from a human being to an intelligent machine. The Idea here is to classify the feature — in this case upward and downward slopes – and use the information along with OBD-II data on a time scale to determine the performance of the driver.

Implementation Details

NeuralTalk2 is used to classify the image. It uses a Convolution Neural Network mechanism to classify the image. For each detected feature it provides the confidence value.

Figure 3. General deep learning flow diagram (Source: Einfochips)

CNN works in the following manner:

Step 1 : Filters, parameters and weights are assigned some random values. Weights are meant for fully connected network’s value and filters are number of convolution filter.

Step 2 : Network takes the input image, starts forward propagation (Convolve, Non-linear activation, Spatial pooling, and finally enters Fully Connected (FC) layer), it also finds the probability of each feature — refer to Convolution and Non linearity section along with pooling section in Figure 3.

Convolution: f(x)=fD(…f2(f1(x;w1);w2)…),wD)

Here fd takes as input a value xd and a parameter value wd and produces as output a result xd+1. While the type and sequence of functions is manual, the parameters w=(w1,…,wD) are learned from training .

Nonlinear Activation ReLU (Rectified Linear Unit):

f(x)=max(0, x)

Spatial Pooling:

f(x) = max(Stride)

Step 3 : Calculate the total error.

Total Error = ∑  ½ (target Error – output Error)²

Step 4 : Using back propagation to minimize the error. To do this, derive a gradient of error with respect to weight and use gradient descents to update filter values and parameters to minimize output error. A few things that change here are the filter values and the connection weight gets updated. This is carried out in a fully connected layer.

w = wi – ƞ * dL/dw

w  =  Weight

wi  =  Initial Weight

ƞ =   Learning Rate

Since this application is concerned with the assessment of the driver, the scene or the scenario in which he/she is driving is of prime importance rather than the instrumentation reading alone. To understand the scene (let us visualize a road), we require a very good classifier. To overcome this, we used NeuralTalk2 which is internally using VGGNet. VGGNet is a convolution network that is best at identifying the location of the object. It is also very good at classifying the object itself.  Input is an image and output is probability (Figure 4).

Figure 4. General VGGNet blocks (Source: Einfochips)

VGGNet achieves this ability by relying on one aspect. It believes that to recognize/classify an image, it is very useful to understand the depth of image. The more in-depth analysis is done, the more the accurately the image can be classified. In other words, the more the convolution, the better the classification of objects. In VGGNet, there are 16 convolution layers, which serve to boost the accuracy of prediction.  CONV layer performs 3×3 convolution with stride as 1 and pad as 1. POOL layer performs 2×2 max pooling with stride as 2 and pad as 0.

A training set of inputs have been provided during the training phase to NeuralTalk2. We limited the training set of straight roads, upward slope, downward slopes, curves, left turns and right turns.  This was ultimately helpful in training the model in a more efficient way as the number of inputs were limited. We did this since we wanted to achieve higher degree of accuracy while making our assessment.

The trained network/model can be generated which specifically looks out for features mentioned in the training set like upward slope, left turn, and right turn. This model can be ported to most x86 Linux computers.

  • Input Sensors : OBD-II is a standard interface within the vehicle. The hardware is used to gather OBD-II’s engine rpm, heat, speed data from the end sensors and provides those data as input to compute module.
  • Assessment Dashboard : Dashboard will give the first level analysis of the data. It will show ideal vs. original data graphs. When clicking on any point on the original data graph, it will show the complete details available in data set, at that point.


A front camera records the video from the front of vehicle. When driver is maneuvering the vehicle uphill, it is evident that he will apply more gas to maintain the thrust of the vehicle. This in-turn will be recorded on a storage device attached to the hardware along with video from the front camera.

This recorded data will be analyzed when the journey of the driver is over and when the vehicle is parked in the parking lot. Video and images will be analyzed through a computer, which already have a trained classification model built in it. Along with video other vehicle parameters like rpm, speed, heat etc. of the vehicle will also be analyzed. As computer is already trained to detect the upslope, it would automatically discard the data of increased rpm of vehicle which was not possible in a traditional setup. This mechanism would be highly useful in avoiding incorrect assessment of the driver.

Detection of Field Images

Figure 5. Detection of Straight path (Source: Einfochips)

Figure 6. Detection of Downward slope (Source: Einfochips)

Figure 7. Detection of Upward slope (Source: Einfochips)

The trained model was able to identify the straight and slopes with a high degree of accuracy.

Pseudo Code of Training Algorithm
Following is complete pseudo code/flow of training algorithm:

Pseudo Code of Testing Algorithm

Note: Probability values mentioned in pseudo code are hypothetical.


CNN when efficiently trained for various kinds of road types and features can lend itself as a powerful enabler of self-driving technologies in future. The model can also be integrated to the software platform to enhance the accuracy of existing driver assessor systems. This neural network can also be retrained against various datasets like object identification, object classification, snapshot identification and others. Thereby, it is efficient and reusable. We can further deduce that integration of CNN model with other software will provide more flexibility. Future work can be carried out by optimizing the overall training process of the model. Optimizing the low level library functions and frequently used functions can greatly improve the overall training process.


Hemang Bhimani is a senior engineer at eInfochips. He has over 4 years of experience in designing and debugging various software applications, including emerging technologies like Machine Learning and Computer Vision, Image/Video processing, Algorithm porting and optimization, Network security and Authentication, Platform migration, etc.
Ankur Devani is a senior engineer at eInfochips. He has done his Masters in Embedded Systems and has 4+ years of experience in designing and developing various applications of different domains like Machine Learning and Computer Vision, Image/Video processing, Networking, Algorithm porting and optimization, Platform migration, etc.
Samir Bhatt is a senior technical lead at eInfochips and has over a decade of experience in various domains of software programming like Video/Graphic processing, Automotive, Medical Instrumentation, Streaming, and Algorithm Optimizations, etc. He has been guiding a team of focused engineers to execute Neural Networks for many engineering solutions at eInfochips.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.