2018 has seen great advances in computer vision capabilities. The accuracy of object detection and facial recognition continues to improve, and the number of readily available options based on state-of-the-art deep learning technologies including convolutional and recurrent neural networks continues to increase. The improvements come at a cost – an increase in the complexity and processing requirements of the technologies. YOLOV3 for example, a popular object recognition model, has a 106 layer fully convolutional underlying architecture, more than doubling from the previous version. Other models, such as RetinaNet and SSD variants are also showing huge strides in accuracy, but again at the cost of increased complexity and reduced performance.
Keeping Up with New Demands
While the complexity and computational requirements of advanced computer vision technology increase, there is a demand for applying these technologies against growing numbers of high-resolution live video streams. The numbers of video surveillance cameras is increasing at a dramatic rate, along with the expectation that they provide proactive intelligence. A passive video system is no longer enough. Cameras, quite simply, need to be a lot smarter.
The reality of rolling out advanced machine learning technologies requires a new way of thinking about implementations. Streaming full-resolution video to the cloud for processing is prohibitively expensive, requires too much bandwidth, and introduces high latency. Putting large numbers of high-powered servers on-site has its own sets of issues, requiring large amounts of precious space and power, and can be cost prohibitive when trying to roll out across large numbers of cameras. It also does not address the realities of dealing with multi-location environments that become increasingly important to make use of the data. Processing live video from 1 or 2 cameras is one thing. Processing video from hundreds of cameras in real-time across one or more locations, often with limited available resources, requires us to think entirely differently.
The Solution: Video at the Edge
The answer lies at the edge. Putting the intelligence at the edge allows the workload to be distributed across many devices. This can mean either embedding stronger processing capabilities into the camera itself or adding highly efficient edge appliances that sit between cameras and the cloud. To enable this edge processing companies are beginning to release fast, power efficient specialized AI processors. Nvidia has launched several modules in their Jetson series for performing real-time inference in embedded devices and Intel, through its acquisition of Movidius, offers their Myriad series processors and neural compute stick. The last few years have also seen a huge amount of investor funding going to a new generation of chip companies offering low-cost, high performance deep learning processing capabilities. Companies such as Mythic, Graphcore and others have received 100’s of millions of dollars in venture funding. Recently even Google and Amazon announced their own edge processing chips. This is an amazing acknowledgement by two pure play cloud companies of the importance of processing machine learning at the edge.
What’s to Come
Edge-based processing will enable an entirely new kind of real-time intelligence. What are currently passive video recorders will soon be watching for kids at risk of drowning in a swimming pool, detecting weapons near a school or opening doors for employees without a key. They will look for defects in manufacturing lines, find workers who aren’t donning safety equipment, and learn how people move around in a retail environment to optimize flow and reduce wait times. Cameras will finally provide real-time actionable data. We will see huge improvements in our ability to increase security, manufacturing reliability, in-store shopper satisfaction and safety.
With over 1 billion cameras in deployment and the next billion ready to be deployed, edge processing offers the potential to finally make them smart.
Already, companies can find video intelligence service providers such as Kogniz able to provide capabilities that can identify people and patterns in real time. With the Kogniz approach, the service leverages edge-based appliances including standalone cameras and adaptors for existing IP cameras, allowing on-demand deployment with minimal infrastructure. The Kogniz solution works with an unlimited number of cameras and across any number of locations.
Jed Putterman serves as the Co-CEO of Kogniz. Mr. Putterman has started several technology companies including Snapcentric, acquired by VeriSign, and Allerez, acquired by Mercury Interactive Corporation. Mr. Putterman started his career at Oracle Corporation and spent many years as a consultant for large companies including Sun Microsystems, SGI and Aspect Communications. He graduated from the University of California, Berkeley.