
BLOG
BLOG
2021/11/04 | Time to read: 4 min
Ananya H.A. is a principal architect of neural networks at Drishti.
Have you ever watched a task performed over and over for hours? It doesn’t take long before you get distracted, mesmerized, zone out, or focus on the other things going on around you. Now imagine trying to observe and count hundreds of hours of assembly activity on a factory line to make sure that the nuts and the screws are tightened right or that your worker followed steps A to D in the right sequence.
Real-time assembly line observation is difficult, tedious, and time-consuming. And for more than 100 years, though manual observation and measurement have been the technique of choice, it has been clear that the process is inadequate and innately flawed. However, Drishti has developed a breakthrough with an efficient technology that can help manufacturers measure productivity, efficiency and accuracy in manual assembly.
At Drishti, we specialize in video analytics and deep learning inventions. We use state-of-the-art research in deep learning and computer vision to apply machine intelligence across manual assembly lines. With our video analytics, manufacturers are able to identify quality issues, increase throughput, reduce waste, improve workplace safety, enhance traceability, foster standardized work adherence, fulfill regulatory requirements, and more. Drishti’s solutions bring business intelligence out of the board room and into real time operational settings by providing live, actionable insights from hundreds of hours of footage straight from the assembly line.
Drishti's offerings continue to drive the boundary of video analytics advancements in manufacturing. We are exploring all areas of computer vision and deep learning. The applications are many, vastly untapped, and pervasive. We use state-of-the-art research in deep learning and computer vision to apply machine intelligence across manual assembly lines. The job at hand can almost feel like "Mission Impossible," but our top research team has been building these systems for years. Although we are not at liberty to discuss some of the exact details of the proprietary algorithms invented, we will provide a broad overview of the solutions that we've developed at Drishti in this blog.
Deep learning inventions from Drishti
At Drishti, we don't work with off-the-shelf models, we have our own proprietary model architectures, they're all invented in-house at Drishti.
Cycle detection using almost unsupervised anchored variational autoencoders (AVAE) (patent pending)
One of the not-so-cool things about machine learning is that you have to generate a lot of manual training data. We have an in-house invention (patent pending), which uses a technology called anchored variational autoencoders which can considerably reduce the amount of labeled data needed. Similar experiments are underway at Google; however, we have managed to significantly improve the output results in comparison, and our paper can extend their research and improve the accuracy considerably. Our architecture is designed to generate high-quality results without requiring much input data. We can launch models with very little data training. We can achieve these results with little supervision, so you don’t need to spend a significant amount of time or money on human expertise.
Action detection using cutting edge 3D convolution
We use a 3D convolutional neural network structure for action detection. It's a very flexible paradigm, which is good for a lot of applications. For example, if I show you a still photo of a half-open door, would you be able to tell me whether the door is closing or opening? But if I show you a video of successive frames of half-open, semi-open, quarter-open doors, you will be easily able to tell me whether the door is opening or closing because you're able to see the movement over time. This is fundamentally the reason why a 3D sequence inherently leads to higher accuracy. And using this technology we have been able to come up with state-of-the-art action detection technology with the greatest accuracy.
3D spatio-temporal neural networks
Structure from motion is one of the most fascinating techniques in applications of computer vision. Given a set of video snippets (over multiple frames), we are able to construct a 3D representation of the scene. In this work, we use a novel approach based on spatio-temporal modeling to solve the problem. We formulate the problem as an optimization problem based on estimating the scene's intrinsic parameters integrating clues within the frame and across frames based on motion cues and geometric constraints.
Extension of computer vision/people/cycle/action detection to other avenues like Smart Cities
Today, we are largely focused on industrial manufacturing. However, the paradigms we have invented — observing manual work and developing analytics — are very easily extendable to other industries. For example, there is a company that is planning to install cameras on buses and trains so they can anonymously count the number of people boarding and deboarding at each station. It’s not as easy as it sounds: people could be standing shoulder to shoulder, making it hard for the neural network to count, and there could be perspective distortion.
In a very similar vein, one can perform queue length detection at kiosks. For example, let's say you are a mobile phone outlet, and you can install a camera to collect detailed analysis of the time taken by associates to resolve each customer’s query, allowing for better planning of resources. Other examples include compliance detection, criminal activity detection from CCTVs, etc. Cameras can also be installed to keep a constant check on inventory. The variety of industries that can leverage our technology is endless.
Want to know more about the pioneering AI/deep learning work we do at Drishti? Visit our careers page and drop us a line.