
CAREERS BLOG
CAREERS BLOG
2022/02/17
Devashish Shankar is a principal architect at Drishti.
I recently presented on behalf of Drishti at the Machine Learning Developers Summit (MLDS). Drishti is excited about machine learning (ML) and artificial intelligence (AI) developments and their implications as AI is becoming more prevalent globally and the use cases multiply. We are focused on improving our offerings for manufacturers around the world and applying our technology to help improve human capability within manufacturing. At the MLDS, I shared what we do, why it’s unique and what exciting new developments are on the horizon. Here’s a quick summary, but you can view the full session here.
Drishti is the pioneer in mining video streams with AI to create new data from the factory floor, a process we call action recognition. Manufacturers use that data to improve manual assembly lines and make sweeping improvements in quality and productivity in the factory.
What our technology does
At its core, Drishti measures cycles and specific actions in a video stream. A particular object entering the defined frame of reference of a camera is the starting marker, while exiting the frame signals the end of a cycle. Our technology then detects all of the prescribed actions that were supposed to occur and flags any deviations, so the manufacturer can investigate the anomalies. In manufacturing, the data provided is invaluable, saving time in training, increasing efficiency, reducing waste, improving quality and reducing risk in high value assemblies.
Beneath the simple exterior: A complex new tech
While the concept is quite simple and the benefits obvious, Drishti is making massive leaps in the space of AI to create this ultra-useful manufacturing software.
To detect an action accurately, our systems have to recognize objects in an x, y space and then add the element of time for a three-dimensional view. Considering that fast, accurate object recognition still isn’t a given, adding this third dimension is even more challenging. On a manual assembly line, Drishti’s system must account for things like:
variations in unit size
indeterminate locations
irregular trajectories
multiple units within a field of view
camera occlusion (when hands, heads and tools are blocking portions of the view)
variations of the line associates
lighting changes and background variation.
Our technology then has to do this for every frame in a live feed and make sense of it with only 2-3 seconds of latency for it to be helpful on the manufacturing floor in real time. We’ve pushed the boundaries of 3D convolution, in which we stack all the frames of a video on top of one another within a 3D grid to contextualize actions. Drishti was one of the pioneers of getting 3D convolution right, and we’ve since increased our capabilities in this area.
We’ve innovated the process to the point where we are doing temporal action localization, not just video classification (which is industry standard ). We’re also making leaps in our semi-supervised process, making our learning curve far less steep than it was initially, requiring as much as 4x less labeled data to train our systems in specific workflows and anomaly detections. What this means for our customers is shorter times for efficacy.
We’ve made a system that can detect anomalies within 2-3 seconds, which in itself is quite tricky. It’s difficult for all of the reasons mentioned above and because humans are pretty accurate in their work. Humans are full of visible variations in action, but generally are over 99% accurate in their work actions despite these variations. Good neural networks (NN) typically have a 95% precision rating in their detections. If the system is not trained correctly, we risk creating false positives for anomaly detection.
How Drishti ensures continuous improvement in its own tech
Drishti has reduced false positives and increased the accuracy of its systems through continuous improvement from its MLGym process. The customer engineers begin by helping to define what is the correct workflow and what would be considered an anomaly as best they can.
Raw video is fed into our MLGym, at which point our Drishti annotation service will take the video and define points of interest. The data is then sent back through MLGym and through ML training workflows and pipelines (as well as any required NN interfaces). The data is then sent one last time through MLGym. At this point, the trained model is deployed. Every day a selection of data and errors is sent for human verification and the few errors are corrected and sent back into the training data — meaning the system continuously becomes more accurate.
What’s next
We’ve created this special technology and workflow, and it has allowed us to help manufacturers improve operations while also continuously improving our own technology. As we continue to improve and hone our methods, we’re beginning to experiment with self-serve AI. One day soon, most of the process will be available at the click of a button.
At Drishti, we believe that the future of AI should be human focused, and to that end, we were excited to share these developments and look forward to the future of AI.
For more on humans and machines, visit our blog.