Keywords: edge AI, onboard computer vision, efficient computer vision, novel deep neural networks
Summary:Battlefield success is critically tied to the quality of the available common operating picture, where quality is measured by the timeliness, detail, and accuracy of the picture. We propose a system to automatically generate localized real-time battlefield mapping and movement tracking. The common operating picture consists of the current battlefield understanding held by command derived from a variety of intelligence sources. Our system utilizes currently available on turret rotating cameras and computer vision to detect, recognize, and track the battlefield environment. Computer vision applications have seen recent successes in the automated detection and recognition of visual patterns. These applications are implemented in an extremely wide range of industries from cancer cell detection in medical imaging, inventory management, manufacturing, security, infrastructure maintenance, etc. All of these applications have the commonality of defining a list of known objects, learning the most identifiable characteristics of those objects, and recognizing those objects in new scenarios. The differentiation in these applications comes in the definitions of those objects. For cancer detection, the learned objects are the considered cell types. For inventory management, the learned objects are the items tracked in the inventory process. We propose to extend these current works towards battlefield situational awareness by leveraging the proven success recently seen in computer vision visual recognition with higher reasoned post process analyses. The two primary improvements we propose pertain to locally persistent object identities and the aggregated battlefield state. One of the current shortcomings in visual recognition techniques is the ability to distinguish individual objects from a group of similar objects. This limits the ability to track individual movements of a group and therefore the ability for detailed scene understanding. Our proposed system implements state-of-the-art object description generation for the re-detection task. After an object is detected the collected object imagery is compressed into an object description and added to a roster of identified objects. As the scene progresses through time future object detections are compared and associated with the roster of identified objects generating a map of individual object movements. The use of object description generation and re-association is far from a novel concept. Where our advantage lies is in the combination of few-shot learning and multi-modal Bayesian estimation in this process. The addition of few-shot learning and multi-modal Bayesian estimation primarily provides improved flexibility of the system. Traditionally deep neural networks (DNN) are trained from a curated dataset. Few-shot learning looks to reduce the training dataset requirements through the adaptation of prior knowledge to the new scenario. For example, let’s say we have trained a DNN for recognition in the sunny cornfields of Ohio but we want to apply our DNN to a scenario in Arizona. One option would be to collect and process a comprehensive dataset specifically for Arizona, or with few-shot learning we adapt the Ohio trained network on a few example images from Arizona. To summarize few-shot learning improves flexibility to new scenarios.