|Automated Scene Understanding for Battlefield Awareness|
|Sensors & Electronics Technology|
Autonomy, Data Fusion, Deep Learning, Image Processing, Information Sensing
The state-of-the-art in digital video information extraction and exploitation has advanced rapidly over the past decade. It has been largely driven by the work of commercial industry as vendors seek to understand the vast amounts of video and image data that is traversing the internet. For these vendors, understanding the content and context of this data is the key to delivering targeted advertising and new services, such as self-driving cars. The key to achieving that understanding has been the simultaneous growth of deep learning techniques and the large sets of annotated data widely available thanks to social media. While NATO partners have collaborated on certain specific portions of the problem of information extraction, such as human detection and activity recognition, the general task of converting sensor data into actionable information remains unsolved. The successes of industry have not yet been leveraged to address the military problem because relevant military data needed for training models does not exist in sufficient quantity and is typically not available or of interest to industry leaders. In order to transition these algorithms to military systems, representative data sets will need to be developed with multi-modal sensors, and scenarios and targets that are relevant to the military. Furthermore, commercially available algorithms should be studied and joint algorithm evaluations should be performed to better understand which approaches best address NATO partner requirements and which areas are most in need of additional research. Data collection, standardization and annotation efforts would both benefit from joint NATO cooperation, as would assessment and evaluation of the vibrant commercial industry in video content exploitation.
- Develop a common, multi-modal, annotated set of military-relevant sensor data and metadata consisting of a combination of existing data contributed by partner nations, simulated data and data from a joint data collection.
- Develop a common set of data labels and annotation methodology defining objects and activities of interest.
- Develop a common set of evaluation procedures and metrics for comparing algorithm performance in the areas of target detection, segmentation, tracking, classification and activity recognition.
- Compare performance of artificial intelligence approaches to object detection, object segmentation and activity recognition, addressing military scenarios and operating conditions (noise, low contrast, compression, degraded visual environment, etc.) where representative training data is currently limited. Considered approaches include transfer learning (domain adaptation), one/few shot learning, and simulated/augmented training data.
- Compare strategies for multi-view and multi-modal data exploitation leveraging artificial intelligence techniques.
Deliverables will be:
- Common set of labels and annotation methodology to be used by all participants in compilation of data sets.
- Annotated data set, using the proposed set of labels and the methodology.
- Common set of evaluation metrics and evaluation procedures.
- Final report with summary of compiled data set, known gaps remaining in data set, algorithm evaluation results and recommendations on research strategies going forward.
- Cooperative Demonstration of Technology (CDT) as part of this RTG or a follow-on.
- Algorithms for: automated/aided detection, classification, segmentation and tracking of objects of interest; activity recognition; and situational awareness / understanding, with a focus on EO/IR image processing.
- Understand how modern algorithms, such as deep learning, perform at these tasks in demanding environments including but not limited to cluttered urban environments, considering both tactical and Intelligence, Surveillance and Reconnaissance (ISR) missions, from both air and ground.