RSS feed Add to your Facebook page Watch us on Youtube

Activity title

Content-Based Multi-media Analytics (CBMA)

Activity Reference

IST-144 (AI2S)


Information Systems Technology

Security Classification




Activity type


Start date


End date



Analytics, decision support, deep learning, situational awarness


Content-based information multi-media retrieval and analytics, (Content Based Multi-media Analytics (CBMA)), is a means to allow military experts to exploit data from multiple sources in a rapid fashion for sense making, decision support and knowledge generation. Elements of CBMA include contextual understanding of complex events through computational/human processing techniques, event prediction through the automated extraction of spatio-temporal features, hidden clusters, network structures and resource flows, and the use of machine learning and processing for automated translation, parsing, information extraction, & summarization of unstructured and semi-structured data from multiple streams. These types of complex analyses cannot be done in isolation. NATO and coalition military leaders, commanders, and intelligence analysts need interoperable tools that cross-cue knowledge obtained from one method to generate taskings in another. This requires a focus on building the cross-cued solution from advances in multi-media data analytics (e.g. text/image/audio). Results will significantly improve NATO abilities to generate knowledge from extremely large stores of text, imagery, and video caches to speed situational awareness and decision making.


The main objective is to take forward research into the defence and security gaps identified by the exploratory team (IST-ET-086). This team carried out and reported on their review of this field regarding the research and development of theoretical and algorithmic tools supporting joint exploitation of multimedia data sources. A research task group is proposed to ameliorate the gaps starting with a study of three multi-media types, image, video and text. A common framework will be developed to allow sharing of data, algorithms, and machine learning tool outputs to detect and classify events. This investigation will include identifying real world scenarios, and datasets, where two or more multi-media types co-occur, to design features that best lead to efficient extraction of actionable information. Technical enablers that will be exploited include the following. First, in the area of video analytics, the focus of automation technologies has been on providing a foundation for activity identification. Our ability to retrieve video based on what is inside the clip is extremely limited. The ability to automatically identify and mark-up activities would greatly enhance a decision maker’s ability to take full advantage of the real strength of video, its recording of change over time. Furthermore such developments could enable high performance ‘frame-rate’ computations and analysis of ever-increasing volumes of streamed video data. Second, a huge improvement for video analytics is expected when contextual information can be successfully incorporated, such that the algorithm knows when and where to focus on a subset of the scene and actors. This would result in more accurate predictions, because a large part can be ignored by the algorithm and it can focus on a few candidate persons and/or part of the scene. Third, Deep Learning (DL) will be considered for applications across image processing, video processing, and text analytics. DL is a dominating domain of study for pattern recognition and understanding and uses multi-layered convolutional neural networks to derive the underlying meaning of a data source, whether it is an image, text or audio stream. DL shows great promise in advancing text and video analytics by its ability derive higher level understanding. Fourth, research is needed to advance DL techniques by improved understanding of the theoretical underpinnings of the mathematical foundations of current algorithms and developing new algorithms to better exploit advances in high performance computing. Technical areas include hardware for DL (including neuromorphic chips and low power, high performance Field Programmable Gate Arrays (FPGAs), new optimisation methods, training with fewer examples, and distributed learning across systems. Finally the research will consider at a conceptual level the engineering and integration of these developments so as to progress the architectural design of the demonstration framework that can be realised to show the progress and performance of the research developments. As with any high risk collaborative research, the research plans and implementation will be reviewed and modified to progress those areas that show the most promise.


1. Capture and indexing of motion imagery: further investigate intelligent capturing and initial processing by sensor systems, to include initial video indexing and key frame information produced in audio and metadata entries. 2. Exploit imagery indexing through hierarchical methods using semantic identifiers and human evaluations of exploitation results. 3. Explore motion-based index generation to generate rapid and robust retrieval of context. Types of motion include background motion of static structures related with sensor flight, background motion generated by normal patterns such as traffic flow, an explosion and after effects at a location, etc. 4. Expand the Deep Learning approach for semantic video analytics through a semantic hierarchy of full motion video. Long term impact is the provision of optimal semantic information to users in rapid fashion while adapting to dynamically varying computational resources. 5. Explore the mechanisms by which text analysis results can be used to drive/exploit video and imagery indexing and retrieval. 6. Explore frameworks for optimizing multi-media analytics via systems engineering and architectural design concepts.

Contact Panel Office