STO-Activities: (no title)

Activity title: Exploiting Reinforcement Learning to Achieve Decision Advantage
Activity Reference: SAS-181
Panel: SAS
Security Classification: PUBLIC RELEASE
Status: Active
Activity type: RTG
Start date: 2023-02-14T00:00:00Z
Actual End date: 2026-02-14T00:00:00Z
Keywords: Artificial intelligence, Decision policy, Decision support, Modelling Simulation, Optimization, Reinforcement learning, Sequential Decision Making, Strategy
Background: Many decisions are not made in isolation. Rather they follow a more cyclical pattern—decisions are made; new information that was previously uncertain is observed; further decisions are made using this new information; more new information arrives; and so on. If these decisions, known as sequential decision making under uncertainty, do not account for the relationship between them then the outcomes achieved may be neither efficient nor effective. While Dynamic Programming provides an elegant framework to find optimal decision policies for these types of problems, its usefulness is limited in many real-world applications due to the curses of dimensionality and modelling. In recent years, the field of Reinforcement Learning (RL) (known as Approximate Dynamic Programming (ADP) in operations research)—which aims to overcome these curses and falls under the umbrella of Artificial Intelligence (AI)—has produced highly effective decision policies. Although there are many examples, perhaps the best known may be DeepMind’s reporting in 2018 of AlphaZero achieving superhuman performance in the games of chess, shogi, and Go (Silver et al. (2018), A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science, 362, 1140-1144).
To date, the overwhelming majority of military-focused research on applying RL / ADP has centred on autonomous agents, such as those employed in unmanned aerial vehicles, communication networks, and training simulators. However, recent reviews have highlighted that:
(i) there is “an increasing role for AI techniques in high-level command and control, strategic planning and decision making” (Moy et al. (2020), Recent Advances in Artificial Intelligence and their impact on Defence, DST-Group-TR-3716; p. 18);
(ii) while the majority of ADP-based decision support applications have tended to focus on force employment, a wide variety of potential applications within force development and generation exist (Rempel and Cai (2021), A review of approximate dynamic programming applications within military operations research, Operations Research Perspectives, 8, 100204); and
(iii) there is a desire by several nations, such as the United Kingdom, “to apply artificial intelligence techniques to military wargames to allow them to deliver outputs more quickly, cheaply, and/or of higher quality” (Goodman et al. (2020), AI and Wargaming, https://arxiv.org/abs/2009.08922; p. 4).
Collectively these reviews suggest that RL / ADP has the potential to provide a significant advantage to decision makers across a broad spectrum of defence and security domains when used to support decision making. The purpose of this Research Task Group (RTG) is to investigate how RL / ADP may be best applied to provide highly effective and reliable support to defence and security decision making within NATO, its member Nations, and its partners.
Objectives: The objective of this RTG is to investigate how RL / ADP may be best applied to support defence and security decision making within NATO, its member Nations, and partners. In order to achieve this goal, this RTG will undertake four tasks:
(i) classify, compare, and contrast existing national efforts and those in the open literature that focus on using RL / ADP in the defence and security domain to support decision making across a range of factors, such as:
a. problem characteristics (e.g., single- vs multi-domain, single- vs multi-objective, stationary vs non-stationary environment, symmetric vs non-symmetric, reward type, etc.)
b. approaches to modelling, uncertainty quantification, decision policies, algorithms, benchmarking; and
c. human-machine interactions, trust and bias, explainability, contextualization, computing resources, and data.
(ii) based on the results of the review:
a. identify trends and gaps in the literature;
b. map the characteristics of the problems studied, the abstractions selected, the RL / ADP methods employed, and the degree of decision advantage achieved to create a framework whose aim is to guide those designing and developing future RL / ADP applications in support of defence and security decision making;
(iii) based on (i) and (ii), identify lessons learned and best practices with respect to both the art and science of using RL / ADP within the defence and security domain; and
(iv) develop recommendations, such as: designing a framework to identify the readiness, impact, and risks of RL / ADP applications; potential novel RL / ADP applications in support of decision making; collaboration opportunities; and follow-on STO activities.
Topics: With a particular focus on supporting defence and security decision making, the scientific topics to be covered are:
RL / ADP – informed by existing literature reviews (e.g., Rempel and Cai, 2021), develop a structured approach to classify, contrast, and compare RL / ADP applications;
Frameworks – develop a novel framework whose aim is to guide those designing and developing future RL / ADP applications; and
Lessons learned – identify lessons learned and best practices with the aim to accelerate the adoption of RL / ADP.
Contact:
Open2Partners:
Title:

Created at 01/12/2022 18:00 by System Account
Last modified at 16/05/2024 10:00 by System Account
 
Go back to list
Home(NATO STO)