When Machines Think Ahead: The Rise of Strategic AI

STRATEGIC AI

Exploring the advancements in strategic AI and how large language models fit into the bigger picture

Image generated by the author using Canva Magic Studio

Prologue

11. May 1997, New York City.

It was a beautiful spring day in New York City. The skies were clear, and temperatures were climbing toward 20 degrees Celsius. The Yankees prepared to play the Kansas City Royals at Yankee Stadium, and the Rangers were facing off against the Devils at Madison Square Garden. Nothing seemed out of the ordinary, yet the people gathering at the Equitable Center in Midtown Manhattan were about to experience something truly unique. They were about to witness the historic event when a computer, for the first time, would beat a reigning world champion in chess under standard tournament conditions.

Representing humans was Gary Kasparov, widely recognized as the world’s top chess player at the time. And representing the machines, Deep Blue — a chess computer developed by IBM. Going into the final and 6th game of the match, both players had 2.5 points. It was today that the winner was to be decided.

Gary started out as black, but made an early error and faced a strong, aggressive attack from Deep Blue. After just 19 moves it was all over. Kasparov, feeling demoralized and under pressure, resigned, believing his position was untenable. A symbolic, and by many hailed as one of the most important moments between man and machine was a fact. This landmark event marked a turning point in AI development, highlighting the potential — and challenges — of strategic AI.

Introduction

Inspired by the recent advancements in generative AI — and my own experiments with large language models and their strategic capabilities — I have increasingly been thinking about strategic AI. How have we tried to approach this topic in the past? What are the challenges and what remains to be solved before we have a more generalist strategic AI agent?

As data scientists, we are increasingly implementing AI solutions for our clients and employers. For society at large, the ever-increasing interaction with AI makes it critical to understand the development of AI and especially strategic AI. Once we have autonomous agents with the ability to maneuver well in strategic contexts, this will have profound implications for everyone.

But what exactly do we mean when we say strategic AI? At its core, strategic AI involves machines making decisions that not only consider potential actions, but also anticipate and influence the responses of others. It’s about maximizing expected outcomes in complex, uncertain environments.

In this article, we’ll define strategic AI, explore what it is and how it has developed through the years since IBM’s Deep Blue beat Kasparov in 1997. We will try to understand the general architecture of some of the models, and in addition also examine how large language models (LLMs) fit into the picture. By understanding these trends and developments, we can better prepare for a world where autonomous AI agents are integrated into society.

Defining Strategic AI

A deeper discussion around strategic AI starts with a well-formulated definition of the topic.

When we consider strategy in a commercial setting, we often tend to associate it with topics like long-term thinking, resource allocation and optimization, a holistic understanding of inter-dependences in an organization, alignment of decisions with the purpose and mission of the company and so on. While these topics are useful to consider, I often prefer a more game theoretical definition of strategy when dealing with AI and autonomous agents. In this case we define being strategic as:

Choosing a course of action that maximizes your expected payoff by considering not just your own potential actions but also how others will respond to those actions and how your decisions impact the overall dynamics of the environment.

The critical part of this definition is that strategic choices are choices that do not occur in a vacuum, but rather in the context of other participants, be they humans, organizations or other AIs. These other entities can have similar or conflicting goals of their own and may also try to act strategically to further their own interests.

Also, strategic choices always seek to maximize expected payoffs, whether those payoffs are in terms of money, utility, or other measures of value. If we wanted to incorporate the more traditional “commercial” topics related to strategy we could imagine that we want to maximize the value of a company 10 years from now. In this case, to formulate a good strategy we would need to take a “long term” view, and might also consider the “purpose and mission” of the company as well, to ensure alignment with the strategy. However, pursuing these exercises are merely a consequence of what it actually means to act strategically.

The game-theoretic view of strategy captures the essence of strategic decision-making and consequently lets us clearly define what we mean by strategic AI. From the definition we see that if an AI system or agent is to act strategically, it needs to have a few core capabilities. Specifically, it will need to be able to:

Model other agents (using predictive techniques or probabilistic reasoning; those agents being anything from humans, AIs or organizations).
Optimize actions based on expected utility.
Adapt dynamically as they gather new information about other agents’ strategies.

There is currently no well-known, or well published system, that is capable of all of these actions in an autonomous way in the real world. However, given the recent advances in AI systems and the rapid rise of LLMs that might be about to change!

Other Important Concepts from Game Theory

Before we proceed with further discussion into strategic AI, it might be useful to review some concepts and ideas from game theory. A lot of the work that has been done around strategic AI has a foundation in game theoretic concepts and using theorems from game theory can show the existence of certain properties that make some games and situations easier to deal with than others. It also helps to highlight some of the shortcomings of game theory when it comes to real world situations and highlights where we might be better off looking in other directions for inspiration.

What is a Game?

We define a game as a mathematical model comprising three key components:

Players: The individuals or entities making decisions.
Strategies: The possible actions or plans each player can adopt.
Payoffs: The rewards or outcomes each player receives based on the chosen strategies.

This formal structure allows for the systematic study of strategic interactions and decision-making processes.

Finite vs Infinite Games

When speaking on games it also makes sense to look at the distinction between finite and infinite games.

Finite games have a fixed set of players, defined rules, and a clear endpoint. The objective is to win, and examples include chess, go, checkers, and most traditional board games.

Infinite games on the other hand have no predetermined endpoint, and the rules can evolve over time. The objective is not to win but to continue playing. Real-world scenarios like business competition or societal evolution can be viewed as infinite games. The Cold War can be viewed as an example of an infinite game. It was a prolonged geopolitical struggle between the United States and its allies (the West) and the Soviet Union and its allies (the East). The conflict had no fixed endpoint, and the strategies and “rules” evolved over time.

Subgames

Sometimes we might be able to find smaller games within a larger game context. Mathematically, subgames are self-contained games in their own right, and the need to satisfy a few different criteria:

A subgame starts at a point where the player knows exactly where they are in the game.
It includes every possible action and outcome that could follow from that point.
It encompasses all the players’ knowledge and uncertainties relevant to those actions.

We can visualize a subgame if we imagine a large tree representing an entire game. A subgame is like selecting a branch of this tree starting from a certain point (node) and including everything that extends from it, while also ensuring that any uncertainties are fully represented within this branch.

The core idea behind a subgame makes it useful for our discussion around strategic AI. The reason is primarily that some infinite games between players might be very complex and difficult to model while if we choose to look at smaller games within that game, we can have more success applying game theoretical analysis.

Coming back to our example with the Cold War as an infinite game, we can recognize several subgames within that context. Some examples include:

The Cuban Missile Crisis (1962):

Players: The United States and the Soviet Union.
Strategies: The U.S. considered options ranging from diplomatic negotiations to military invasion, while the Soviet Union had to decide whether to remove the missiles or escalate the confrontation.
Payoffs: Avoiding nuclear war, maintaining global image, and strategic military positioning.

The Berlin Blockade and Airlift (1948–1949):

Players: The Western Allies and the Soviet Union.
Strategies: The Soviets blocked Berlin to push the Allies out, while the Allies had to decide between abandoning the city or supplying it via air.
Payoffs: Control over Berlin, demonstrating political resolve, and influencing European alignment.

Although of course very difficult and complex to deal with, both “subgames” are easier to analyze and develop responses to than to the whole of the Cold War. They had a defined set of players, with a limited set of strategies and payoffs, and also a clearer time frame. This made them both more applicable for game theoretical analysis.

In the context of strategic AI, analyzing these sub-games is crucial for developing intelligent systems capable of making optimal decisions in complex, dynamic environments.

Two Player Games

Two player games are simply a game between two players. This could for example be a game between two chess players, or coming back to our Cold War example, the West vs the East. Having only two players in the game simplifies the analysis but still captures essential competitive or cooperative dynamics. Many of the results in game theory are based around two player games.

Zero-Sum Games

Zero-sum games are a subset of games where one player’s gain is another player’s loss. The total payoff remains constant, and the players are in direct competition.

Nash Equilibrium and Optimal Actions

A Nash Equilibrium (NE) is a set of strategies where no player can gain additional benefit by unilaterally changing their own strategy, assuming the other players keep theirs unchanged. In this state, each player’s strategy is the best response to the strategies of the others, leading to a stable outcome where no player has an incentive to deviate.

For example, in the game Rock-Paper-Scissor (RPS), the NE is the state where all players play rock, paper and scissors, randomly, each with equal probability. If you as a player choose to play the NE strategy, you ensure that no other player can exploit your play and in a two player zero-sum games it can be shown that you will not lose in expectation, and that the worst you can do is break even.

However, playing a NE strategy might not always be the optimal strategy, especially if your opponent is playing in a predictably sub-optimal way. Consider a scenario with two players, A and B. If player B starts playing paper more, player A could recognize this and increase its frequency of playing scissors. However, this deviation from A could again be exploited by B again which could change and play more rock.

Key Takeaways Regarding Strategic AI

Reviewing the game theoretic concepts, it would seem the idea of a subgame is especially useful for strategic AI. The ability to find possible smaller and easier to analyze games within a larger context makes it easier to apply already know solutions and solvers.

For example, let’s say you are working on developing your career, something which could be classified as an infinite game and difficult to “solve”, but suddenly you get the opportunity to negotiate a new contract. This negotiation process presents an opportunity for a subgame within your career and would be much more approachable for a strategic AI using game theoretic concepts.

Indeed, humans have been creating subgames within our lives for thousands of years. About 1500 years ago in India, we created the origins of what is now known as chess. Chess turned out to be quite a challenge for AI to beat, but also allowed us to start developing more mature tools and techniques that could be used for even more complicated and difficult strategic situations.

A Short History of Strategic AI in Games

Games have provided an amazing proving ground for developing strategic AI. The closed nature of games makes it easier to train models and develop solution techniques than in open ended systems. Games are clearly defined; the players are known and so are the payoffs. One of the biggest and earliest milestones was Deep Blue, the machine that beat the world champion in chess.

Early Milestones: Deep Blue

Deep Blue was a chess-playing supercomputer developed by IBM in the 1990s. As stated in the prologue, it made history in May 1997 by defeating the reigning world chess champion, Garry Kasparov, in a six-game match. Deep Blue utilized specialized hardware and algorithms capable of evaluating 200 million chess positions per second. It combined brute-force search techniques with heuristic evaluation functions, enabling it to search deeper into potential move sequences than any previous system. What made Deep Blue special was its ability to process vast numbers of positions quickly, effectively handling the combinatorial complexity of chess and marking a significant milestone in artificial intelligence.

However, as Gary Kasparov notes in his interview with Lex Fridman¹, Deep Blue was more of a brute force machine than anything else, so it’s perhaps hard to qualify it as any type of intelligence. The core of the search is basically just trial and error. And speaking of errors, it makes significantly less errors than humans, and according to Kasparov this is one of the features which made it hard to beat.

Advancements in Complex Games: AlphaGo

19 years after the Deep Blue victory in chess, a team from Google’s DeepMind produced another model that would contribute to a special moment in the history of AI. In 2016, AlphaGo became the first AI model to defeat a world champion go player, Lee Sedol.

Go is a very old board game with origins in Asia, known for its deep complexity and vast number of possible positions, far exceeding those in chess. AlphaGo combined deep neural networks with Monte Carlo tree search, allowing it to evaluate positions and plan moves effectively. The more time AlphaGo was given at inference, the better it performs.

The AI trained on a dataset of human expert games and improved further through self-play. What made AlphaGo special was its ability to handle the complexity of Go, utilizing advanced machine learning techniques to achieve superhuman performance in a domain previously thought to be resistant to AI mastery.

One could argue AlphaGo exhibits more intelligence than Deep Blue, given its exceptional ability to deeply evaluate board states and select moves. Move 37 from its 2016 game against Lee Sedol is a classic example. For those acquainted with Go, it was a shoulder hit at the fifth line and initially baffled commentators, including Lee Sedol himself. But as would later become clear, the move was a brilliant play and showcased how AlphaGo would explore strategies that human players might overlook and disregard.

Combining Chess and Go: AlphaZero

One year later, Google DeepMind made headlines again. This time, they took many of the learnings from AlphaGo and created AlphaZero, which was more of a general-purpose AI system that mastered chess, as well as Go and shogi. The researchers were able to build the AI solely through self-play and reinforcement learning without prior human knowledge or data. Unlike traditional chess engines that rely on handcrafted evaluation functions and extensive opening libraries, AlphaZero used deep neural networks and a novel algorithm combining Monte Carlo tree search with self-learning.

The system started with only the basic rules and learned optimal strategies by playing millions of games against itself. What made AlphaZero special was its ability to discover creative and efficient strategies, showcasing a new paradigm in AI that leverages self-learning over human-engineered knowledge.

Integrating Speed and Strategy: Star Craft II

Continuing its domination in the AI space, the Google DeepMind team changed its focus to a highly popular computer game, StarCraft II. In 2019 they developed an AI called AlphaStar² which was able to achieve Grandmaster level play and rank higher than 99.8% of human players on the competitive leaderboard.

StarCraft II is a real time strategy game that provided several novel challenges for the team at DeepMind. The goal of the game is to conquer the opposing player or players, by gathering resources, constructing buildings and amassing armies that can defeat the opponent. The main challenges in this game arise from the enormous action space that needs to be considered, the real-time decision making, partial observability due to fog of war and the need for long-term strategic planning, as some games can last for hours.

By building on some of the techniques developed for previous AIs, like reinforcement learning through self-play and deep neural networks, the team was able to make a unique game engine. Firstly, they trained a neural net using supervised learning and human play. Then, they used that to seed another algorithm that could play against itself in a multi-agent game framework. The DeepMind team created a virtual league where the agents could explore strategies against each other and where the dominant strategies would be rewarded. Ultimately, they combined the strategies from the league into a super strategy that could be effective against many different opponents and strategies. In their own words³:

The final AlphaStar agent consists of the components of the Nash distribution of the league — in other words, the most effective mixture of strategies that have been discovered — that run on a single desktop GPU.

Deep Dive into Pluribus and Poker

I love playing poker, and when I was living and studying in Trondheim, we used to have a weekly cash game which could get quite intense! One of the last milestones to be eclipsed by strategic AI was in the game of poker. Specifically, in one of the most popular forms of poker, 6-player no-limit Texas hold’em. In this game we use a regular deck of cards with 52 cards, and the play follows the following structure:

The Preflop: All players are given 2 cards (hole cards) which only they themselves know the value of.
The Flop: 3 cards are drawn and laid face up so that all players can see them.
The Turn: Another card is drawn and laid face up.
The River: A final 5th card is drawn and laid face up.

The players can use the cards on the table and the two cards on their hand to assemble a 5-card poker hand. For each round of the game, the players take turns placing bets, and the game can end at any of the rounds if one player places a bet that no one else is willing to call.

Though reasonably simple to learn, one only needs to know the hierarchy of the various poker hands, this game proved to be very difficult to solve with AI, despite ongoing efforts for several decades.

There are multiple factors contributing to the difficulty of solving poker. Firstly, we have the issue of hidden information, because you don’t know which cards the other players have. Secondly, we have a multiplayer setup with many players, with each extra player increasing the number of possible interactions and strategies exponentially. Thirdly, we have the no-limit betting rules, which allow for a complex betting structure where one player can suddenly decide to bet his entire stack. Fourth, we have an enormous game tree complexity due to the combinations of hole cards, community cards, and betting sequences. In addition, we also have complexity due to the stochastic nature of the cards, the potential for bluffing and the opponent modelling!

It was only in 2019 that a couple of researchers, Noam Brown and Tuomas Sandholm, finally cracked the code. In a paper published in Science, they describe a novel poker AI — Pluribus — that managed to beat the best players in the world in 6-player no-limit Texas hold’em.⁴ They conducted two different experiments, each consisting of a 10000 poker hands, and both experiments clearly showed the dominance of Pluribus.

In the first experiment, Pluribus played against 5 human opponents, achieving an average win rate of 48 mbb/game, with a standard deviation of 25 mbb/game. (mbb/game stands for milli big blind per game, how many big blinds is won per 1000 games played.) 48 mbb/game is considered a very high win rate, especially among elite poker players, and implies that Pluribus is stronger than the human opponents.

In the second experiment, the researchers had 5 versions of Pluribus play against 1 human. They set up the experiment so that 2 different humans would each play 5000 hands each against the 5 machines. Pluribus ended up beating the humans by an average of 32 mbb/game with a standard error of 15 mbb/game, again showing its strategic superiority.

The dominance of Pluribus is quite amazing, especially given all the complexities the researchers had to overcome. Brown and Sandholm came up with several smart strategies that helped Pluribus to become superhuman and computationally much more efficient than previous top poker AIs. Some of their techniques include:

The use of two different algorithms for evaluating moves. They would first use a so called “blueprint strategy” which was created by having the program play against itself using a method called Monte Carlo counterfactual regret minimization. This blueprint strategy would be used in the first round of betting, but in subsequent betting rounds, Pluribus conducts a real-time search to find a better more granular strategy.
To make its real-time search algorithm be more computationally efficient, they would use a dept-limited search and evaluate 4 different possible strategies that the opponents might choose to play. Firstly, they would evaluate each strategy for 2 moves ahead. In addition, they would only evaluate four different strategies for the opponents, including the original blueprint strategy, a blueprint strategy biased towards folding, a blueprint strategy biased towards calling and a final blueprint strategy biased towards raising.
They also used various abstraction techniques to reduce the number of possible game states. For example, because a 9 high straight is fundamentally similar to a 8 high straight these can be viewed in a similar way.
Pluribus would discretize the continuous betting space into a limited set of buckets, making it easier to consider and evaluate various betting sizes.
In addition, Pluribus also balances its strategy in way that for any given hand it is playing, it would also consider other possible hands it could have in that situation and evaluate how it would play those hands, so that the final play would be balanced and thus harder to counter.

There are quite a few interesting observations to draw from Pluribus, but perhaps the most interesting is that it doesn’t vary its play against different opponents, but instead has developed a robust strategy that is effective against a wide variety of players. Since a lot of poker players think they have to adjust their play to various situations and people, Pluribus shows us that this is not needed and probably not even optimal, given how it beat all the humans it played against.

In our short foray into game theory, we noted that if you play the NE strategy in two-player zero-sum games you are guaranteed not to lose in expectation. However, for a multiplayer game like 6-player poker there is no such guarantee. Noam Brown speculates⁵ that it is perhaps the adversarial nature of a game like poker which still makes it suitable to try to approach it with a NE strategy. Conversely, in a game like Risk where players can cooperate more, pursuing a NE strategy is not guaranteed to work, because, if you are playing a risk game with 6 people, there is nothing you can do if your 5 opponents decide to gang up on you and kill you.

Evaluating the Trend in Strategic AI

Summarizing the history of strategic AI in games, we see a clear trend emerging. The games are slowly but surely becoming closer to the real-world strategic situations that humans find themselves in on an everyday basis.

Firstly, we are moving from a two-player to a multiplayer setting. This can be seen from the initial success in two-player games to multiplayer games like 6-player poker. Secondly, we are seeing an increase in the mastery of games with hidden information. Thirdly we are also seeing an increase in mastery of games with more stochastic elements.

Hidden information, multiplayer settings and stochastic events are the norm rather than the exception in strategic interactions among humans, so mastering these complexities is key in achieving a more general superhuman strategic AI that can navigate in the real world.

Large Language Models and Strategic AI

I recently ran an experiment where I let LLMs play the boardgame Risk against each other. My objective with the experiment was to gauge how well the LLMs could perform in a strategic setting, more of less out of the box. Quite a lot of detailed prompting were given to the agents to provide the right context, however, and perhaps not surprisingly, the LLM performance was rather mediocre.

You can find an article about the experiment here:

Exploring the Strategic Capabilities of LLMs in a Risk Game Setting

Summarizing some of the key findings from the experiment, the current generation of LLMs struggles with basic strategic concepts like fortification and recognizing winning moves. They also fail to eliminate other players when it would have been strategically beneficial for them to do so.

The above experiment indicates that even though we have seen a rapid improvement in the LLMs, they still lack the sophistication for strategic reasoning. Given their very general training data and how they have been constructed this shouldn’t come as a surprise.

So how do they fit into the discussion around strategic AI? To understand that, we need to understand what the LLMs really excel at. Perhaps the most promising feature of the LLMs is their ability to digest and generate vast amounts of text. And now with multimodal models, video and audio too. In other words, LLMs are great for interacting with the real world, both in human and other contexts. Recently, an AI team at Meta was able to combine the general language capabilities of a language model with the strategic insights of a strategy engine.

Case Study: Cicero and Diplomacy

The game of Diplomacy is a 2 to 7-player strategy game, which Meta describes as a mix between Risk, Poker and the TV show Survivor. The players start out with a map of Europe ca. 1900, and the objective is to gain control over a majority of supply centers. Specifically, a player aims to control 18 out of 34 supply centers to achieve victory. By doing so, a player effectively dominates the map, representing their nation’s ascendancy over Europe in the period leading up to World War I.

What sets Diplomacy apart from many of the other games we have discussed so far is its reliance on negotiations between players. It’s a much more cooperative form of play than for example poker. Each player uses natural language to communicate with the other players before each turn, and they make plans to ally with each other. When the preparations are finished all players reveal their plans at the same time and the turn is executed. This type of game obviously resembles actual diplomacy and real-life negotiations closer than most other boardgames, however because of the natural language component, it has been very difficult for AI to master.

This changed in 2022, when the AI team at Meta developed Cicero. Using the latest advancements in language modelling, combined with a strategic module, Cicero was a game engine that was able to achieve more than “double the average score of the human players and ranked in the top 10% of participants who played more than one game.”⁶ As Meta describes it, their model is able to produce a strategy-grounded dialogue and generate a dialogue aware-strategy.

Differences Between Cicero and Other Strategic AI Models

There are a few key differences between Diplomacy and some of the other games where we have had recent strategic AI advancements. Most notably is the cooperative nature of the game — compared to the adversarial nature of the other games — and the open-ended natural language format it uses. I would argue that these differences makes the game more like real human interaction, however it also places restrictions on how the researches could train the algorithms that power Cicero.

Unlike Pluribus and AlphaZero, Cicero is not primarily trained through self-play and reinforcement learning. Instead, the Meta team used a data set with over 125,000 games and 40,000,000 messages to help train the algorithm. They thought that given the negotiating, persuading and trust-building aspects of the game, they might see strange behavior if they let the AI negotiate with itself through self-play, and that it might not capture the essence of human interaction. Quoting their research article:

“…we found that a self-play algorithm that achieved superhuman performance in 2p0s versions of the game performed poorly in games with multiple human players owing to learning a policy inconsistent with the norms and expectations of potential human allies.”

However, reinforcement learning was used to train part of the strategy engine, specifically it was used to train Cicero’s value function — which it needs to predict the utility of its actions. The researchers used a modified version of behavioral cloning, piKL, which seeks to maximize the expected utility from an action and at the same time minimize the divergence from human behavior.⁶ Simply put, they wanted the model to be able to find strategically sound actions while at the same time staying close to human actions.

The above features of Diplomacy highlight some important issues related to creating a strategic AI that can operate in a real-world human setting, and need to be taken into consideration when we evaluate how strategic AI will evolve moving forward.

The Future of Strategic AI

Predicting the future is always tricky, however, one approach can be to use the current trends and extrapolate into future scenarios. Below, we investigate a few topics that closely relate to our previous discussion and evaluate how they can influence the future of strategic AI.

General Symbolic Strategy Engines vs. Specialized Modules

If we examine the trajectory of strategic AI engines so far, one thing that strikes us is how specialized each game engine is. Even though the architectures can be similar — like with AlphaZero learning how to play multiple different games — the AI still plays millions of games with itself for each specific game. For chess, AlphaZero played 44 million games and for Go 130 million games!⁷ A natural question to ask is whether we should try to build more general strategy engines or continue to focus on specialized modules for specific tasks?

A general strategy engine would aim to understand and apply broad strategic principles across different situations. Perhaps by creating games that capture many aspects of human strategic interaction, AI could learn through play against itself and develop strategies that apply to real-world scenarios. This approach could help AI generalize its learning, making it useful in various contexts.

On the other hand, specialized modules are AI systems designed for particular scenarios or tasks. We could envision that we could create a general strategic AI by combining multiple specialized agents. AI agents could be trained to excel in each specific area, providing deep expertise where it’s most needed. While this method might limit the AI’s ability to generalize, it ensures high performance in specific domains, which can lead to practical applications more quickly.

Given the issues with using AI for self-play in cooperative settings — as we observed with Diplomacy — and the current trend which seems to favor specialized modules for different strategic situations, it seems likely that for the near future we will have specialized strategic modules for different contexts. However, one could also envision a mixed system where we used general strategy engines to provide insights into broader topics, while specialized modules handle complex, specific challenges. This balance could allow AI systems to apply general strategic insights while adapting to the details of particular situations.

LLMs Bridging the Gap Between Strategic Modules and Real-World Applications

Large language models have changed how AI interacts with human language, offering a powerful way to connect strategic AI modules with real-world use cases. LLMs are great at understanding and generating human-like text, making them ideal as an intermediary that can translate real-world situations into structured data that strategy engines can process. As seen with Meta’s Cicero, combining LLMs with strategic reasoning allowed the AI to understand human communication, negotiate, and plan actions in collaborative environments.

Given the current trend towards more multimodal models, the LLMs are also increasingly able to translate not just text, but any real-world context into a machine readable syntax. This makes the models even more useful as intermediaries.

If we build on the ideas developed for Cicero, we could also envision fine-tuning different language models for specific tasks — like diplomatic communication — perhaps by fine tuning the models on historic diplomatic correspondence and then training separate strategy engines to come up with optimal actions.

Human-AI Collaboration: The Centaur Model

The future of strategic AI isn’t just about machines taking over decision-making; for a transition period it’s also about humans and AI working together effectively. This partnership is often called the “Centaur Model,” combining human intuition with AI’s computing power. In this model, humans bring creativity, ethical judgment, and flexibility, while AI systems offer powerful data processing and consistent application of strategic principles.

Real-world examples of this model include areas where human-AI teams outperform either humans or machines working alone. In chess, for example, Garry Kasparov promoted the idea of teaming up with AI, combining human strategic insight with AI’s precise calculations. The centaur model seemed to work well in chess until the programs started to become really good. At that point the human contribution wasn’t worth anything and was in the worst case detrimental.

However, in other areas that are more open-ended and real-world-like than chess, the centaur model is probably a good bet going forward. Simply consider how human collaboration with modern LLMs has the potential to drastically improve productivity.

This collaborative approach improves decision-making by combining human judgment with AI analysis, possibly leading to more informed and balanced outcomes. It allows for quick adaptation to new and unexpected situations, as humans can adjust strategies in real-time with AI support.

Real-World Applications Beyond Games

Games have been a great testing ground for developing strategic AI, but the real impact comes from applying these advancements to real-world challenges. Below we highlight a few examples.

One field that has seen tremendous development in the last few years is self-driving cars, and how they use strategic AI to navigate roads safely. They must predict and respond to the actions of other drivers, pedestrians, and cyclists. For example, an autonomous vehicle needs to anticipate if a pedestrian is about to cross the street or if another driver is about to change lanes unexpectedly.

Just this year, Waymo — a company that develops autonomous vehicles and ride-hailing services — started using fully autonomous taxis in three US cities: Phoenix, Arizona, and California’s Los Angeles and San Francisco. In the coming years we can probably expect to see a massive rise in fully autonomous vehicles due to the improvements in strategic AI.

In the financial markets, AI-driven trading systems analyze enormous amounts of data to make investment decisions. These systems consider the likely actions of other market participants, such as traders and institutions, to anticipate market movements. They use strategic reasoning to execute trades that maximize returns while minimizing risks, often in highly volatile environments.

AI systems also optimize supply chains by considering the actions of suppliers, competitors, and customers. They can strategically adjust production schedules, inventory levels, and logistics based on anticipated demand and competitor behavior. For example, if a competitor is expected to launch a new product, the AI can recommend increasing stock levels to meet potential increases in demand.

Strategic AI is also used to manage energy distribution efficiently. Smart grids employ AI to predict consumption patterns and adjust supply accordingly. They consider how consumers might change their usage in response to pricing signals or environmental factors. The AI strategically allocates resources to balance load, prevent outages, and integrate renewable energy sources.

The examples above clearly show how strategic AI is being integrated into various industries and fields. By considering the actions of others, these AI systems make informed decisions that optimize outcomes, enhance efficiency, and often provide a competitive advantage. As strategic AI continues to improve so will these systems, and we will likely see their emergence in many other domains as well.

Conclusion

Strategic AI has come a long way since Deep Blue’s victory over Garry Kasparov. From mastering complex board games to engaging in human-like negotiations, AI systems are increasingly exhibiting strategic reasoning abilities.

In this article we investigated the foundational concepts of strategic AI, emphasizing the importance of game theory and how some of the concepts from the field can be applied to strategic AI. We also looked at how specialized AI systems have achieved superhuman performance in specific games by focusing on narrow domains and extensive self-play. This raises the question of whether the future of strategic AI lies in developing general symbolic strategy engines capable of broader application or continuing with specialized modules tailored to specific tasks.

As we saw with Cicero, language models will also likely have a future in the space of strategic AI. The new models from providers like OpenAI, Anthropic and Meta make it easier than ever before to integrate these tools into autonomous agents that can use them to translate the real-world into structured data that AI systems can process.

However, the journey toward a general-purpose strategic AI that can navigate the complexities of the real world is just beginning. Challenges remain in developing systems that can generalize across domains, adapt to unforeseen situations, and integrate ethical considerations into their decision-making processes.

Thanks for reading!

If you enjoyed reading this article and would like to access more content from me please feel free to

SUBSCRIBE TO MY NEWSLETTER

or connect with me on LinkedIn at https://www.linkedin.com/in/hans-christian-ekne-1760a259/ or visit my webpage at https://www.ekneconsulting.com/ to explore some of the services I offer. Don’t hesitate to reach out via email at [email protected]

References

Lex Fridman. (2019, October 27). Garry Kasparov: Chess, Deep Blue, AI, and Putin | Lex Fridman Podcast #46 [Video File]. Youtube. https://youtu.be/8RVa0THWUWw?si=1ErCnwlAn4myoK9W
Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
https://deepmind.google/discover/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/
Brown et al. (2019, August 30). Superhuman AI for multiplayer poker. Science 365, 885–890, (2019). https://www.science.org/doi/epdf/10.1126/science.aay2400
Lex Fridman. (2022, December 6). Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344 [Video File]. Youtube. https://youtu.be/2oHH4aClJQs?si=AvE_Esb42GNGIPRG
Meta Fundamental AI Research Diplomacy Team (FAIR)† et al., Human-level play in the game of Diplomacy by combining language models with strategic reasoning.Science378,1067 1074(2022).DOI:10.1126/science.ade9097, https://noambrown.github.io/papers/22-Science-Diplomacy-TR.pdf
David Silver et al. , A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.Science362,1140–1144(2018).DOI:10.1126/science.aar6404 https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphazero-shedding-new-light-on-chess-shogi-and-go/alphazero_preprint.pdf

When Machines Think Ahead: The Rise of Strategic AI was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Originally appeared here:
When Machines Think Ahead: The Rise of Strategic AI

Go Here to Read this Fast! When Machines Think Ahead: The Rise of Strategic AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.