A paper published by researchers at Carnegie Mellon University, San Francisco research firm OpenAI, Facebook AI Research, the University of California at Berkeley, and Shanghai Jiao Tong University describes a paradigm that scales up multi-agent reinforcement learning, where AI models learn by having agents interact within an environment such that the agent population increases in size over time. By maintaining sets of agents in each training stage and performing mix-and-match and fine-tuning steps over these sets, the coauthors say the paradigm — Evolutionary Population Curriculum — is able to promote agents with the best adaptability to the next stage.
In computer science, evolutionary computation is the family of algorithms for global optimization inspired by biological evolution. Instead of following explicit mathematical gradients, these models generate variants, test them, and retain the top performers. They’ve shown promise in early work by OpenAI, Google, Uber, and others, but they’re somewhat tough to prototype because there’s a dearth of tools targeting evolutionary algorithms and natural evolution strategies (NES).
As the coauthors explain, Evolutionary Population Curriculum allows the scaling up of agents exponentially. The core idea is to divide the learning procedure into multiple stages with an increasing number of agents in the environment, so that the agents first learn to interact in simpler scenarios with fewer agents and then leverage these experiences to adapt to more agents.
Evolutionary Population Curriculum introduces new agents by directly cloning existing ones from the previous stage, but it incorporates techniques to ensure that only agents with the best adaptation abilities move onto the next stage as the population is scaled up. Crossover, mutation, and selection is performed among sets of agents in each stage in parallel so that the influence on overall training time is minimized.
The researchers experimented on three challenging environments: a predator-prey-style Grassland game, a mixed cooperative and competitive Adversarial Battle game, and a fully cooperative Food Collection game. They report that the agent “significantly” improved over baselines in terms of performance and training stability, indicating that Evolutionary Population Curriculum is general and can potentially benefit scaling other algorithms.
“Most real-world problems involve interactions between multiple agents and the problem becomes significantly harder when there exist complex cooperation and competition among agents,” wrote the coauthors. “We hope that learning with a large population of agents can also lead to the emergence of swarm intelligence in environments with simple rules in the future.”
If indeed Evolutionary Population Curriculum is an effective way of isolating the best algorithms for various target tasks, it could help to automate the most laborious bits of AI model engineering. According to an Algorithmia study, 50% of companies spend between 8 and 90 days deploying a single AI model.
The code is available in open source on GitHub.