Chuyển tới nội dung
Home » Openai Dota 2 5V5 | Comparisons With Other Game Ai Systems[Edit]

Openai Dota 2 5V5 | Comparisons With Other Game Ai Systems[Edit]

OG vs OpenAI FIVE - AI vs HUMANS - TI8 CHAMPIONS vs BOTS FINAL DOTA 2

Comparisons with other game AI systems[edit]

Prior to OpenAI Five, other AI versus human experiments and systems have been successfully used before, such as Jeopardy! with Watson, chess with Deep Blue, and Go with AlphaGo.[22][23][24] In comparison with other games that have used AI systems to play against human players, Dota 2 differs as explained below:[19]

Long run view: The bots run at 30 frames per second for an average match time of 45 minutes, which results in 80,000 ticks per game. OpenAI Five observes every fourth frame, generating 20,000 moves. By comparison, chess usually ends before 40 moves, while Go ends before 150 moves.

Partially observed state of the game: Players and their allies can only see the map directly around them. The rest of it is covered in a fog of war which hides enemies units and their movements. Thus, playing Dota 2 requires making inferences based on this incomplete data, as well as predicting what their opponent could be doing at the same time. By comparison, Chess and Go are “full-information games”, as they do not hide elements from the opposing player.[25]

Continuous action space: Each playable character in a Dota 2 game, known as a hero, can take dozens of actions that target either another unit or a position. The OpenAI Five developers allow the space into 170,000 possible actions per hero. Without counting the perpetual aspects of the game, there are an average of ~1,000 valid actions each tick. By comparison, the average number of actions in chess is 35 and 250 in Go.

Continuous observation space: Dota 2 is played on a large map with ten heroes, five on each team, along with dozens of buildings and non-player character (NPC) units. The OpenAI system observes the state of a game through developers’ bot API, as 20,000 numbers that constitute all information a human is allowed to get access to. A chess board is represented as about 70 lists, whereas a Go board has about 400 enumerations.

Coordination

OpenAI Five does not contain an explicit communication channel between the heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed “team spirit”. Team spirit ranges from 0 to 1, putting a weight on how much each of OpenAI Five’s heroes should care about its individual reward function versus the average of the team’s reward functions. We anneal its value from 0 to 1 over training.

OG vs OpenAI FIVE - AI vs HUMANS - TI8 CHAMPIONS vs BOTS FINAL DOTA 2
OG vs OpenAI FIVE – AI vs HUMANS – TI8 CHAMPIONS vs BOTS FINAL DOTA 2

Exploration

Given a learning algorithm capable of handling long horizons, we still need to explore the environment. Even with our restrictions, there are hundreds of items, dozens of buildings, spells, and unit types, and a long tail of game mechanics to learn about—many of which yield powerful combinations. It’s not easy to explore this combinatorially-vast space efficiently.

OpenAI Five learns from self-play (starting from random weights), which provides a natural curriculum for exploring the environment. To avoid “strategy collapse”, the agent trains 80% of its games against itself and the other 20% against its past selves. In the first games, the heroes walk aimlessly around the map. After several hours of training, concepts such as laning, farming, or fighting over mid emerge. After several days, they consistently adopt basic human strategies: attempt to steal Bounty runes from their opponents, walk to their tier one towers to farm, and rotate heroes around the map to gain lane advantage. And with further training, they become proficient at high-level strategies like 5-hero push.

In March 2017, our first agent defeated bots but got confused against humans. To force exploration in strategy space, during training (and only during training) we randomized the properties (health, speed, start level, etc.) of the units, and it began beating humans. Later on, when a test player was consistently beating our 1v1 bot, we increased our training randomizations and the test player started to lose. (Our robotics team concurrently applied similar randomization techniques to physical robots to transfer from simulation to the real world.)

OpenAI Five uses the randomizations we wrote for our 1v1 bot. It also uses a new “lane assignment” one. At the beginning of each training game, we randomly “assign” each hero to some subset of lanes and penalize it for straying from those lanes until a randomly-chosen time in the game.

Exploration is also helped by a good reward. Our reward consists mostly of metrics humans track to decide how they’re doing in the game: net worth, kills, deaths, assists, last hits, and the like. We postprocess each agent’s reward by subtracting the other team’s average reward to prevent the agents from finding positive-sum situations.

We hardcode item and skill builds (originally written for our scripted baseline), and choose which of the builds to use at random. Courier management is also imported from the scripted baseline.

Cooperative mode

It actually felt nice; my Viper gave his life for me at some point. He tried to help me, thinking “I’m sure she knows what she’s doing” and then obviously I didn’t. But, you know, he believed in me. I don’t get that a lot with [human] teammates. —Sheever

OpenAI Five’s ability to play with humans presents a compelling vision for the future of human-AI interaction, one where AI systems collaborate and enhance the human experience. Our testers reported feeling supported by their bot teammates, that they learned from playing alongside these advanced systems, and that it was generally a fun experience overall.

Note that OpenAI Five exhibits zero-shot transfer learning—it was trained to have all heroes controlled by copies of itself, but generalizes to controlling a subset of heroes, playing with or against humans. We were very surprised this worked as well as it did. In fact, we’d considered doing a cooperative match at The International but assumed it’d require dedicated training.

OpenAI vs HUMANS   AI vs 99 95% BEST PLAYERS 5v5 DOTA 2
OpenAI vs HUMANS AI vs 99 95% BEST PLAYERS 5v5 DOTA 2

Arena

We’re launching OpenAI Five Arena, a public experiment where we’ll let anyone play OpenAI Five in both competitive and cooperative modes. We’d known that our 1v1 bot would be exploitable through clever strategies; we don’t know to what extent the same is true of OpenAI Five, but we’re excited to invite the community to help us find out!

Arena opens Thursday, April 18 at 6pm PST and will close 11:59pm PST on Sunday, April 21. Please register so we can ensure there’s enough server capacity in your region! Results of all games will be automatically reported to the Arena public leaderboard.

We’re incredibly grateful for all the support the Dota community has shown us over the past two years, and we hope that Arena will also serve as one small way of giving back. Have fun with it!

Result[edit]

Place

Participant

Mastering Dota 2 with OpenAI 5

Table of Contents

  1. Introduction
  2. What is Artificial Intelligence?
  3. The Role of Artificial Intelligence in Games
  4. Evolution of Artificial Intelligence in Games
  5. The Game of Chess: An Early Example of AI in Games
  6. Dota 2: An Unpredictable Gaming Environment
  7. Understanding Dota 2
  8. The International: The World Cup of Dota 2
  9. OpenAI and Dota 2
  10. OpenAI’s Breakthrough in 1v1 Matchup
  11. OpenAI’s Challenge: Moving to a 5v5 Situation
  12. The Complexity of Dota 2: Challenges for AI
  13. OpenAI’s Five Neural Networks Approach
  14. The Training Process of OpenAI Five
  15. Reinforcement Learning and Proximal Policy Optimization
  16. Coordinated Actions and Strategic Planning in Dota 2
  17. OpenAI’s Remarkable Achievement
  18. Simulating the Human Brain with Artificial Intelligence
  19. Conclusion

Artificial Intelligence in Dota 2: The Rise of OpenAI

Introduction Artificial Intelligence has been making significant strides in various fields, including gaming. The concept of AI in games is not new, with early examples such as chess programs paving the way for more complex applications. One such game that has witnessed the integration of AI is Dota 2. Dota 2, short for Defense of the Ancients, is a multiplayer online battle arena (MOBA) game that incorporates real-time strategies and unpredictable gameplay. In recent years, OpenAI, a non-profit research agency, has made remarkable progress in developing AI-powered bots capable of defeating top Dota 2 players. This article explores the evolution of artificial intelligence in games, the challenges of applying AI in Dota 2, and the groundbreaking achievements of OpenAI.

What is Artificial Intelligence? To understand the significance of artificial intelligence in Dota 2, it is essential to grasp the fundamental concepts of AI. Artificial intelligence refers to the development of computer systems capable of performing tasks that require human intelligence. It involves techniques that allow machines to learn from experience, adapt to changing scenarios, and make intelligent decisions. AI has found applications in various industries, including gaming, where it enhances gameplay, creates more dynamic experiences, and offers challenging opponents for players.

The Role of Artificial Intelligence in Games Artificial intelligence plays a vital role in shaping the gaming experience, making it more interactive and immersive. In the early days of AI in gaming, chess programs demonstrated the capabilities of AI by defeating human players. These programs utilized sophisticated algorithms to analyze potential moves and outcomes. Since then, AI has evolved to handle the complexities of modern games, incorporating machine learning and deep neural networks to simulate human-like behavior and decision-making. AI in games aims to provide challenging opponents, generate dynamic environments, and enhance player engagement.

Evolution of Artificial Intelligence in Games The concept of artificial intelligence in games has evolved over the years, keeping pace with advancements in technology and computing power. Early examples, such as chess programs, demonstrated the potential of AI, albeit in a limited environment. These programs relied on pre-defined rules and decision trees to calculate optimal moves. As technology progressed, AI in games became more sophisticated, incorporating machine learning algorithms to adapt to player behavior and improve performance. The integration of neural networks and deep learning techniques has allowed AI to simulate human-like decision-making, creating more immersive and challenging gaming experiences.

The Game of Chess: An Early Example of AI in Games Chess, a game of strategy and foresight, has long been a testbed for artificial intelligence. Chess programs have been developed to compete against human players, leveraging AI techniques to analyze potential moves, calculate outcomes, and select the most optimal strategy. While these early chess programs lacked the extensive data sets and training that AI systems utilize today, they laid the foundation for the AI revolution in gaming. They showcased the potential of AI to outperform human players in strategically complex games.

Dota 2: An Unpredictable Gaming Environment Dota 2, an online multiplayer battle arena game, presents unique challenges for artificial intelligence. Unlike chess, which operates within a defined set of rules and a controlled environment, Dota 2 thrives on chaos and unpredictability. Real-time strategies, teamwork, and split-second decision-making are crucial in this game. Dota 2 features 114 different heroes, each with unique abilities and interactions. The outcome of a match depends on the players’ ability to adapt to changing situations, exploit weaknesses in the enemy team, and coordinate effective strategies. Dota 2’s unpredictable nature poses a significant hurdle for AI, making it a challenging environment to develop intelligent bots.

Understanding Dota 2 Before delving deeper into the impact of AI in Dota 2, it is important to have a basic understanding of the game itself. Dota 2 is a multiplayer online battle arena game where two teams of five players each compete to destroy the opposing team’s Ancient, a structure located in their base. The game combines elements of football and chess, requiring teamwork, strategic planning, and individual skill. Players choose from a diverse pool of heroes, each with unique abilities, roles, and playstyles. The game unfolds in real-time, with matches lasting between 30 minutes to over an hour.

The International: The World Cup of Dota 2 To understand the significance of AI in Dota 2, one must grasp the magnitude of the game’s competitive scene. Dota 2’s premier tournament is called The International, organized by Valve Corporation, the developers of the game. The International brings together 16 of the best Dota 2 teams from around the world to compete for a multi-million dollar prize pool. This tournament serves as a celebration of the game and the pinnacle of competitive Dota 2. The International has become a global phenomenon, attracting millions of viewers and cementing Dota 2’s status as the most widely played eSport of our generation.

OpenAI and Dota 2 OpenAI, a non-profit artificial intelligence research agency, made waves in the gaming community when they successfully developed AI bots capable of defeating human players in Dota 2. OpenAI’s foray into Dota 2 began with a 1v1 matchup, where their bot outperformed top players. Building upon this success, OpenAI set out to create a team of AI bots capable of challenging professional Dota 2 teams in a 5v5 scenario. This ambitious goal required overcoming several challenges related to AI’s ability to adapt to the dynamic nature of Dota 2’s gameplay.

OpenAI’s Breakthrough in 1v1 Matchup OpenAI’s first major milestone in Dota 2 was the development of an AI bot that could outperform human players in a 1v1 matchup. This bot, trained through reinforcement learning, showcased an impressive understanding of game mechanics and strategic decision-making. OpenAI trained the bot by playing millions of games against itself, gradually improving its performance through iteration and refinement. The AI bot learned various techniques employed by professional players, such as creep manipulation, animation canceling, and strategic positioning.

OpenAI’s Challenge: Moving to a 5v5 Situation While the success of OpenAI’s 1v1 bot was commendable, the transition to a 5v5 scenario presented a whole new set of challenges. Dota 2’s gameplay revolves around teamwork, coordination, and strategic planning. These aspects are difficult to simulate with AI due to the complex interactions between multiple players and the unpredictability of human decision-making. OpenAI faced the task of developing AI bots that could effectively collaborate, communicate, and adapt to the ever-changing dynamics of a 5v5 Dota 2 match.

The Complexity of Dota 2: Challenges for AI Dota 2’s complexity poses significant challenges for AI systems. In a 5v5 game, teams must make decisions based on incomplete information. Actions such as revealing heroes on certain areas of the map, getting favorable laning matchups, and outmaneuvering opponents require strategic planning and coordination. Furthermore, the implications of these decisions extend beyond immediate rewards, often influencing the course of the game for several minutes. AI bots must navigate through the chaos, anticipate their opponents’ strategies, and optimize long-term rewards over short-term gains.

OpenAI’s Five Neural Networks Approach To tackle the complexities of Dota 2 and enable coordinated actions, OpenAI adopted a five neural networks approach. Each neural network represented a different player on the team, creating a team of five coordinated agents. This approach, known as OpenAI Five, utilized state-of-the-art reinforcement learning techniques, particularly Proximal Policy Optimization (PPO). PPO allowed the neural networks to learn and refine their strategies through iterative gameplay, optimizing their decision-making processes and improving performance.

The Training Process of OpenAI Five Training OpenAI Five involved extensive gameplay and reinforcement learning. OpenAI used a system called Rapid, built on top of the Gym environment, to train the bots. The training process comprised rollout workers, which ran multiple instances of the game, and optimizer nodes, which performed synchronous gradient descents across a fleet of GPUs. The bots’ rewards were determined by various aggregated metrics, such as net worth, kills, deaths, assists, and last hits. The training process utilized millions of games, generating a vast amount of experience to refine the bots’ strategies.

Reinforcement Learning and Proximal Policy Optimization Reinforcement learning played a crucial role in training OpenAI Five. The bots learned from their interactions with the game environment, receiving rewards or penalties based on their performance. The reinforcement learning algorithm, aided by Proximal Policy Optimization techniques, fine-tuned the bots’ decision-making and strategic planning. Over time, the neural networks of OpenAI Five developed advanced strategies, resembling those employed by professional players. They demonstrated concepts like laning, farming, map rotation, and objective prioritization, showcasing their ability to adapt and optimize long-term rewards.

Coordinated Actions and Strategic Planning in Dota 2 One of the remarkable achievements of OpenAI Five was the coordination of actions among the five neural networks. While each network operated independently, they synchronized their strategies, communicated, and strategized as a team. This coordination model, while relatively simple, showcased the ingenuity and adaptability of the AI system. It demonstrated the potential of AI to simulate human brain functions, coordinate actions, and make professional-level decisions in a chaotic gaming environment like Dota 2.

OpenAI’s Remarkable Achievement OpenAI’s breakthrough in Dota 2 is a testament to the power of artificial intelligence and its potential to revolutionize gaming. Despite the challenges posed by Dota 2’s unpredictability, OpenAI Five displayed remarkable teamwork, strategic planning, and decision-making. The bots, trained through reinforcement learning and coordinated neural networks, were capable of maximizing long-term rewards, outmaneuvering opponents, and adapting to dynamic gameplay. OpenAI’s achievement showcases the remarkable progress made in AI and its potential to transform gaming on a global scale.

Simulating the Human Brain with Artificial Intelligence OpenAI’s success in Dota 2 demonstrates the ability of artificial intelligence to simulate human-like decision-making and strategic planning. The coordinated actions and communication between the neural networks of OpenAI Five emulate the teamwork and coordination observed in professional players. This remarkable achievement highlights the potential of AI to replicate complex human behaviors and optimize strategies to achieve specific goals. By combining advanced algorithms with extensive training, artificial intelligence can push the boundaries of what is possible in gaming.

Conclusion The integration of artificial intelligence in Dota 2 marks a significant milestone in the evolution of gaming. OpenAI’s groundbreaking achievements in developing AI bots capable of challenging top Dota 2 teams demonstrate the potential of AI to adapt to complex and unpredictable gaming environments. While challenges persist, such as the need for continuous refinement and addressing the intricacies of team coordination, AI in gaming holds immense promise. The fusion of human ingenuity and artificial intelligence has the potential to revolutionize the way we play and experience games, creating more immersive and dynamic experiences for players worldwide.

Highlights

  • Artificial intelligence has made significant advancements in gaming, including Dota 2.
  • Dota 2 is a multiplayer online battle arena game that incorporates real-time strategies and unpredictable gameplay.
  • OpenAI, a non-profit research agency, developed AI bots capable of challenging professional Dota 2 teams.
  • Dota 2’s complexity poses challenges for AI, requiring adaptability, coordination, and strategic planning.
  • OpenAI utilized reinforcement learning and coordinated neural networks to train and refine their AI bots.
  • OpenAI’s achievement showcases the potential of AI to replicate human-like decision-making and optimize strategies in a dynamic gaming environment.

FAQ

Q: What is Dota 2? A: Dota 2 is a multiplayer online battle arena game where two teams of five players each battle to destroy the opposing team’s Ancient structure.

Q: What is the International in Dota 2? A: The International is a premier Dota 2 tournament organized by Valve Corporation, attracting the best teams from around the world to compete for a multi-million dollar prize pool.

Q: What challenges did OpenAI face in developing AI bots for Dota 2? A: OpenAI faced challenges such as adapting AI to the dynamic and unpredictable nature of Dota 2 gameplay, coordinating actions among multiple agents, and optimizing long-term rewards over short-term gains.

Q: What techniques did OpenAI use to train their AI bots? A: OpenAI utilized reinforcement learning, particularly Proximal Policy Optimization, to train their AI bots. The bots learned from playing millions of games against themselves and refined their strategies iteratively.

Q: How did OpenAI’s AI bots coordinate actions in Dota 2? A: OpenAI used a coordinated approach with five independent neural networks, each representing a different player. These networks communicated and synchronized their strategies to achieve team objectives.

App rating
4.9
AI Tools
100k+
Trusted Users
5000+

TOOLIFY is the best ai tool source.

  • Lessons from Oppenheimer: Managing the Risks of AI & Biotechnology
  • The Cutting-Edge Innovations of Aces Group in AI and IoT
  • Revolutionize Talent Acquisition with AI: Oracle HCM’s Latest Breakthrough
  • Unveiling the Power of AI in Content Creation and Marketing
  • The Oppenheimer Moment: AI in Control of Nuclear Weapons?
  • Earn $1,000 Daily with AI-Generated Videos in Just One Click!
  • Revolutionizing Podcast Transcription: AI System and Tools Explained
  • Transforming Content Creation: AI in Optimizely CMS
  • Unlock the Power of ChatGPT for Automatic Python Code Generation
  • Revolutionizing Vehicle Damage Evaluation with AI-Powered Technology
  • Transform Your Images with Microsoft’s BING and DALL-E 3
  • Create Stunning Images with AI for Free!
  • Unleash Your Creativity with Microsoft Bing AI Image Creator
  • Create Unlimited AI Images for Free!
  • Discover the Amazing Microsoft Bing Image Creator
  • Create Stunning Images with Microsoft Image Creator
  • AI Showdown: Stable Diffusion vs Dall E vs Bing Image Creator
  • Create Stunning Images with Free Ai Text to Image Tool
  • Unleashing Generative AI: Exploring Opportunities in QE&T
  • Create a YouTube Channel with AI: ChatGPT, Bing Image Maker, Canva
  • Google’s AI Demo Scandal Sparks Stock Plunge
  • Unveiling the Yoga Master: the Life of Tirumalai Krishnamacharya
  • Hilarious Encounter: Jimmy’s Unforgettable Moment with Robert Irwin
  • Google’s Incredible Gemini Demo: Unveiling the Future
  • Say Goodbye to Under Eye Dark Circles – Simple Makeup Tips
  • Discover Your Magical Soul Mate in ASMR Cosplay Role Play
  • Boost Kidney Health with these Top Foods
  • OpenAI’s GEMINI 1.0 Under Scrutiny
  • Unveiling the Mind-Blowing Gemini Ultra!
  • Shocking AI News: Google’s Deception Exposed!
TI9 CHAMPION OG vs OpenAI Final Version 2019  -   Game 1
TI9 CHAMPION OG vs OpenAI Final Version 2019 – Game 1

At OpenAI Five Finals, we also shared two surprises:

  1. OpenAI Five discovered a rudimentary ability to be a teammate with humans, even though our training process focuses exclusively on beating other bots. The ease with which we turned a competitive AI into a cooperative one makes us hopeful that future AI systems can be very beneficial for humans given active development effort.
  2. From April 18–21, we’re scaling up OpenAI Five to play the Internet, whether as a competitor or teammate. This final test will let us answer an important research question—to what extent OpenAI Five is exploitable or can otherwise be reliably beaten—and be potentially the largest-ever deployment of a highly-competent deep reinforcement learning agent that people can knowingly interact with.

Our approach

Our system learns using a massively-scaled version of Proximal Policy Optimization. Both OpenAI Five and our earlier 1v1 bot learn entirely from self-play. They start with random parameters and do not use search or bootstrap from human replays.

OpenAI 1v1 bot OpenAI Five
CPUs 60,000 CPU cores on Azure 128,000 preemptible CPU cores on GCP
GPUs 256 K80 GPUs on Azure 256 P100 GPUs on GCP
Experience collected ~300 years per day ~180 years per day (~900 years per day counting each hero separately)
Size of observation ~3.3 kB ~36.8 kB
Observations per second of gameplay 10 7.5
Batch size 8,388,608 observations 1,048,576 observations
Batches per minute ~20 ~60

RL researchers (including ourselves) have generally believed that long time horizons would require fundamentally new advances, such as hierarchical reinforcement learning. Our results suggest that we haven’t been giving today’s algorithms enough credit — at least when they’re run at sufficient scale and with a reasonable way of exploring.

Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor called . During the latest training run of OpenAI Five, we annealed from

0.998

(valuing future rewards with a half-life of 46 seconds) to

0.9997

(valuing future rewards with a half-life of five minutes). For comparison, the longest horizon in the PPO paper was a half-life of 0.5 seconds, the longest in the Rainbow paper was a half-life of 4.4 seconds, and the Observe and Look Further paper used a half-life of 46 seconds.

While the current version of OpenAI Five is weak at last-hitting (observing our test matches, the professional Dota commentator Blitz estimated it around median for Dota players), its objective prioritization matches a common professional strategy. Gaining long-term rewards such as strategic map control often requires sacrificing short-term rewards such as gold gained from farming, since grouping up to attack towers takes time. This observation reinforces our belief that the system is truly optimizing over a long horizon.

Rapid

Our system is implemented as a general-purpose RL training system called Rapid, which can be applied to any Gym environment. We’ve used Rapid to solve other problems at OpenAI, including Competitive Self-Play.

The training system is separated into rollout workers, which run a copy of the game and an agent gathering experience, and optimizer nodes, which perform synchronous gradient descent across a fleet of GPUs. The rollout workers sync their experience through Redis to the optimizers. Each experiment also contains workers evaluating the trained agent versus reference agents, as well as monitoring software such as TensorBoard, Sentry, and Grafana.

During synchronous gradient descent, each GPU computes a gradient on its part of the batch, and then the gradients are globally averaged. We originally used MPI’s allreduce for averaging, but now use our own NCCL2 wrappers that parallelize GPU computations and network data transfer.The latencies for synchronizing 58MB of data (size of OpenAI Five’s parameters) across different numbers of GPUs are shown on the right. The latency is low enough to be largely masked by GPU computation which runs in parallel with it.

We’ve implemented Kubernetes, Azure, and GCP backends for Rapid.

DENDI 1v1 vs BOT AI - TI7 DOTA 2
DENDI 1v1 vs BOT AI – TI7 DOTA 2

History[edit]

Development on the algorithms used for the bots began in November 2016. OpenAI decided to use Dota 2, a competitive five-on-five video game, as a base due to it being popular on the live streaming platform Twitch, having native support for Linux, and had an application programming interface (API) available.[1] Before becoming a team of five, the first public demonstration occurred at The International 2017 in August, the annual premiere championship tournament for the game, where Dendi, a professional Ukrainian player of the game, lost against an OpenAI bot in a live one-on-one matchup.[2][3] After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time, and that the learning software was a step in the direction of creating software that can handle complex tasks “like being a surgeon”.[4][5] OpenAI used a methodology called reinforcement learning, as the bots learn over time by playing against itself hundreds of times a day for months, in which they are rewarded for actions such as killing an enemy and destroying towers.[6][7][8]

By June 2018, the ability of the bots expanded to play together as a full team of five and were able to defeat teams of amateur and semi-professional players.[9][6][10][11] At The International 2018, OpenAI Five played in two games against professional teams, one against the Brazilian-based paiN Gaming and the other against an all-star team of former Chinese players.[12][13] Although the bots lost both matches, OpenAI still considered it a successful venture, stating that playing against some of the best players in Dota 2 allowed them to analyze and adjust their algorithms for future games.[14] The bots’ final public demonstration occurred in April 2019, where they won a best-of-three series against The International 2018 champions OG at a live event in San Francisco.[15] A four-day online event to play against the bots, open to the public, occurred the same month.[16] There, the bots played in 42,729 public games, winning 99.4% of those games.[17]

Differences versus humans

OpenAI Five is given access to the same information as humans, but instantly sees data like positions, healths, and item inventories that humans have to check manually. Our method isn’t fundamentally tied to observing state, but just rendering pixels from the game would require thousands of GPUs.

OpenAI Five averages around 150-170 actions per minute (and has a theoretical maximum of 450 due to observing every 4th frame). Frame-perfect timing, while possible for skilled players, is trivial for OpenAI Five. OpenAI Five has an average reaction time of 80ms, which is faster than humans.

These differences matter most in 1v1 (where our bot had a reaction time of 67ms), but the playing field is relatively equitable as we’ve seen humans learn from and adapt to the bot. Dozens of professionals used our 1v1 bot for training in the months after last year’s TI. According to Blitz, the 1v1 bot has changed the way people think about 1v1s (the bot adopted a fast-paced playstyle, and everyone has now adapted to keep up).

OpenAI Five Beats World Champion DOTA2 Team 2-0! 🤖
OpenAI Five Beats World Champion DOTA2 Team 2-0! 🤖

What’s next

Our team is focused on making our August goal. We don’t know if it will be achievable, but we believe that with hard work (and some luck) we have a real shot.

This post described a snapshot of our system as of June 6th. We’ll release updates along the way to surpassing human performance and write a report on our final system once we complete the project. Please join us on August 5th virtually or in person, when we’ll play a team of top players!

Our underlying motivation reaches beyond Dota. Real-world AI deployments will need to deal with the challenges raised by Dota which are not reflected in Chess, Go, Atari games, or Mujoco benchmark tasks. Ultimately, we will measure the success of our Dota system in its application to real-world tasks. If you’d like to be part of what comes next, we’re hiring!

Liquipedia Dota 2 needs more help, are you able to? You just have to register an account and then log in to edit our pages. If you have any questions you can join the #dota2 channel on Discord.

OpenAI Five Finals

From Liquipedia Dota 2 Wiki

OpenAI Five Finals

League Information

Organizer:

Type:

Offline

Venue:

Bay Area

Format:

5v5

Date:

2019-04-13

Game:

Version:

Teams:

Links

What’s next

We will be releasing a more technical analysis of OpenAI Five once we’ve reviewed the outcomes of OpenAI Five Arena.

Afterwards, we’ll continue working with the Dota 2 environment within OpenAI. We’ve seen rapid progress in the past two years on RL capabilities, and we think that Dota 2 will continue to help us push forward what’s possible—whether with achieving competent performance from less data or true human-AI cooperation.

If you are interested in advancing AI capabilities and helping further our mission of ensuring they benefit humanity, we’re hiring!

OpenAI Five

OpenAI Five is a computer program by OpenAI that plays the five-on-five video game Dota 2. Its first public appearance occurred in 2017, where it was demonstrated in a live one-on-one game against the professional player Dendi, who lost to it. The following year, the system had advanced to the point of performing as a full team of five, and began playing against and showing the capability to defeat professional teams.

By choosing a game as complex as Dota 2 to study machine learning, OpenAI thought they could more accurately capture the unpredictability and continuity seen in the real world, thus constructing more general problem-solving systems. The algorithms and code used by OpenAI Five were eventually borrowed by another neural network in development by the company, one which controlled a physical robotic hand. OpenAI Five has been compared to other similar cases of artificial intelligence (AI) playing against and defeating humans, such as AlphaStar in the video game StarCraft II, AlphaGo in the board game Go, Deep Blue in chess, and Watson on the television game show Jeopardy!.

OpenAi taunting TOPSON , Notail and Ceb Reaction .
OpenAi taunting TOPSON , Notail and Ceb Reaction .

Reception[edit]

OpenAI Five have received acknowledgement from the AI, tech, and video game community at large. Microsoft founder Bill Gates called it a “big deal”, as their victories “required teamwork and collaboration”.[8][26] Chess player Garry Kasparov, who lost against the Deep Blue AI in 1997, stated that despite their losing performance at The International 2018, the bots would eventually “get there, and sooner than expected”.[27]

In a conversation with MIT Technology Review, AI experts also considered OpenAI Five system as a significant achievement, as they noted that Dota 2 was an “extremely complicated game”, so even beating non-professional players was impressive.[25] PC Gamer wrote that their wins against professional players was a significant event in machine learning.[28] In contrast, Motherboard wrote that the victory was “basically cheating” due to the simplified hero pools on both sides, as well as the fact that bots were given direct access to the API, as opposed to using computer vision to interpret pixels on the screen.[29] The Verge wrote that the bots were evidence that the company’s approach to reinforcement learning and its general philosophy about AI was “yielding milestones”.[16]

In 2019, DeepMind unveiled a similar bot for Starcraft II, AlphaStar. Like OpenAI Five, AlphaStar used reinforcement learning and self-play. The Verge reported that “the goal with this type of AI research is not just to crush humans in various games just to prove it can be done. Instead, it’s to prove that — with enough time, effort, and resources — sophisticated AI software can best humans at virtually any competitive cognitive challenge, be it a board game or a modern video game.” They added that the DeepMind and OpenAI victories were also a testament to the power of certain uses of reinforcement learning.[30]

It was OpenAI’s hope that the technology could have applications outside of the digital realm. In 2018, they were able to reuse the same reinforcement learning algorithms and training code from OpenAI Five for Dactyl, a human-like robot hand with a neural network built to manipulate physical objects.[31] In 2019, Dactyl solved the Rubik’s Cube.[32]

Surprising findings

  • Binary rewards can give good performance. Our 1v1 model had a shaped reward, including rewards for last hits, kills, and the like. We ran an experiment where we only rewarded the agent for winning or losing, and it trained an order of magnitude slower and somewhat plateaued in the middle, in contrast to the smooth learning curves we usually see. The experiment ran on 4,500 cores and 16 k80 GPUs, training to the level of semi-pros (70 TrueSkill) rather than 90 TrueSkill of our best 1v1 bot).
  • Creep blocking can be learned from scratch. For 1v1, we learned creep blocking using traditional RL with a “creep block” reward. One of our team members left a 2v2 model training when he went on vacation (proposing to his now wife!), intending to see how much longer training would boost performance. To his surprise, the model had learned to creep block without any special guidance or reward.
  • We’re still fixing bugs. The chart shows a training run of the code that defeated amateur players, compared to a version where we simply fixed a number of bugs, such as rare crashes during training, or a bug which resulted in a large negative reward for reaching level 25. It turns out it’s possible to beat good humans while still hiding serious bugs!

A subset of the OpenAI Dota team, holding the laptop that defeated the world’s top professionals at Dota 1v1 at The International last year.*

PAIN vs OpenAI - HUMANS vs AI - AMAIZING SHOWMATCH #TI8 DOTA 2
PAIN vs OpenAI – HUMANS vs AI – AMAIZING SHOWMATCH #TI8 DOTA 2

References[edit]

  1. ^ OpenAI. “OpenAI Five”. openai.com/five. Archived from the original on 1 September 2018. Retrieved 10 October 2018.
  2. ^ Savov, Vlad (14 August 2017). “My favorite game has been invaded by killer AI bots and Elon Musk hype”. The Verge. Archived from the original on 26 June 2018. Retrieved 25 June 2018.
  3. ^ Frank, Blair Hanley. “OpenAI’s bot beats top Dota 2 player so badly that he quits”. Venture Beat. Archived from the original on 12 August 2017. Retrieved 12 August 2017.
  4. ^ OpenAI (11 August 2017). “Dota 2”. blog.openai.com. Archived from the original on 11 August 2017. Retrieved 12 August 2017.
  5. ^ OpenAI (16 August 2017). “More on Dota 2”. blog.openai.com. Archived from the original on 16 August 2017. Retrieved 16 August 2017.
  6. ^ a b Simonite, Tom (25 June 2018). “Can Bots Outwit Humans in One of the Biggest Esports Games?”. Wired. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  7. ^ Kahn, Jeremy (25 June 2018). “A Bot Backed by Elon Musk Has Made an AI Breakthrough in Video Game World”. Bloomberg.com. Archived from the original on 27 June 2018. Retrieved 27 June 2018.
  8. ^ a b “Bill Gates says gamer bots from Elon Musk-backed nonprofit are ‘huge milestone’ in A.I.” CNBC. 28 June 2018. Archived from the original on 28 June 2018. Retrieved 28 June 2018.
  9. ^ OpenAI (18 July 2018). “OpenAI Five Benchmark”. blog.openai.com. Archived from the original on 26 August 2018. Retrieved 25 August 2018.
  10. ^ Vincent, James (25 June 2018). “AI bots trained for 180 years a day to beat humans at Dota 2”. The Verge. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  11. ^ Savov, Vlad (6 August 2018). “The OpenAI Dota 2 bots just defeated a team of former pros”. The Verge. Archived from the original on 7 August 2018. Retrieved 7 August 2018.
  12. ^ Simonite, Tom. “Pro Gamers Fend off Elon Musk-Backed AI Bots—for Now”. Wired. Archived from the original on 24 August 2018. Retrieved 25 August 2018.
  13. ^ Quach, Katyanna. “Game over, machines: Humans defeat OpenAI bots once again at video games Olympics”. The Register. Archived from the original on 25 August 2018. Retrieved 25 August 2018.
  14. ^ OpenAI (24 August 2018). “The International 2018: Results”. blog.openai.com. Archived from the original on 24 August 2018. Retrieved 25 August 2018.
  15. ^ Wiggers, Kyle (13 April 2019). “OpenAI Five defeats professional Dota 2 team, twice”. Venture Beat. Archived from the original on 13 April 2019. Retrieved 13 April 2019.
  16. ^ a b Statt, Nick (13 April 2019). “OpenAI’s Dota 2 AI steamrolls world champion e-sports team with back-to-back victories”. The Verge. Vox Media. Archived from the original on 15 April 2019. Retrieved 15 April 2019.
  17. ^ Wiggers, Kyle (22 April 2019). “OpenAI’s Dota 2 bot defeated 99.4% of players in public matches”. Venture Beat. Retrieved 22 April 2019.
  18. ^ “Understanding LSTM Networks”. colah’s blog. Archived from the original on 1 August 2017. Retrieved 27 August 2015.
  19. ^ a b c OpenAI (25 June 2018). “OpenAI Five”. blog.openai.com. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  20. ^ “Why are AI researchers so obsessed with games?”. QUARTZ. 4 August 2018. Archived from the original on 4 August 2018. Retrieved 4 August 2018.
  21. ^ Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). “Proximal Policy Optimization Algorithms”. arXiv:1707.06347 [cs.LG].
  22. ^ Gabbatt, Adam (17 February 2011). “IBM computer Watson wins Jeopardy clash”. The Guardian. Archived from the original on 21 September 2013. Retrieved 17 February 2011.
  23. ^ “Chess grandmaster Garry Kasparov on what happens when machines ‘reach the level that is impossible for humans to compete'”. Business Insider. Archived from the original on 29 December 2017. Retrieved 29 December 2017.
  24. ^ “DeepMind’s Go-playing AI doesn’t need human help to beat us anymore”. Verge. 18 October 2017. Archived from the original on 18 October 2017. Retrieved 18 October 2017.
  25. ^ a b Knight, Will (25 June 2018). “A team of AI algorithms just crushed humans in a complex computer game”. MIT Tech Review. Retrieved 25 June 2018.
  26. ^ “Bill Gates hails ‘huge milestone’ for AI as bots work in a team to destroy humans at video game ‘Dota 2′”. Business Insider. Archived from the original on 27 June 2018. Retrieved 27 June 2018.
  27. ^ “Garry Kasparov’s Twitter”. 24 August 2018. Retrieved 24 August 2018.
  28. ^ Park, Morgan (11 August 2018). “How the OpenAI Five tore apart a team of Dota 2 pros”. PC Gamer. Retrieved 25 May 2020.
  29. ^ Gault, Matthew (17 August 2018). “OpenAI Is Beating Humans at ‘Dota 2’ Because It’s Basically Cheating”. Vice. Retrieved 25 May 2020.
  30. ^ Statt, Nick (30 October 2019). “DeepMind’s StarCraft 2 AI is now better than 99.8 percent of all human players”. The Verge. Retrieved 25 May 2020.
  31. ^ OpenAI; Andrychowicz, Marcin; Baker, Bowen; Chociej, Maciek; Józefowicz, Rafał; McGrew, Bob; Pachocki, Jakub; Petron, Arthur; Plappert, Matthias; Powell, Glenn; Ray, Alex; Schneider, Jonas; Sidor, Szymon; Tobin, Josh; Welinder, Peter; Weng, Lilian; Zaremba, Wojciech (2019). “Learning Dexterous In-Hand Manipulation”. arXiv:1808.00177v5 [cs.LG].
  32. ^ OpenAI; Akkaya, Ilge; Andrychowicz, Marcin; Chociej, Maciek; Litwin, Mateusz; McGrew, Bob; Petron, Arthur; Paino, Alex; Plappert, Matthias; Powell, Glenn; Ribas, Raphael (2019). “Solving Rubik’s Cube with a Robot Hand”. arXiv:1910.07113v1 [cs.LG].

Rạng sáng nay 14/04 theo giờ Việt Nam, tại San Francisco, một trận đấu DOTA 2 giữa trí thông minh nhân tạo OpenAI và đương kim vô địch thế giới bộ môn DOTA 2, 5 chàng trai của team OG đã được tổ chức. Kết quả, con người thua trắng 2-0 trước trí thông minh nhân tạo. Anh em có thể xem highlight của hai trận đấu này dưới đây:Dĩ nhiên nếu đối thủ của OpenAI là 5 người chơi DOTA 2 ở mức trung bình khá thì không nói làm gì, nhưng ở đây lại là team OG. Tháng 08 vừa rồi họ đã vượt qua 15 đội tuyển khác trên toàn thế giới để giành giải thưởng lớn nhất trị giá hơn 11 triệu USD sau trận chung kết BO5 căng thẳng với những người Trung Quốc của team PSG.LGD. Thế nhưng có vẻ như với thành tựu công nghệ, con người vẫn phải chịu khuất phục.OG đối mặt với OpenAI trong một trận đấu BO3, bên nào chạm 2 trận thắng trước là giành chiến thắng tổng. Giống hệt như con người, những con bot của OpenAI học chơi DOTA 2 bằng cách tập luyện liên tục, với những vòng lặp thử sai không hồi kết, tự chơi game với chính bản thân mình để hoàn thiện kỹ năng. Trận đấu rạng sáng nay có thể nói là màn trình diễn ấn tượng nhất về khả năng của OpenAI tính đến thời điểm hiện tại, sau khi để thua 2 trận trước những đối thủ “yếu hơn” so với OG hồi tháng 08 năm ngoái. Theo nhà đồng sáng lập OpenAI kiêm giám đốc công nghệ Greg Brockman, 5 con Bot của OpenAI tự học cách chơi trong môi trường ảo với tốc độ cao, giống hệt như Doctor Strange tua nhanh 14 triệu kết cục của Infinity War vậy 😁

Dĩ nhiên nếu đối thủ của OpenAI là 5 người chơi DOTA 2 ở mức trung bình khá thì không nói làm gì, nhưng ở đây lại là team OG. Tháng 08 vừa rồi họ đã vượt qua 15 đội tuyển khác trên toàn thế giới để giành giải thưởng lớn nhất trị giá hơn 11 triệu USD sau trận chung kết BO5 căng thẳng với những người Trung Quốc của team PSG.LGD. Thế nhưng có vẻ như với thành tựu công nghệ, con người vẫn phải chịu khuất phục.

OG đối mặt với OpenAI trong một trận đấu BO3, bên nào chạm 2 trận thắng trước là giành chiến thắng tổng. Giống hệt như con người, những con bot của OpenAI học chơi DOTA 2 bằng cách tập luyện liên tục, với những vòng lặp thử sai không hồi kết, tự chơi game với chính bản thân mình để hoàn thiện kỹ năng. Trận đấu rạng sáng nay có thể nói là màn trình diễn ấn tượng nhất về khả năng của OpenAI tính đến thời điểm hiện tại, sau khi để thua 2 trận trước những đối thủ “yếu hơn” so với OG hồi tháng 08 năm ngoái. Theo nhà đồng sáng lập OpenAI kiêm giám đốc công nghệ Greg Brockman, 5 con Bot của OpenAI tự học cách chơi trong môi trường ảo với tốc độ cao, giống hệt như Doctor Strange tua nhanh 14 triệu kết cục của Infinity War vậy 😁

Anh Brockman cho biết: “OpenAI Five tự nâng cấp khả năng chơi game bằng cách tự học. Chúng tôi không lập trình AI này để biết cách chơi DOTA, mà lập trình để nó học cách chơi. Trong vòng 10 tháng tồn tại, khoảng thời gian mà AI này bỏ ra để học chơi game rơi vào khoảng 45.000 năm tính theo thời gian thực, và AI thì không biết chán như con người.”

Đưa ra quyết định như “pro”

DOTA 2 là một trò chơi có mặt chiến thuật vô cùng phức tạp, đối với những anh em chưa chơi trò này bao giờ. Hơn 100 nhân vật, hệ thống kỹ năng, vật phẩm và số lượng lựa chọn chiến thuật trong mỗi trận đấu khiến DOTA 2 không bao giờ nhàm chán. Vì lý do learning curve quá phức tạp, OpenAI được áp dụng một vài giới hạn khi đối mặt với đối thủ là người chơi chuyên nghiệp, một trong số đó phải kể đến việc giới hạn lượng tướng mà cả hai bên được phép lựa chọn.

Trong trường hợp đấu với OG, cả OpenAI Five lẫn OG đều chỉ được chọn trong số 17 vị tướng trong DOTA 2. Kèm theo đó, OpenAI cũng chọn chế độ Captain’s Draft, cho phép cả hai bên lựa chọn hoặc cấm chọn những vị tướng tùy ý để không cho đối thủ triển khai chiến thuật lý tưởng trong đầu mỗi người đội trưởng. Với chế độ ban pick như thế này, đội trưởng hai bên phải là bộ não chiến thuật để cả 5 người cùng kết hợp nhuần nhuyễn nhất.

OpenAI chiến thắng con người, cái đó không cần bàn cãi, nhưng con người cũng phải “nhường” trí thông minh nhân tạo vài phần, vì không chỉ phải chọn tướng theo giới hạn, những kỹ năng tạo “bóng” hay “gọi đệ”, những bản copy của nhiều nhân vật trong DOTA 2 cũng bị vô hiệu hóa. Những kỹ năng này, theo Brockman, là những thứ OpenAI chưa học cách đối phó.

Ngoài hai bước “nhường nhịn” đó ra, thì trận đấu diễn ra gần như công bằng với cả hai bên. Trong trận đấu đầu tiên, OpenAI Five khiến OG bất ngờ với nhiều chiến thuật có phần hung hăng, và một trong số đó là quyết định bỏ số vàng kiếm được ra để hồi sinh những vị tướng bị OG tiêu diệt trước đó và lật ngược thế cờ. Tính năng này trong DOTA 2 gọi là Buy Back. Theo Brockman, OpenAI rất thích những chiến thuật đánh nhanh thắng nhanh, từ đó lộ rõ yếu điểm khi trận đấu diễn ra về lâu về dài, khi con người rất giỏi trong việc quản lý tài nguyên và lược ra chiến thuật cụ thể. Tuy nhiên trong trận đấu này, những pha buy back từ sớm đã khiến OG không thể vượt qua được và trận đấu kết thúc sau gần 40 phút đồng hồ thi đấu.

Quảng cáo

Trận đấu thứ 2 thậm chí còn ấn tượng hơn với phía OpenAI Five. Chiến thuật snowball giành lợi thế ngay từ những phút đầu tiên và không để cho đối thủ có cơ hội lật ngược thế cờ của trí thông minh nhân tạo đã khiến đương kim vô địch thế giới phải chịu khuất phục chỉ sau 20 phút thi đấu với tỉ số 44 – 6 nghiêng về phía OpenAI.

Mike Cook, một gamer DOTA 2 nghiên cứu về AI trong game nhận ra rằng, ở trận đấu thứ 2, OpenAI Five chơi hung hăng hơn bình thường rất nhiều, còn OG thì gần như không làm cách nào để ngăn chặn sự tiến công ở cả ba đường trên bản đồ của trí thông minh nhân tạo. Cook cũng nhấn mạnh việc OpenAI tận dụng triệt để kỹ năng của những tướng nó chọn hiệu quả đến mức nào.

Sắp có phiên bản thử nghiệm rộng rãi, ai cũng được đấu với OpenAI

Đối với OpenAI, chiến thắng này không phải lý do để ăn mừng, mà là một trong những cột mốc chứng minh được sự hiệu quả trong việc thiết kế một trí thông minh nhân tạo biết tự học và tự sửa sai theo thời gian. Trong tương lai, sẽ không có những trận đấu showmatch như với OG nữa, mà thay vào đó đội ngũ kỹ sư của OpenAI sẽ hoàn thiện phần mềm để biến OpenAI Five trở thành một công cụ chơi game cùng với con người, hoặc để con người tập chơi game. Họ cũng công bố rằng vào ngày 18/04 tới, OpenAI Five Arena sẽ được mở cửa, cho phép anh em chơi DOTA 2 với trí thông minh nhân tạo này.

Sam Altman, đồng sáng lập kiêm CEO của OpenAI cho rằng, AI trong tương lai sẽ có những đóng góp rất quan trọng, dĩ nhiên không chỉ trong việc chơi game: “Trận đấu này là bài học rất quan trọng về việc thế giới rồi sẽ hoạt động ra sao nếu có AI biết tự học song hành với con người. Việc AI và con người kết hợp với nhau là một trong những tầm nhìn tích cực khi chúng tôi nghĩ về tương lai của thế giới, từ đó giúp con người làm việc tốt hơn, vui vẻ hơn và có tác động tích cực hơn.”

Quảng cáo

Altman cho biết OpenAI sẽ tiếp tục phát triển trí thông minh nhân tạo chơi DOTA 2 và những trò chơi khác, đơn giản vì game là công cụ thử thách chất lượng AI rất tuyệt vời, và cũng là công cụ đo đếm thành tựu của nhóm kỹ sư phát triển AI. Có một vấn đề nho nhỏ, Altman cho rằng, không có game nào trên thế giới mà OpenAI không thể vượt qua khả năng của con người. Trong khi đó đối với ngành AI nói chung, game chỉ là một công cụ nhỏ nhoi và sẽ sớm trở nên lỗi thời một khi con người tìm ra những công việc mà AI có thể làm tốt hơn, đem lại những lợi ích tích cực hơn với thế giới. Đó cũng chính là mục tiêu cao nhất mà OpenAI đang thực hiện.

Theo The Verge

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. While today we play with restrictions, we aim to beat a team of top professionals at The International in August subject only to a limited set of heroes. We may not succeed: Dota 2 is one of the most popular and complex esports games in the world, with creative and motivated professionals who train year-round to earn part of Dota’s annual $40M prize pool (the largest of any esports game).

OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores—a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies. This indicates that reinforcement learning can yield long-term planning with large but achievable scale—without fundamental advances, contrary to our own expectations upon starting the project.

To benchmark our progress, we’ll host a match versus top players on August 5th. Follow us on Twitch to view the live broadcast, or request an invite to attend in person!

Why Dota?

We started OpenAI Five in order to work on a problem that felt outside of the reach of existing deep reinforcement learning[^footnote-learning] algorithms. We hoped that by working on a problem that was unsolvable by current methods, we’d need to make a big increase in the capability of our tools. We were expecting to need sophisticated algorithmic ideas, such as hierarchical reinforcement learning, but we were surprised by what we found: the fundamental improvement we needed for this problem was scale. Achieving and utilizing that scale wasn’t easy and was the bulk of our research effort!

To build OpenAI Five, we created a system called Rapid which let us run PPO at previously unprecedented scale. The results exceeded our wildest expectations, and we produced a world-class Dota bot without hitting any fundamental performance limits.

The surprising power of today’s RL algorithms comes at the cost of massive amounts of experience, which can be impractical outside of a game or simulated environment. This limitation may not be as bad as sounds—for example, we used Rapid to control a robotic hand to dexterously reorient a block, trained entirely in simulation and executed on a physical robot. But we think decreasing the amount of experience is a next challenge for RL.

We are retiring OpenAI Five as a competitor today, but progress made and technology developed will continue to drive our future work. This isn’t the end of our Dota work—we think that Dota is a much more intrinsically interesting and difficult (and now well-understood!) environment for RL development than the standard ones used today.

AI Learns to Speedrun Mario
AI Learns to Speedrun Mario

Transfer learning

The current version of OpenAI Five has been training continuously since June 2018, despite changes to the model size and the game rules (including some fairly large game patch updates and newly implemented features). In each case, we were able to transfer the model over and continue training—something that is an open challenge for RL in other domains. To the best of our knowledge, this is the first time an RL agent has been trained using such a long-lived training run.

To make this work, we’ve continued to flesh out our surgery tooling so that we can start from trained parameters even across substantial architecture changes.

The problem

One AI milestone is to exceed human capabilities in a complex video game like StarCraft or Dota. Relative to previous AI milestones like Chess or Go, complex video games start to capture the messiness and continuous nature of the real world. The hope is that systems which solve complex video games will be highly general, with applications outside of games.

Dota 2 is a real-time strategy game played between two teams of five players, with each player controlling a character called a “hero”. A Dota-playing AI must master the following:

  • Long time horizons. Dota games run at 30 frames per second for an average of 45 minutes, resulting in 80,000 ticks per game. Most actions (like ordering a hero to move to a location) have minor impact individually, but some individual actions like town portal usage can affect the game strategically; some strategies can play out over an entire game. OpenAI Five observes every fourth frame, yielding 20,000 moves. Chess usually ends before 40 moves, Go before 150 moves, with almost every move being strategic.
  • Partially-observed state. Units and buildings can only see the area around them. The rest of the map is covered in a fog hiding enemies and their strategies. Strong play requires making inferences based on incomplete data, as well as modeling what one’s opponent might be up to. Both chess and Go are full-information games.
  • High-dimensional, continuous action space. In Dota, each hero can take dozens of actions, and many actions target either another unit or a position on the ground. We discretize the space into 170,000 possible actions per hero (not all valid each tick, such as using a spell on cooldown); not counting the continuous parts, there are an average of ~1,000 valid actions each tick. The average number of actions in chess is 35; in Go, 250.
  • High-dimensional, continuous observation space. Dota is played on a large continuous map containing ten heroes, dozens of buildings, dozens of NPC units, and a long tail of game features such as runes, trees, and wards. Our model observes the state of a Dota game via Valve’s Bot API as 20,000 (mostly floating-point) numbers representing all information a human is allowed to access. A chess board is naturally represented as about 70 enumeration values (a 8×8 board of 6 piece types and minor historical info); a Go board as about 400 enumeration values (a 19×19 board of 2 piece types plus Ko).

The Dota rules are also very complex — the game has been actively developed for over a decade, with game logic implemented in hundreds of thousands of lines of code. This logic takes milliseconds per tick to execute, versus nanoseconds for Chess or Go engines. The game also gets an update about once every two weeks, constantly changing the environment semantics.

AME [Faceless Void] Powerful Attack Speed Destroy Pub Game Dota 2
AME [Faceless Void] Powerful Attack Speed Destroy Pub Game Dota 2

More heroes

We saw very little slowdown in training going from 5 to 18 heroes. We hypothesized the same would be true going to even more heroes, and after The International, we put a lot of effort into integrating new ones.

We spent several weeks training with hero pools up to 25 heroes, bringing those heroes to approximately 5k MMR (about 95th percentile of Dota players). Although they were still improving, they weren’t learning fast enough to reach pro level before Finals. We haven’t yet had time to investigate why, but our hypotheses range from insufficient model capacity to needing better matchmaking for the expanded hero pool to requiring more training time for new heroes to catch up to old heroes. Imagine how hard it is for a human to learn a new hero when everyone else has mastered theirs!

We believe these issues are fundamentally solvable, and solving them could be interesting in its own right. The Finals version plays with 17 heroes—we removed Lich because his abilities were changed significantly in Dota version 7.20.

Model structure

Each of OpenAI Five’s networks contain a single-layer, 1024-unit LSTM that sees the current game state (extracted from Valve’s Bot API) and emits actions through several possible action heads. Each head has semantic meaning, for example, the number of ticks to delay this action, which action to select, the X or Y coordinate of this action in a grid around the unit, etc. Action heads are computed independently.

Interactive demonstration of the observation space and action space used by OpenAI Five. OpenAI Five views the world as a list of 20,000 numbers, and takes an action by emitting a list of 8 enumeration values. Select different actions and targets to understand how OpenAI Five encodes each action, and how it observes the world. The image shows the scene as a human would see it.

OpenAI Five can react to missing pieces of state that correlate with what it does see. For example, until recently OpenAI Five’s observations did not include shrapnel zones (areas where projectiles rain down on enemies), which humans see on screen. However, we observed OpenAI Five learning to walk out of (though not avoid entering) active shrapnel zones, since it could see its health decreasing.

NaVi vs KOI – Map 1 Ancient - PGL CS2 RMR EU 1
NaVi vs KOI – Map 1 Ancient – PGL CS2 RMR EU 1

The games

Thus far OpenAI Five has played (with our restrictions) versus each of these teams:

  1. Best OpenAI employee team: 2.5k MMR (46th percentile)
  2. Best audience players watching OpenAI employee match (including Blitz, who commentated the first OpenAI employee match): 4–6k MMR (90th-99th percentile), though they’d never played as a team.
  3. Valve employee team: 2.5–4k MMR (46th-90th percentile).
  4. Amateur team: 4.2k MMR (93rd percentile), trains as a team.
  5. Semi-pro team: 5.5k MMR (99th percentile), trains as a team.

The April 23rd version of OpenAI Five was the first to beat our scripted baseline. The May 15th version of OpenAI Five was evenly matched versus team 1, winning one game and losing another. The June 6th version of OpenAI Five decisively won all its games versus teams 1–3. We set up informal scrims with teams 4 & 5, expecting to lose soundly, but OpenAI Five won two of its first three games versus both.

The teamwork aspect of the bot was just overwhelming. It feels like five selfless players that know a good general strategy.

We observed that OpenAI Five:

  • Repeatedly sacrificed its own safe lane (top lane for dire; bottom lane for radiant) in exchange for controlling the enemy’s safe lane, forcing the fight onto the side that is harder for their opponent to defend. This strategy emerged in the professional scene in the last few years, and is now considered to be the prevailing tactic. Blitz commented that he only learned this after eight years of play, when Team Liquid told him about it.
  • Pushed the transitions from early- to mid-game faster than its opponents. It did this by: (1) setting up successful ganks (when players move around the map to ambush an enemy hero—see animation) when players overextended in their lane, and (2) by grouping up to take towers before the opponents could organize a counterplay.
  • Deviated from current playstyle in a few areas, such as giving support heroes (which usually do not take priority for resources) lots of early experience and gold. OpenAI Five’s prioritization allows for its damage to peak sooner and push its advantage harder, winning team fights and capitalizing on mistakes to ensure a fast win.

Trophies awarded after the match between the best players at OpenAI and our bot team. One trophy for the humans, one trophy for the bots (represented by Susan Zhang from our team!)

Compute

OpenAI Five’s victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we’d drive further progress by increasing our training scale. But after The International, we’d already dedicated the vast majority of our project’s compute to training a single OpenAI Five model. So we increased the scale of compute in the only way available to us: training for longer.

In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months (up from about 10,000 years over 1.5 realtime months as of The International), for an average of 250 years of simulated experience per day. The Finals version of OpenAI Five has a 99.9% winrate versus the TI version.[^footenote-winrate]

Camera Zoom Dota 2 ( The Dragon's Gift ) + Error Administrator
Camera Zoom Dota 2 ( The Dragon’s Gift ) + Error Administrator

Architecture[edit]

Each OpenAI Five bot is a neural network containing a single layer with a 4096-unit LSTM[18] that observes the current game state extracted from the Dota developer’s API. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around the unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes an action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world.[19]

OpenAI Five has been developed as a general-purpose reinforcement learning training system on the “Rapid” infrastructure. Rapid consists of two layers: it spins up thousands of machines and helps them ‘talk’ to each other and a second layer runs software. By 2018, OpenAI Five had played around 180 years worth of games in reinforcement learning running on 256 GPUs and 128,000 CPU cores,[20] using Proximal Policy Optimization, a policy gradient method.[19][21]

OpenAI 1v1 bot (2017) OpenAI Five (2018)
CPUs 60,000 CPU cores on Microsoft Azure 128,000 pre-emptible CPU cores on the Google Cloud Platform (GCP)
GPUs 256 K80 GPUs on Azure 256 P100 GPUs on the GCP
Experience collected ~300 years per day ~180 years per day
Size of observation ~3.3kB ~36.8kB
Observations per second of gameplay 10 7.5
Batch size 8,388,608 observations 1,048,576 observations
Batches per minute ~20 ~60

Keywords searched by users: openai dota 2 5v5

Openai Five Defeats Dota 2 World Champions
Openai Five Defeats Dota 2 World Champions
Openai Five Beats World Champion Dota2 Team 2-0! 🤖 - Youtube
Openai Five Beats World Champion Dota2 Team 2-0! 🤖 – Youtube
Openai 5V5 Vs Dota Top 0,05% Players — First Game Vs Pro Team - Youtube
Openai 5V5 Vs Dota Top 0,05% Players — First Game Vs Pro Team – Youtube
Openai Vs Humans - Ai Vs 99.95% Best Players 5V5 Dota 2 - Youtube
Openai Vs Humans – Ai Vs 99.95% Best Players 5V5 Dota 2 – Youtube
Openai'S Long Pursuit Of Dota 2 Mastery | Synced
Openai’S Long Pursuit Of Dota 2 Mastery | Synced
Openai Là Gì? Các Tính Năng, Ứng Dụng Tiêu Biểu Của Open Ai
Openai Là Gì? Các Tính Năng, Ứng Dụng Tiêu Biểu Của Open Ai
More On Dota 2
More On Dota 2
Openai Has Managed To Create A Shadowfiend Bot That, Under Several  Constraints, Plays 1V1 Dota At A World-Class Level. What Additional Ai  Techniques Might Be Needed In Order To Create A World-Class
Openai Has Managed To Create A Shadowfiend Bot That, Under Several Constraints, Plays 1V1 Dota At A World-Class Level. What Additional Ai Techniques Might Be Needed In Order To Create A World-Class

See more here: kientrucannam.vn

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *