Meta-World is an open source benchmark for developing and evaluating multi-task and meta reinforcement learning algorithms for continuous control robotic manipulation environments, with various benchmarks to evaluate different aspects of reinforcement learning algorithms.
The documentation website is at metaworld.farama.org, and we have a public discord server (which we also use to coordinate development work) that you can join here: https://discord.gg/bnJ6kubTg6
To install Meta-World, use pip install metaworld
We support and test for Python 3.8, 3.9, 3.10, 3.11 on Linux and macOS. We will accept PRs related to Windows, but do not officially support it.
The Meta-World API follows the Gymnasium API for environment creation and environment interactions.
To create a benchmark and interact with it:
import gymnasium as gym
import metaworld
env = gym.make("Meta-World/reach-V3")
observation, info = env.reset()
for _ in range(500):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
env.close()
The MT1, MT10, and MT50 benchmarks are the Multi-Task Benchmarks. These benchmarks are used to learn a multi-task policy that can learn 1, 10, or 50 training tasks simultaneously. MT1 benchmarks can be created with any of the 50 tasks available in Meta-World. In the MT10 and MT50 benchmarks, the observations returned by the benchmark will come with one-hot task IDs appended to the state.
The ML1, ML10, and ML45 benchmarks are 3 meta-reinforcement learning benchmarks available in Meta-World. The ML1 benchmark can be used with any of the 50 tasks available in Meta-World. The ML1 benchmark tests for few-shot adaptation to goal variations within a single task. The ML10 and ML45 both test few-shot adaptation to new tasks. ML10 comprises 10 train tasks with 5 test tasks, while ML45 comprises of 45 training tasks with 5 test tasks.
import gymnasium as gym
import metaworld
seed = 42 # for reproducibility
env = gym.make('Meta-World/reach-V3', seed=seed) # MT1 with the reach environment
obs, info = env.reset()
a = env.action_space.sample() # randomly sample an action
obs, reward, truncate, terminate, info = env.step(a) # apply the randomly sampled action
MT10 has two different versions that can be returned by gym.make. The first version is the synchronous version of the benchmark where all environments are contained within the same process. For users with limited compute resources, the synchronous option needs the least resources.
import gymnasium as gym
import metaworld
seed = 42
envs = gym.make('Meta-World/MT10-sync', seed=seed) # this returns a Synchronous Vector Environment with 10 environments
obs, info = envs.reset() # reset all 10 environments
a = env.action_space.sample() # sample an action for each environment
obs, reward, truncate, terminate, info = envs.step(a) # step all 10 environments
Alternatively, for users with more compute we also provide the asynchronous version of the MT10 benchmark where each environment is isolated in it's own process and must use inter-process messaging via pipes to communicate.
envs = gym.make('Meta-World/MT10-async', seed=seed) # this returns an Asynchronous Vector Environment with 10 environments
MT50 also contains two different versions, a synchronous and an asynchronous version, of the environments.
import gymnasium as gym
import metaworld
seed = 42
envs = gym.make('Meta-World/MT50-sync', seed=seed) # this returns a Synchronous Vector Environment with 50 environments
obs, info = envs.reset() # reset all 50 environments
a = env.action_space.sample() # sample an action for each environment
obs, reward, truncate, terminate, info = envs.step(a) # step all 50 environments
envs = gym.make('Meta-World/MT50-async', seed=seed) # this returns an Asynchronous Vector Environment with 50 environments
Each Meta-reinforcement learning benchmark has training and testing environments. These environments must be created separately as follows.
import gymnasium as gym
import metaworld
seed = 42
train_envs = gym.make('Meta-World/ML1-train-reach-V3', seed=seed)
test_envs = gym.make('Meta-World/ML1-test-reach-V3', seed=seed)
# training procedure use train_envs
# testing procedure use test_envs
Similar to the Multi-Task benchmarks, the ML10 and ML45 environments can be run in synchronous or asynchronous modes.
import gymnasium as gym
import metaworld
train_envs = gym.make('Meta-World/ML10-train-sync', seed=seed) # or ML10-train-async
test_envs = gym.make('Meta-World/ML10-test-sync', seed=seed) # or ML10-test-async
import gymnasium as gym
import metaworld
train_envs = gym.make('Meta-World/ML45-train-sync', seed=seed) # or ML45-train-async
test_envs = gym.make('Meta-World/ML45-test-sync', seed=seed) # or ML45-test-async
Finally, we also provide support for creating custom benchmarks by combining any number of Meta-World environments.
The prefix 'mt' will return environments that are goal observable for Multi-Task reinforcement learning, while the prefix 'ml' will return environments that are partially observable for Meta-reinforcement learning. Like the included MT and ML benchmarks, these environments can also be run in synchronous or asynchronous mode. In order to create a custom benchmark, the user must provide a list of environment names with the suffix '-V3'.
import gymnasium as gym
import metaworld
envs = gym.make('Meta-World/mt-custom-sync', envs_list=['env_name_1-V3', 'env_name_2-V3', 'env_name_3-V3'], seed=seed)
We have a roadmap for future development work for Gymnasium available here: Farama-Foundation#500