A standard API for MORL and a diverse set of reference environments.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

tests

MO-Gymnasium: Multi-Objective Reinforcement Learning Environments

MO-Gymnasium is an open source Python library for developing and comparing multi-objective reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Essentially, the environments follow the standard Gymnasium API, but return vectorized rewards as numpy arrays.

The documentation website is at mo-gymnasium.farama.org, and we have a public discord server (which we also use to coordinate development work) that you can join here: https://discord.gg/bnJ6kubTg6.

Environments

MO-Gymnasium includes environments taken from the MORL literature, as well as multi-objective version of classical environments, such as Mujoco.

Env	Obs/Action spaces	Objectives	Description
`deep-sea-treasure-v0`	Discrete / Discrete	`[treasure, time_penalty]`	Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
`resource-gathering-v0`	Discrete / Discrete	`[enemy, gold, gem]`	Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
`fishwood-v0`	Discrete / Discrete	`[fish_amount, wood_amount]`	ESR environment, the agent must collect fish and wood to light a fire and eat. From Roijers et al. 2018.
`breakable-bottles-v0`	Discrete (Dictionary) / Discrete	`[time_penalty, bottles_delivered, potential]`	Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From Vamplew et al. 2021.
`four-room-v0`	Discrete / Discrete	`[item1, item2, item3]`	Agent must collect three different types of items in the map and reach the goal. From Alegre et al. 2022.
`mo-mountaincar-v0`	Continuous / Discrete	`[time_penalty, reverse_penalty, forward_penalty]`	Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
`mo-mountaincarcontinuous-v0`	Continuous / Continuous	`[time_penalty, fuel_consumption_penalty]`	Continuous Mountain Car env, but with penalties for fuel consumption.
`mo-lunar-lander-v2`	Continuous / Discrete or Continuous	`[landed, shaped_reward, main_engine_fuel, side_engine_fuel]`	MO version of the `LunarLander-v2` environment. Objectives defined similarly as in Hung et al. 2022.
`minecart-v0`	Continuous or Image / Discrete	`[ore1, ore2, fuel]`	Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
`mo-highway-v0` and `mo-highway-fast-v0`	Continuous / Discrete	`[speed, right_lane, collision]`	The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From highway-env.
`mo-reacher-v4`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	Mujoco version of `mo-reacher-v0`, based on `Reacher-v4` environment.
`mo-halfcheetah-v4`	Continuous / Continuous	`[velocity, energy]`	Multi-objective version of HalfCheetah-v4 env. Similar to Xu et al. 2020.
`mo-hopper-v4`	Continuous / Continuous	`[velocity, height, energy]`	Multi-objective version of Hopper-v4 env.
`water-reservoir-v0`	Continuous / Continuous	`[cost_flooding, deficit_water]`	A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From Pianosi et al. 2013.
`fruit-tree-v0`	Discrete / Discrete	`[nutri1, ..., nutri6]`	Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From Yang et al. 2019.
`mo-reacher-v0`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	[:warning: PyBullet support is limited.] Reacher robot from PyBullet, but there are 4 different target positions. From Alegre et al. 2022.
`mo-supermario-v0`	Image / Discrete	`[x_pos, time, death, coin, enemy]`	[:warning: SuperMarioBrosEnv support is limited.] Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Installation

To install MO-Gymnasium, use:

pip install mo-gymnasium

This does not include dependencies for all families of environments (some can be problematic to install on certain systems). You can install these dependencies for one family like pip install "mo-gymnasium[mujoco]" or use pip install "mo-gymnasium[all]" to install all dependencies.

API

As for Gymnasium, the MO-Gymnasium API models environments as simple Python env classes. Creating environment instances and interacting with them is very simple - here's an example using the "minecart-v0" environment:

import gymnasium as gym
import mo_gymnasium as mo_gym
import numpy as np

# It follows the original Gymnasium API ...
env = mo_gym.make('minecart-v0')

obs, info = env.reset()
# but vector_reward is a numpy array!
next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs))

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

You can also check more examples in this colab notebook!

Notable related libraries

MORL-Baselines is a repository containing various implementations of MORL algorithms by the same authors as MO-Gymnasium. It relies on the MO-Gymnasium API and shows various examples of the usage of wrappers and environments.

Environment Versioning

MO-Gymnasium keeps strict versioning for reproducibility reasons. All environments end in a suffix like "-v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion.

Citing

If you use this repository in your work, please cite:

@inproceedings{Alegre+2022bnaic,
  author = {Lucas N. Alegre and Florian	Felten and El-Ghazali Talbi and Gr{\'e}goire Danoy and Ann Now{\'e} and Ana L. C. Bazzan and Bruno C. da Silva},
  title = {{MO-Gym}: A Library of Multi-Objective Reinforcement Learning Environments},
  booktitle = {Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022},
  year = {2022}
}

Acknowledgments

The minecart-v0 env is a refactor of https://github.com/axelabels/DynMORL.
The deep-sea-treasure-v0, fruit-tree-v0 and mo-supermario-v0 envs are based on https://github.com/RunzheYang/MORL.
The four-room-v0 env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.
The fishwood-v0 code was provided by Denis Steckelmacher and Conor F. Hayes.
The water-reservoir-v0 code was provided by Mathieu Reymond.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.0

Mar 11, 2024

1.0.1

Aug 24, 2023

1.0.0

Jun 12, 2023

0.3.4

Mar 14, 2023

This version

0.3.3

Feb 13, 2023

0.3.2

Feb 3, 2023

0.3.1

Feb 2, 2023

0.3.0

Jan 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mo-gymnasium-0.3.3.tar.gz (981.5 kB view hashes)

Uploaded Feb 13, 2023 Source

Built Distribution

mo_gymnasium-0.3.3-py3-none-any.whl (978.7 kB view hashes)

Uploaded Feb 13, 2023 Python 3

Hashes for mo-gymnasium-0.3.3.tar.gz

Hashes for mo-gymnasium-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`b863261a7abe66e371ffb06d4f213d5a7891a349e3e525a39738921c52a02249`
MD5	`632b0d28d75513a741f9f354317ceff3`
BLAKE2b-256	`25fe927c16db10e4d2d0ad35d8a42f46c0c6ea5313cd096118d72f38eb627d9b`

Hashes for mo_gymnasium-0.3.3-py3-none-any.whl

Hashes for mo_gymnasium-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09762cebbc6a263a39ac24e80eeb85b248de0302d52959281c9e096198964983`
MD5	`494153b4527d3a69985d2c32aac00d58`
BLAKE2b-256	`4bf77968e887f05c4bf2cd2a763475885baa26f8c87a1c7eeb9888dd8f966a85`