SAILenv is a platform that makes it easy to customize and interface 3D Virtual Environments with your Machine Learning algorithms. It is powered by Unity, and it is capable of generating frames at real-time speed, providing full pixel-wise annotations (semantic and instance labeling, depth, optical flow). It includes 3+1 pre-built scenes.
SAILenv comes with a Python API, designed to easily integrate with the most common learning frameworks.

SAILenv was developed in the context of some of the activities of the PRIN 2017 project RexLearn, funded by the Italian Ministry of Education, University and Research (grant no. 2017TWNMH2).

The original SAILenv article, “SAILenv: Learning in Virtual Visual Environments Made Simple” can be found here.

    title={SAILenv: Learning in Virtual Visual Environments Made Simple},
    author={Enrico Meloni and Luca Pasqualini and Matteo Tiezzi and Marco Gori and Stefano Melacci},
    booktitle = {International Conference on Pattern Recognition ({ICPR})},


You can also read the new SAILenv paper, “Evaluating Continual Learning Algorithms by Generating 3D Virtual Environments” at this link.

      title={Evaluating Continual Learning Algorithms by Generating 3D Virtual Environments}, 
      author={Enrico Meloni and Alessandro Betti and Lapo Faggi and Simone Marullo and Matteo Tiezzi and Stefano Melacci},


You can find the latest SAILenv paper, “Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects” at this link.

      title={Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects}, 
      author={Enrico Meloni and Matteo Tiezzi and Luca Pasqualini and Marco Gori and Stefano Melacci},



Pixel-wise Annotations

SAILenv yields the classic RGB view, depth information, optical flow, semantic segmentation, instance labeling, as shown in the figure above.
Each view includes information for all the pixels acquired by the camera. The RGB view is straightforwardly implemented representing each pixel (in each channel) with the classic 8-bit encoding, while depth information is inherited by the Unity engine, through a gray-scale texture representing the distance of the observed rendered objects from the camera view. In the Depth View, lighter pixels indicate elements that are closer to the agent.

Each pixel is fully annotated with a category identifier (semantic labeling) and an instance identifier, that are encoded in the category and instance views, respectively, as shown above. While the instance identifier is implicitly given by the unique Unity identifier of each object in the scene, categories can be added or edited using the Unity editor GUI, without any code-level operations. In particular, categories are represented as Unity objects, and they can be attached to every other object by a drag-and-drop operation. SAILenv also includes a “category holder” to organize sets of categories and to allow the user to quickly add them to the current scene.


Optical Flow

SAILenv also yields highly precise and dense motion information about the environment. Differently from what is done by the most common optical flow algorithms, the SAILenv optical flow is not due to an estimation obtained by observing consecutive frames, and it is fully computed by the physics engine of Unity.
Unity has access to the information about the motion of the objects in the scene and the agent viewpoint, and it uses them to drive the simulation of physics of the environment. SAILenv inherits such information and adapts it to generate a view that includes the motion vectors for all the pixels of the frame.

In the images shown above we report three examples of the optical flow computed in a scene populated solely by a rotating cube. The cube has no special textures, and it has a uniform color. This clearly makes it hard to estimate the pixel-level motion using classic algorithms, while SAILenv can correctly capture the rotation of the cube. Widely used implementations, such as the Farneback algorithm implemented in OpenCV, or modern approaches based on convolutional neural networks (FlowNetLite) fail in correctly capturing the motion, as noticeable in the above figures. Despite its very high precision, the optical flow has almost null computational burden on the Unity server, as shown in the plot below.


Object Library


Relying on the Unity engine to handle the virtual environments allows SAILenv to exploit all the facilities of the powerful 3D editor that comes with Unity.
However, creating new scenes in virtual environments might quickly become a time consuming procedure that requires experience in 3D graphics. This is even more evident when preparing photo-realistic objects, that requires the user to pay attention to a large number of aspects in order to reach a certain target appearance for the object.

In order to partially mitigate these issues, SAILenv includes more than 65 objects that can be placed in any scene, plus some objects related to the structure of the sample scenes (walls, windows, etc.). See the images above for some examples of the available objects. Most object meshes were originally taken from the AI2-THOR project, and strongly re-worked in order to improve the quality of their appearance, reaching a more advanced photo-realistic level.


Ready-To-Go Scenes

SAILenv currently includes a ready-to-go Unity project with all the photo-realistic elements and 4 sample scenes based on them, meant to demonstrate the capabilities of the framework and to run some experiments in simple contexts. The user can either edit one of these scenes or create a new one either using the SAILenv objects or other 3D elements. The sample scenes are about different rooms, and they are based on a variety of objects, and they also include moving objects to evaluate motion-based algorithms. The agent has a predefined motion pattern such that it automatically moves around the scenes exploring the available areas.



Server Executables (Sample Scenes): Version Directory

Source Unity Project (Customizable): Source Code

Client Python API: Source Code (GitHub) | Pip Package

3D Models .OBJ for Adversarial Attacks: ZIP archive

Python Downloader:

Server executables and the source Unity project can also be downloaded using the Python API. These instructions will download zip files on the working directory.

import sailenv.downloader as dwn



SAILenv supports all main Operating Systems (Windows, Linux, Mac OSX).


  1. Install the python package using pip or through source code
  2. Download the executable through version directory or with Python (see above)
  3. Double click on SAILenv.exe
  4. Run the Python Code



  1. Install the python package using pip or through source code
  2. Download the executable through version directory or with Python (see above)
  3. If you don’t have a monitor attached to the server, you need to setup a X11 server with virtual monitors (see the example reported right after this list)
  4. Add execution permissions: chmod +x ./
  5. Run ./ on the chosen display
  6. Run the Python Code

In order to run X on virtual monitors, we provide the following example as a reference (Ubuntu)

# Install XServer, if not already there
sudo apt-get install xserver-xorg

# If you have an NVIDIA card, you can use nvidia-config to create a valid configuration file for X11 based on virtual monitors (/etc/X11/xorg.conf):
sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

# Run X on a display ID (that is 0 in this example)
sudo /usr/bin/X :0 &

# Select the display and run the SAILenv server executable (assuming you are in the folder in which such executable is)

# In order to allow non-sudoers to run X edit /etc/X11/Xwrapper.config, chaning the allowed_users field as follows:



  1. Install the python package using pip or through source code
  2. Download the executable through version directory or with Python (see above)
  3. If you don’t have a monitor attached to the server, you need to setup a X11 server with virtual monitors.
  4. Add execution permissions: chmod +x
  5. Double Click on
  6. Run the Python Code


Extended Tutorial

We provide an extended tutorial that will guide you in moving your first steps with SAILenv and with the Unity editor [Download].


Further Questions

If you have any unsolved questions or you just want to get in contact with us, please write to any of the following email addresses:


Sample Script: Get Data from SAILenv

import time
import numpy as np
import cv2
from random import randint

from sailenv.agent import Agent
from sailenv.utilities import draw_flow_map

def convert_color_space(array: np.ndarray):
        Convert the given numpy array from RGB to GBR.

        :param array: the numpy array to convert
        :return: the converted image that can be displayed
    image = cv2.cvtColor(array, cv2.COLOR_RGB2BGR)
    return image

frames: int = 1000
host = ""
if __name__ == '__main__':
    print("Generating agent...")
    agent = Agent(depth_frame_active=True,
                  category_frame_active=True, width=256, height=192, host=host, port=8085, use_gzip=False)
    print("Registering agent on server...")
    print(f"Agent registered with ID: {}")
    last_unity_time: float = 0.0

    print(f"Available scenes: {agent.scenes}")

    scene = agent.scenes[2]
    print(f"Changing scene to {scene}")

    print(f"Available categories: {agent.categories}")
        print("Press ESC to close")
        while True:
            frame = agent.get_frame()

            # get RGB view
            if frame["main"] is not None:
                main_img = convert_color_space(frame["main"])
                cv2.imshow("PBR", main_img)

            # get instance segmentation view
            if frame["object"] is not None:
                obj_img = convert_color_space(frame["object"])
                cv2.imshow("Object ID", obj_img)

            # get class segmentation view
            if frame["category"] is not None:
                k = np.array(list(agent.cat_colors.keys()))
                v = np.array(list(agent.cat_colors.values()))

                # create a map from category to colors as tensor
                mapping_ar = np.zeros((np.maximum(np.max(k) + 1, 256), 3), dtype=v.dtype)
                mapping_ar[k] = v

                cat_img = mapping_ar[frame["category"]]
                cat_img = np.reshape(cat_img, (agent.height, agent.width, 3))
                cat_img = cat_img.astype(np.uint8)
                cv2.imshow("Category ID", cat_img)

            # getting optical flow view
            if frame["flow"] is not None:
                flow = frame["flow"]
                # utility for converting to HSV color space
                flow_img = draw_flow_map(flow)
                cv2.imshow("Optical Flow", flow_img)

            # getting depth view
            if frame["depth"] is not None:
                depth = frame["depth"]
                cv2.imshow("Depth", depth)

            key = cv2.waitKey(1)
            if key == 27:  # ESC Pressed
        print(f"Closing agent {}")