# Domain Adaptive Imitation Learning with Visual Observation

This repository is the under-review-version implementation of "Domain Adaptive Imitation Learning with Visual Observation", which is submitted to NeurIPS 2023.

This repository is submitted to NeurIPS 2023 as a supplementary material.

Our implementation is built based on the code of "Domain-Robust Visual Imitation Learning with Mutual Information Constraints" (ICLR 2021). You can find the code [here](https://github.com/Aladoro/domain-robust-visual-il).


## Requirements

We use Anaconda to set up the environment. 

The environments used in the experiments are based on OpenAI Gym and DeepMind Control Suite. 

Before you install mujoco-py and run the experiment, you need a MuJoCo activation key, which is downloadable at https://www.roboti.us/license.html. (expires on October 18, 2031.)

We used Titan XP GPU as our computing resource with CUDA 10.0 and CUDNN 7.6.4 installed.


To install conda environment with requirements:

```
conda env create -f environment.yml
```
**Important.** If you use NVIDIA GeForce RTX 3090, you need at least CUDA 11.4, CUDNN 8.2.4, tensorflow-gpu==2.5, and tensorflow-probability==0.12.0. In this case, you need to get the proper versions of CUDA and CUDNN and run this command after you installed the requirements:
```
pip install tensorflow==2.5 gym==0.15.4 tensorflow-probability==0.12.0 
```

To activate the conda environment:
```
conda activate d3il
```

If you have trouble rendering MuJoCo environments off-screen, installing the following packages can solve the problem:
```
sudo apt install libsm6 libxext6 libxrender-dev libosmesa6-dev libgl1-mesa-glx libglfw3
```



## [Optional] Train expert policies

We provide trained expert policies in "sac_expert" directory, so you don't need to train them from scratch. 
If you want to train your own expert, you can delete the trained expert and run this command. 

For example, this command trains an expert policy for "InvertedPendulum" RL task, collects source domain expert demonstrations, and saves the expert:
```
python -m custom_code.collect_demo.collect_data_for_imitation --demo_type=expert --env_name=InvertedPendulum-v2 --save_expert --demo_timesteps=10000
```

The list of environments `--env_name` available for training expert policies are the following:
* InvertedPendulum-v2
* InvertedDoublePendulum-v2
* Reacher2-v2
* Reacher3-v2
* HalfCheetah-v2 (`--demo_timesteps=20000`)
* PointUMaze-v2
* DMCartPoleBalance
* DMCartPoleSwingUp
* DMPendulum


## Collect expert demonstrations / non-expert datasets 

This command collects expert demonstrations from saved expert for "InvertedPendulum" in the source domain:
```
python -m custom_code.collect_demo.collect_data_for_imitation --demo_type=expert --env_name=InvertedPendulum-v2 --load_expert --demo_timesteps=10000
```
The list of environments `--env_name` available for collecting expert demonstrations are the same as those listed above.

This command collects non-expert datasets for "InvertedPendulum" and "InvertedDoublePendulum" both in the source domain and in the target domain. 
```
python -m custom_code.collect_demo.collect_data_for_imitation --demo_type=prior --realm_name=InvertedPendulum --demo_timesteps=10000
```

The list of environments `--realm_name` available for collecting non-expert datasets are the following:
* InvertedPendulum
* Reacher
* HalfCheetah (`--demo_timesteps=20000`)
* PointUMaze
* DMCartPoleBalance
* DMCartPoleSwingUp
* DMPendulum


## Training feature extraction model & image generation model

This command trains the feature extraction model and the image generation model for "InvertedPendulum-to-colored" IL task and save the model.
```
python -m custom_code.run_imitation.run_d3il --env_name=InvertedPendulum-v2 --env_type=to_colored --exp_id=230525_0000 --gpu_id=0 --debug_save_pretrained_it_model --debug_only_pretrain
```

The list of `--env_name` and `--env_type` available are the following:
* `--env_name=InvertedPendulum-v2` &rarr; `--env_type=` `to_colored` or `to_two`
* `--env_name=InvertedDoublePendulum-v2` &rarr; `--env_type=` `to_colored` or `to_one`
* `--env_name=Reacher2-v2` &rarr; `--env_type=` `to_tilted` or `to_three`
* `--env_name=Reacher3-v2` &rarr; `--env_type=` `to_tilted` or `to_two`
* `--env_name=HalfCheetah-v2` &rarr; `--env_type=locked_legs`
* `--env_name=PointUMaze-v2` &rarr; `--env_type=to_ant`
* `--env_name=DMCartPoleBalance` &rarr; `--env_type=` `to_cartpoleswingup` or `to_pendulum`
* `--env_name=DMCartPoleSwingUp` &rarr; `--env_type=` `to_cartpolebalance`, `to_pendulum` or `to_acrobot`
* `--env_name=DMPendulum` &rarr; `--env_type=` `to_cartpolebalance`, `to_cartpoleswingup` or `to_acrobot`


## Training policy

This command loads the feature extraction model and the image generation model for "InvertedPendulum-to-colored" IL task and train the policy in the target domain with seed 0.
```
python -m custom_code.run_imitation.run_d3il --env_name=InvertedPendulum-v2 --env_type=to_colored --exp_id=230525_0000 --exp_num=0 --gpu_id=0 --debug_load_pretrained_it_model
```
The list of `--env_name` and `--env_type` available are the same as those listed above.