# Self Paced Deep Reinforcement Learning

## Installation

It is easiest to setup a virtual or conda environment in order to isolate the packages installed for this project from your global python installation. We used Python 3.6.10 on Ubuntu 18.04 LTS for the experiments. You can easily install the required dependencies by executing
```bash
pip install -r requirements.txt
```

This will install all packages required to run the point mass experiments. If you furthermore want to run the ball catching experiment, you need to also execute
```
pip install -r requirements_ext.txt
```
This will install a wrapper for the MuJoCo simulation library. For this to work, you need to have set up MuJoCo according to [this guide](https://github.com/openai/mujoco-py).

There exist a convenience script for running the experiments: **run_experiments.sh** . The script takes one argument that specifies the seed with which the experiments will be run. So in order to run all experiments with seed 1, you need to execute
```bash
./run_experiments.sh 1
```

After running the experiments for the desired number of seeds, the results can be visualized using the following command
```bash
python visualize_results.py --env point_mass point_mass_2d --learner ppo ppo --dist_vis point_mass_2d
python visualize_results.py --env point_mass point_mass_2d --learner trpo trpo --dist_vis point_mass_2d
python visualize_results.py --env point_mass point_mass_2d --learner sac sac --dist_vis point_mass_2d
python visualize_results.py --env ball_catching --learner ppo --dist_vis ball_catching
python visualize_results.py --env ball_catching --learner trpo --dist_vis ball_catching
python visualize_results.py --env ball_catching --learner sac --dist_vis ball_catching
```

There is a folder **pretrained_agents** that contains the final policies learned in the reported experiments. You can generate the plots from the appendix using the following commands

```bash
python misc/compute_ball_catching_success_rates.py.py --base_log_dir pretrained_agents --learner [sac|trpo|ppo]
python misc/visualize_point_mass_trajectories.py --base_log_dir pretrained_agents --learner [sac|trpo|ppo]
python misc/compute_t_tests.py --base_log_dir pretrained_agents --env [point_mass|point_mass_2d|ball_catching]
python misc/visualize_ball_catching_policies.py --base_log_dir pretrained_agents --learner [sac|trpo|ppo] --type [self_paced|default|goal_gan|alp_gmm]
``` 

## Results

The results obtained by running the experiments for the first 20 seeds are as follows (given in LaTeX tables).

### Point Mass

\begin{tabular}{lcccccr}
\toprule
 TRPO (P3D) & PPO (P3D) & SAC (P3D) & TRPO (P2D) & PPO (P2D) & SAC (P2D) \\
\midrule
ALP-GMM & 3.92 \pm 0.5 & $2.34\pm0.2$ & $0.96\pm0.3$ & 6.38 \pm 0.3 & $5.24\pm0.4$ & $1.15\pm0.4$ \\
GoalGAN & 0.87 \pm 0.3 & $0.50\pm0.0$ & $1.08\pm0.4$ & 3.40 \pm 0.7 &$1.39\pm0.5$ & $0.72\pm0.2$ \\
SPDL & \mathbf{9.13 \pm 0.3} & $\mathbf{9.35\pm0.1}$ & $\mathbf{4.43\pm0.7}$ & \mathbf{9.11 \pm 0.4} & $\mathbf{9.02 \pm 0.4}$ & $\mathbf{4.69\pm0.7}$ \\
Random & 1.05 \pm 0.2 & $0.53\pm0.0$ & $0.60\pm0.1$ & 4.47 \pm 0.4 & $1.34\pm0.3$ & $0.93\pm0.3$ \\
Default & 2.49 \pm 0.0 & $2.46\pm0.0$ & $2.26\pm0.0$ & 2.47 \pm 0.0 & $2.47\pm0.0$ & $2.23\pm0.0$ \\
\bottomrule
\end{tabular}
\end{small}

### Ant

\begin{tabular}{lr}
\toprule
 & PPO \\
\midrule
ALP-GMM & $209.22 \pm 44.4$ \\
GoalGAN & $847.20 \pm 97.58$ \\
SPDL & $\mathbf{1278.21 \pm 29.09$} \\
Default & $335.12 \pm 2.19$ \\
Random & $326.45 \pm 30.60$ \\
\bottomrule
\end{tabular}

### Ball Catching

\begin{tabular}{lccr}
\toprule
 & TRPO & PPO & SAC \\
\midrule
ALP-GMM & $39.8\pm1.1$ & $46.5\pm0.7$ & 33.2 \pm 1.0 \\
GoalGAN & $42.5\pm1.6$ & $42.6\pm2.7$ & 27.0 \pm 1.5 \\
GoalGAN* & $45.8\pm1.0$ & $45.9\pm1.0$ & 28.1 \pm 1.2 \\
SPDL & $47.0\pm2.0$ & $\mathbf{53.9\pm0.4}$ & 31.1 \pm 2.3 \\
SPDL* & $43.3\pm2.0$ & $49.3\pm1.4$ & 36.9 \pm 1.0 \\
Default & $21.0\pm0.3$ & $22.1\pm0.3$ & 4.67 \pm 0.6 \\
Default* & $21.2 \pm 0.3$ & $23.0 \pm 0.7$ & 6.18 \pm 0.6 \\
\bottomrule
\end{tabular}

