CSP/  The code of contextual symbolic policy.

Train the contextual symbolic policy with:
python  main.py  --config configs/xxx.config   --spls 0.25 --target_ratio 0.002  --arch_index 0  --hard_epoch 25  --seed 0

Then you can get the symbolic expression with CSP/sym.py.  The average count of all selected paths and paths selected by at least ninety perce can be calculated with CSP/mask_matrix.py




ESPL/ 
The code for single task symbolic policy learning.

Train the symbolic policy with:
python -u sac_symbolic_v1.py --env lunar_lander

Tune the Hyper-parameters by editing sac_symbolic_v1.py or using the argparse.