PPO, MVO, Tamar, MVP, MG are the same as the code in HalfCheetah

MVPI and TD3 are the same as the code in InvertedPendulum