MVPI and TD3 are the same as the code in InvertedPendulum