The code is for paper#2308: Off-Policy Selection for Initiating Human-Centric Experimental Design


This folder contains the code to (1) prepare data with sharable format and to be trained for augmentation and evaluation; (2) train the VAE-MDP using training data as inputs; and (3) policy evaluation.

Folders:
augmented_dataset -- contains the augmented trajectories
processed_data -- contains the prepared data in sharable format
raw_data -- contains original trajectories
saved_augmented_data -- contains augmented trajectories
saved_dist -- stores (pre-trained) behavior policy checkpoints that can be loaded as behavior policy
saved_models -- stores (pre-trained) checkpoints that can be loaded as trajectory augmentation models
model -- stores target (evaluation policies)
cluster_data -- stores subgroup-specific data

############################################
(1) Data preparation 
############################################

Execute data_preparation.py
Dependencies:
Python 3
numpy 1.21.2
pandas 1.3.5
csv 1.0
sklearn 1.0.2

############################################
(2) Train/Evaluate the MDP-VAE Model 
############################################

Step1. Execute learn_behavior.py
Step2. Execute LSTM_VAE_train.py
Step3. Execute LSTM_VAE_eval.py

Dependencies:
Python 3
tensorflow 1.15.0
gym 0.21.0
numpy 1.21.2
pandas 1.3.5
csv 1.0
sklearn 1.0.2

############################################
(3) Off-policy evaluation 
############################################

Execute policy_evaluation.py

Dependencies:
Python 3
tensorflow 1.15.0
numpy 1.21.2
pandas 1.3.5
csv 1.0
gym 0.21.0

****************************************************************************************************
----------------------------------------------------------------------------------------------------
Due to the IRB protocol, either students original data or anonymized data derived from original data is not included in this folder.
----------------------------------------------------------------------------------------------------
****************************************************************************************************
