# GPT2 workbench

Please find the optimizer in 'workbench/train/optim/adam_cpr.py'

## Install environment
```bash
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt


git clone https://github.com/HazyResearch/flash-attention \
    && printf "[safe]\n\tdirectory = /flash-attention" > ~/.gitconfig \
    && git config --global --add safe.directory /home/user/flash-attention \
    && cd flash-attention && git checkout v2.5.8 \
    && cd csrc/fused_softmax && pip install . && cd ../../ \
    && cd csrc/rotary && pip install . && cd ../../ \
    && cd csrc/xentropy && pip install . && cd ../../ \
    && cd csrc/layer_norm && pip install . && cd ../../ \
    && cd csrc/fused_dense_lib && pip install . && cd ../../ \
    && cd csrc/ft_attention && pip install . && cd ../../ \
    && cd .. && rm -rf flash-attention
pip install -e .

```

## Run experiments for AdamW with small model on tinystories for testing
```bash
python train_gpt_model.py -c default_config.yaml optimizer=AdamW weight_decay=0.1 model.model_dim=128 model.num_head=2 model.n_layers=2 data.dataset_name=tinystories data.max_sample_len=256 data.batch_size=8
```

## Run GPT2s experiments for AdamW
```bash
python train_gpt_model.py -c default_config.yaml  optimizer=AdamW weight_decay=0.1
```


## Run experiments for AdamCPR with small model on tinystories for testing
```bash
python train_gpt_model.py -c default_config.yaml  optimizer=AdamCPR optim.cpr_config.kappa_init_method=inflection_point model.model_dim=128 model.num_head=2 model.n_layers=2 data.dataset_name=tinystories data.max_sample_len=256 data.batch_size=8
```

## Run GPT2s experiments for AdamCPR with Kappa-IP
```bash
python train_gpt_model.py -c default_config.yaml  optimizer=AdamCPR optim.cpr_config.kappa_init_method=inflection_point
```

## Run GPT2s experiments for AdamCPR with Kappa-WS
```bash
python train_gpt_model.py -c default_config.yaml  optimizer=AdamCPR optim.cpr_config.kappa_init_method=warm_start optim.cpr_config.kappa_init_value=5000
```
