## CAMS: Cost-Effective Online Contextual Model Selection

### Requirements:
  pip install -r requirements.txt

### Datasets/Benchmarks:

We did experiments on four datasets {CIFAR10, DRIFT, VETEBRAL, HIV}.
The current package contains {DRIFT, VETEBRAL, HIV}.
Please download the CIFAR10 structured data stream file at https://gitlab.com/csconf1000paper/confdata
and unzip the file to be in the same directory as other datasets `resources/contextual_data/`.

### Reproducibility:
There are three steps to reproduce the results in the paper. Please run the steps in the following order.

#### 1. Update the task name in the config.py file
(To avoid the resource problem, please do not run tasks in parallel.)

For example, if you want to run task1 to reproduce the results in the main paper, please update the task in the config.py file as follows.

`task="task1"`

#### 2. Run bash file 
(The bash task should match the task update in config.py file (step (1))

```
# for run task1
  . task1_whole_test.sh 
```

At the end of the task, it will generate a log file for each benchmark or policy.
After the task is completed, it will save all the results' data at resources/results/

#### 3. To visualize and reproduce the result, run plot

All the log files for the same task will have the same timestamp at the end of the file, such as `_Date-2022-05-25_Time-20-50`.

Please check the log file in your folder and run the plot command for the task.
```
python plots_task1_main_plot.py -t _Date-2022-05-25_Time-20-50
```
Then you will get the figures and associated task files in the task folder.

(We tested on a Linux server with 80 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and a total of 528 Gigabyte memory. If your computational resource is very limited, you could run the single command in the .sh bash file and copy the timestamp(result folder path) to the plots file one by one. Finally, you will get the same result.)

#### task9: scaling parameter
For task9, we need to run and plot each dataset experiment sequentially.

  (1) In the config file, we need to set 
```
    task="task9"
    dataset="cifar10" # one of { cifar10, drift, vetebral, hiv}
```
  (2) Run task: `task9_apply_scaling_parameter.sh` 
      comment out other datasets task
  (9) Plot: `plots_task9_apply_scaling_parameter.py`
      comment out other datasets plotting task

```
Usage:
    -dataset {drift_contextual,cifar_contextual,HIV_contextual,VERTEBRAL_contextual}
    -stream_size Size of streaming instances in a single realization
    -budgets  Query Cost limit
    -num_reals  Number of realizations over which experiment results are averaged
    -p run single policy
    -which_methods WHICH_METHODS    List of integers of size 11 that account for the model selection methods in order: 
    ['Model Picker', 'Query by Committee', 'Structural Query by Committee', 'Random Sampling',
    'Importance Weighted Active Learning', 'Efficient Active Learning',
    "Oracle","CAMS","CAMS-MAX","contextual qbc","contextual iwal"]) # the algorithm might be different for different task
```

Folder Structure:
```
    algorithms:      src/methods/
    results:         resources/results/
    datasets:        resources/contextual_data/
    figures and log: task*/
```

