README

This repository includes code for the submission entitled "Affinity-Aware Graph Networks." We have included code for both data generation (along with relevant graph utilities for affinity measure computation), as well as the models and training script. We describe these below. Please note that you may need to install certain Python libraries to get the code to run, including but not limited to the following (which can be installed via Pip):

-cloudpickle
-jax
-jaxlib
-jraph
-ml_collections
-scipy



*Data Generation*

The main file to run is pna_to_jraph.py, which requires an input data file 'multitask_dataset.pkl' in the data/raw/ subdirectory. This script processes the aforementioned raw data file into an output file 'pna_jraph.pkl' in the data/ subdirectory. This generated file contains the necessary data for the PNA dataset in Jraph format and can be used by the train script.

This step may be skipped if one has a pregenerated 'pna_jraph.pkl' file, which can be copied to the necessary subdirectory. In particular, we have provided both the raw data file and processed data file below, as part of anonymized repositories:

Raw data file: https://drive.google.com/file/d/1G2vL6BgVmiqTSIt1z2FQRui9IxSRG3Bk/view?usp=sharing
Processed data file: https://drive.google.com/file/d/14lNhjpym2x8WGYx9lLOQ2pMzCz748GkU/view?usp=sharing



*Training*

The main file to run is train.py, which takes in a configuration file 'config.py' that specifies the configuration for the model (e.g., which affinity features to use, model architecture, number of layers, etc.). We have provided a sample 'config.py' file that makes use of all affinity features. The script also requires the 'pna_jraph.pkl' file described above.
