# NeurIPS datasets & benchmarks track 2024 supplementary materials

We were asked to prepare the following:

- Submission introducing new datasets must include the following in the supplementary materials (as a separate PDF):
    - Dataset documentation and intended uses. Recommended documentation frameworks include datasheets for datasets, dataset nutrition labels, data statements for NLP, data cards, and accountability frameworks.
      - **Our repository is documented with READMEs and examples. However, in addition we used one of the templates suggested; the results are in `data_card.md`.**
    - URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers. 
      - **[Repository link](https://anonymous.4open.science/r/sadcode-for-review-7B75/README.md)**
    - URL to Croissant metadata record documenting the dataset/benchmark available for viewing and downloading by the reviewers. You can create your Croissant metadata using e.g. the Python library available here: https://github.com/mlcommons/croissant
      - **We do not upload this because, to avoid the dataset being used in model training data, we have taken great pains to keep it off the public internet (including all data files in the repository being included as encrypted zip files, but with Shell scripts available for unzipping&decrypting and encrypting&zipping). However, we have uploaded a csv including those tasks within SAD that are both static and do not have their answers vary from model to model in `static_sad_data.csv`.**
    - Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.
      - **See `author_statement.md`**
    - Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.
      - **This is discussed in `data_card.md`. The short version is that the dataset will be stored on a GitHub repository. The de-anonymized version will become available after the review period.**

Note that SAD is not a dataset but a benchmark or an evaluation.

SAD is licensed under an MIT License (not CC, since it also includes code).