# All Politics is Local: Redistricting via Local Fairness

## Overview

There are two major components in our experiments. The first part uses the Julia language and covers most of the computationally-intensive parts, e.g., generating ensembles, generating random spanning trees, and auditing plans using ensembles and DPs. The second part uses Python (in a Jupyter Notebook) and covers computation of compactness (Polsby-Popper scores) and all visualization and data analysis.

## Set-up

#### Required Files and Packages
To generate the ensemble for a state, we use the [Gerrychain Julia Package](https://docs.juliahub.com/GerryChain/UHUIf/0.1.2/).
GerryChain requires a shapefile containing geographic information for the state; we use the shapefiles from [MGGG States](https://github.com/mggg-states).
To audit a districting plan, the shapefile must additionally contain election voter data.

In addition to ```GerryChain```, this project also requires the following Julia Packages: ```TickTock```, ```Plots```, ```Shapefile```, ```Printf```, ```JLD```, ```SparseArrays```, ```Random```, ```DataFrames```, ```CSV```, ```Graphs```, ```ArgParse```,
and the following Python Packages: ```geopandas```, ```pandas```, ```numpy```, ```matplotlib```, ```seaborn```, ```sys```.

Throughout, we use North Carolina as an example state.

#### Metadata Required
To run each piece of code in this project, the shapefile path must be specified. Below is a table that lists all fields required in the metadata and their meanings. The metadata is specified at the top of each file. All paths are relative.

| Field           | Description | Required in |
| --------------- | ----------- | ----------- |
| SHAPEFILE_PATH  | path of shapefile of the precinct graph (usually .shp file but sometimes can be .json), including the file extension. | ```generate_maps.jl```, ```generate_random_spanning_tree.jl```, ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl```|
| DISTRICTING_PATH  | name of the common prefix of the .csv files of plans to be visualized. See examples. | ```districts_to_csvs.jl```|
| POPULATION_COL  | column name for total population. See metadata of the [MGGG State](https://github.com/mggg-states) used. required but has no effect in ```generate_random_spanning_tree.jl```. | ```generate_maps.jl```, ```generate_random_spanning_tree.jl```, ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl```|
| SEED_MAP   | seed districting plan for ReCom chain. This will implicitly decide the number of districts. See metadata of the [MGGG State](https://github.com/mggg-states) used. | ```generate_maps.jl``` |
| ELECTION  | required (by the ReCom API) but has no effect in ```generate_maps.jl```. use any valid election in the [MGGG State](https://github.com/mggg-states) used. | ```generate_maps.jl``` |
| CHAIN_LENGTH  | # of steps in the ReCom chain for every sampled plan.  | ```generate_maps.jl``` |
| BLUE_VOTES  | number of votes for the blue party in the election for audit. required but has no effect in ```generate_maps.jl```.  | ```generate_maps.jl```, ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl```|
| RED_VOTES  | number of votes for the red party in the election for audit. required but has no effect in ```generate_maps.jl```. | ```generate_maps.jl```, ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl```|
| ENSEMBLE_FILENAME  | name of file that stores the ensemble being audited, without the ```.jld``` extension.  | ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl``` |
| TREE_FILENAME  | name of file that stores the spanning trees, without the ```.jld``` extension.  | ```audit_on_trees.jl``` |
| OUTPUT_FILENAME  | name of the output file / common prefix of all output files, if multiple, without the ```.jld``` extension.  | ```generate_maps.jl```, ```generate_random_spanning_tree.jl```|
| NUM_MAPS  | # of plans in the current context.  | ```generate_maps.jl```, ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl``` |
| NUM_TREES  | # of spanning trees in the current context.  | ```generate_random_spanning_tree.jl```, ```audit_on_trees.jl``` |
| NUM_DISTRICTS  | # of districts (k) in the state  | ```audit_all_maps.jl```, ```audit_on_trees.jl```, ```districts_to_csvs.jl``` |
| EPSILON  | error parameter in population balance (default to 0.02)  | ```generate_maps.jl```, ```audit_on_trees.jl``` |
| DG_ROBUST_THRESHOLD  | minimum real strength for a deviating group found by the DP to be considered robust (default to 0.5) | ```audit_on_trees.jl``` |
| VERIFICATION_ON  | whether or not to examine each candidate deviating group to verify true population and strength (default to ```TRUE```). If set to ```FALSE```, the DP terminates once any deviating group is found. | ```audit_on_trees.jl``` |

In addition to the metadata, several file require command-line arguments, which we describe in the examples. We give examples below for ```generate_maps.jl``` (the code for generating an ensemble of plans) and ```audit_on_trees.jl``` (the code for audit an ensemble of plans using the dynamic program).

## Running the code
Run the scripts in the following order to replicate our experiments.

#### Generating Ensemble

To generate the ensemble, run ```generate_maps.jl``` without command-line arguments. This will output a julia dataframe **in the root directory**.
For example, assume the downloaded NC shapefile is at "Shapefiles/NC/NC_VTD.shp". Then setting the metadata as

- SHAPEFILE_PATH = "Shapefiles/NC/NC_VTD.shp"
- POPULATION_COL = "TOTPOP"
- SEED_MAP       = "CD"
- ELECTION       = "PR16"
- BLUE_VOTES     = "EL16G_PR_D"
- RED_VOTES      = "EL16G_PR_R"
- CHAIN_LENGTH   = 10
- NUM_MAPS       = 10
- OUTPUT_FILENAME= "NC_ensemble"
- EPSILON        = 0.02

will use the MGGG ReCom algorithm to generate an ensemble of 10 plans of North Carolina, each plan being the result of a 10-step ReCom chain seeded by the congressional districting plan. (We used 1,000 and 10,000 in our actual experiments.) Note that ```ELECTION```, ```BLUE_VOTES```, and ```RED_VOTES``` are required here to satisfy the ReCom API, but has no effect in our pipeline. They are for the extra functionalities in [Gerrychain](https://docs.juliahub.com/GerryChain/UHUIf/0.1.2/) that we do not use.

Note: when downloading the shapefiles from MGGG, always keep all files (there are usually like 5) present and in the same directory. In addition to the main **.shp** ones, the ReCom package needs the other files as well.

Note: there are more useful examples of metadata in ```metadata_reference.jl```.

#### Auditing via Ensemble

To audit plans via the ensemble-based method, run ```audit_all_maps.jl``` without command-line arguments. The ensemble dataframe outputted in the previous step (stored as ```OUTPUT_FILENAME```) will be used as input, and thus should be specified as ```ENSEMBLE_FILENAME``` in the metadata in ```audit_all_maps.jl```.
This will output two sets of files of two each: the auditing summary (as both a julia dataframe and a csv), and the deviating group information (as both a julia dataframe and a csv).

#### Auditing via DP

##### Generating Spanning Trees

To audit plans via the dynamic program, we need to first generate random spanning trees by running ```generate_random_spanning_tree.jl``` without command-line arguments. The number of trees to generate can be specified in the metadata; by default 100 trees are created.

##### Auditing on Spanning Trees

Next, run ```audit_on_trees.jl```. Note that **this file requires arguments**. For a concrete example, setting the metadata as

- SHAPEFILE_PATH = "Shapefiles/NC/NC_VTD.shp"
- POPULATION_COL = "TOTPOP"
- NUM_MAPS       = 10
- NUM_TREES      = 100
- ENSEMBLE_FILENAME = "NC_ensemble"
- TREE_FILENAME  = "random_spanning_trees_NC"
- NUM_DISTRICTS  = 13                         
- BLUE_VOTES     = "EL16G_PR_D"
- RED_VOTES      = "EL16G_PR_R"
- EPSILON        = 0.02
- DG_ROBUST_THRESHOLD = 0.5
- VERIFICATION_ON     = true

will load an ensemble of 10 plans and a set of 100 spanning trees (generated by ```generate_random_spanning_tree.jl```) and set the election in question to be the 2016 presidential election. However, we do not always want to audit every plan in the ensemble against each tree. The command-line usage of ```audit_on_trees.jl``` controls (1) the actual ranges of plans and trees in use, (2) the rounding parameter p in the DP, and (3) the output filename. This allows convenient parallelization of the tasks. Run

```julia audit_on_trees.jl --sm [a] --em [b] --st [x] --et [y] -p [p] -o [output_filename]```

to audit every plan in the [a,b] range in the ensemble against every tree in the [x,y] range in the input file, using a rounding parameter of p.

Concretely,
```julia audit_on_trees.jl --sm 6 --em 10 --st 21 --et 30 -p 2000 -o "DPaudit"```
will audit 5 plans against 10 trees in the respective specified range with p=2000, and all output files will share the common prefix "DPaudit".

This file gives two files per plan audited; specifically, it outputs a table of all deviating groups found for that plan as both a julia dataframe and a csv. In the example above, all deviating groups found for plan 10 will be stored in ```DPaudit_dgs_map_10.jld``` as well as ```DPaudit_dgs_map_10.csv```.

#### Data Analysis and Visualization
We use a Jupyter Notebook (```data-analysis.ipynb```) to integrate our data analysis and visualization process. See the notebook (and the comments therein) for further instructions. Computationally intensive steps have been commented out by default. The notebook takes all kinds of output files (generated by the Julia codes above) as input, in addition to the following:

##### Converting districting plans to .csv files

To visualize the plans (and deviating groups) in our experiments, we require the districting plans to be stored as **.csv** files. These can be generated by running ```districts_to_csvs.jl```, which requires the same metadata as ```audit_all_maps.jl``` (see above) as well as ```DISTRICTING_PATH``` as the common prefix of the path to store districting information. For example, setting

- DISTRICTING_PATH = "NC_districts/districting_map_"

in ```districts_to_csvs.jl``` means the districting info of the first plan will be stored as ```NC_districts/districting_map_1.csv```; of course, this needs the folder ```NC_districts``` to exist. Note that ```districts_to_csvs.jl``` converts **all** plans in the specified ensemble into .csv files, so for the running example, it generates 10 csv files.

##### Computing compactness of districts
The function ```compute_compactness``` in ```data-analysis.ipynb``` provides a convenient interface to calculate the Polsby-Popper scores of districts using ```geopandas```. We show the code to compute the minimum Polsby-Popper scores among all districts in each plan in an ensemble; other usages (e.g., computing the average score, or computing the score for standalone deviating groups) should be striaghtforward.

##### Visualizing the plans and deviating groups

We give two visualization functions in ```data-analysis.ipynb```: ```plot_districting``` plots a districting plan in the ensemble, and ```plot_all_dgs``` plots the districting plan and a deviating group of that plan (found by the audit by ensemble method).

Note that the ensemble generation process contains randomness, so your results may vary.
