# Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

## Setup
Install the necessary python modules

```
pip3 install -r requirements.txt
```

There's an error in the PoissonBinomial class from the package poisson_binomial that needs to be fixed locally before running our code. The source code for the PoissonBinomial class reads
```
from scipy import fft
```
but it should read:
```
from scipy.fft import fft
```

## Overall summary of code to run:
The following commands, executed from the src directory, will generate all the results included in our paper "Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes":
- `python3 hapi_analysis.py {dir_name} --by_year --by_hardness --polarization_on_failures`
- `python3 hapi_improvements.py {dir name}`
- `python3 derm_analysis.py {dir_name}`

Additionally, the `Paper Results Graphs.ipynb` notebook generates many of the graphs used in our paper; it requires that you run the above scripts first. The `ferplus_ambiguity_analysis.ipynb` and `hapi_outcome_examples.ipynb` notebooks also generate figures that we use in the paper.

Further description of what each script does is available in the following sections.

## HAPI profile polarization (Section 3.2: Ecosystem-level Behavior, Figure 2)
The `hapi_analysis.py` script generates profile polarization results. You can run `python3 hapi_analysis.py {dir_name} --by_year` which will generate the rate of homogenization (defined in [Bommasani et al. (2022)](https://openreview.net/forum?id=-H6kKm4DVo)), the error rates for all models for all years, and the histogram of observed vs. expected outcomes. These results will be saved in 3 separate pickle files titled `homogenization_results.pickle`, `error_rates.pickle`, and `histograms.pickle` -- respectively. These pickle files are found in the directory `results/{dir_name}/by_year`

Once these results have been generated, you can use `Paper Results Graph.ipynb` to replicate the figure in the paper.

## Examples of systemic failures (Section 3.2: Ecosystem-level Behavior, Figure 3)
In `hapi_outcome_examples.ipynb` we include examples for each ecosystem-level outcome from one dataset for each modality in the HAPI dataset. This notebook requires that you download the underlying datasets. We've included code to programatically download the datasets from Kaggle where possible and left links to the datasets if not possible.


## HAPI improvements (Section 4: Do Model Improvements Improve Systemic Failures, Figure 4)
The `hapi_improvements.py` script can generate histograms and results for model-user ecosystems where one of the models experiences a change in accuracy over time. When you run `python3 hapi_improvements.py {dir name}`, the script will generate 4 plots for each model that experiences a change in accuracy (in single-label datasets): a plot of the error rates (`{dataset name} error rates.png`), heatmap and histogram style graphs of the user-model ecosystem in the year before and after the improvement (`Task / {task}, Dataset / {dataset}, Years / {year before, year after}.png`), histograms of the distribution of other-model outcomes among the instances that the model improved on or declined at -- depending on if the model net improved or declined (`{'improvements'/'declines'}_{year before} instances of {improved model} on {task} task, {dataset} dataset.png`), and histograms of the distribution of other model outcomes on the instances that the improved model failed at in the year before and after the improvement (`failures_{year before} instances of {improved model} on {task} task, {dataset} dataset.png`). 

Figure 4a contains the following visualization: `results/{dir_name}/sa/waimai/improvements_20 instances of amazon_sa on sa task, waimai dataset.png`.

Figure 4b is generated within `Paper Results Graphs.ipynb` in the 'Model improvements don't improve systems' section of the notebook.

## Dermatology polarization (Section 5: Ecosystem-level Analysis in Dermatology (DDI), Figure 5)
The `derm_analysis.py` script generates results on the DDI dataset; predictions from models and dermatologists on the DDI dataset were given to us by [https://ddi-dataset.github.io/index.html](Daneshjou et al. (2022)). Pursuant to a Research Use Agreement, we cannot share the data they gave us; other researchers interested in using the dataset should contact the original authors.

If you have the data, you can run `python3 derm_analysis.py {dir_name}` (change the file name in the data_loader(...) method and the data folder path in the init method first) to generate results. The figures we use in Figure 5 are located at `{dir_name}/dermatology/all/no_split/rows=all,by=no_split, cols=models, group=all_rows.png` and `{dir_name}/dermatology/all/no_split/rows=all,by=no_split, cols=humans, group=all_rows.png`.

## Dermatology race results (Section 5: Ecosystem-level Analysis in Dermatology (DDI), Figure 6)
Having run `python3 derm_analysis.py {dir_name}`, you can use `Paper Results Graphs.ipynb` to generate the racial disparities figures (Figure 6) in the section titled "Dermatology: Racial Disparities in Profile Polarization"; just ensure that you change the name of the results folder to the {dir_name} that you use. 

## Appendix: Ferplus Ambiguity Analysis (Appendix A.1: Data-centric Explanations for Homogeneous Outcomes, Figure 7 & 8)
Figure 7 and 8 can be generated by running `ferplus_ambiguity_analysis.ipynb`. This requires downloading the ferplus crowdsourcer annotations from https://raw.githubusercontent.com/microsoft/FERPlus/master/fer2013new.csv.

## Appendix: Dermatologist Accuracy Analysis (Appendix A.2: Data-centric Explanations for Homogeneous Outcomes, Figure 9)
Figure 9 can be generated after running `python3 derm_analysis.py {dir_name}`. The results will be stored in `{dir_name}/dermatology/all/by_human_accuracy/`.

## Appendix: Expressive Models (Appendix A.3: Data-centric Explanations for Homogeneous Outcomes, Figure 10 & 11)
You can run `python3 hapi_analysis.py {dir_name} --by_hardness` to execute the grid search of alpha and Delta values that we describe in section A.3. You can replicate the graphs by running code in the `Alpha Delta Simulations` section of `Paper Results Graphs.ipynb`.

## Appendix: Leader-Following Effects in Profile Polarization (Appendix B.2: Leader Following Effects in Profile Polarization, Figure 12)
Results displayed in Figure 12 can be generated by running `python3 hapi_analysis.py --polarization_on_failures {dir_name}`. The figure is created in `Paper Results Graph.ipynb` in "Polarization After Observing a Single Failure".

## Appendix: Hapi Improvements Threshold (Appendix B.3: Model improvement analysis is insensitive to threshold, Figure 13)
Results displayed in Figure 13 are generated by `hapi_improvements.py` (see Hapi Improvements). The figure is generated in `Paper Results Graph.ipynb`. 

## Appendix: Hapi Improvements Net Improvements (Appendix B.4: Net Improvements, Figure 14)
The results in Figure 14 are generated when running 'hapi_improvements.py' (see Hapi Improvements); the graph is created in `Paper Results Graph.ipynb`.

## Appendix: Dermatology results with HAM10k (Figure 15)
You can generate dermatology results with HAM10K included by running `python3 derm_analysis --ham10k`. 15a is included at `{dir_name}/dermatology/all/no_split/rows=all,by=no_split, cols=models, group=all_rows.png`. 15b is created within `Paper Results Graphs.ipynb`. Use a different dir_name than what was used for the dermatology results without ham10k to avoid overwriting results.

