# To be filled by the author(s) at the time of submission
# -------------------------------------------------------

# Title of the article:
#  - For a successful replication, it shoudl be prefixed with "[Re]"
#  - For a failed replication, it should be prefixed with "[¬Re]"
#  - For other article types, no instruction (but please, not too long)
title: "[Re] Exploring the Explainability of Bias in Image Captioning Models"

authors:
  - name: Marten Türk
    orcid: 0009-0002-9142-2753
    email: martentyrk@gmail.com
    affiliations: 1

  - name: Luyang Busser
    orcid: 0009-0004-4565-390X
    email: luyang.busser@student.uva.nl
    affiliations: 1,*

  - name: Daniël van Dijk
    orcid: 0009-0004-2571-842X
    email: danielvandijk65@gmail.com
    affiliations: 1
    
  - name: Max J.A. Bosch
    orcid: 0009-0005-6736-1390
    email: max.bosch2002@gmail.com
    affiliations: 1

# List of affiliations with code (corresponding to author affiliations), name
# and address. You can also use these affiliations to add text such as "Equal
# contributions" as name (with no address).
affiliations:
  - code:    1
    name:  University of Amsterdam
    address:  Amsterdam, The Netherlands
  - code:    1
    name:  Equal contribution
    address:  ""

# List of keywords (adding the programming language might be a good idea)
keywords: rescience c, machine learning, deep learning, python, pytorch, reproducibility, fairness, gender, race, LIC, LIC score, integrated gradients, MSCOCO, Captum, bias amplification, encoders, captioning models, LSTM, BERT

# Code URL and DOI (url is mandatory for replication, doi after acceptance)
code:
  - url: https://github.com/martentyrk/mlrc2022hirota
  - doi:
  - swh: swh:1:dir:2b9535456facf442ee411ab4cae5d3c4ce1cc29e
# Date URL and DOI (optional if no data)
data:
  - url: 
  - doi: 

# Information about the original article that has been replicated
replication:
 - cite: "Hirota, Yusuke and Nakashima, Yuta and Garcia, Noa. “Quantifying Societal Bias Amplification in Image Captioning”, CVPR, 2022."
 - bib: hirotapaper
 - url: https://arxiv.org/abs/2203.15395
 - doi: 10.48550/arXiv.2203.15395

# Don't forget to surround abstract with double quotes
abstract: "Scope of Reproducibility
The main objective of this paper is to reproduce and verify the following claims made in the original paper: (1) According to the LIC metric, all evaluated image captioning models amplify gender and racial bias, (2) the proposed LIC metric is robust against encoders, and (3) captioning model NIC+Equalizer amplifies gender bias beyond baseline.

Methodology
We reproduced the results of the original authors with only minor modifications to the code they made available. We contribute to their research by highlighting a noteworthy limitation in the used data split and propose an integrated gradients
method to increase explainability, allowing users to understand predictions better using
the Captum library for Pytorch. As for the computational requirements, all experiments
were run on a cluster with a NVIDIA Titan RTX GPU and the time required to run a total
of 720 models was ∼98 hours.

Results
The results we obtained showed the same patterns as in the original authors work. All our results were in the range of ±1 LIC score units compared to the original
work, which supports the claims on the gender and racial bias amplification, robustness
against encoders, and amplification by NIC+Equalizer beyond baseline. As for our contributions, we show that the attribution scores obtained by using integrated gradients follow similar patterns in terms of gender amplification for all evaluated language models, providing additional support for the proposed LIC metric.

During data set analysis we observed a leakage in the original data split being used, resulting in identical captions occurring multiple times in both the training and test set.
The removal of already seen captions during training from the test set reduced its size by 62.4% on average and caused a decline in LIC_M scores of approximately 5 units.

What was easy
Reproducing the results using the original provided code offered no difficulties.

What was difficult
Finding a useful angle of contribution to the paper proved to be challenging. After we had decided upon using our selected explainability method, implementing and modifying existing code was more work than expected."

# Bibliography file (yours)
bibliography: bibliography.bib
  
# Type of the article
# Type can be:
#  * Editorial
#  * Letter
#  * Replication
type: Replication

# Scientific domain of the article (e.g. Computational Neuroscience)
#  (one domain only & try to be not overly specific)
domain: ML Reproducibility Challenge 2022

# Coding language (main one only if several)
language: Python

  
# To be filled by the author(s) after acceptance
# -----------------------------------------------------------------------------

# For example, the URL of the GitHub issue where review actually occured
review: 
  - url: https://openreview.net/forum?id=N9Wn91tE7D0

contributors:
  - name:
    orcid: 
    role: editor
  - name: Anonymous Reviewers
    orcid:
    role: reviewer
  - name:
    orcid:
    role: reviewer

# This information will be provided by the editor
dates:
  - received:
  - accepted:
  - published: 

# This information will be provided by the editor
article:
  - number: 1 # Article number will be automatically assigned during publication
  - doi:    # DOI from Zenodo
  - url:    # Final PDF URL (Zenodo or rescience website?)

# This information will be provided by the editor
journal:
  - name: "ReScience C"
  - issn: 2430-3658
  - volume: 9
  - issue: 2