Abstract: Greenwashing, a form of deceptive marketing where organizations attempt to convince consumers that their offerings and operations are environmentally sound, can cause lasting damage to sustainability efforts by confusing consumers and eroding trust in genuine pro-sustainability actions. Nonetheless, capturing greenwashing “in the wild” remains challenging because greenwashed content frequently employs subliminal messaging through abstract semantic concepts that require subjective interpretation and contextualization within the context of the parent company’s actual environmental performance. Moreover, this task typically presents itself as a weakly-supervised set-relevance problem, where the detection of greenwashing in individual media relies on utilizing supervisory signals available at the company level. To open up the task of detecting greenwashing in the wild to the wider multimedia retrieval community, we present a dataset that combines large-scale text and image collections, obtained from Twitter accounts for Fortune-1000 companies, with authoritative environmental risk scores on fine-grained issue categories like emissions, effluent discharge, resource usage, and greenhouse gas emissions. Furthermore, we offer a simple baseline method that uses state-of-the-art content encoding techniques to represent social media content and to understand the connection between content and its tendency for greenwashing.
Loading