Directing the violence or admonishing it? A survey of contronymy and androcentrism in Google Translate and some recommendationsDownload PDF

Anonymous

18 Aug 2021OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
Keywords: Google translate, NLP, Machine translation, bias, ethics
TL;DR: Translation biases in Google Translate
Abstract: The recent raft of high-profile gaffes involving neural machine translation technologies has brought to light the unreliability of this evolving technology. A worrisome facet of the ubiquity of this technology is that it largely operates in a use-it-at-yourown-peril mode where the user is often unaware of either the idiosyncratic brittleness of the underlying neural translation model or when it is, that the translations be deemed trustworthy and when they wouldn’t. These revelations have worryingly coincided with other developments such as the emergence of large language models that now produce biased and erroneous results, albeit with human-like fluency, the use of back-translation as a data-augmentation strategy in so termed ’low-resource’ settings and the emergence of ’AI-enhanced legal-tech’ as a panacea that promises ’disruptive democratization’ of access to legal services. In the backdrop of these quandaries, we present this cautionary tale where we shed light on the specifics of the risks surrounding cavalier deployment of this technology by exploring two specific failings: Androcentrism and Enantiosemy. In this regard, we empirically investigate the fate of the pronouns and a list of contronyms when subjected to back-translation using Google Translate. Through this, we seek to highlight the prevalence of ’defaulting-to-the-masculine’ phenomenon in the context of engendered profession-related translations and also empirically demonstrate the scale and nature of threats pertaining to contronymous phrases covering both current-affairs and legal issues. Based on these observations, we have collected a series of recommendations that constitute the latter half of this paper. All of the code and datasets generated in this paper have been open-sourced for the community to build on here: https://github.com/rteehas/GT_study_recommendations.
0 Replies

Loading