Abstract: With the advent of large language models (LLMs), concerns about knowledge bias have recently increased. Previously, prevalent research has focused on detecting the bias of model knowledge by providing explicit social terms, such as race, gender, and age, into inputs. However, revealing the subtle and implicit bias of the model knowledge requires verification utilizing language expressed in a more implied form, such as literary works. This is because literature implicitly contains subjective filters of individuals and their living regional culture. Accordingly, this study aims to probe a research question of whether LLMs have a knowledge under-representation problem between two different regions using the same language, Spain and Spanish-speaking countries in Latin America. To this end, we design an under-representation verification task, unmapped: monospace REGion and unmapped: monospace Literary unmapped: monospace Author prediction (unmapped: monospace REGLA) and dataset based on Spanish-written literary works. Inspired by the knowledge shortcut concept from a previous study, unmapped: monospace REGLA consists of two tasks to figure out meta-information of poems, i.e., region and author. Moreover, we explore various prompting methods that can unleash the knowledge observed to be under-represented within the verification process. According to the verification and prompt engineering results, knowledge about the literary works of Latin American countries appears to be more under-represented compared to those of Spain in LLMs. It is also observed that the task decomposition prompting method effectively lets under-represented knowledge be generated.
Loading