Abstract: Most low resource language technology development is premised on the need to collect data
for training statistical models. When we follow
the typical process of recording and transcribing text for small Indigenous languages, we
hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new
ways of engaging with speakers which generate
data while avoiding the transcription bottleneck.
We have deployed a prototype app for speakers to use for confirming system guesses in an
approach to transcription based on word spotting. However, in the process of testing the
app we encountered many new problems for
engagement with speakers. This paper presents
a close-up study of the process of deploying
data capture technology on the ground in an
Australian Aboriginal community. We reflect
on our interactions with participants and draw
lessons that apply to anyone seeking to develop
methods for language data collection in an Indigenous community.
Loading