Editorial: Towards Omnipresent and Smart Speech Assistants

Ingo Siegert, Stefan Hillmann, Benjamin Weiss, Jessica M. Szczuka, Alexey Karpov

Published: 2022, Last Modified: 28 Mar 2025Frontiers Comput. Sci. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 1 INTRODUCTIONThe functionality of digital voice assistant systems has been constantly increasing during the last decadeand a lot of commercial systems are available. Driven by their ease of use, the attractiveness of such devicesis constantly growing, and they allow conducting online searches and orders as well as smart home servicesby simply calling up the device (de Barcelos Silva et al., 2020) and (Dutsinma et al., 2022).However, the implications of voice-based interaction are not always clear to the user, ranging from itsfunctionality to the impact of speech as a social cue for resulting psychological effects. In the future,however, they should not only process simple commands, but also enable a natural and smooth interactionand be omnipresent. In addition to an improved speech recognition, this will require enhanced speechunderstanding and more intelligent dialog guidance.While state-of-the art systems are mainly conceptualized for young adults and middle-aged people,future systems should adapt to the user in order to meet the needs of different (vulnerable) user groups,ranging from young children to the elderly. This will be accompanied by efforts to make systems moreunderstandable and users more sophisticated. Consequently, legal aspects resulting from the spread ofvoice assistants and the stricter data protection regulations are important.The goal of this Research Topic was to present the latest advances - both from academia and industry -in the area of voice assistants. It was aimed at collecting research contributions from the disciplines ofhuman-computer interaction, artificial intelligence, and human factors in order to promote interdisciplinarycollaborations and cross-fertilization of ideas. More specifically, we were interested in exploring thecurrent landscape and future directions for the emerging topic of voice assistants. The Research Topiccovers 11 articles from 34 different authors from different research fields, including linguistics, psychology,usability/user experience studies as well as the technical perspective. One apparent focus of this researchtopic was on analyzing and assessing user experience. Both, different user groups and situations are takeninto account. However, we hope to see the aforementioned perspective on more sophisticated dialogsrepresented in the near future.2 CONTRIBUTIONSCao et al. investigate how mind-based anthropomorphism influences users’ exploratory usage of intelligentpersonal assistants (IPA). The article describes a study collecting more than 500 valid answeredquestionnaires, and the results on the influence of cognitive and affective anthropomorphism on IPAself-efficacy and the user’s social connection to the IPA.Carolus et al. show in an online laboratory experiment that participants have empathy with a smart speaker,when watching videos of a user interacting with such a device. This claims a rather universal effect, as theresults are independent of the participants’ gender or usage experience, and thus expands the current bodyof empirical results around the Media Equation (Reeves and Nass, 1996).Cohn et al. investigate users’ speech rate adjustments during conversations with an Amazon Alexa socialbot in at-home and in-lab settings, considering automatic speech recognition (ASR) comprehension errors.It is found that users used a slower speech rate when talking to the bot, which is even more slowed down inthe in-lab setting (relative to at-home).Cohn and Zellou present the results of a study on differences in speech adaptations (e.g., speech rate, f0mean, and f0 variation) during pre-scripted spoken interactions with a voice-AI assistant and a humaninterlocutor. The authors measured a decreased speech rate, higher average fundamental frequency (f0),and greater f0 variation for the device directed speech.Frommherz and Zarcone collected ecologically-valid German dialog data via a crowdsourcing approach inthe Wizard-of-Oz (WOZ) setting. Compared with the MultiWOZ dataset, their method for data collectionhas led to considerably less scripting and priming in the collected dialog data.Hirsch presents a local and low-cost, low-energy voice assistant solution including a keyword recognitionalgorithm and a further client system without the need of an external power supply. This is the most relevantapplied work, of a privacy-ensuring home speech assistant, among all the articles.Mavrina et al. describe a longitudinal field study on communication breakdowns between family membersand a voice assistant. Their article provides qualitative analysis of particularly interesting breakdown cases,as well as statistical analysis combining empirical and conversational data collected with children andadults during five weeks of free interaction with a voice assistant device.Schlomann et al. present their opinion regarding elderly with and without cognitive disabilities. Their mainargument is to raise the potential of speech assistants for elderly users by participatory design methods andverify the approaches by field studies.Schreibelmayr and Mara conducted a randomized laboratory experiment on synthetic voices with 165participants to explore what level of human-like realism human-interactors prefer, whether the participantsevaluations vary across different domains of application, and if the listener’s personality has an impact onthe ratings.Wienrich and Carolus have developed an instrument called “conversational agent literacy scale” (CALS),to measure conceptualizations and competencies about conversational agents in human users. This scaleconsists of five sub-scales and is based on a study with 170 participants.Wienrich et al. found in a laboratory study that a voice assistant designed as a “specialist” is rated as moretrustworthy by the users than a “generalist” in the health domain. By providing both, a theoretical line of reasoning and empirical data, the study lays the pathway for further studies on the users’ perspective ontrustworthiness in voice-based systems.3 CONCLUSIONIn conclusion, this Research Topic comprises interdisciplinary contributions and gives some examples ofboth theoretical and practical implications for smart voice/speech assistants. Topics reach from laboratorystudies on empathy or speaking behavior adjustments over field studies on communication breakdowns, tothe description of a local client voice assistant system. It therefore reflects the diversity of this stronglydeveloping field of research. However, the contributions also highlight unresolved questions in currentresearch, e.g. pitfalls due to design and field study issues or a lack of studies regarding trust or acceptance.We are aware that there is a plethora of further aspects that need to be addressed to complete, in the bestsense, the aim of a human-like interaction with voice assistants for all kind of humans. The articles of thisResearch Topic paving the way to an understanding of the role of voice assistants and thus, in the future,voice assistants can be an integral part of our daily life in terms of a true intelligent assistant.