Abstract: Scarcity of data and technological limitations
for resource-poor languages in developing
countries like India poses a threat to the development of sophisticated NLU systems for
healthcare. To assess the current status of various state-of-the-art language models in healthcare, this paper studies the problem by initially
proposing two different Healthcare datasets,
Indian Healthcare Query Intent-WebMD and
1mg (IHQID-WebMD and IHQID-1mg) and
one real world Indian hospital query data in
English and multiple Indic languages (Hindi,
Bengali, Tamil, Telugu, Marathi and Gujarati)
which are annotated with the query intents as
well as entities. Our aim is to detect query intents and extract corresponding entities. We
perform extensive experiments on a set of models in various realistic settings and explore two
scenarios based on the access to English data
only (less costly) and access to target language
data (more expensive). We analyze context specific practical relevancy through empirical analysis. The results, expressed in terms of overall
F1 score show that our approach is practically
useful to identify intents and entities
Loading