Is natural language an inconvenience or an opportunity for IR?Open Website

2002 (modified: 11 Nov 2022)SIGIR 2002Readers: Everyone
Abstract: Natural language (NL) has evolved to facilitate human communication. It enables the speaker to make the listener's mind wander among her experiences and mental associations roughly according to the intentions of the speaker. The speaker and the listener usually share experiences and expectations, and they use mostly the same units and rules of a shared NL. Written language functions similarly, but in a less interactive way, with fewer possibilities for feedback.Both the symbols of NL (i.e. words or morphemes), and their arrangements are meaningful. Not with universal and precise meanings, but similar enough among different speakers and accurate enough for the communication mostly to succeed.NLs are mostly very large systems. Hundreds of thousands of words and infinitely many possible utterances. Even inflection alone might produce huge numbers of forms, e.g. more than ten thousand distinct forms out of every Finnish verb entry.NL processing (for IR or any other purpose) must cope with phenomena like (1) inflection and compounding, (2) synonymy, (3) polysemy, (4) ambiguity, (5) anaphora and (6) head-modifier relations among words and phrases.Language technology can neutralize much of the effect of these 'inconveniences' inherent with NL, but what kinds of advantages could NL have? Redundant use of synonymous expressions can effectively identify new concepts. Multilingual parallel documents may help in identifying their exact content. NLs typically carry connotations, i.e. what is implied but not explicitly said (e.g. attitudes, politeness). Vague associations are easy to express in NL, but not always in formal systems (e.g. "a few years ago there was an article about the rival of Yeltsin - I don't remember his name but - he then went over to some region in Siberia - but what did the guy promise?") Jokes and humor belong to NLs, not to formal systems. .Are there any alternatives for NL? Not really, because any artificial and more precise formalisms fail to adapt to new concepts and they do not easily allow restructuring of previous ideas.One challenge for language technology is to find better solutions for the above 'inconveniences' in order to provide various IR, document classification, indexing and summarizing methods with more accurate and adequate input data. With more accurate input some of the more demanding tasks of IR can perhaps be solved.
0 Replies

Loading