Abstract: In languages that use compound words such as Swedish, it is often neccessary to split compound words when indexing documents or queries. One of the problems is that it is difficult to find constituents that express a concept similar to that expressed by the compound. The approach taken here is to expand a query with the leading constituents of the compound words. Every query term is truncated so as to increase recall by hopefully finding other compounds with the leading constituent as prefix. This approach increases recall in a rather uncontrolled way, so we use a Boolean quorum-level search method to rank documents both according to a tf-idf factor but also to the number of matching Boolean combinations.
Loading