SODA

Weighting Query Terms Based on Distributional Statistics

Karlgren, Jussi and Sahlgren, Magnus and Cöster, Rickard (2006) Weighting Query Terms Based on Distributional Statistics. In: Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Papers, September, 2005, Wien, Austria. (In Press)

Full text not available from this repository.

Official URL: http://www.springeronline.com/lncs

Abstract

This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both -- complementary -- boosting schemes tested delivered improved results.

Item Type:Conference or Workshop Item (Paper)
ID Code:151
Deposited By:Userware Researcher
Deposited On:20 Nov 2005
Last Modified:18 Nov 2009 15:54

Repository Staff Only: item control page