SODA

Compound terms and their constituent elements in information retrieval

Karlgren, Jussi (2005) Compound terms and their constituent elements in information retrieval. In: 15th Nordic Conference of Computational Linguistics, 20-21 May 2005, Joensuu. (In Press)

Full text not available from this repository.

Abstract

Compounds, especially in languages where compounds are formed by concatenation without intervening whitespace between elements, pose challenges to simple text retrieval algorithms. Search queries that include compounds may not retrieve texts where elements of those compounds occur in uncompounded form; search queries that lack compounds will not retrieve texts where the salient elements are buried inside compounds. This study explores the distributional characteristics of compounds and their constituent elements using Swedish, a compounding language, as a test case. The compounds studied are taken from experimental search topics given for CLEF, the Cross-Language Evaluation Forum and their distributions are related to relevance assessments made on the collection under study and evaluated in terms of divergence from expected random distribution over documents. The observations made have direct ramifications on e.g. query analysis and term weighting approaches in information retrieval system design.

Item Type:Conference or Workshop Item (Poster)
Subjects:I. Computing Methodologies > I.7 DOCUMENT AND TEXT PROCESSING (H.4, H.5)
ID Code:13
Deposited By:Userware Researcher
Deposited On:20 Oct 2005
Last Modified:18 Nov 2009 15:51

Repository Staff Only: item control page