SODA

Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity

Sahlgren, Magnus and Karlgren, Jussi (2005) Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity. In: String Processing and Information Retrieval: 12th International Conference, SPIRE 2005, 2-4 Nov, 2005, Buenos Aires, Argentine.

Full text not available from this repository.

Abstract

This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.

Item Type:Conference or Workshop Item (Paper)
Subjects:H. Information Systems > H.3 INFORMATION STORAGE AND RETRIEVAL
ID Code:29
Deposited By:Userware Researcher
Deposited On:24 Oct 2005
Last Modified:18 Nov 2009 15:51

Repository Staff Only: item control page