Sahlgren, Magnus and Karlgren, Jussi (2005) Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity. In: String Processing and Information Retrieval: 12th International Conference, SPIRE 2005, 2-4 Nov, 2005, Buenos Aires, Argentine.
Full text not available from this repository.
This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.
|Item Type:||Conference or Workshop Item (Paper)|
|Subjects:||H. Information Systems > H.3 INFORMATION STORAGE AND RETRIEVAL|
|Deposited By:||Userware Researcher|
|Deposited On:||24 Oct 2005|
|Last Modified:||18 Nov 2009 15:51|
Repository Staff Only: item control page