SODA

An integration of vector-based semantic analysis and simple recurrent networks for the automatic acquisition of lexical representations from unlabeled corpora

Moscoso del Prado Martin, Fermin and Sahlgren, Magnus (2002) An integration of vector-based semantic analysis and simple recurrent networks for the automatic acquisition of lexical representations from unlabeled corpora. In: Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data Workshop at LREC 2002, 1 June 2002, Las Palmas, Spain.

Full text not available from this repository.

Abstract

This study presents an integration of Simple Recurrent Networks to extract grammatical knowledge and Vector-Based Semantic Analysis to acquire semantic information from large corpora. Starting from a large, untagged sample of English text, we use Simple Recurrent Networks to extract morpho-syntactic vectors in an unsupervised way. These vectors are then used in place of random vectors to perform Vector-Based Semantic Analysis. In this way, we obtain rich lexical representations in the form of high-dimensional vectors that integrate morpho-syntactic and semantic information about words. Apart from incorporating data from the different levels, we argue how these vectors can be used to account for the particularities of each different word token of a given word type. The amount of lexical knowledge acquired by the technique is evaluated both by statistical analyses comparing the information contained in the vectors with existing `hand-crafted' lexical resources such as CELEX and WordNet, and by performance in language proficiency tests. We conclude by outlining the cognitive implications of this model and its potential use in the bootstrapping of lexical resources.

Item Type:Conference or Workshop Item (Paper)
ID Code:2999
Deposited By:INVALID USER
Deposited On:14 Jul 2008
Last Modified:18 Nov 2009 16:16

Repository Staff Only: item control page