SODA

Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition

Täckström, Oscar (2012) Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition. In: NAACL-HLT 2012 Workshop on Inducing Linguistic Structure, 7 June 2012, Montreal, Canada. (In Press)

[img]
Preview
PDF - Accepted Version
270Kb

Abstract

In this paper, we study direct transfer methods for multilingual named entity recognition. Specifically, we extend the method recently proposed by Täckström et al. (2012), which is based on cross-lingual word cluster features. First, we show that by using multiple source languages, combined with self-training for target language adaptation, we can achieve significant improvements compared to using only single source direct transfer. Second, we investigate how the direct transfer system fares against a supervised target language system and conclude that between 8,000 and 16,000 word tokens need to be annotated in each target language to match the best direct transfer system. Finally, we show that we can significantly improve target language performance, even after annotating up to 64,000 tokens in the target language, by simply concatenating source and target language annotations.

Item Type:Conference or Workshop Item (Paper)
ID Code:5257
Deposited By:Oscar Tackström
Deposited On:18 May 2012 09:10
Last Modified:18 May 2012 09:10

Repository Staff Only: item control page