The Proceedings of International Conferences of Information, Communication, Technology, and Systems 2008

43
Title: INFORMATION EXTRACTION USING SVM UNEVEN MARGIN FOR MULTI-LANGUAGE DOCUMENT'
Author: Dwi Hendratmo Widyantoro*, Ayu Purwarianti*, Paramita
Pages: 249-253
DOI: -
Abstract:
We applied SVM Uneven Margin (SVMUM) for cross lingual Information Extraction (IE) task. SVM has been proved to have good performance in IE task for English documents. So far, no IE research has been conducted for multi-language documents where the majority text are written in language with low resource tool such as Indonesian. In the experiment, we compared several training data composition to see its impact on the token classification of IE task for the multi-language document. The result shows that even though the linguistics information for Indonesian are not satistfying, the IE performance can still be improved by adding the training data. We also compared the SVMUM with kNN and NB as the learning algorithm in the token classification of IE task. The experimental results showed that SVMUM is suitable for multi-language documents. The IE accuracy achieved better performance than the other two algorithms.