Automatically generated NE tagged corpora for English and Hungarian

Simon, Eszter and Nemeskey, Dávid Márk (2012) Automatically generated NE tagged corpora for English and Hungarian. In: Proceedings of the 4th Named Entity Workshop (NEWS), 2012-07-08 - 2012-07-14, Jeju, Dél-Korea.


Download (103kB) | Preview


Supervised Named Entity Recognizers require large amounts of annotated text. Since manual annotation is a highly costly procedure, reducing the annotation cost is essential. We present a fully automatic method to build NE annotated corpora from Wikipedia. In contrast to recent work, we apply a new method, which maps the DBpedia classes into CoNLL NE types. Since our method is mainly language-independent, we used it to generate corpora for English and Hungarian. The corpora are freely available.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: ?? R104a ??
Depositing User: EPrints Admin
Date Deposited: 18 Feb 2013 14:01
Last Modified: 05 Feb 2014 12:28

Update Item Update Item