Digitization errors in Hungarian documents

Pataki, Máté and Füzessy, Tamás (2007) Digitization errors in Hungarian documents. In: AACS '07. Proceedings of the automation and applied computer science workshop. Budapest, 2007..

[img]
Preview
Image (cover image)
cover.jpg - Cover Image

Download (6kB) | Preview
[img]
Preview
Text
200706_AACS_DigitizationErrors.pdf - Published Version

Download (510kB) | Preview

Abstract

Our task was to analyze a certain digitizing system, check what type of errors emerge during the process, and how these errors effect the searchability of the digitized documents. We have set up a testbed which is suitable for the automatic processing of digitized texts in a large scale. In this paper we shortly introduce the methodology of document digitization emphasizing the error-sources in the process, and sketch the results obtained from our test-system, especially the Hungarian language dependent characteristics of the emerging errors.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: character recognition, text processing, search, error, OCR
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: Department of Distributed Systems
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:26
Last Modified: 11 Dec 2012 15:26
URI: https://eprints.sztaki.hu/id/eprint/4402

Update Item Update Item