Digitization errors in Hungarian documents
Pataki, Máté and Füzessy, Tamás (2007) Digitization errors in Hungarian documents. In: AACS '07. Proceedings of the automation and applied computer science workshop. Budapest, 2007..
|
Image (cover image)
cover.jpg - Cover Image Download (6kB) | Preview |
|
|
Text
200706_AACS_DigitizationErrors.pdf - Published Version Download (510kB) | Preview |
Abstract
Our task was to analyze a certain digitizing system, check what type of errors emerge during the process, and how these errors effect the searchability of the digitized documents. We have set up a testbed which is suitable for the automatic processing of digitized texts in a large scale. In this paper we shortly introduce the methodology of document digitization emphasizing the error-sources in the process, and sketch the results obtained from our test-system, especially the Hungarian language dependent characteristics of the emerging errors.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | character recognition, text processing, search, error, OCR |
Subjects: | Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Divisions: | Department of Distributed Systems |
Depositing User: | Eszter Nagy |
Date Deposited: | 11 Dec 2012 15:26 |
Last Modified: | 11 Dec 2012 15:26 |
URI: | https://eprints.sztaki.hu/id/eprint/4402 |
Update Item |