Digitization errors in Hungarian documents
Pataki, Máté and Füzessy, Tamás (2007) Digitization errors in Hungarian documents. In: AACS '07. Proceedings of the automation and applied computer science workshop. Budapest, 2007..
Image (cover image)
cover.jpg - Cover Image
Download (6kB) | Preview
200706_AACS_DigitizationErrors.pdf - Published Version
Download (510kB) | Preview
Our task was to analyze a certain digitizing system, check what type of errors emerge during the process, and how these errors effect the searchability of the digitized documents. We have set up a testbed which is suitable for the automatic processing of digitized texts in a large scale. In this paper we shortly introduce the methodology of document digitization emphasizing the error-sources in the process, and sketch the results obtained from our test-system, especially the Hungarian language dependent characteristics of the emerging errors.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||character recognition, text processing, search, error, OCR|
|Subjects:||Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány|
|Divisions:||Department of Distributed Systems|
|Depositing User:||Eszter Nagy|
|Date Deposited:||11 Dec 2012 15:26|
|Last Modified:||11 Dec 2012 15:26|