Towards the creation of a robust search index for digitalized documents
Kovács, László and Pataki, Máté and Füzessy, Tamás and Tóth, Zoltán (2008) Towards the creation of a robust search index for digitalized documents. ERCIM News (73). pp. 49-50.
|
Image (cover image)
cover.jpg - Cover Image Download (11kB) | Preview |
|
|
Text
200804_ErcimNews_RobustSearchIndex.pdf - Published Version Download (1MB) | Preview |
|
|
Text
200804_ErcimNews_RobustSearchIndex_formatted.pdf - Published Version Download (245kB) | Preview |
Abstract
The simultaneous support of electronic and paper-based document handling is a natural demand of current filing and document management systems. To support the better management of search and retrieval functions and to reduce the high costs of digitizing, the Department of Distributed Systems of SZTAKI analysed the different kinds of error that emerged during the digitization process of Hungarian documents, and examined how these errors affect the searchability of the digitized items. For this reason, a testbed was set up that was suitable for the automatic analysis of digitized texts in a large corpus, and the conclusions and statistics obtained from the analysis were employed in the development of new content management products. The primary beneficiaries of these are civil service and higher-education bodies.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | OCR, error, character recognition, search |
Subjects: | Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Depositing User: | Eszter Nagy |
Date Deposited: | 11 Dec 2012 15:29 |
Last Modified: | 11 Dec 2012 15:29 |
URI: | https://eprints.sztaki.hu/id/eprint/4870 |
Update Item |