Towards the creation of a robust search index for digitalized documents

Kovács, László and Pataki, Máté and Füzessy, Tamás and Tóth, Zoltán (2008) Towards the creation of a robust search index for digitalized documents. ERCIM News (73). pp. 49-50.

Image (cover image)
cover.jpg - Cover Image

Download (11kB) | Preview
200804_ErcimNews_RobustSearchIndex.pdf - Published Version

Download (1MB) | Preview
200804_ErcimNews_RobustSearchIndex_formatted.pdf - Published Version

Download (245kB) | Preview


The simultaneous support of electronic and paper-based document handling is a natural demand of current filing and document management systems. To support the better management of search and retrieval functions and to reduce the high costs of digitizing, the Department of Distributed Systems of SZTAKI analysed the different kinds of error that emerged during the digitization process of Hungarian documents, and examined how these errors affect the searchability of the digitized items. For this reason, a testbed was set up that was suitable for the automatic analysis of digitized texts in a large corpus, and the conclusions and statistics obtained from the analysis were employed in the development of new content management products. The primary beneficiaries of these are civil service and higher-education bodies.

Item Type: Article
Uncontrolled Keywords: OCR, error, character recognition, search
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:29
Last Modified: 11 Dec 2012 15:29

Update Item Update Item