Plagiarism detection and document chunking methods

Pataki, Máté (2003) Plagiarism detection and document chunking methods. NIIF , Budapest.

[img]
Preview
Image (cover image)
cover.jpg - Cover Image

Download (24kB) | Preview
[img] Text
p186Pataki.html - Published Version
Restricted to Registered users only

Download (12kB)
[img]
Preview
Text
200305_WWW2003.pdf - Published Version

Download (159kB) | Preview

Abstract

This paper describes the tests made on chunking methods used for plagiarism detection. The result of the tests makes it possible to decide on the best fitting chunking method for a given application. For example, overlapping word chunking is good for a grammar analyzer or for small databases, sentence chunking suits best for finding quoted texts, hashed breakpoint chunking is the fastest method therefore advisable for search in big set of documents, or if more reliability is needed overlapping hashed breakpoint chunking can be used as well.

Item Type: Other
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: Department of Distributed Systems
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:10
Last Modified: 11 Dec 2012 15:10
URI: https://eprints.sztaki.hu/id/eprint/3093

Update Item Update Item