Plagiarism detection and document chunking methods
Pataki, Máté (2003) Plagiarism detection and document chunking methods. NIIF , Budapest.
|
Image (cover image)
cover.jpg - Cover Image Download (24kB) | Preview |
|
Text
p186Pataki.html - Published Version Restricted to Registered users only Download (12kB) |
||
|
Text
200305_WWW2003.pdf - Published Version Download (159kB) | Preview |
Official URL: http://www2003.org/cdrom/papers/poster/p186/p186-P...
Abstract
This paper describes the tests made on chunking methods used for plagiarism detection. The result of the tests makes it possible to decide on the best fitting chunking method for a given application. For example, overlapping word chunking is good for a grammar analyzer or for small databases, sentence chunking suits best for finding quoted texts, hashed breakpoint chunking is the fastest method therefore advisable for search in big set of documents, or if more reliability is needed overlapping hashed breakpoint chunking can be used as well.
Item Type: | Other |
---|---|
Subjects: | Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Divisions: | Department of Distributed Systems |
Depositing User: | Eszter Nagy |
Date Deposited: | 11 Dec 2012 15:10 |
Last Modified: | 11 Dec 2012 15:10 |
URI: | https://eprints.sztaki.hu/id/eprint/3093 |
Update Item |