A new approach for searching translated plagiarism

Pataki, Máté (2012) A new approach for searching translated plagiarism. In: 5th International Plagiarism Conference.

[img]
Preview
Image (cover image)
cover.jpg - Cover Image

Download (5kB) | Preview
[img]
Preview
Text
20120712_PlagiarismConference_NewApproachForSearchingTranslatedPlagiarism.pdf - Published Version

Download (308kB) | Preview
[img]
Preview
Text
20120712_PlagiarismConference_NewApproachForSearchingTranslatedPlagiarism_Slides.pdf - Presentation

Download (1MB) | Preview

Abstract

In 2010 we started a one-year research project to be able to search for translational plagiarism cases. Most current approaches use machine translation to detect similarity between texts written in different languages, but it was not feasible for the research goal to develop an algorithm that works effectively between Hungarian and English documents as well. The Hungarian language has three main obstacles when comparing to other (European) languages: a) loose word order, b) conjugation, c) having a significantly different grammar. These are also the reasons – alongside with small available parallel corpora – that machine translation to and from Hungarian are rather useless for serious applications, often not even understandable by humans. The new algorithm defines a dictionary-based distance function between sentences which are evaluated in multiple steps as to enable a fast candidate search and a precise comparison between possible translations. It basically searches for all possible translations, instead of going with one given by an automatic translator. This approach has proved to be effective and eliminated the necessity of using word-sense disambiguation first (at the machine translation stage) and then synonyms in the next step of the system.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: external, translational, plagiarism detection, algorithm
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: Department of Distributed Systems
Depositing User: Máté Pataki
Date Deposited: 12 Dec 2012 08:40
Last Modified: 06 Feb 2014 14:48
URI: http://eprints.sztaki.hu/id/eprint/6539

Update Item Update Item