Please use this identifier to cite or link to this item: http://cris.utm.md/handle/5014/466
Title: Automate plagiarism detection
Authors: STRATULAT, Eugeniu 
STROIANEȚKI, Stanislav 
BOBICEV, Victoria 
Keywords: plagiarism;automate plagiarism detection;text classification;substring search
Issue Date: 2019
Source: STRATULAT, Eugeniu; STROIANEȚKI, Stanislav; BOBICEV, Victoria. Automate plagiarism detection. In: Electronics, Communications and Computing. Editia a 10-a, 23-26 octombrie 2019, Chişinău. Chișinău, Republica Moldova: Universitatea Tehnică a Moldovei, 2019, p. 29. ISBN 978-9975-108-84-3.
Conference: Electronics, Communications and Computing 
Abstract: 
The paper presents a study in which an application for plagiarism detection has been created. It has been evaluated using the set of documents provided by PAN 2009 task on external plagiarism detection [1]. The task has been formulated as follows: Given a set of suspicious documents and a set of source documents the task is to find all text passages in the suspicious documents which have been plagiarized and the corresponding text passages in the source documents. The organizers provided a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages from one or more source documents. The main metrics used for document comparison was NCD (Normalized Compression Distance) which is actually a family of functions which take as arguments two objects (some texts) and evaluate a fixed formula expressed in terms of the compressed versions of these objects, separately and combined [3]. The method is the outcome of a mathematical theoretical developments based on Kolmogorov complexity [4]. The smaller is the result, the more similar are the objects. The application for plagiarism detection has been written in PHP. The similarity of two lines is calculated using the algorithm described in [2]. The selected threshold value has been estimated on the base of training data. This value provides the best plagiarism detection accuracy on the given texts. In order to evaluate our application we used 400 documents from the set provided by the task organizers. We calculated Precision and Recall on 1/10 part of this set, namely, on 40 documents. The information of the plagiarism in these 40 documents has been provided by the task organizers, so we knew exactly that only 5 of these 40 documents contained plagiarized fragments. The application returned exactly 5 files in which plagiarism was found. This result demonstrated that the application is good for the task.
URI: http://cris.utm.md/handle/5014/466
ISBN: 978-9975-108-84-3
Appears in Collections:Conference Abstracts

Files in This Item:
File Description SizeFormat
29-29_11.pdf287.49 kBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.