Please use this identifier to cite or link to this item: http://cris.utm.md/handle/5014/466
DC FieldValueLanguage
dc.contributor.authorSTRATULAT, Eugeniuen_US
dc.contributor.authorSTROIANEȚKI, Stanislaven_US
dc.contributor.authorBOBICEV, Victoriaen_US
dc.date.accessioned2020-04-28T18:31:20Z-
dc.date.available2020-04-28T18:31:20Z-
dc.date.issued2019-
dc.identifier.citationSTRATULAT, Eugeniu; STROIANEȚKI, Stanislav; BOBICEV, Victoria. Automate plagiarism detection. In: Electronics, Communications and Computing. Editia a 10-a, 23-26 octombrie 2019, Chişinău. Chișinău, Republica Moldova: Universitatea Tehnică a Moldovei, 2019, p. 29. ISBN 978-9975-108-84-3.en_US
dc.identifier.isbn978-9975-108-84-3-
dc.identifier.urihttp://cris.utm.md/handle/5014/466-
dc.description.abstractThe paper presents a study in which an application for plagiarism detection has been created. It has been evaluated using the set of documents provided by PAN 2009 task on external plagiarism detection [1]. The task has been formulated as follows: Given a set of suspicious documents and a set of source documents the task is to find all text passages in the suspicious documents which have been plagiarized and the corresponding text passages in the source documents. The organizers provided a training corpus which comprises a set of suspicious documents and a set of source documents. A suspicious document may contain plagiarized passages from one or more source documents. The main metrics used for document comparison was NCD (Normalized Compression Distance) which is actually a family of functions which take as arguments two objects (some texts) and evaluate a fixed formula expressed in terms of the compressed versions of these objects, separately and combined [3]. The method is the outcome of a mathematical theoretical developments based on Kolmogorov complexity [4]. The smaller is the result, the more similar are the objects. The application for plagiarism detection has been written in PHP. The similarity of two lines is calculated using the algorithm described in [2]. The selected threshold value has been estimated on the base of training data. This value provides the best plagiarism detection accuracy on the given texts. In order to evaluate our application we used 400 documents from the set provided by the task organizers. We calculated Precision and Recall on 1/10 part of this set, namely, on 40 documents. The information of the plagiarism in these 40 documents has been provided by the task organizers, so we knew exactly that only 5 of these 40 documents contained plagiarized fragments. The application returned exactly 5 files in which plagiarism was found. This result demonstrated that the application is good for the task.en_US
dc.language.isoenen_US
dc.subjectplagiarismen_US
dc.subjectautomate plagiarism detectionen_US
dc.subjecttext classificationen_US
dc.subjectsubstring searchen_US
dc.titleAutomate plagiarism detectionen_US
dc.typeArticleen_US
dc.relation.conferenceElectronics, Communications and Computingen_US
item.fulltextWith Fulltext-
item.languageiso639-1other-
item.grantfulltextopen-
crisitem.author.deptDepartment of Computer Science and Systems Engineering-
crisitem.author.parentorgFaculty of Computers, Informatics and Microelectronics-
Appears in Collections:Conference Abstracts
Files in This Item:
File Description SizeFormat
29-29_11.pdf287.49 kBAdobe PDFView/Open
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.