TF-IDF Implementation for Similarity Checker on The Final Project Title

Dwiny Meidelfi, - Yulherniwati, Indri Rahmayuni, Taufik Hidayat, Dikky Chandra

Abstract


Students of the Software Engineering Technology Study Program, Department of Information Technology, Politeknik Negeri Padang, is required to compile a final project to complete their study period. In the implementation of the final project, several parties have involved such as the KBK team whose job is to check whether the proposed title is appropriate or not. The main issue undertaken by KBK is whether the title submitted has been used or not. The method used by KBK in checking the availability of titles was by looking at the titles of the final projects that have been submitted by previous students. The examination process carried out by the KBK, it took a long time. By utilizing the Cosine Similarity algorithm and TF-IDF, it is expected that it will make it easier for KBK to check the availability of final project titles. Cosine similarity is a method used to calculate the degree of similarity between 2 or more documents. While the TF-IDF Algorithm is a method used to weight a word in a document. The object of testing in this study was the title of the student’s final project. The process of calculating the level of similarity of documents started from the preprocessing stage, then proceeds with weighting using TF-IDF and calculating the level of similarity used the Cosine Similarity algorithm. The final result found that system could calculate the degree of similarity of the title of the student’s final project. From the results of testing the process of calculating the degree of similarity of titles using the cosine similarity algorithm can be undertaken quickly.

Keywords


Cosine similarity; TF-IDF; final project; software enginering technology; Politeknik Negeri Padang

Full Text:

PDF

References


KBBI, “Kamus Besar Bahasa Indonesia (KBBI),†2020.

S. Awasthi, “Plagiarism and academic misconduct: A systematic review,†DESIDOC Journal of Library and Information Technology, vol. 39, no. 2. 2019, doi: 10.14429/djlit.39.2.13622.

“Plagiarism in higher education environment: causes and solutions,†Rwandan J. Educ., vol. 4, no. 2, 2018.

Foltýnek et al., “Testing of support tools for plagiarism detection,†Int. J. Educ. Technol. High. Educ., vol. 17, no. 1, 2020, doi: 10.1186/s41239-020-00192-4.

D. Meidelfi, Yulherniwati, F. Sukma, D. Chandra, and A. H. Soleliza Jones, “The implementation of SAW and BORDA method to determine the eligibility of students’ final project topic,†Int. J. Informatics Vis., vol. 5, no. 2, 2021, doi: 10.30630/joiv.5.1.447.

M. Z. Naf’an, A. Burhanuddin, and A. Riyani, “Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen,†J. Linguist. Komputasional, vol. 2, no. 1, pp. 23–27, 2019, doi: 10.26418/jlk.v2i1.17.

N. A. Rakhmawati, A. A. Firmansyah, P. M. Effendi, R. Abdillah, and T. A. Cahyono, “Auto Halal detection products based on euclidian distance and cosine similarity,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 4–2, 2018, doi: 10.18517/ijaseit.8.4-2.7083.

R. T. Wahyuni, D. Prastiyanto, and E. Supraptono, “Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi Dokumen Skripsi,†J. Tek. Elektro, vol. 9, no. 1, pp. 18–23, 2017.

B. Hashemzadeh and M. Abdolrazzagh-Nezhad, “Improving keyword extraction in multilingual texts,†Int. J. Electr. Comput. Eng., vol. 10, no. 6, 2020, doi: 10.11591/ijece.v10i6.pp5909-5916.

F. A. Nugroho, F. Septian, D. A. Pungkastyo, and J. Riyanto, “Penerapan Algoritma Cosine Similarity untuk Deteksi Kesamaan Konten pada Sistem Informasi Penelitian dan Pengabdian Kepada Masyarakat,†J. Inform. Univ. Pamulang, vol. 5, no. 4, p. 529, 2021, doi: 10.32493/informatika.v5i4.7126.

E. L. Amalia, A. J. Jumadi, I. A. Mashudi, and D. W. Wibowo, “Analisis Metode Cosine Similarity Pada Aplikasi Ujian Online Otomatis (Studi Kasus JTI POLINEMA),†J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 2, p. 343, 2021, doi: 10.25126/jtiik.2021824356.

L. Yasni, I. M. I. Subroto, and S. F. C. Haviana, “Implementasi Cosine Similarity Matching Dalam Penentuan Dosen Pembimbing Tugas Akhir,†Transmisi, vol. 20, no. 1, p. 22, 2018, doi: 10.14710/transmisi.20.1.22-28.

D. Kurniadi, S. F. C. Haviana, and A. Novianto, “Implementasi Algoritma Cosine Similarity pada sistem arsip dokumen di Universitas Islam Sultan Agung,†J. Transform., vol. 17, no. 2, p. 124, 2020, doi: 10.26623/transformatika.v17i2.1613.

A. Z. Z. Abidin and A. Sukmadinata, “Sistem Deteksi Kerusakan pada Sistem Operasi Menggunakan Metode TF - IDF dan Cosine Similarity,†J. Ilm. Inform., vol. 8, no. 2, pp. 6–11, 2020.

A. D. Fikri, “PERBANDINGAN METODE DICE SIMILARITY DENGAN COSINE SIMILARITY MENGGUNAKAN QUERY EXPANSION PADA PENCARIAN AYATUL AHKAM DALAM TERJEMAH ALQURAN BERBAHASA INDONESIA SKRIPSI Oleh : AHMAD DZUL FIKRI,†pp. 1–73, 2019.

U. Hasanah and D. A. Muatiara, “Perbandingan metode cosine similarity dan jaccard similarity untuk penilaian otomatis jawaban pendek,†Semin. Nas. Sist. Inf. dan Tek. Inform., no. 2019: SENSITIF 2019, pp. 1255–1263, 2019.

O. Nurdiana, J. Jumadi, and D. Nursantika, “Perbandingan Metode Cosine Similarity Dengan Metode Jaccard Similarity Pada Aplikasi Pencarian Terjemah Al-Qur’an Dalam Bahasa Indonesia,†J. Online Inform., vol. 1, no. 1, p. 59, 2016, doi: 10.15575/join.v1i1.12.

M. M. Sya’bani and R. Umilasari, “Penerapan Metode Cosine Similarity dan Pembobotan TF / IDF pada Sistem Klasifikasi Sinopsis Buku di Perpustakaan Kejaksaan Negeri Jember,†Justindo (J. Sist. Teknol. Indones., vol. 3, no. 1, pp. 31–42, 2018.

Z. Mujahidin, “Implementasi Metode Rabin Karp Untuk Mendeteksi Tingkat Kesamaan Dua Dokumen,†J. Tugas Akhir, 2013.

D. Soyusiawaty and Y. Zakaria, “Book data content similarity detector with cosine similarity (case study on digilib.uad.ac.id),†2018, doi: 10.1109/TSSA.2018.8708758.

R. A. Sasmita and A. Z. Falani, “Pemanfaatan Algoritma TF/IDF Pada Sistem Informasi Ecomplaint Handling,†J. Link, vol. 27, no. 1, pp. 27–33, 2018.

D. Asmarajati, “Analisis Perbandingan Algoritma Tf-Idf Dengan Sql Query Untuk Kasus Pencarian Pada Sistem Informasi Dokumentasi Arsip (Sidokar),†Device, vol. 10, no. 1, pp. 1–8, 2020, doi: 10.32699/device.v10i1.1478.

S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,†Human-centric Comput. Inf. Sci., vol. 9, no. 1, 2019, doi: 10.1186/s13673-019-0192-7.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,†Int. J. Comput. Appl., vol. 181, no. 1, 2018, doi: 10.5120/ijca2018917395.

S. Mujilahwati, “Pre-Processing Text Mining Pada Data Twitter,†Semin. Nas. Teknol. Inf. dan Komun., vol. 2016, no. Sentika, pp. 2089–9815, 2016.

F. Alzami, E. D. Udayanti, D. P. Prabowo, and R. A. Megantara, “Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis,†Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, 2020, doi: 10.22219/kinetik.v5i3.1066.

G. Mediamer, adiwijaya@telkomuniversity ac id Adiwijaya, and S. Al Faraby, “Development of rule-based feature extraction in multi-label text classification,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 4, 2019, doi: 10.18517/ijaseit.9.4.8894.

S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and support vector machine for news classification,†2016, doi: 10.1109/ICETECH.2016.7569223.

P. Sun, L. Wang, and Q. Xia, “The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm,†in Proceedings - 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2017, 2017, vol. 2018-January, doi: 10.1109/CyberC.2017.40.

Reni Nursyanti, R.Yadi Rakhman Alamsyah, and S. Perdana, “PERANCANGAN APLIKASI BERBASIS WEB UNTUK MEMBANTU PENGUJIAN KUALITAS KAIN TEKSTIL OTOMOTIF,†J. Sist. Inf. dan Telemat., vol. 10, no. 1, pp. 5–13, 2019.




DOI: https://doi.org/10.30630/ijasce.3.1.3

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

 

Organized / Collaboration

- Soft Computing and Data Mining Centre, UTHM, Malaysia and Department of Information Technology

- Society of Visual Informatics, Indonesia