Optimizing Phishing Email Classification Through Scalable Feature Extraction Using MapReduce
- Title
- Optimizing Phishing Email Classification Through Scalable Feature Extraction Using MapReduce
- Creator
- Uddin, Syed Hameed; Maaz, Mugaerah Ahmed Shareef; Gupta, Himanshu; Panicker, Arya Jinan; Upadhyay, Darshan; Khunti, Shravan; Upreti, Kamal
- Description
- A bag of features (BOF) may be made using either map reduction techniques or a combination of a thesaurus and domain knowledge. This research presents the BOFMR (Bag of Features using MapReduce) and BOFWT (Bag of Features with Weighted Terms) algorithms, a scalable and efficient technique for processing large email datasets and generating feature vectors based on pre-defined characteristics. The outcomes from using both BOFs on identical datasets are compared. The algorithm leverages the parallel processing capabilities of the MapReduce framework to handle the extensive data, ensuring performance and scalability. When creating a bag of words from a training dataset, the BOFMR technique is useful. The map-reduce technique will help to create a bag of features faster even in case of a larger chunk of data. In this experiment, as data size was limited, the performance of map reduce was not measured. In another BOFWT approach, the building of BOF with domain knowledge by using the word thesaurus was a challenge. The experimental result shows that the results of BOFWT are nearer to the output of BOFMR, and both algorithms show the highest accuracy among other methods. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
- Source
- Lecture Notes in Networks and Systems;Volume;1233 LNNS;pp.101-116
- Date
- 01-01-2025
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Subject
- Email classification MapReduce; Machine learning; NLP; Phishing detection
- Coverage
- Uddin S.H., Department of Information Technology, LIET, Osmania University, Hyderabad, India; Maaz M.A.S., Department of Computer Science and Engineering (AI & Ml), LIET, Osmania University, Hyderabad, India; Gupta H., Software Engineer, Meta, NJ, United States; Panicker A.J., Department of Computer Science and Engineering, APJ Abdul Kalam Technological University, Kerala, Thiruvananthapuram, India; Upadhyay D., MS Business Analytics and Information Management, Purdue University, West Lafayette, IN, United States; Khunti S., Center for Data Science, New York University, New York, United States; Upreti K., Department of Computer Science, CHRIST (Deemed to Be University), Uttar Pradesh, Ghaziabad, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 23673370; ISBN: 978-981963286-2;
- Format
- online
- Language
- English
- Type
- Conference paper
Collection
Citation
Uddin, Syed Hameed; Maaz, Mugaerah Ahmed Shareef; Gupta, Himanshu; Panicker, Arya Jinan; Upadhyay, Darshan; Khunti, Shravan; Upreti, Kamal, “Optimizing Phishing Email Classification Through Scalable Feature Extraction Using MapReduce,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/25509.
