Research Areas
Cybersecurity · Privacy-Preserving AI · Digital Forensics
My research sits at the intersection of cybersecurity, artificial intelligence, and privacy-preserving systems. I am interested in building practical, research-driven technologies that combine machine learning, secure system design, and human-centered computing for real-world environments. My recent work spans propaganda detection, privacy-aware intelligence systems, language-related AI assessment, and broader questions of equity and infrastructure in underrepresented-language computing.
My work emphasizes practical AI systems, reproducible experimentation, and infrastructure-aware computing for real-world deployment.
Cybersecurity · Privacy-Preserving AI · Digital Forensics
Machine Learning · Secure Systems Design · Experimental Evaluation
Threat Intelligence · Language Computing · Public-Interest Technology
Sonal Sagar Boda, Siddhesh Pimpale, Mohammad Somon Sikder, Hemayet Uddin Himel, Zeeshan Akbar, Avijit Roy, Deepak Gupta
Accepted · Applied Research · Wiley · 2026
Status: Accepted, awaiting publication · Scopus-indexed, Q2 journal
This work proposes a unified AI-driven and cyber-secure inverter-control framework for permanent magnet synchronous motor (PMSM)-based electric vehicle systems. The methodology integrates a Deep Q-Network (DQN)-based deep reinforcement learning adaptive switching controller with encrypted command validation and an embedded intrusion detection system (IDS), implemented on an ARM Cortex-M7 (STM32H7) platform and validated against WLTP and UDDS driving-cycle datasets in MATLAB/Simulink and hardware-in-the-loop environments.
Key finding: The framework achieves an average peak efficiency of 95.4% (a 4.2% improvement over conventional field-oriented control), a 34.5% reduction in switching losses, 67% reduction in torque ripple, and THD reduced to 3.2%, while the embedded IDS reaches 98.6% attack-detection accuracy at 3.2 ms average detection latency with only 1.6% computational overhead.
Avijit Roy, Proma Roy, Hrishitva Patel
Accepted Poster · Language Models for Underserved Communities (LM4UC) Workshop · IJCAI 2026
Bremen, Germany · August 16, 2026
This work introduces the Tokenization Equity Audit (TEA), a reproducible benchmark for measuring tokenization premiums in multilingual technical tutoring content. TEA evaluates GPT-4o, Qwen2.5-7B, and Mistral-7B tokenizers across Bengali, Hindi, Arabic, Tamil, and Yoruba using a 120-item Python debugging corpus. The study shows that semantically equivalent content can require substantially different token counts across languages, creating unequal API cost, effective context-window capacity, and local deployment overhead for underserved-language communities.
Key finding: Bengali requires 1.56× as many GPT-4o tokens as English, while Yoruba reaches 2.37× despite using Latin script, showing that tokenization inequity is not reducible to script family alone.
Thomas Kimmeth, Avijit Roy, Vivek Sharma
Full Paper · FLAIRS-39 Special Track: Applied Natural Language Processing · Florida Artificial Intelligence Research Society Conference · 2026
Hilton Marco Island Beach Resort and Spa · Marco Island, Florida, USA · May 17–20, 2026
This research introduces Propasafe-Hybrid, a hybrid propaganda detection framework for sentence-level analysis of news text. The system combines transformer-based classification with a cost-aware explanatory pipeline to identify propagandistic content, label the technique involved, and generate concise rationales for why a sentence was flagged. The work emphasizes interpretability, practical deployment, and reduced computational overhead in applied propaganda analysis.
Published in: Vol. 39 No. 1 (2026): Proceedings of the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39), Special Track on Applied Natural Language Processing, Marco Island, FL
Avijit Roy, Proma Roy
SSRN Preprint · 2026
Poster presented at the 69th Annual ILA Conference (International Linguistic Association), New
York City.
John Jay College, City University of New York · New York City, USA · April 30–May 2, 2026
This research examines structural barriers affecting AI performance in underrepresented languages, focusing on Bengali as a case study. The work analyzes disparities in web representation, training token allocation, tokenization efficiency, and connectivity access, and introduces an infrastructure-aware perspective on equitable AI development.
Avijit Roy, Emran Hossain, Hemayet Uddin Himel, Hasan Imam, Ruhul Amin Md Rashed, Harleen Kaur
ICICC 2026 · Best Paper Award · Full Paper
Shaheed Sukhdev College of Business Studies · New Delhi, India · February 6–7, 2026
Organized in association with National Institute of Technology Patna and University of Valladolid
This research proposes a privacy-preserving Management Information System architecture that integrates blockchain infrastructure, federated learning, and zero-trust security models. The framework enables secure data collaboration across distributed systems while maintaining strict privacy boundaries and minimizing centralized data exposure. Experimental evaluation showed improvements in decision accuracy, privacy protection, and operational efficiency over conventional and partially distributed MIS architectures.
To appear in Springer Lecture Notes in Networks and Systems (LNNS), ICICC 2026 proceedings.
Avijit Roy
Advisor: Professor Shweta Jain, Ph.D.
Master's Thesis · Digital Forensics & Cybersecurity · John Jay College of Criminal Justice, CUNY · 2026
This thesis evaluates the adversarial robustness of six perceptual hashing configurations used in image matching, content moderation, provenance, and digital forensics workflows. Using a matched experimental protocol across evasion and near-collision attacks, the work shows that threshold-based perceptual hash systems can fail in asymmetric ways: adversarial changes may cause known content to evade detection, while unrelated images may be moved close enough to trigger false-positive matches.
The findings frame perceptual hashing as an operationally valuable but incomplete security mechanism. Rather than treating hash length alone as a measure of robustness, the thesis shows that pipeline structure, optimization behavior, quantization design, and threshold policy strongly shape real-world risk.
Awarded the Claude Hawley Medal during the 2026 graduation hooding ceremony — presented for the best master's thesis and/or scholastic distinction in the graduate program. Also presented at the Graduate Student Research Symposium — John Jay Research & Creativity Expo, May 4, 2026.
Authors: Thomas Kimmeth, Avijit Roy, Vivek Sharma
May 18, 2026 · Session 6B: ANLP, Applied Natural Language Processing — Marco Island, Florida
Oral presentation of Propasafe-Hybrid: A Text-Based Hybrid Propaganda Detection Tool, presented at the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39) under the Special Track on Applied Natural Language Processing.
The presentation discussed a hybrid propaganda detection framework combining transformer-based classification, selective LLM explainability, and cost-aware filtering for sentence-level propaganda analysis in news text.
Authors: Mustafa Eren, Aneeza Shakeel, Vivek Sharma
Presented by: Avijit Roy
May 18, 2026 · Session 6A: ANLP, Applied Natural Language Processing — Marco Island, Florida
Presented Fine-Grained Sentence-Level Propaganda Detection in News Articles on behalf of the authors at the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39) under the Special Track on Applied Natural Language Processing.
The presentation explored sentence-level propaganda detection using transformer-based models including BERT and RoBERTa, with experiments focused on class imbalance, focal loss, and fine-grained propaganda technique classification in news media.
May 4, 2026 — John Jay College of Criminal Justice, New York, NY
Adversarial Robustness of Perceptual Hashing Systems: A Unified Security Evaluation
Framework
Presented as part of the John Jay Research & Creativity Expo. This work examines adversarial
vulnerabilities in perceptual hashing systems, highlighting how threshold-based matching can be exploited
to produce both false negatives (evasion) and false positives (near-collision) across real-world detection
and verification pipelines.
Authors: Thomas Kimmeth, Avijit Roy, Vivek Sharma
April 30–May 2, 2026 — New York City, New York
Presentation of the Propasafe-Hybrid system, a hybrid propaganda detection pipeline combining transformer-based classification, selective LLM explainability, and cost-aware filtering for sentence-level propaganda analysis.
Authors: Avijit Roy, Proma Roy
April 30–May 2, 2026 — New York City, New York
Poster presentation on underrepresented-language AI infrastructure, focused on structural inequities in language technology, data availability, and the hidden costs of English-centric design.
USPTO Provisional Patent Application · Patent Pending
Application No.: 63/986,762
Status: Filed & Recorded (Priority Date Secured)
Filed: February 2026
This invention introduces a privacy-preserving phishing detection architecture that performs semantic analysis directly inside an email client runtime.
The system implements a staged machine learning pipeline where:
All inference is executed locally using ONNX transformer models, ensuring that email content never leaves the client device.
The architecture is designed for privacy-preserving phishing detection in constrained email client environments, reducing reliance on server-side inspection while improving explainability and local control.
Principal Investigator · April 2026–April 2027
Competitive allocation awarded through the NSF ACCESS program (Advanced Cyberinfrastructure Coordination Ecosystem), supporting GPU-enabled infrastructure for reproducible machine learning research in low-resource Bengali NLP.
Infrastructure: 120,000 SUs on Indiana Jetstream2 GPU · 50,000 SUs on Jetstream2 Large Memory (est. value: $31,305) and 30,000 Access Credits
Project:
Dataset Development and Fine-Tuning of Language Models for Low-Resource Bengali Programming
Assistance
Research involves iterative dataset refinement, parameter-efficient fine-tuning (LoRA/QLoRA), and
structured multi-stage evaluation targeting beginner-friendly Bengali programming education in
low-connectivity environments.
June 2025–August 2025
Graduate research support for applied work spanning intelligence systems, language-related AI evaluation, and public-interest technology development.
Developed an applied NLP research system for evaluating student writing using machine learning models aligned with ACTFL proficiency guidelines. The project focuses on automated writing assessment, explainable scoring workflows, and practical support for language-learning evaluation in educational settings.
Ongoing research examining how data scarcity, benchmarking gaps, and infrastructure asymmetries affect AI development for underrepresented languages. This work focuses on equitable evaluation, dataset realities, and the broader technical consequences of English-centric language technology ecosystems.
Designed a secure intelligence platform for collecting and analyzing human trafficking reports. The system integrates geospatial visualization, secure data storage, and investigative dashboards to support law enforcement and research efforts.
Developed a browser-based cryptographic hashing utility implementing MD5, SHA-1, and SHA-256 for local file integrity verification. The tool supports digital forensics and evidence validation workflows by enabling client-side hash computation without file upload.
The following report was completed during my undergraduate studies at John Jay College of Criminal Justice (CUNY) and has since been publicly archived on Zenodo with a citable DOI. It is included here for transparency and research continuity, not as a peer-reviewed publication.
Avijit Roy
Archival Undergraduate Research Report · John Jay College of Criminal Justice (CUNY) ·
2021
Archived on Zenodo (March 2026) with a citable DOI
This report presents a Python-based machine learning workflow motivated by cyber threat intelligence (CTI) analysis. Seven classifiers were evaluated—k-Nearest Neighbors, Logistic Regression, Stochastic Gradient Descent, Naive Bayes, Decision Tree, Random Forest, and Gradient Boosting—using AUC, accuracy, recall, precision, specificity, and learning-curve analysis. Tree-based ensemble methods produced competitive validation performance. The workflow demonstrates a reproducible methodological foundation for applying machine learning to structured threat intelligence data.
Note: Due to limited access to structured CTI datasets during the original project period, a publicly available surrogate dataset was used to demonstrate the analytical pipeline. This limitation is documented transparently in the archived record.
This work represents an early exploration of machine learning for cybersecurity, which informs my current research in AI-driven threat analysis and digital forensics.
Awarded for the best master’s thesis at John Jay College of Criminal Justice (CUNY), titled ”Adversarial Robustness of Perceptual Hashing Systems: A Unified Security Evaluation Framework.”
Established in memory of Claude Hawley, the first Vice President of John Jay College of Criminal Justice. The award is presented for the best master’s thesis and/or scholastic distinction in the graduate program.
Recognized during the 2026 Commencement Awards Ceremony for academic achievement, research contributions, leadership, and impact within the graduate community.
Presented by the Office of the Dean of Students and college leadership at John Jay College of Criminal Justice, The City University of New York (CUNY).
Awarded at the 9th International Conference on Innovative Computing and Communication (ICICC 2026) for the paper: “Privacy-Preserving MIS Using Blockchain, Federated Learning and Zero-Trust Models.”
Authors: Avijit Roy, Emran Hossain, Hemayet Uddin Himel, Hasan Imam, Ruhul Amin Md Rashed, and Harleen Kaur.
Selected as a recipient of the John Jay College Student Council Scholarship in recognition of academic dedication, leadership, resilience, and impact within the John Jay community.
Awarded by the Student Council Scholarship Committee, John Jay College of Criminal Justice (CUNY).
Awarded to an outstanding member of the senior class for distinguished scholarship and exceptional service to the College.
Established in honor of Leonard E. Reisman, the founding president of John Jay College of Criminal Justice.