Research

My research sits at the intersection of cybersecurity, artificial intelligence, and privacy-preserving systems. I am interested in building practical, research-driven technologies that combine machine learning, secure system design, and human-centered computing for real-world environments. My recent work spans propaganda detection, privacy-aware intelligence systems, language-related AI assessment, and broader questions of equity and infrastructure in underrepresented-language computing.

My work emphasizes practical AI systems, reproducible experimentation, and infrastructure-aware computing for real-world deployment.

Avijit Roy in an academic environment

Research Areas

Cybersecurity · Privacy-Preserving AI · Digital Forensics

Methods

Machine Learning · Secure Systems Design · Experimental Evaluation

Applications

Threat Intelligence · Language Computing · Public-Interest Technology

Research Overview

AI-driven cybersecurity and trustworthy machine learning systems
Privacy-preserving and secure-by-design computing architectures
Digital forensics, threat intelligence, and applied security tooling
Language-related AI evaluation, assessment, and equitable computing
Public-interest intelligence systems for social harm detection and analysis

Publications & Accepted Papers

ACCEPTED · PEER-REVIEWED JOURNAL ARTICLE · WILEY APPLIED RESEARCH

AI-Driven and Cyber-Secure Embedded Inverter Control for Energy Optimization in Electric Vehicles

Sonal Sagar Boda, Siddhesh Pimpale, Mohammad Somon Sikder, Hemayet Uddin Himel, Zeeshan Akbar, Avijit Roy, Deepak Gupta

Accepted · Applied Research · Wiley · 2026
Status: Accepted, awaiting publication · Scopus-indexed, Q2 journal

This work proposes a unified AI-driven and cyber-secure inverter-control framework for permanent magnet synchronous motor (PMSM)-based electric vehicle systems. The methodology integrates a Deep Q-Network (DQN)-based deep reinforcement learning adaptive switching controller with encrypted command validation and an embedded intrusion detection system (IDS), implemented on an ARM Cortex-M7 (STM32H7) platform and validated against WLTP and UDDS driving-cycle datasets in MATLAB/Simulink and hardware-in-the-loop environments.

Key finding: The framework achieves an average peak efficiency of 95.4% (a 4.2% improvement over conventional field-oriented control), a 34.5% reduction in switching losses, 67% reduction in torque ripple, and THD reduced to 3.2%, while the embedded IDS reaches 98.6% attack-detection accuracy at 3.2 ms average detection latency with only 1.6% computational overhead.

ACCEPTED POSTER · PEER-REVIEWED WORKSHOP PAPER · IJCAI 2026 WORKSHOP

Measuring the Tokenization Premium: A Cost Audit for Underserved Language Communities

Avijit Roy, Proma Roy, Hrishitva Patel

Accepted Poster · Language Models for Underserved Communities (LM4UC) Workshop · IJCAI 2026
Bremen, Germany · August 16, 2026

This work introduces the Tokenization Equity Audit (TEA), a reproducible benchmark for measuring tokenization premiums in multilingual technical tutoring content. TEA evaluates GPT-4o, Qwen2.5-7B, and Mistral-7B tokenizers across Bengali, Hindi, Arabic, Tamil, and Yoruba using a 120-item Python debugging corpus. The study shows that semantically equivalent content can require substantially different token counts across languages, creating unequal API cost, effective context-window capacity, and local deployment overhead for underserved-language communities.

Key finding: Bengali requires 1.56× as many GPT-4o tokens as English, while Yoruba reaches 2.37× despite using Latin script, showing that tokenization inequity is not reducible to script family alone.

FULL PAPER · PEER-REVIEWED · FLAIRS-39

Propasafe-Hybrid: A Text-Based Hybrid Propaganda Detection Tool

Thomas Kimmeth, Avijit Roy, Vivek Sharma

Full Paper · FLAIRS-39 Special Track: Applied Natural Language Processing · Florida Artificial Intelligence Research Society Conference · 2026
Hilton Marco Island Beach Resort and Spa · Marco Island, Florida, USA · May 17–20, 2026

This research introduces Propasafe-Hybrid, a hybrid propaganda detection framework for sentence-level analysis of news text. The system combines transformer-based classification with a cost-aware explanatory pipeline to identify propagandistic content, label the technique involved, and generate concise rationales for why a sentence was flagged. The work emphasizes interpretability, practical deployment, and reduced computational overhead in applied propaganda analysis.

Published in: Vol. 39 No. 1 (2026): Proceedings of the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39), Special Track on Applied Natural Language Processing, Marco Island, FL

Published Paper: View Published Paper
Volume 39 No. 1: View Conference Volume
Conference Website: flairs-39.info
PREPRINT · CONFERENCE POSTER · ILA 2026

Structural Silence: When AI Infrastructure Fails Speakers of Underrepresented Languages — A Case Study in Bengali, the Digital Divide, and the Hidden Cost of English-Centric Design

Avijit Roy, Proma Roy

SSRN Preprint · 2026
Poster presented at the 69th Annual ILA Conference (International Linguistic Association), New York City.
John Jay College, City University of New York · New York City, USA · April 30–May 2, 2026

This research examines structural barriers affecting AI performance in underrepresented languages, focusing on Bengali as a case study. The work analyzes disparities in web representation, training token allocation, tokenization efficiency, and connectivity access, and introduces an infrastructure-aware perspective on equitable AI development.

Interactive Poster: View Poster Page
Preprint DOI: 10.2139/ssrn.6522858
Conference Website: ilaword.org/conference/2026
FULL PAPER · BEST PAPER AWARD · SPRINGER LNNS · ICICC 2026

Privacy-Preserving MIS Using Blockchain, Federated Learning and Zero-Trust Models

Avijit Roy, Emran Hossain, Hemayet Uddin Himel, Hasan Imam, Ruhul Amin Md Rashed, Harleen Kaur

ICICC 2026 · Best Paper Award · Full Paper
Shaheed Sukhdev College of Business Studies · New Delhi, India · February 6–7, 2026
Organized in association with National Institute of Technology Patna and University of Valladolid

This research proposes a privacy-preserving Management Information System architecture that integrates blockchain infrastructure, federated learning, and zero-trust security models. The framework enables secure data collaboration across distributed systems while maintaining strict privacy boundaries and minimizing centralized data exposure. Experimental evaluation showed improvements in decision accuracy, privacy protection, and operational efficiency over conventional and partially distributed MIS architectures.

To appear in Springer Lecture Notes in Networks and Systems (LNNS), ICICC 2026 proceedings.

Conference Website: icicc-conf.com/icicc26

Master's Thesis

CLAUDE HAWLEY MEDAL · BEST MASTER'S THESIS · 2026

Adversarial Robustness of Perceptual Hashing Systems: A Unified Security Evaluation Framework

Avijit Roy

Advisor: Professor Shweta Jain, Ph.D.

Master's Thesis · Digital Forensics & Cybersecurity · John Jay College of Criminal Justice, CUNY · 2026

This thesis evaluates the adversarial robustness of six perceptual hashing configurations used in image matching, content moderation, provenance, and digital forensics workflows. Using a matched experimental protocol across evasion and near-collision attacks, the work shows that threshold-based perceptual hash systems can fail in asymmetric ways: adversarial changes may cause known content to evade detection, while unrelated images may be moved close enough to trigger false-positive matches.

The findings frame perceptual hashing as an operationally valuable but incomplete security mechanism. Rather than treating hash length alone as a measure of robustness, the thesis shows that pipeline structure, optimization behavior, quantization design, and threshold policy strongly shape real-world risk.

Awarded the Claude Hawley Medal during the 2026 graduation hooding ceremony — presented for the best master's thesis and/or scholastic distinction in the graduate program. Also presented at the Graduate Student Research Symposium — John Jay Research & Creativity Expo, May 4, 2026.

Presentations & Scholarly Visibility

ORAL PRESENTATION · FLAIRS-39

FLAIRS-39 — Propasafe-Hybrid

Authors: Thomas Kimmeth, Avijit Roy, Vivek Sharma

May 18, 2026 · Session 6B: ANLP, Applied Natural Language Processing — Marco Island, Florida

Oral presentation of Propasafe-Hybrid: A Text-Based Hybrid Propaganda Detection Tool, presented at the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39) under the Special Track on Applied Natural Language Processing.

The presentation discussed a hybrid propaganda detection framework combining transformer-based classification, selective LLM explainability, and cost-aware filtering for sentence-level propaganda analysis in news text.

ORAL PRESENTATION · FLAIRS-39

FLAIRS-39 — Fine-Grained Sentence-Level Propaganda Detection

Authors: Mustafa Eren, Aneeza Shakeel, Vivek Sharma

Presented by: Avijit Roy

May 18, 2026 · Session 6A: ANLP, Applied Natural Language Processing — Marco Island, Florida

Presented Fine-Grained Sentence-Level Propaganda Detection in News Articles on behalf of the authors at the 39th Florida Artificial Intelligence Research Society Conference (FLAIRS-39) under the Special Track on Applied Natural Language Processing.

The presentation explored sentence-level propaganda detection using transformer-based models including BERT and RoBERTa, with experiments focused on class imbalance, focal loss, and fine-grained propaganda technique classification in news media.

THESIS PRESENTATION · GSRS 2026

Graduate Student Research Symposium

May 4, 2026 — John Jay College of Criminal Justice, New York, NY

Adversarial Robustness of Perceptual Hashing Systems: A Unified Security Evaluation Framework
Presented as part of the John Jay Research & Creativity Expo. This work examines adversarial vulnerabilities in perceptual hashing systems, highlighting how threshold-based matching can be exploited to produce both false negatives (evasion) and false positives (near-collision) across real-world detection and verification pipelines.

ORAL PRESENTATION · ILA 2026

ILA 2026 — Propasafe-Hybrid

Authors: Thomas Kimmeth, Avijit Roy, Vivek Sharma

April 30–May 2, 2026 — New York City, New York

Presentation of the Propasafe-Hybrid system, a hybrid propaganda detection pipeline combining transformer-based classification, selective LLM explainability, and cost-aware filtering for sentence-level propaganda analysis.

POSTER PRESENTATION · ILA 2026

ILA 2026 — Structural Silence

Authors: Avijit Roy, Proma Roy

April 30–May 2, 2026 — New York City, New York

Poster presentation on underrepresented-language AI infrastructure, focused on structural inequities in language technology, data availability, and the hidden costs of English-centric design.

Interactive Poster: View Poster Page
Preprint DOI: 10.2139/ssrn.6522858

Patents

PATENT PENDING · USPTO 2026

Staged Client-Side Phishing Risk Assessment Architecture Using Locally Executed ONNX Transformer Inference Within an Email Client Runtime

USPTO Provisional Patent Application · Patent Pending
Application No.: 63/986,762
Status: Filed & Recorded (Priority Date Secured)
Filed: February 2026

This invention introduces a privacy-preserving phishing detection architecture that performs semantic analysis directly inside an email client runtime.

The system implements a staged machine learning pipeline where:

  1. A lightweight binary classifier evaluates phishing likelihood.
  2. A conditional execution controller determines whether deeper semantic analysis is required.
  3. A multi-label intent model identifies phishing behaviors such as credential harvesting, impersonation, payment fraud, and urgency-based manipulation.

All inference is executed locally using ONNX transformer models, ensuring that email content never leaves the client device.

The architecture is designed for privacy-preserving phishing detection in constrained email client environments, reducing reliance on server-side inspection while improving explainability and local control.

Research Support

NSF ACCESS Computational Allocation (CIS260616)

Principal Investigator · April 2026–April 2027

Competitive allocation awarded through the NSF ACCESS program (Advanced Cyberinfrastructure Coordination Ecosystem), supporting GPU-enabled infrastructure for reproducible machine learning research in low-resource Bengali NLP.

Infrastructure: 120,000 SUs on Indiana Jetstream2 GPU · 50,000 SUs on Jetstream2 Large Memory (est. value: $31,305) and 30,000 Access Credits

Project: Dataset Development and Fine-Tuning of Language Models for Low-Resource Bengali Programming Assistance
Research involves iterative dataset refinement, parameter-efficient fine-tuning (LoRA/QLoRA), and structured multi-stage evaluation targeting beginner-friendly Bengali programming education in low-connectivity environments.

RF CUNY Graduate Research Support

June 2025–August 2025

Graduate research support for applied work spanning intelligence systems, language-related AI evaluation, and public-interest technology development.

Research Projects

Natural Language Processing for Language Proficiency Assessment

Developed an applied NLP research system for evaluating student writing using machine learning models aligned with ACTFL proficiency guidelines. The project focuses on automated writing assessment, explainable scoring workflows, and practical support for language-learning evaluation in educational settings.

AI Infrastructure and Underrepresented-Language Computing

Ongoing research examining how data scarcity, benchmarking gaps, and infrastructure asymmetries affect AI development for underrepresented languages. This work focuses on equitable evaluation, dataset realities, and the broader technical consequences of English-centric language technology ecosystems.

Human Trafficking Intelligence Platform

Designed a secure intelligence platform for collecting and analyzing human trafficking reports. The system integrates geospatial visualization, secure data storage, and investigative dashboards to support law enforcement and research efforts.

Client-Side Cryptographic Hash Verification Tool (HashNow)

Developed a browser-based cryptographic hashing utility implementing MD5, SHA-1, and SHA-256 for local file integrity verification. The tool supports digital forensics and evidence validation workflows by enabling client-side hash computation without file upload.

Archival Undergraduate Reports

The following report was completed during my undergraduate studies at John Jay College of Criminal Justice (CUNY) and has since been publicly archived on Zenodo with a citable DOI. It is included here for transparency and research continuity, not as a peer-reviewed publication.

Exploring Machine Learning Techniques for Cyber Threat Intelligence Analysis (Archival Report)

Avijit Roy

Archival Undergraduate Research Report · John Jay College of Criminal Justice (CUNY) · 2021
Archived on Zenodo (March 2026) with a citable DOI

This report presents a Python-based machine learning workflow motivated by cyber threat intelligence (CTI) analysis. Seven classifiers were evaluated—k-Nearest Neighbors, Logistic Regression, Stochastic Gradient Descent, Naive Bayes, Decision Tree, Random Forest, and Gradient Boosting—using AUC, accuracy, recall, precision, specificity, and learning-curve analysis. Tree-based ensemble methods produced competitive validation performance. The workflow demonstrates a reproducible methodological foundation for applying machine learning to structured threat intelligence data.

Note: Due to limited access to structured CTI datasets during the original project period, a publicly available surrogate dataset was used to demonstrate the analytical pipeline. This limitation is documented transparently in the archived record.

This work represents an early exploration of machine learning for cybersecurity, which informs my current research in AI-driven threat analysis and digital forensics.

Honors & Awards

COMMENCEMENT ACADEMIC AWARD · GRADUATE THESIS DISTINCTION

Claude Hawley Medal (2026) — John Jay College of Criminal Justice

Awarded for the best master’s thesis at John Jay College of Criminal Justice (CUNY), titled ”Adversarial Robustness of Perceptual Hashing Systems: A Unified Security Evaluation Framework.”

Established in memory of Claude Hawley, the first Vice President of John Jay College of Criminal Justice. The award is presented for the best master’s thesis and/or scholastic distinction in the graduate program.

COMMENCEMENT SERVICE AWARD

Graduate Achievement Award (2026) — John Jay College of Criminal Justice

Recognized during the 2026 Commencement Awards Ceremony for academic achievement, research contributions, leadership, and impact within the graduate community.

Presented by the Office of the Dean of Students and college leadership at John Jay College of Criminal Justice, The City University of New York (CUNY).

INTERNATIONAL BEST PAPER AWARD · ICICC 2026

Best Paper Award (ICICC 2026)

Awarded at the 9th International Conference on Innovative Computing and Communication (ICICC 2026) for the paper: “Privacy-Preserving MIS Using Blockchain, Federated Learning and Zero-Trust Models.”

Authors: Avijit Roy, Emran Hossain, Hemayet Uddin Himel, Hasan Imam, Ruhul Amin Md Rashed, and Harleen Kaur.

ACADEMIC SCHOLARSHIP & LEADERSHIP

Student Council Scholarship Recipient (2025–2026)

Selected as a recipient of the John Jay College Student Council Scholarship in recognition of academic dedication, leadership, resilience, and impact within the John Jay community.

Awarded by the Student Council Scholarship Committee, John Jay College of Criminal Justice (CUNY).

ACADEMIC & SERVICE DISTINCTION

Leonard E. Reisman Medal (2021) — John Jay College of Criminal Justice

Awarded to an outstanding member of the senior class for distinguished scholarship and exceptional service to the College.

Established in honor of Leonard E. Reisman, the founding president of John Jay College of Criminal Justice.