I am currently a Research Fellow at City University of Hong Kong working with Prof. Cong WANG I have received my Ph.D. degree with “Outstanding Graduate of Zhejiang University” from Zhejiang University in 2024, advised by Prof. Zhan QIN. My research interests include Trustworthy AI, LLM Security and Safety. My recent work is mainly about LLM watermarking and privacy protection of LLM.

🔥 News

2025.04: 🎉 Our paper “Artificial Intelligence Security and Privacy: a Survey” is accepted by SCIENCE CHINA Information Sciences
2024.12: 🎉 Our paper FDINet is accepted by IEEE TDSC 2024
2024.08: 🎉 Our paper Explanation as a Watermark is accepted by NDSS 2025
2024.03: 🔥 We release AIcert Platform，Media
2023.12: 🎉 Our paper PoisonPrompt is accepted by IEEE ICASSP 2024
2023.10: 🎉 Our paper PromptCARE is accepted by IEEE S&P 2024
2023.09: 🔥 Our work is promoted by New Scientist
2023.09: 🎉 Our paper RemovalNet is accepted by IEEE TDSC 2023

📝 Publications

🎙 LLM

arxiv 2024

TAPI: Towards Target-Specific and Adeversarial Prompt Injection against Code LLMs
Yuchen Yang, Hongwei Yao, Zhan Qin, Kui Ren,

This paper proposes a new attack paradigm, i.e., targetspecific and adversarial prompt injection (TAPI), against Code LLMs. TAPI generates unreadable comments containing information about malicious instructions and hides them as triggers in the external source code.
This paper successfully attacks some famous deployed code completion integrated applications, including CodeGeex and Github Copilot.

IEEE S&P 2024

PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification
Hongwei Yao, Jian Lou, Zhan Qin, Kui Ren, [Code]

In this paper, we propose PromptCARE, the first framework for prompt copyright protection through watermark injection and verification.
Academic Impact: Our work are promoted by media and forums, such as New Scientist、GOSSIP、隐者联盟、安全内参.

IEEE ICASSP 2024

PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
Hongwei Yao, Jian Lou, Zhan Qin, [Code]

In this paper, we present PoisonPrompt, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of PoisonPrompt through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs.

arxiv 2025 BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF, Kaiwen Duan, Hongwei Yao, Yufei Chen, Ziyun Li, Tong Qiao, Zhan Qin, Cong Wang
arxiv 2025 Quantifying Conversation Drift in MCP via Latent Polytope, Haoran Shi, Hongwei Yao, Tong Qiao, Zhan Qin, Cong Wang
arxiv 2025 ControlNET: A Firewall for RAG-based LLM System, Hongwei Yao, Haoran Shi, Yidou Chen, Zhan Qin, Cong Wang,
arxiv 2024 TAPI: Towards Target-Specific and Adeversarial Prompt Injection against Code LLMs, Yuchen Yang, Hongwei Yao, Zhan Qin, Kui Ren
arxiv 2024 Eguard: Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion, Tiantian Liu, Hongwei Yao, Zhan Qin, Feng Lin, Kui Ren
IEEE TDSC 2024 FDINet: Protecting against DNN Model Extraction via Feature Distortion Index, Hongwei Yao, Zheng Li, Haiqin Weng, Feng Xue, Zhan Qin, Kui Ren
NDSS 2025 Explanation as a Watermark: Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution, Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren
IEEE S&P 2024 PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification, Hongwei Yao, Jian Lou, Zhan Qin, Kui Ren
IEEE ICASSP 2024 PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, Hongwei Yao, Jian Lou, Zhan Qin, Kui Ren
IEEE TDSC 2023 RemovalNet: DNN Fingerprint Removal Attacks, Hongwei Yao, Zheng Li, Kunzhe Huang, Jian Lou, Zhan Qin, Kui Ren, IEEE Transactions on Dependable and Secure Computing, 2023
Elsevier 2022 Classifying Between Computer Generated and Natural Images: An Empirical Study from RAW to JPEG Format, Tong Qiao, Xiangyang Luo, Hongwei Yao, Elsevier Journal of Visual Communication and Image Representation, 2022
Sensors 2020 Image Forgery Detection and Localization via a Reliability Fusion Map, Hongwei Yao, Ming Xu, Tong Qiao, Ling Zheng, MDPI Sensors, 2020
IEEE Access 2018 Robust Multi-classifier for Camera Model Identification based on Convolution Neural Network, Hongwei Yao, Tong Qiao, Ming Xu, Ling Zheng, IEEE Access, 2018

🔏 Patents

姚宏伟，娄坚，秦湛，任奎. 一种大模型提示词版权验证方法及装置（发明专利，已进入实质审查，CN202311744252.0）
姚宏伟，秦湛，任奎. 一种深度神经网络模型指纹鲁棒性评估方法（发明专利，已进入实质审查，CN202311144816.7）
姚宏伟，任奎，秦湛，王志波，屠春来，牛文杰. 一种基于特征失真指数的模型窃取防御方法及装（发明专利，已进入实质审查，CN202211524887.5）

📚 Books and Technical Reports

《人工智能安全白皮书（2020）》 Media1/Media2/Media3
《人工智能安全》

🎖 Honors and Awards

[2024] Outstanding Graduate of Zhejiang University, by Zhejiang University
[2023] Award of Honor for Graduate, by Zhejiang University
[2022] Outstanding Graduate Student, by Zhejiang University
[2021] Award of Honor for Graduate, by Zhejiang University
[2021] Graduate of Merit, by Zhejiang University
[2020] Ph.D Freshman Scholarship, by Zhejiang University
[2020] Outstanding Graduate of Hangzhou Dianzi University, by Hangzhou Dianzi University
[2019] Zhejiang Province 16th “The Challenge Cup” College Students Science and Technology Competition, (First Prize), by Zhejiang Province
[2018] China Internet Development Foundation Cyberspace Security Scholarship, by China Internet Development Foundation
[2018] Huawei Scholarship, by Huawei
[2018] Hack {China} Hackathon Competition, (First Prize), by Hangzhou Dianzi University
[2018] Unique Hackathon Competition, (First Prize), by Huazhong University

💬 Invited Talks

🧑‍🎨 Services

Reviewer of ICLR, AAAI 2026
Reviewer of ICLR, NeurIPS, ICML, IEEE ICASSP, AISTATS 2025
Reviewer of IEEE ICASSP 2025
Reviewer of IEEE Transactions on Dependable and Secure Computing (TDSC)
Reviewer of IEEE Access
Reviewer of ACM Multimedia Systems
Reviewer of The Journal of Supercomputing