👤 Bio
I am currently a PhD candidate in Computer Science at the University of Science and Technology of China (USTC), under the supervision of Professor Xiangyang Li (ACM Fellow, IEEE Fellow). My research interests focus on optimization in complex networks, including optimization of deep learning model inference ((primary research focus)), intelligent sensing in the Internet of Things, and security of intelligent models. Feel free to contact me via email.
🎓 Education
- 2022.09 - Present University of Science and Technology of China, School of Computer Science, pursuing a PhD in Computer Science.
- 2020.09 - 2022.06 University of Science and Technology of China, School of Computer Science, pursuing a Master’s degree in Computer Science.
- 2016.09 - 2020.06 Chongqing University, Hongshen School/Computer Science School, obtained a Bachelor’s degree in Computer Science.
📰 News
- [Apr 2025] 🎉 I published a paper as a first author at the CCF B-level conference IWQoS, thanks to the hard work of all the co-authors!
- [Dec 2024] 🎉 Our collaboration project on intelligent cockpit model inference optimization with NIO Inc. has been successfully completed, and the paper, where I am the first author, has been accepted by the CCF A-level conference AAAI!
- [Dec 2024] 🎉 I published a paper as a co-first author at the CCF A-level conference INFCOM, thanks to the hard work of Bowen Zhang, Jiahui Hou, and all the co-authors!
- [Nov 2024] 🎉 Awarded third place in the 泛在智能感知技术创新应用大赛.
- [Oct 2024] 🎉 As a co-author, I contributed to a paper published in the CCF A-level Chinese journal Journal of Computer Science and Technology. Many thanks to my mentors and every colleague for their effort!
- [Aug 2024] 🎉 As the first author, I published a paper in the CCF A-level journal TMC, and I am grateful for the guidance from my mentors and the support from my colleagues!
📚 Publications
[AAAI’25 Oral] A-VL: Adaptive Attention for Large Vision-Language Models.
Junyang Zhang, Mu Yuan, Ruiguang Zhong, Puhan Luo, Huiyou Zhan, Ningkang Zhang, Chengchen Hu, Xiangyang Li.
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025, CCF-A)[TMC’24] WordWhisper: Exploiting Real-Time, Hardware-Dependent IoT Communication Against Eavesdropping.
Junyang Zhang; Jiahui Hou; Ye Tian; Xiang-Yang Li.
IEEE Transactions on Mobile Computing (IEEE TMC, CCF-A, JCR Q1)[INFOCOM’25] TensAllo: Adaptive Deployment of LLMs on Resource-Constrained Heterogeneous Edge Devices.
Bowen Zhang, Junyang Zhang(co-first author), Jiahui Hou and Yixin Wang.
IEEE Conference on Computer Communications (IEEE INFOCOM, CCF-A)[IWQoS’25] Deploy Efficient Large Language Model Distributed Inference Pipeline for Heterogeneous GPUs.
Junyang Zhang, Jiahui Hou, Bowen Zhang and Xiang-Yang Li.
IEEE/ACM International Symposium on Quality of Service (IEEE/ACM IWQoS, CCF-B)[计算机学报] 面向智能物联网的资源高效模型推理综述
袁牧,张兰,姚云昊,张钧洋,罗溥晗,李向阳.
CHINESE JOURNAL OF COMPUTERS 计算机学报 (中文CCF-A)[PrePrint] PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks.
Huiyou Zhan, Xuan Zhang, Haisheng Tan, Han Tian, Dongping Yong, Junyang Zhang, Xiang-Yang Li.
arXiv[PrePrint] DERMARK: A Dynamic, Efficient and Robust Multi-bit Watermark for Large Language Models.
Qihao Lin, Chen Tang, Lan zhang, Junyang zhang, Xiangyang Li.
arXiv
📝 Research
Model Inference Optimization
Model inference optimization (primary research focus): This area addresses performance bottlenecks when deep learning models are deployed in real-world applications. The goal is to enhance the inference efficiency of deep learning models, reduce computational costs, minimize latency, and optimize resource utilization while maintaining or improving accuracy, ultimately achieving cost reduction and efficiency enhancement.
- Application Project: Independently responsible for research on multimodal large model computational optimization in the NIO Industry-University Collaboration Project “Intelligent Cockpit Inference Optimization Based on Large Models”, aiming to reduce computational load, decrease inference latency, and minimize KV Cache memory usage.
- Application Project: Independently responsible for research on distributed inference optimization for large models under heterogeneous computing power in the Huawei Industry-University Collaboration Project “AI Efficiency Improvement on the Edge”, aiming to enhance inference throughput and improve resource utilization under heterogeneous computing environments.
- Application Project: Participated in the development of an industry knowledge base and large model Q&A system in the Baidu Industry-University Collaboration Project “ESG Domain-Specific Question Answering System”, focusing on real system optimization for vertical domain knowledge Q&A.
- First Author Paper: [CCF A] A-VL: Adaptive Attention for Large Vision-Language Models.
- First Author Paper: [CCF B] Deploy Efficient Large Language Model Distributed Inference Pipeline for Heterogeneous GPUs.
- Co-first Author Paper: [CCF A] TensAllo: Adaptive Deployment of LLMs on Resource-Constrained Heterogeneous Edge Devices.
- Collaborative Paper: PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks.
- Collaborative Paper: [CCF A] 面向智能物联网的资源高效模型推理综述。
Intelligent Sensing
Intelligent sensing: The deep integration of artificial intelligence and information sensing technology, leveraging wireless signals for environmental perception, object detection, and behavior recognition. This technology overcomes the limitations of traditional sensors, such as light and visibility angle, and offers advantages in low power consumption, non-contact sensing, and high precision. Intelligent sensing often leads to unexpected breakthroughs using common wireless signals, combining inspiration and creativity to drive innovation.
- First Author Paper: [CCF A] WordWhisper: Exploiting Real-Time, Hardware-Dependent IoT Communication Against Eavesdropping.
Model Security
Model inference security: Model inference security can be divided into three stages: pre-inference (input security), during inference (weight security), and post-inference (output security). From a goal-oriented perspective, it can further be categorized into end-to-end data protection (privacy) during the computation process and authentication and traceability (public) after computation. My collaborators and I primarily focus on authentication and traceability of intelligent model computations post-inference, aiming to protect the legitimate interests of all parties involved.
- Collaborative Paper: DERMARK: A Dynamic, Efficient, and Robust Multi-bit Watermark for Large Language Models.
🌏 Service
- Teaching Assistant: Served as a Teaching Assistant for the graduate-level course “Computer Applied Mathematics” at USTC in 2023.
- Volunteer Activities: Volunteered for the ACM TURC, as well as for various events such as the college’s evening parties, among others.
💫 Hobbies
Beyond my research work, I also have a passion for photography (Nikon fan), music (singing and guitar), and gaming (PC & Switch)—feel free to reach out if you’d like to chat about any of these! I’m also fond of hosting and have had the pleasure of hosting two university-level events, one at the college level, and two in the lab. It’s always a joy to meet new people! I love experimenting with various quirky coding projects—coding things I enjoy is a true pleasure, and the sense of accomplishment constantly drives me to explore what comes next. Work hard, but also enjoy life. ✨


