I am a PhD student in Computer Science and Engineering at Shanghai Jiao Tong University, supervised by Prof. Yanmin Qian. I received my Master's and Bachelor's degrees from SJTU as well. I was also an exchange student at Télécom Paris (Institut polytechnique de Paris).
My research interests include multi-modality, text-to-speech, audio generation, speaker verification, and speech enhancement. I have published papers at top venues such as NeurIPS, ICLR, ICASSP, SLT, and InterSpeech.
I have interned at Meta and Microsoft. I work closely with Yao Qian and Bowen Shi. I am expected to graduate in 2027 and am actively seeking full-time research positions in industry or academia. Please feel free to reach out at zhangleying@sjtu.edu.cn.
I speak three languages fluently: Chinese, English, and French. Outside of research, I enjoy traveling, music, and movies. I love meeting new people and am happy to grab a coffee and chat!
🔥 News
- 2026.01: 🎉 One paper accepted by ICLR 2026
- 2025.10: 🚀 Joined Meta Superintelligence Labs as a Research Intern
- 2025.09: 🎉 One paper accepted by NeurIPS 2025
- 2024.12: 🎉 Two paper accepted by ICASSP 2025, received ICASSP 2025 Travel Grant
- 2024.10: 🚀 Joined Microsoft Core AI as a Research Intern
- 2024.09: 🎉 One paper accepted by NeurIPS 2024, received NeurIPS 2024 Scholar Award
📝 Publications
🗣️ Dialogue Generation
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng
- First zero-shot multi-talker conversational speech generation system
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang, Yao Qian, Xiaofei Wang, Manthan Thakker, Dongmei Wang, Jianwei Yu, Haibin Wu, Yuxuan Hu, Jinyu Li, Yanmin Qian, Sheng Zhao
- Fully non-autoregressive dialogue generation with flow matching
- Supports zero-shot multi-speaker, multi-turn and fine-grained temporal control
- Incorporated into Azure TTS product
ICASSP 2025Slide: Integrating speech language model with LLM for spontaneous spoken dialogue generation, Haitian Lu, et al., Leying Zhang, et al.
🎤 Speech Generation (TTS)
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian
- LLM-based zero-shot ASMR speech generation
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
Leying Zhang, Wangyou Zhang, Zhengyang Chen and Yanmin Qian
- Controllable background removal and preservation in zero-shot TTS
InterSpeech 2025E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning, Yihan Liu, Zhengyang Chen, Leying Zhang, Yanmin QianICLR 2024Prompttts 2: Describing and generating voices with text prompt, Yichong Leng, et al., Leying Zhang, et al.NCMMSC 2025Training Text-to-Speech Model with Purely Synthetic Data: Feasibility, Sensitivity, and Generalization Capability, Tingxiao Zhou, Leying Zhang, Zhengyang Chen, Yanmin Qian
🔊 Speech Extraction & Enhancement
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Hemin Yang, Shujie Liu, Long Zhou, Yanmin Qian
- Combined diffusion and discriminative methods for target speech extraction
- Handles multi- and single-speaker scenarios in both noisy and clean conditions
PreprintScale this, not that: Investigating key dataset attributes for efficient speech enhancement scaling, Leying Zhang, Wangyou Zhang, Chenda Li, Yanmin QianICASSP 2024Generation-Based Target Speech Extraction with Speech Discretization and Vocoder, Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin QianISCSLP 2024Knowledge Distillation from Discriminative Model to Generative Model with Parallel Architecture for Speech Enhancement, Tingxiao Zhou, Leying Zhang, Yanmin Qian
🔏 Speaker Verification
ICASSP 2023Adaptive Large Margin Fine-tuning for Speaker Verification, Leying Zhang, Zhengyang Chen and Yanmin QianInterSpeech 2022Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification, Leying Zhang*, Zhengyang Chen* and Yanmin QianInterSpeech 2021Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification, Leying Zhang, Zhengyang Chen and Yanmin Qian
📦 Other
ICLR 2026FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates, Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, et al.
🎖 Honors and Awards
- 2025 ICASSP 2025 Travel Grant
- 2024 NeurIPS 2024 Scholar Award
- 2022 National Scholarship
- 2022 First place in CN-Celeb Speaker Recognition Challenge 2022
- 2021 ISCA and Interspeech Travel Grant
- 2021 Outstanding Graduates of Shanghai
- 2021 Outstanding Student Leader of SJTU
- 2020 Guanghua Scholarship
- 2019 SJTU Class B Scholarship
📖 Educations
- 2023.09 - Present, PhD in Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai
- 2021.09 - 2023.06, Master in Electronic Information, Shanghai Jiao Tong University, Shanghai
- 2021.09 - 2022.02, Exchange Student in Data Science and Image Processing, Télécom Paris (Institut polytechnique de Paris), France
- 2017.09 - 2021.06, Bachelor of Information Engineering and French (double degree), Shanghai Jiao Tong University, Shanghai
💻 Industry Experience
- 2025.10 - 2026.03, Research Intern, Meta Superintelligence Labs, New York, USA
- 2024.10 - 2025.08, Research Intern, Microsoft Core AI (Remote)
- 2023.03 - 2024.03, Research Intern, Microsoft Azure Research (Remote)
- 2022.11 - 2023.03, Research Intern, Microsoft Research Asia, Beijing, China
📚 Teaching
- Spring 2025, Teaching Assistant - Intelligent Speech Technology
- Fall 2022, Teaching Assistant - Machine Learning