I am a PhD student in Computer Science and Engineering at Shanghai Jiao Tong University, supervised by Prof. Yanmin Qian. I received my Master's and Bachelor's degrees from SJTU as well. I was also an exchange student at Télécom Paris (Institut polytechnique de Paris).

My research interests include multi-modality, text-to-speech, audio generation, speaker verification, and speech enhancement. I have published papers at top venues such as NeurIPS, ICLR, ICASSP, SLT, and InterSpeech.

I have interned at Meta and Microsoft. I work closely with Yao Qian and Bowen Shi. I am expected to graduate in 2027 and am actively seeking full-time research positions in industry or academia. Please feel free to reach out at zhangleying@sjtu.edu.cn.

I speak three languages fluently: Chinese, English, and French. Outside of research, I enjoy traveling, music, and movies. I love meeting new people and am happy to grab a coffee and chat!

🔥 News

2026.01: 🎉 One paper accepted by ICLR 2026
2025.10: 🚀 Joined Meta Superintelligence Labs as a Research Intern
2025.09: 🎉 One paper accepted by NeurIPS 2025
2024.12: 🎉 Two paper accepted by ICASSP 2025, received ICASSP 2025 Travel Grant
2024.10: 🚀 Joined Microsoft Core AI as a Research Intern
2024.09: 🎉 One paper accepted by NeurIPS 2024, received NeurIPS 2024 Scholar Award

📝 Publications

🗣️ Dialogue Generation

NeurIPS 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

First zero-shot multi-talker conversational speech generation system

NeurIPS 2025

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang, Yao Qian, Xiaofei Wang, Manthan Thakker, Dongmei Wang, Jianwei Yu, Haibin Wu, Yuxuan Hu, Jinyu Li, Yanmin Qian, Sheng Zhao

Fully non-autoregressive dialogue generation with flow matching
Supports zero-shot multi-speaker, multi-turn and fine-grained temporal control
Incorporated into Azure TTS product

ICASSP 2025 Slide: Integrating speech language model with LLM for spontaneous spoken dialogue generation, Haitian Lu, et al., Leying Zhang, et al.

🎤 Speech Generation (TTS)

Preprint

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian

LLM-based zero-shot ASMR speech generation

ICASSP 2025

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
Leying Zhang, Wangyou Zhang, Zhengyang Chen and Yanmin Qian

Controllable background removal and preservation in zero-shot TTS

InterSpeech 2025 E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning, Yihan Liu, Zhengyang Chen, Leying Zhang, Yanmin Qian
ICLR 2024 Prompttts 2: Describing and generating voices with text prompt, Yichong Leng, et al., Leying Zhang, et al.
NCMMSC 2025 Training Text-to-Speech Model with Purely Synthetic Data: Feasibility, Sensitivity, and Generalization Capability, Tingxiao Zhou, Leying Zhang, Zhengyang Chen, Yanmin Qian

🔊 Speech Extraction & Enhancement

SLT 2024

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Hemin Yang, Shujie Liu, Long Zhou, Yanmin Qian

Combined diffusion and discriminative methods for target speech extraction
Handles multi- and single-speaker scenarios in both noisy and clean conditions

Preprint Scale this, not that: Investigating key dataset attributes for efficient speech enhancement scaling, Leying Zhang, Wangyou Zhang, Chenda Li, Yanmin Qian
ICASSP 2024 Generation-Based Target Speech Extraction with Speech Discretization and Vocoder, Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian
ISCSLP 2024 Knowledge Distillation from Discriminative Model to Generative Model with Parallel Architecture for Speech Enhancement, Tingxiao Zhou, Leying Zhang, Yanmin Qian

🔏 Speaker Verification

ICASSP 2023 Adaptive Large Margin Fine-tuning for Speaker Verification, Leying Zhang, Zhengyang Chen and Yanmin Qian
InterSpeech 2022 Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification, Leying Zhang*, Zhengyang Chen* and Yanmin Qian
InterSpeech 2021 Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification, Leying Zhang, Zhengyang Chen and Yanmin Qian

📦 Other

ICLR 2026 FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates, Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, et al.

🎖 Honors and Awards

2025 ICASSP 2025 Travel Grant
2024 NeurIPS 2024 Scholar Award
2022 National Scholarship
2022 First place in CN-Celeb Speaker Recognition Challenge 2022
2021 ISCA and Interspeech Travel Grant
2021 Outstanding Graduates of Shanghai
2021 Outstanding Student Leader of SJTU
2020 Guanghua Scholarship
2019 SJTU Class B Scholarship

📖 Educations

2023.09 - Present, PhD in Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai
2021.09 - 2023.06, Master in Electronic Information, Shanghai Jiao Tong University, Shanghai
2021.09 - 2022.02, Exchange Student in Data Science and Image Processing, Télécom Paris (Institut polytechnique de Paris), France
2017.09 - 2021.06, Bachelor of Information Engineering and French (double degree), Shanghai Jiao Tong University, Shanghai

💻 Industry Experience

2025.10 - 2026.03, Research Intern, Meta Superintelligence Labs, New York, USA
2024.10 - 2025.08, Research Intern, Microsoft Core AI (Remote)
2023.03 - 2024.03, Research Intern, Microsoft Azure Research (Remote)
2022.11 - 2023.03, Research Intern, Microsoft Research Asia, Beijing, China

📚 Teaching

Spring 2025, Teaching Assistant - Intelligent Speech Technology
Fall 2022, Teaching Assistant - Machine Learning