About Me
I am a third-year PhD student in Computer Science and Engineering at Shanghai Jiao Tong University, supervised by Prof. Yanmin Qian. My research focuses on cutting-edge audio and speech technologies.
Research Interests: Text-to-Speech • Multi-modality • Audio Generation • Speaker Verification
Education
PhD, Computer Science and Engineering
Sep 2023 - PresentShanghai Jiao Tong University
Supervisor: Prof. Yanmin Qian
Master, Electronic Information
Sep 2021 - Jun 2023Shanghai Jiao Tong University
Supervisor: Prof. Yanmin Qian
Exchange Student, Data Science and Image Processing
Sep 2021 - Feb 2022Télécom Paris (Institut polytechnique de Paris)
Bachelor of Information Engineering and French (Double Degree)
Sep 2017 - Jun 2021Shanghai Jiao Tong University
Selected Publications
"CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching"
NeurIPS, Dec. 2025 • PDF
"Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction"
ICASSP, April 2025 • PDF
"CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations"
NeurIPS, Dec. 2024 • PDF
"DDTSE: Discriminative Diffusion Model for Target Speech Extraction"
SLT, Dec. 2024 • PDF
"Adaptive Large Margin Fine-tuning for Speaker Verification"
ICASSP, June 2023
"Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification"
InterSpeech, Sep. 2022
"Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification"
InterSpeech, Sep. 2021
See CV for complete list of publications including collaborative works.
Industry Experience
Research Intern - Meta Superintelligence Labs
Oct 2025 - PresentLocation: New York, USA
Supervisor: Bowen Shi
Research Intern - Microsoft Core AI
Oct 2024 - Aug 2025Location: Remote
Supervisor: Yao Qian
Project: Text-to-Dialogue Generation - Designed and implemented a purely non-autoregressive dialogue generation framework that supports zero-shot multi-speaker, multi-turn and fine-grained temporal control. This system has been incorporated into the Azure TTS product.
Research Intern - Microsoft Azure Research
Mar 2023 - Mar 2024Location: Remote
Supervisor: Yao Qian
Projects:
- Target speech extraction: Investigated diffusion-based model for target speech extraction. Proposed an efficient approach by combining diffusion and discriminative methods for handling multi- and single-speaker scenarios in both noisy and clean conditions.
- Text-to-Dialogue Generation: Investigated Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation.
Research Intern - Microsoft Research Asia
Nov 2022 - Mar 2023Location: Beijing, China
Supervisor: Xu Tan
Projects:
- Audio generation: Implemented vector-quantized diffusion model with classifier-free guidance. Achieved 10% improvement over baseline. Investigated latent diffusion model's effects by fine-tuning Stable diffusion.
- Text-to-speech: Utilized vector-quantized diffusion model for text-to-speech on large-scale dataset with different neural audio codecs. Generated high-quality speech and achieved improvements on zero-shot text-to-speech.
Teaching Experience
Teaching Assistant - Intelligent Speech Technology
Spring 2025Shanghai Jiao Tong University
Teaching Assistant - Machine Learning
Fall 2022Shanghai Jiao Tong University
Honors and Awards
Skills
Languages
- Chinese - Native
- English - Professional
- French - Professional
Interests
- Badminton
- Yoga