Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in arXiv preprint arXiv:2601.13802, 2026
Habibi is a unified-dialectal Arabic TTS framework covering 12+ regional dialects. Our unified model matches or surpasses per-dialect specialized models and is highly competitive with ElevenLabs Eleven v3 (alpha).
Recommended citation: Y. Chen, J. Liu, Y. Tu, Z. Niu, Y. Liang, C. Qiang, C. Zhang, K. Yu, X. Chen. (2026). "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis." arXiv preprint arXiv:2601.13802.
Download Paper
Published in arXiv preprint arXiv:2601.18184, 2026
VibeVoice-ASR is a general-purpose speech understanding framework that supports single-pass processing for up to 60 minutes of audio, unifying ASR, Speaker Diarization, and Timestamping into a single end-to-end generation task. It supports over 50 languages and natively handles code-switching.
Recommended citation: Z. Peng, J. Yu, Y. Chang, Z. Wang, L. Dong, Y. Hao, Y. Tu, C. Yang, W. Wang, et al. (2026). "VIBEVOICE-ASR Technical Report." arXiv preprint arXiv:2601.18184.
Download Paper