Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
Published in arXiv preprint arXiv:2601.13802, 2026
Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. We present Habibi, a unified-dialectal Arabic TTS framework that addresses key barriers including substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. Both automatic metrics and human evaluations confirm that Habibi is highly competitive with ElevenLabs’ Eleven v3 (alpha) in intelligibility, speaker similarity, and naturalness.
Recommended citation: Y. Chen, J. Liu, Y. Tu, Z. Niu, Y. Liang, C. Qiang, C. Zhang, K. Yu, X. Chen. (2026). "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis." arXiv preprint arXiv:2601.13802.
Download Paper
