Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

少于 1 分钟阅读时长

发布时间:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

少于 1 分钟阅读时长

发布时间:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

少于 1 分钟阅读时长

发布时间:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

少于 1 分钟阅读时长

发布时间:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

少于 1 分钟阅读时长

发布时间:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Published in arXiv preprint arXiv:2601.13802, 2026

Habibi is a unified-dialectal Arabic TTS framework covering 12+ regional dialects. Our unified model matches or surpasses per-dialect specialized models and is highly competitive with ElevenLabs Eleven v3 (alpha).
GitHub stars

Recommended citation: Y. Chen, J. Liu, Y. Tu, Z. Niu, Y. Liang, C. Qiang, C. Zhang, K. Yu, X. Chen. (2026). "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis." arXiv preprint arXiv:2601.13802.
Download Paper

VIBEVOICE-ASR Technical Report

Published in arXiv preprint arXiv:2601.18184, 2026

VibeVoice-ASR is a general-purpose speech understanding framework that supports single-pass processing for up to 60 minutes of audio, unifying ASR, Speaker Diarization, and Timestamping into a single end-to-end generation task. It supports over 50 languages and natively handles code-switching.
GitHub stars

Recommended citation: Z. Peng, J. Yu, Y. Chang, Z. Wang, L. Dong, Y. Hao, Y. Tu, C. Yang, W. Wang, et al. (2026). "VIBEVOICE-ASR Technical Report." arXiv preprint arXiv:2601.18184.
Download Paper

talks

teaching