Alibaba’s Qwen3.5-Omni challenges Google with extended audio processing
Source: Qwen
Alibaba is narrowing the capability gap in multimodal AI by releasing a model that processes 10+ hours of continuous audio—a substantial engineering feat that addresses a real friction point in voice-heavy applications like transcription, lecture analysis, and conversational AI. The competitive claim against Google’s Gemini 3.1 Pro shows that Chinese AI labs are matching or exceeding them on specific modalities, which matters because audio processing at scale is becoming table stakes for enterprise AI adoption. Omnimodal models (text, audio, image, video in one architecture) are positioned to outperform single-modality specialists, putting pressure on OpenAI and Google to justify their narrower, more specialized model releases.