Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

更便宜、更快速且具备文化感知力:Avataar 的视频 AI 专为印度市场规模打造

India’s AI model output has been slow compared to the U.S., Europe, and China. Only a few startups are releasing models, and most of them are large language models or voice models. To encourage more development, the government launched the India AI Mission, a roughly $1.2 billion initiative that — among other things — gives selected startups access to subsidized GPU compute in exchange for releasing their models publicly. 与美国、欧洲和中国相比,印度的 AI 模型产出一直较为缓慢。目前仅有少数初创公司发布模型,且大多为大语言模型或语音模型。为了鼓励更多开发,印度政府启动了“印度 AI 使命”(India AI Mission),这是一项耗资约 12 亿美元的计划。该计划的一项核心内容是:入选的初创公司可以获得 GPU 计算补贴,作为交换,他们必须公开其模型。

One of the 12 startups selected for the program, Avataar AI, has launched a new video model called Varya that is built to understand local context — such as identifying different festivals, food, and clothing. The Peak XV-backed startup, which focuses on creating video tools for e-commerce, didn’t build Varya from scratch. It started with Wan 2.2, a publicly available video generation model released by Alibaba, and used a technique called distillation — essentially compressing the model’s capabilities into a leaner, faster version optimized for Avataar’s specific use cases. 作为入选该计划的 12 家初创公司之一,Avataar AI 发布了一款名为 Varya 的新型视频模型。该模型旨在理解本地语境,例如识别不同的节日、食物和服饰。这家由 Peak XV 支持的初创公司专注于为电子商务创建视频工具,但 Varya 并非从零构建。它基于阿里巴巴发布的开源视频生成模型 Wan 2.2,并采用了“蒸馏”(distillation)技术——本质上是将模型的能力压缩成更精简、更快速的版本,以针对 Avataar 的特定用例进行优化。

The result is a model that runs in four steps rather than Wan 2.2’s 50, producing video 10 times faster and at a fraction of the cost. To put that in concrete terms: Using an Nvidia H200 GPU, Varya can generate a five-second 720p clip in 45 seconds, compared to 1,230 seconds for Wan 2.2. The most striking aspect of Varya may be its price. The company plans to charge ₹0.48 ($0.005) per second of video on its hosted service — far cheaper than models like Veo, Kling, Luma, and Runway, which typically charge $0.10 or more per second. That’s a roughly 20x price difference. 其结果是,该模型仅需 4 个步骤即可运行,而 Wan 2.2 需要 50 个步骤,这使得视频生成速度提高了 10 倍,且成本仅为原来的一小部分。具体来说:使用 Nvidia H200 GPU,Varya 生成一段 5 秒的 720p 视频仅需 45 秒,而 Wan 2.2 则需要 1,230 秒。Varya 最引人注目的或许是其价格。该公司计划在其托管服务上按每秒 0.48 卢比(约 0.005 美元)收费,这远低于 Veo、Kling、Luma 和 Runway 等通常每秒收费 0.10 美元或更高的模型。价格差异约为 20 倍。

“India is a video-first market. We see this across every large consumer internet product in India: video wins over text. Current AI video models are too expensive for population-scale use in India. If video AI is going to reach students, teachers, MSMEs, creators, enterprises, and public services, costs have to come down dramatically. Cost is the biggest unlock for AI adoption in India,” Peak XV’s managing director Rajan Anandan told TechCrunch. Peak XV 的董事总经理 Rajan Anandan 对 TechCrunch 表示:“印度是一个‘视频优先’的市场。我们在印度每一个大型消费互联网产品中都能看到这一点:视频胜过文字。目前的 AI 视频模型对于印度的人口规模应用来说太昂贵了。如果视频 AI 要惠及学生、教师、中小微企业、创作者、企业和公共服务部门,成本必须大幅下降。成本是印度 AI 普及的最大解锁点。”

Image and video generation models often miss cultural nuances and produce stereotyped or generic outputs — a problem TechCrunch has reported on before. Avataar AI says it has used curated data to train Varya to recognize cultural nuances including food, clothing, architecture, and festivals. Varya will be released as an open-weight model on India’s AIKosh portal — the Indian government’s centralized repository for publicly available AI models and datasets — along with its training data, meaning developers can self-host or modify it for their own needs. 图像和视频生成模型往往会忽略文化细微差别,产生刻板或通用的输出——这是 TechCrunch 之前报道过的一个问题。Avataar AI 表示,他们使用了精选数据来训练 Varya,使其能够识别包括食物、服饰、建筑和节日等在内的文化细微差别。Varya 将作为开源权重模型在印度的 AIKosh 门户网站(印度政府用于存放公开 AI 模型和数据集的中央存储库)上发布,并附带其训练数据,这意味着开发者可以自行托管或根据自身需求进行修改。

Avataar also plans to make the model available to its enterprise customers and says it is open to partnerships with video tools, including Higgsfield and Adobe Firefly. Anyone can try it now on its website using text prompts or reference images. Avataar 还计划向其企业客户提供该模型,并表示愿意与包括 Higgsfield 和 Adobe Firefly 在内的视频工具进行合作。目前,任何人都可以通过其网站使用文本提示或参考图像进行试用。

Varya’s launch reflects a fundamental tradeoff in India’s AI ambitions. Industry veterans have noted that India can make its mark in AI by creating applications and a robust developer ecosystem rather than competing on foundation models. And there’s a reason for that pragmatism: Model development has been slower in India than in global rivals due to a lack of compute and limited quality data availability. The India AI Mission is also part of a broader government push to close that gap. Last year, it selected 12 startups — Avataar AI among them — to develop AI models and provided them with cost-efficient compute. Earlier this year, IT minister Ashwini Vaishnaw said India aims to attract $200 billion in AI investment by 2028 and more than double its GPU capacity within six months. Varya 的发布反映了印度在 AI 雄心方面的一个基本权衡。行业资深人士指出,印度可以通过创建应用程序和强大的开发者生态系统,而不是在基础模型上进行竞争,从而在 AI 领域留下自己的印记。这种务实态度是有原因的:由于缺乏计算资源和高质量数据,印度的模型开发速度一直慢于全球竞争对手。“印度 AI 使命”也是政府缩小这一差距的更广泛举措的一部分。去年,政府选定了 12 家初创公司(包括 Avataar AI)来开发 AI 模型,并为它们提供了高性价比的计算资源。今年早些时候,印度信息技术部长 Ashwini Vaishnaw 表示,印度目标在 2028 年前吸引 2,000 亿美元的 AI 投资,并在六个月内将其 GPU 容量增加一倍以上。