RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

RightNow-Arabic-0.5B-Turbo：通过词汇注入与边缘优先部署实现的开源 10 亿参数以下阿拉伯语大模型

Abstract: Open Arabic large language models split into two classes: sub-1B multilingual models that treat Arabic as an afterthought (Qwen2.5-0.5B, Falcon-H1-0.5B), and 7B-70B Arabic-specialized models that require a server to run (Jais, AceGPT, ALLaM, SILMA). The one published attempt at a sub-2B Arabic-specialized model, Kuwain-1.5B, never released its weights.

摘要： 目前开源的阿拉伯语大语言模型主要分为两类：一类是将阿拉伯语视为次要语言的 10 亿参数以下多语言模型（如 Qwen2.5-0.5B、Falcon-H1-0.5B）；另一类是需要服务器才能运行的 70 亿至 700 亿参数的阿拉伯语专用模型（如 Jais、AceGPT、ALLaM、SILMA）。此前唯一公开发布的 20 亿参数以下阿拉伯语专用模型尝试 Kuwain-1.5B，并未公开其权重。

We present RightNow-Arabic-0.5B-Turbo, a 518M-parameter Arabic-specialized decoder LLM built on Qwen2.5-0.5B. The pipeline adds 27,032 Arabic tokens via mean-subtoken initialization, continues pretraining on 504M Arabic tokens on 8xH100 with FSDP, FlashAttention varlen packing, and Liger fused kernels, then applies supervised fine-tuning on 129,116 Arabic instruction pairs with response-only loss masking, direct preference optimization on 6,750 Arabic preference pairs, and weight soup merging across three checkpoints.

我们推出了 RightNow-Arabic-0.5B-Turbo，这是一个基于 Qwen2.5-0.5B 构建的 5.18 亿参数阿拉伯语专用解码器大语言模型。该流程通过均值子词初始化（mean-subtoken initialization）增加了 27,032 个阿拉伯语词元，在 8 张 H100 GPU 上利用 FSDP、FlashAttention 变长打包（varlen packing）和 Liger 融合算子，在 5.04 亿个阿拉伯语词元上进行了持续预训练。随后，该模型在 129,116 对阿拉伯语指令集上进行了监督微调（采用仅回复损失掩码），在 6,750 对阿拉伯语偏好数据上进行了直接偏好优化（DPO），并对三个检查点进行了权重融合（Weight Soup）。

On three lm-evaluation-harness Arabic benchmarks (COPA-ar, Arabic HellaSwag, ArabicMMLU) the merged model reaches 35.9% mean accuracy, beats every same-class open model, ties Falcon-H1-1.5B on COPA-ar (58.4%) at one-third the size, and recovers 67% of SILMA-9B’s mean at 1/18 the parameters. The edge build quantizes to 398 MB (q4_k_m) and delivers 635 tokens/s at batch size 1 on a single H100 via this http URL. All code (5,555 lines across 25 scripts), weights (bf16, int8, and four GGUF quantizations), and benchmark scripts are released at this https URL.

在三个 lm-evaluation-harness 阿拉伯语基准测试（COPA-ar、Arabic HellaSwag、ArabicMMLU）中，该融合模型达到了 35.9% 的平均准确率，超越了所有同类开源模型；在 COPA-ar 测试中以三分之一的体积追平了 Falcon-H1-1.5B（58.4%），并以 1/18 的参数量达到了 SILMA-9B 模型 67% 的平均性能。其边缘计算版本量化后仅为 398 MB（q4_k_m），在单张 H100 上以 batch size 为 1 时可达到 635 tokens/s 的生成速度。所有代码（25 个脚本共 5,555 行）、权重（bf16、int8 及四种 GGUF 量化版本）以及基准测试脚本均已在指定链接发布。