Compressing Image Style Training into a Single Model Forward

将图像风格训练压缩至单次模型前向传播

Abstract: Diffusion-based style transfer must balance inference efficiency with stylization fidelity. Adapter-based methods are efficient, but they inject style as an external condition and can either weaken reference-specific appearance or copy reference semantics into the generated image. Optimization-based personalization methods such as LoRA internalize style more effectively, but require a separate training process for every new style.

摘要： 基于扩散模型的风格迁移必须在推理效率与风格化保真度之间取得平衡。基于适配器（Adapter）的方法虽然高效，但它们将风格作为外部条件注入，这可能会削弱参考图像的特定外观，或者将参考图像的语义内容复制到生成图像中。基于优化的个性化方法（如 LoRA）能更有效地内化风格，但每种新风格都需要独立的训练过程。

We introduce i2L (image-to-LoRA), a framework that amortizes style LoRA training into a single forward pass. Given one or more reference images, i2L predicts LoRA weights for a text-to-image model, enabling immediate style instantiation without per-style optimization.

我们引入了 i2L (image-to-LoRA) 框架，该框架将风格 LoRA 的训练摊销为单次前向传播。给定一张或多张参考图像，i2L 能够预测文本到图像模型的 LoRA 权重，从而实现即时的风格实例化，无需针对每种风格进行单独优化。

The architecture combines an image encoder, learnable LoRA queries, and compressed decoding heads that generate adapted matrices. Training on semantically diverse style pairs encourages the predictor to preserve appearance cues while suppressing reference-content copying.

该架构结合了图像编码器、可学习的 LoRA 查询以及生成适配矩阵的压缩解码头。通过在语义多样的风格对上进行训练，该模型能够鼓励预测器在保留外观特征的同时，抑制对参考图像内容的复制。

Experiments on Z-Image, FLUX.2, and Hidream-O1 show that i2L improves style fidelity, prompt alignment, and perceptual quality over existing baselines. Because i2L produces explicit LoRA weights, it also supports asymmetric classifier-free guidance, multi-reference style fusion, and composition with controllable-generation modules.

在 Z-Image、FLUX.2 和 Hidream-O1 上的实验表明，与现有基准相比，i2L 在风格保真度、提示词对齐度和感知质量方面均有提升。由于 i2L 生成的是显式的 LoRA 权重，它还支持非对称无分类器引导（asymmetric classifier-free guidance）、多参考风格融合，以及与可控生成模块的组合使用。