ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ICG:通过基于 MLLM 的提示和个性化偏好对齐改进封面图像生成

Abstract: Recent advances in multimodal large language models (MLLMs) and diffusion models (DMs) have opened new possibilities for AI-generated content. Yet, personalized cover image generation remains underexplored, despite its critical role in boosting user engagement on digital platforms.

摘要: 多模态大语言模型(MLLM)和扩散模型(DM)的最新进展为人工智能生成内容开辟了新的可能性。然而,尽管个性化封面图像生成在提升数字平台用户参与度方面起着至关重要的作用,但目前对其研究仍显不足。

We propose ICG, a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant covers. ICG extracts semantic features from item titles and reference images via meta tokens, refines them with user embeddings, and injects the resulting personalized context into the diffusion model.

我们提出了 ICG,这是一个将基于 MLLM 的提示与个性化偏好对齐相结合的新型框架,旨在生成高质量且与上下文相关的封面。ICG 通过元标记(meta tokens)从项目标题和参考图像中提取语义特征,利用用户嵌入(user embeddings)对其进行优化,并将生成的个性化上下文注入到扩散模型中。

To address the lack of labeled supervision, we adopt a multi-reward learning strategy that combines public aesthetic and relevance rewards with a personalized preference model trained from user behavior. Unlike prior pipelines relying on handcrafted prompts and disjointed modules, ICG employs an adapter to bridge MLLMs and diffusion models for end-to-end training.

为了解决缺乏标注监督的问题,我们采用了一种多奖励学习策略,将公共审美和相关性奖励与从用户行为中训练出的个性化偏好模型相结合。与以往依赖手工提示和离散模块的流程不同,ICG 采用了一个适配器(adapter)来连接 MLLM 和扩散模型,从而实现端到端的训练。

Experiments demonstrate that ICG significantly improves image quality, semantic fidelity, and personalization, leading to stronger user appeal and offline recommendation accuracy in downstream tasks. As a plug-and-play adapter bridging MLLMs and diffusion models, ICG is compatible with common checkpoints and requires no ground-truth labels during optimization.

实验表明,ICG 显著提高了图像质量、语义保真度和个性化程度,从而在下游任务中增强了用户吸引力并提升了离线推荐准确率。作为连接 MLLM 和扩散模型的即插即用适配器,ICG 与常见的检查点(checkpoints)兼容,且在优化过程中无需真实标签(ground-truth labels)。