AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI A web interface for Stable Diffusion, implemented using Gradio library. Stable Diffusion web UI 这是一个基于 Gradio 库实现的 Stable Diffusion Web 界面。

Features

Detailed feature showcase with images:

Original txt2img and img2img modes
One click install and run script (but you still must install python and git)
Outpainting
Inpainting
Color Sketch
Prompt Matrix
Stable Diffusion Upscale
Attention, specify parts of text that the model should pay more attention to:
- a man in a ((tuxedo)) - will pay more attention to tuxedo
- a man in a (tuxedo:1.21) - alternative syntax
- select text and press Ctrl+Up or Ctrl+Down (or Command+Up or Command+Down if you’re on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
Loopback, run img2img processing multiple times
X/Y/Z plot, a way to draw a 3 dimensional plot of images with different parameters
Textual Inversion:
- have as many embeddings as you want and use any names you like for them
- use multiple embeddings with different numbers of vectors per token
- works with half precision floating point numbers
- train embeddings on 8GB (also reports of 6GB working)
Extras tab with:
- GFPGAN, neural network that fixes faces
- CodeFormer, face restoration tool as an alternative to GFPGAN
- RealESRGAN, neural network upscaler
- ESRGAN, neural network upscaler with a lot of third party models
- SwinIR and Swin2SR (see here), neural network upscalers
- LDSR, Latent diffusion super resolution upscaling
Resizing aspect ratio options
Sampling method selection
Adjust sampler eta values (noise multiplier)
More advanced noise setting options
Interrupt processing at any time
4GB video card support (also reports of 2GB working)
Correct seeds for batches
Live prompt token length validation
Generation parameters:
- parameters you used to generate images are saved with that image in PNG chunks for PNG, in EXIF for JPEG
- can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
- can be disabled in settings
- drag and drop an image/text-parameters to promptbox
- Read Generation Parameters Button, loads parameters in promptbox to UI
Settings page
Running arbitrary python code from UI (must run with —allow-code to enable)
Mouseover hints for most UI elements
Possible to change defaults/mix/max/step values for UI elements via text config
Tiling support, a checkbox to create images that can be tiled like textures
Progress bar and live image generation preview
Can use a separate neural network to produce previews with almost none VRAM or compute requirement
Negative prompt, an extra text field that allows you to list what you don’t want to see in generated image
Styles, a way to save part of prompt and easily apply them via dropdown later
Variations, a way to generate same image but with tiny differences
Seed resizing, a way to generate same image but at slightly different resolution
CLIP interrogator, a button that tries to guess prompt from an image
Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
Batch Processing, process a group of files using img2img
Img2img Alternative, reverse Euler method of cross attention control
Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
Reloading checkpoints on the fly
Checkpoint Merger, a tab that allows you to merge up to 3 checkpoints into one
Custom scripts with many extensions from community
Composable-Diffusion, a way to use multiple prompts at once:
- separate prompts using uppercase AND
- also supports weights for prompts: a cat :1.2 AND a dog AND a penguin :2.2
No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
DeepDanbooru integration, creates danbooru style tags for anime prompts
xformers, major speed increase for select cards: (add —xformers to commandline args)
via extension:
- History tab: view, direct and delete images conveniently within the UI
- Generate forever option
- Training tab: hypernetworks and embeddings options
- Preprocessing images: cropping, mirroring, autotagging using BLIP or deepdanbooru (for anime)
- Clip skip
- Hypernetworks
- Loras (same as Hypernetworks but more pretty)
- A separate UI where you can choose, with preview, which embeddings, hypernetworks or Loras to add to your prompt
- Can select to load a different VAE from settings screen
- Estimated completion time in progress bar
- API
- Support for dedicated inpainting model by RunwayML
- via extension: Aesthetic Gradients, a way to generate images with a specific aesthetic by using clip images embeds
Stable Diffusion 2.0 support
Alt-Diffusion support
Load checkpoints in safetensors format
Eased resolution restriction: generated image’s dimensions must be a multiple of 8 rather than 64
Reorder elements in the UI from settings screen
Segmind Stable Diffusion support

功能特性

详细功能展示（含图片）：

原始 txt2img（文生图）和 img2img（图生图）模式
一键安装和运行脚本（但仍需安装 Python 和 Git）
外绘 (Outpainting)
内绘 (Inpainting)
色彩草图 (Color Sketch)
提示词矩阵 (Prompt Matrix)
Stable Diffusion 超分辨率放大
注意力机制 (Attention)，指定模型应更关注文本的哪些部分：
- a man in a ((tuxedo)) - 会更关注 tuxedo
- a man in a (tuxedo:1.21) - 替代语法
- 选中文字并按 Ctrl+Up/Down（MacOS 为 Command+Up/Down）可自动调整选中文字的权重（由匿名用户贡献代码）
循环回馈 (Loopback)，多次运行 img2img 处理
X/Y/Z 图表，一种绘制不同参数图像的三维图表的方法
Textual Inversion（文本反转）：
- 可以拥有任意数量的嵌入 (embeddings)，并为它们使用任何你喜欢的名称
- 支持每个 token 使用不同向量数量的多个嵌入
- 支持半精度浮点数
- 可在 8GB 显存上训练嵌入（也有 6GB 成功的报告）
附加功能选项卡 (Extras tab)：
- GFPGAN，修复面部的神经网络
- CodeFormer，作为 GFPGAN 替代方案的面部修复工具
- RealESRGAN，神经网络放大器
- ESRGAN，带有大量第三方模型的神经网络放大器
- SwinIR 和 Swin2SR，神经网络放大器
- LDSR，潜在扩散超分辨率放大
调整宽高比选项
采样方法选择
调整采样器 eta 值（噪声乘数）
更高级的噪声设置选项
随时中断处理
支持 4GB 显存显卡（也有 2GB 成功的报告）
批处理的正确种子 (Seeds)
实时提示词 token 长度验证
生成参数：
- 生成图像时使用的参数会保存在 PNG 的块中（PNG 格式）或 EXIF 中（JPEG 格式）
- 可将图像拖动到 PNG info 选项卡以恢复生成参数并自动复制到 UI
- 可在设置中禁用
- 支持将图像/文本参数拖放到提示词框
- “读取生成参数”按钮，将参数加载到 UI 提示词框中
设置页面
从 UI 运行任意 Python 代码（必须使用 —allow-code 启用）
大多数 UI 元素的鼠标悬停提示
可通过文本配置更改 UI 元素的默认值/最大值/最小值/步长
平铺支持 (Tiling)，勾选后可创建像纹理一样平铺的图像
进度条和实时图像生成预览
可使用单独的神经网络生成预览，几乎不占用显存或计算资源
负面提示词 (Negative prompt)，一个额外的文本框，用于列出你不希望在生成图像中看到的内容
样式 (Styles)，一种保存部分提示词并稍后通过下拉菜单轻松应用的方法
变体 (Variations)，一种生成相同图像但带有细微差异的方法
种子调整大小 (Seed resizing)，一种以略微不同的分辨率生成相同图像的方法
CLIP 询问器 (CLIP interrogator)，一个尝试从图像猜测提示词的按钮
提示词编辑 (Prompt Editing)，在生成过程中更改提示词的方法，例如开始制作西瓜，中途切换到动漫女孩
批处理 (Batch Processing)，使用 img2img 处理一组文件
Img2img 替代方案，交叉注意力控制的逆向欧拉方法
高分辨率修复 (Highres Fix)，一种一键生成高分辨率图片且无常见畸变的便捷选项
实时重新加载检查点 (Checkpoints)
检查点合并 (Checkpoint Merger)，允许将最多 3 个检查点合并为一个的选项卡
带有许多社区扩展的自定义脚本
组合扩散 (Composable-Diffusion)，一种同时使用多个提示词的方法：
- 使用大写 AND 分隔提示词
- 支持提示词权重：a cat :1.2 AND a dog AND a penguin :2.2
提示词无 token 限制（原始 Stable Diffusion 最多允许 75 个 token）
DeepDanbooru 集成，为动漫提示词创建 Danbooru 风格标签
xformers，显著提升特定显卡的运行速度（在命令行参数中添加 —xformers）
通过扩展功能：
- 历史记录选项卡：在 UI 内方便地查看、管理和删除图像
- 无限生成选项
- 训练选项卡：Hypernetworks 和 Embeddings 选项
- 图像预处理：使用 BLIP 或 deepdanbooru（针对动漫）进行裁剪、镜像、自动打标签
- Clip skip
- Hypernetworks
- Loras（与 Hypernetworks 类似但更美观）
- 独立的 UI，可预览并选择要添加到提示词中的 Embeddings、Hypernetworks 或 Loras
- 可在设置屏幕选择加载不同的 VAE
- 进度条中的预计完成时间
- API
- 支持 RunwayML 的专用内绘模型
- 通过扩展：Aesthetic Gradients，通过使用 clip 图像嵌入生成具有特定美感的图像
支持 Stable Diffusion 2.0
支持 Alt-Diffusion
以 safetensors 格式加载检查点
放宽分辨率限制：生成图像的尺寸必须是 8 的倍数，而不是 64
在设置屏幕中重新排列 UI 元素
支持 Segmind Stable Diffusion

Installation and Running

Make sure the required dependencies are met and follow the instructions available for:

NVidia (recommended)
AMD GPUs.
Intel CPUs, Intel GPUs (both integrated and discrete) (external wiki page)
Ascend NPUs (external wiki page) Alternatively, use online services (like Google Colab): List of Online Services

安装与运行

确保满足所需的依赖项，并按照以下说明进行操作：

NVidia（推荐）
AMD GPU
Intel CPU、Intel GPU（集成和独立显卡）（外部维基页面）
昇腾 NPU（外部维基页面）或者，使用在线服务（如 Google Colab）：在线服务列表