AUTOMATIC1111 / stable-diffusion-webui
AUTOMATIC1111 / stable-diffusion-webui
Stable Diffusion web UI A web interface for Stable Diffusion, implemented using Gradio library. Stable Diffusion web UI 这是一个基于 Gradio 库实现的 Stable Diffusion Web 界面。
Features
Detailed feature showcase with images:
- Original txt2img and img2img modes
- One click install and run script (but you still must install python and git)
- Outpainting
- Inpainting
- Color Sketch
- Prompt Matrix
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to:
- a man in a ((tuxedo)) - will pay more attention to tuxedo
- a man in a (tuxedo:1.21) - alternative syntax
- select text and press Ctrl+Up or Ctrl+Down (or Command+Up or Command+Down if you’re on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
- Loopback, run img2img processing multiple times
- X/Y/Z plot, a way to draw a 3 dimensional plot of images with different parameters
- Textual Inversion:
- have as many embeddings as you want and use any names you like for them
- use multiple embeddings with different numbers of vectors per token
- works with half precision floating point numbers
- train embeddings on 8GB (also reports of 6GB working)
- Extras tab with:
- GFPGAN, neural network that fixes faces
- CodeFormer, face restoration tool as an alternative to GFPGAN
- RealESRGAN, neural network upscaler
- ESRGAN, neural network upscaler with a lot of third party models
- SwinIR and Swin2SR (see here), neural network upscalers
- LDSR, Latent diffusion super resolution upscaling
- Resizing aspect ratio options
- Sampling method selection
- Adjust sampler eta values (noise multiplier)
- More advanced noise setting options
- Interrupt processing at any time
- 4GB video card support (also reports of 2GB working)
- Correct seeds for batches
- Live prompt token length validation
- Generation parameters:
- parameters you used to generate images are saved with that image in PNG chunks for PNG, in EXIF for JPEG
- can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
- can be disabled in settings
- drag and drop an image/text-parameters to promptbox
- Read Generation Parameters Button, loads parameters in promptbox to UI
- Settings page
- Running arbitrary python code from UI (must run with —allow-code to enable)
- Mouseover hints for most UI elements
- Possible to change defaults/mix/max/step values for UI elements via text config
- Tiling support, a checkbox to create images that can be tiled like textures
- Progress bar and live image generation preview
- Can use a separate neural network to produce previews with almost none VRAM or compute requirement
- Negative prompt, an extra text field that allows you to list what you don’t want to see in generated image
- Styles, a way to save part of prompt and easily apply them via dropdown later
- Variations, a way to generate same image but with tiny differences
- Seed resizing, a way to generate same image but at slightly different resolution
- CLIP interrogator, a button that tries to guess prompt from an image
- Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
- Batch Processing, process a group of files using img2img
- Img2img Alternative, reverse Euler method of cross attention control
- Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
- Reloading checkpoints on the fly
- Checkpoint Merger, a tab that allows you to merge up to 3 checkpoints into one
- Custom scripts with many extensions from community
- Composable-Diffusion, a way to use multiple prompts at once:
- separate prompts using uppercase AND
- also supports weights for prompts: a cat :1.2 AND a dog AND a penguin :2.2
- No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
- DeepDanbooru integration, creates danbooru style tags for anime prompts
- xformers, major speed increase for select cards: (add —xformers to commandline args)
- via extension:
- History tab: view, direct and delete images conveniently within the UI
- Generate forever option
- Training tab: hypernetworks and embeddings options
- Preprocessing images: cropping, mirroring, autotagging using BLIP or deepdanbooru (for anime)
- Clip skip
- Hypernetworks
- Loras (same as Hypernetworks but more pretty)
- A separate UI where you can choose, with preview, which embeddings, hypernetworks or Loras to add to your prompt
- Can select to load a different VAE from settings screen
- Estimated completion time in progress bar
- API
- Support for dedicated inpainting model by RunwayML
- via extension: Aesthetic Gradients, a way to generate images with a specific aesthetic by using clip images embeds
- Stable Diffusion 2.0 support
- Alt-Diffusion support
- Load checkpoints in safetensors format
- Eased resolution restriction: generated image’s dimensions must be a multiple of 8 rather than 64
- Reorder elements in the UI from settings screen
- Segmind Stable Diffusion support
功能特性
详细功能展示(含图片):
- 原始 txt2img(文生图)和 img2img(图生图)模式
- 一键安装和运行脚本(但仍需安装 Python 和 Git)
- 外绘 (Outpainting)
- 内绘 (Inpainting)
- 色彩草图 (Color Sketch)
- 提示词矩阵 (Prompt Matrix)
- Stable Diffusion 超分辨率放大
- 注意力机制 (Attention),指定模型应更关注文本的哪些部分:
- a man in a ((tuxedo)) - 会更关注 tuxedo
- a man in a (tuxedo:1.21) - 替代语法
- 选中文字并按 Ctrl+Up/Down(MacOS 为 Command+Up/Down)可自动调整选中文字的权重(由匿名用户贡献代码)
- 循环回馈 (Loopback),多次运行 img2img 处理
- X/Y/Z 图表,一种绘制不同参数图像的三维图表的方法
- Textual Inversion(文本反转):
- 可以拥有任意数量的嵌入 (embeddings),并为它们使用任何你喜欢的名称
- 支持每个 token 使用不同向量数量的多个嵌入
- 支持半精度浮点数
- 可在 8GB 显存上训练嵌入(也有 6GB 成功的报告)
- 附加功能选项卡 (Extras tab):
- GFPGAN,修复面部的神经网络
- CodeFormer,作为 GFPGAN 替代方案的面部修复工具
- RealESRGAN,神经网络放大器
- ESRGAN,带有大量第三方模型的神经网络放大器
- SwinIR 和 Swin2SR,神经网络放大器
- LDSR,潜在扩散超分辨率放大
- 调整宽高比选项
- 采样方法选择
- 调整采样器 eta 值(噪声乘数)
- 更高级的噪声设置选项
- 随时中断处理
- 支持 4GB 显存显卡(也有 2GB 成功的报告)
- 批处理的正确种子 (Seeds)
- 实时提示词 token 长度验证
- 生成参数:
- 生成图像时使用的参数会保存在 PNG 的块中(PNG 格式)或 EXIF 中(JPEG 格式)
- 可将图像拖动到 PNG info 选项卡以恢复生成参数并自动复制到 UI
- 可在设置中禁用
- 支持将图像/文本参数拖放到提示词框
- “读取生成参数”按钮,将参数加载到 UI 提示词框中
- 设置页面
- 从 UI 运行任意 Python 代码(必须使用 —allow-code 启用)
- 大多数 UI 元素的鼠标悬停提示
- 可通过文本配置更改 UI 元素的默认值/最大值/最小值/步长
- 平铺支持 (Tiling),勾选后可创建像纹理一样平铺的图像
- 进度条和实时图像生成预览
- 可使用单独的神经网络生成预览,几乎不占用显存或计算资源
- 负面提示词 (Negative prompt),一个额外的文本框,用于列出你不希望在生成图像中看到的内容
- 样式 (Styles),一种保存部分提示词并稍后通过下拉菜单轻松应用的方法
- 变体 (Variations),一种生成相同图像但带有细微差异的方法
- 种子调整大小 (Seed resizing),一种以略微不同的分辨率生成相同图像的方法
- CLIP 询问器 (CLIP interrogator),一个尝试从图像猜测提示词的按钮
- 提示词编辑 (Prompt Editing),在生成过程中更改提示词的方法,例如开始制作西瓜,中途切换到动漫女孩
- 批处理 (Batch Processing),使用 img2img 处理一组文件
- Img2img 替代方案,交叉注意力控制的逆向欧拉方法
- 高分辨率修复 (Highres Fix),一种一键生成高分辨率图片且无常见畸变的便捷选项
- 实时重新加载检查点 (Checkpoints)
- 检查点合并 (Checkpoint Merger),允许将最多 3 个检查点合并为一个的选项卡
- 带有许多社区扩展的自定义脚本
- 组合扩散 (Composable-Diffusion),一种同时使用多个提示词的方法:
- 使用大写 AND 分隔提示词
- 支持提示词权重:a cat :1.2 AND a dog AND a penguin :2.2
- 提示词无 token 限制(原始 Stable Diffusion 最多允许 75 个 token)
- DeepDanbooru 集成,为动漫提示词创建 Danbooru 风格标签
- xformers,显著提升特定显卡的运行速度(在命令行参数中添加 —xformers)
- 通过扩展功能:
- 历史记录选项卡:在 UI 内方便地查看、管理和删除图像
- 无限生成选项
- 训练选项卡:Hypernetworks 和 Embeddings 选项
- 图像预处理:使用 BLIP 或 deepdanbooru(针对动漫)进行裁剪、镜像、自动打标签
- Clip skip
- Hypernetworks
- Loras(与 Hypernetworks 类似但更美观)
- 独立的 UI,可预览并选择要添加到提示词中的 Embeddings、Hypernetworks 或 Loras
- 可在设置屏幕选择加载不同的 VAE
- 进度条中的预计完成时间
- API
- 支持 RunwayML 的专用内绘模型
- 通过扩展:Aesthetic Gradients,通过使用 clip 图像嵌入生成具有特定美感的图像
- 支持 Stable Diffusion 2.0
- 支持 Alt-Diffusion
- 以 safetensors 格式加载检查点
- 放宽分辨率限制:生成图像的尺寸必须是 8 的倍数,而不是 64
- 在设置屏幕中重新排列 UI 元素
- 支持 Segmind Stable Diffusion
Installation and Running
Make sure the required dependencies are met and follow the instructions available for:
- NVidia (recommended)
- AMD GPUs.
- Intel CPUs, Intel GPUs (both integrated and discrete) (external wiki page)
- Ascend NPUs (external wiki page) Alternatively, use online services (like Google Colab): List of Online Services
安装与运行
确保满足所需的依赖项,并按照以下说明进行操作:
- NVidia(推荐)
- AMD GPU
- Intel CPU、Intel GPU(集成和独立显卡)(外部维基页面)
- 昇腾 NPU(外部维基页面) 或者,使用在线服务(如 Google Colab):在线服务列表