perfectviewer_perfectviewer官方

2024-02-29 17:39:47
手机app游戏网 > 游戏攻略 > perfectviewer_perfectviewer官方

Stable Diffusion是一个基于Latent Diffusion Models（潜在扩散模型，LDMs）的文图生成（text-to-image）模型。它是由CompVis、Stability AI和LAION共同开发的一个文本转图像模型，可以将文本描述转换为图像。

安装 Stable Diffusion 和模板库

这里选了一套比较像真人的模板库

Installing gfpgan Installing clip Installing open_clip Cloning Stable Diffusion into /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai... Cloning Taming Transformers into /content/stable-diffusion-webui/repositories/taming-transformers... Cloning K-diffusion into /content/stable-diffusion-webui/repositories/k-diffusion... Cloning CodeFormer into /content/stable-diffusion-webui/repositories/CodeFormer... Cloning BLIP into /content/stable-diffusion-webui/repositories/BLIP... Installing requirements for CodeFormer Installing requirements for Web UI Installing sd-webui-controlnet requirement: svglib Installing Deforum requirement: av Installing Deforum requirement: pims Installing rembg Installing onnxruntime for REMBG extension Installing pymatting for REMBG extension Installing pycloudflared Launching Web UI with arguments: --listen --xformers --enable-insecure-extension-access --theme dark --gradio-queue --multiple 2023-04-06 02:26:14.767376: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-06 02:26:16.773685: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Additional Network extension not installed, Only hijack built-in lora LoCon Extension hijack built-in lora successfully [AddNet] Updating model hashes... 0it [00:00, ?it/s] [AddNet] Updating model hashes... 0it [00:00, ?it/s] all detected, remote.moe trying to connect... Warning: Permanently added 'localhost.run,54.82.85.249' (RSA) to the list of known hosts. Warning: Permanently added 'remote.moe,159.69.126.209' (ECDSA) to the list of known hosts. all detected, cloudflared trying to connect... Download cloudflared...: 100% 34.0M/34.0M [00:00<00:00, 445MB/s] Calculating sha256 for /content/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors: fc2511737a54c5e80b89ab03e0ab4b98d051ab187f92860f3cd664dc9d08b271 Loading weights [fc2511737a] from /content/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors Creating model from config: /content/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Downloading (…)olve/main/vocab.json: 961kB [00:00, 1.49MB/s] Downloading (…)olve/main/merges.txt: 525kB [00:00, 1.21MB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 41.9kB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 285kB/s] Downloading (…)lve/main/config.json: 4.52kB [00:00, 3.55MB/s] Applying xformers cross attention optimization. Textual inversion embeddings loaded(7): bad_prompt_version2, bad-artist, bad-artist-anime, ng_deepnegative_v1_75t, bad-image-v2-39000, EasyNegative, bad-hands-5 Model loaded in 54.0s (calculate hash: 23.7s, load weights from disk: 0.6s, create model: 14.8s, apply weights to model: 14.7s). *Deforum ControlNet support: enabled*language-bash复制代码

安装好后，进入界面：

提示词：

(8k, RAW photo, best quality, masterpiece:1.2), (realistic, photo-realistic:1.4), ultra-detailed, (Kpop idol),perfect detail , looking at viewer,make up,

paintings,sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, bad anatomy,(long hair:1.4),DeepNegative,(fat:1.2),facing away, looking away,tilted head, {Multiple people}, lowres,bad anatomy,bad hands, text, error, missing fingers,extra digit, fewer digits, cropped, worstquality, low quality, normal quality,jpegartifacts,signature, watermark, username,blurry,bad feet,cropped,poorly drawn hands,poorly drawn face,mutation,deformed,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,extra fingers,fewer digits,extra limbs,extra arms,extra legs,malformed limbs,fused fingers,too many fingers,long neck,cross-eyed,mutated hands,polar lowres,bad body,bad proportions,gross proportions,text,error,missing fingers,missing arms,missing legs,extra digit, extra arms, extra leg, extra foot,

下面是用Stable Diffusion AI生成的几张照片：

感觉是一个模子，把人物的特征填充进去，然后就可以生成各种各样的人物了。有点像真人，但离真人还是有一定差距的，但是这个技术还是很有意思的。

算了，还是直接看真人吧！

欢迎来到觉悟之坡AI绘画系列第39篇。

本文作者：蚂蚁

一.前言

上周大名鼎鼎的controlnet插件发布了新的功能更新，并被作者标记为【主要更新】——Reference only，这个功能有多强大？

按照controlnet作者的介绍：

Now we have a reference-only preprocessor that does not require any control models. It can guide the diffusion directly using images as references.
翻译：现在我们有一个 reference-only预处理，它不需要任何控制模型即可实现直接使用一张图片作为参考来引导扩散。

如作者所给出的这个例子，只在controlnet中上传了一张狗狗的图片，引导词也非常简练。

仅仅是 "a dog running on grassland, best quality, ..."，然后就得到了主体相似、风格也相似，但是动作符合引导描述的图像，效果可以说非常惊艳。

在github评论区，有用户演示了使用二次元图片和midjourney生成图片作为参考，生成图片的效果，让人直呼——从此要lora何用？（并不！）

如果能够省去lora训练的成本确实很有诱惑，毕竟训练lora需要很好的显卡还要大量时间处理图片。

那么我们就来亲自体验一下Reference only这个功能，看看效果如何?

(后面为了方便表述，暂且把reference only叫做“参考模式”。）

二.安装/更新controlnet

1.版本要求

根据controlnet作者的说明：参考模式这项新功能需要我们将插件更新到1.1.153版本及以上。

还没有更新的小伙伴需要更新一下你的controlnet插件，如果你是从1.1以下更新的，那么你要额外下载controlnet预处理模型。

如果你的网络情况比较好，你可以让webUI自己更新，如果网络情况一般则需要手动更新。

如果需要安装/更新的小伙伴请继续往下看，如果你的controlnet符合版本要求，请直接跳过。

2.安装、更新及预处理模型下载

我相信在今天stable diffusion的玩家不可能没安装controlnet这个神级插件。如果需要安装请参考新手安装controlnet教程。

而作为使用mac版的（比如我），从来不敢无脑更新，因为有的插件兼容性存在问题，所以我一般都是去github下载代码包的zip文件，然后解压并替换extensions文件夹下的插件文件。

最后更新好controlnet插件后，记得去下载预处理模型，否则插件无法正常工作。
https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main。

目前预处理模型已经有几十个了，如果不知道哪些有用就都下吧，以后慢慢都会用到。

下载好放到stable-diffusion-webui\extensions\sd-webui-controlnet\models文件夹。

更新好后务必重新开启webUI，如果controlnet里没有显示预处理模型，请点击旁边的刷新按钮，如果还没出来，请检查模型放置是否有误。

三.测试和使用

如果sd webUI如上图显示，在预处理器里面可以找到Reference only了，那么恭喜你准备工作已经完成了。

首先我先复现一下controlnet的演示案例，看看作者是不是在吹牛皮。

（一）简单案例的复现

为了尽可能测试参考模式的能力，我从搜索引擎找到一张宠物狗的实拍照片，为了不干扰模型的判断，尽可能保证主体清晰无遮挡，作为简单测试也确保画面中只有一个主体，裁剪为正方形，如下：

开始操作，首先保险起见底模选择泛用性好一些的，比如sd1.5/2.1测试：

1.在文生图现象卡下，输入正词(P prompt)：a dog running on grassland, best quality，反词（N prompt）暂不填写；

2.采样器（sampling method）选择Euler A，采样步数（sampling steps）设置20，出图宽（width）高（height）设置为512，CFG尺度（cfg sacler）设置为7；

3.将素材图拖入controlnet单张图片区域（single image），勾选启用（Enable）、像素完美（pixel perfect），低显存（low vram）酌情选择；

4.预处理器（preprocessor）选择reference_only，这时候旁边的预处理模型选择框会消失，控制模式（control mode）选择平衡（balanced），如果你是1.1.17版本以上的controlnet，这里你需要将风格保真度（Style Fidelity）调整为1；

5.点击生成按钮

可以看到生成的图确实和素材还比较贴合，但是也有意外情况（右上角那张），但是我们的引导词中没有描述狗的品种、颜色，也没有使用任何lora或者专门训练的底模，这样的生成结果已经达到了预期，当然还是有一定的随机性，可以通过多次生成筛选，或者多次迭代prompt，优化生成结果应该也不难。

（二）测试二次元图片

以这张图为素材，同样是来自搜索引擎：

步骤只需进行少量修改：

1.选择一个二次元底模，比如anything、counterfeit、ghostmix之类的，我选的ghostmix测试；

2.在文生图现象卡下，输入正词(P prompt)：1girl standing nearby street,city landscape, best quality，

反词（N prompt）使用以下ng_deepnegative_v1_75t ,easynegative。

*如果这两个反词的文本拓展没有装的话就复制一段人像常用的反词就行（可以在huggingface或者c站找到很多类似的embedding）；

3.勾选面部修复（restore faces），并适当调高采样步数（sampling steps），我设置到了50（一般情况下无需这么高，我只怕不够，懒得重做而已）；

4.将素材图拖到controlnet单张图片区域（single image），其余设置不变；

5.点击生成；

稍等片刻得到了结果，发现发型有点问题，但是也大差不差了（我的要求还真低）。

但是可以看到背景确实按照描述词替换成街景了，而且与构图视角非常和谐，果然如作者所说——取代了inpaint。

但是不要忘了，我这个教程的标题可是要取代lora，既然挖了这么大的坑，我就得含泪填下去，那么从prompt下手，优化一下吧

这一次我把正词(P prompt)修改为：

1girl standing nearby street,(short hair),city landscape, best quality

然后问题基本就解决，上面跑了三次证明了对角色的还原已经在80%了，这个prompt对于大多数老手来说都是过于简单的描述，还有很大的发挥余地，比如描述一下服饰，增加一些细节提示等等。

当然如果你还觉得这篇文章是标题党，那么请跟我继续测试。

（三）对SD生成真人图片继续演绎

前面可以说都是前菜，看到这里我们开始真正测试参考模式能不能替代lora，同时测试一下对真人图片能否达到预期。

1.首先我们使用chilloutmix_ni模型，正词、反词填写如下：

2.Sampler: DPM++ 2M Karras,步数30，CFG Scale设置为7，开启面部修复

3.不使用controlnet，lora用不用随意

Postive prompt:

masterpiece, best quality, 1girl, aqua eyes, black hair, closed mouth, multicolored background, looking at viewer, outdoors, solo, upper body, alluring, clean, beautiful face, pure face, pale skin, sexy pose,((tube top, navel, shorts)),short hair, ((perfect female figure)), mature female, narrow waist, chinese deity, seductive, highly detailed,best quality, masterpiece, highres, original, extremely detailed 8K, wallpaper, masterpiece, best quality, illustration, beautifully detailed eyes, cinematic lighting, earrings, jewelry,

Negative prompt:

sketches, (worst quality:2), (low quality:2), (normal quality:2), multiple breasts, (mutated hands and fingers:1.5 ), (long body :1.3), (mutation, poorly drawn :1.2) , black-white, bad anatomy, liquid body, liquid tongue, disfigured, malformed, mutated, anatomical nonsense, text font ui, error, malformed hands, long neck, blurred, lowers, lowres, bad anatomy, bad proportions, bad shadow, uncoordinated body, unnatural body, fused breasts, bad breasts, huge breasts, poorly drawn breasts, extra breasts, liquid breasts, heavy breasts, missing breasts, huge haunch, huge thighs, huge calf, bad hands, fused hand, missing hand,

然后我们得到这样一张图，现在我们就用这张图测试controlnet的参考模式。

我目前想到的玩法有以下几种，我们逐一测试：

替换背景和服装

保留主角，替换动作

更换模型，改变画风

等等

（四）替换背景和服装

如果你仔细读了这张图的描述词就可以发现，所有内容几乎都是描述人物本身，对于背景仅仅描述为户外的多色背景（multicolored background,outdoors），所以就有很大的随机性，这里我们来尝试保留主角人物形象，替换背景为赛博朋克场景。

1.我们将这张图拖入到controlnet图像上传区，勾选启用和像素完美（pixel perfect），

2.预处理器（preprocessor）选择reference_only，控制模式选择balanced，风格保真度（sytle fidelity）设置为1，

3.采样方法修改成Eumler a（这里我测试了，用DPM++ 2M Karras确实没有用eumler a对原图拟合的好）

4.修改正词，改成着重描述画面背景：

masterpiece, best quality, 1girl, indoor, (scifi style background), ((in a local bar)),cyberpunker lighting, ((neon lamp)), sci-fi details, insane level of details, hyper realistic, cinematic, composition

5.因为原图有一定的背景内容，为了能覆盖掉这里需要加深提示词的相关性（CFG Scale），我从7调整到了9，

6.如果你刚才使用了lora，这一次取消lora，这样才能验证主体的一致性是不是参考模式作用的。

这次图放的多一点，可以看到共性了吧，衣服样式颜色虽然有的迷失了，但是样貌可以说非常稳，甚至比用多个lora混合的时候还稳。

而这一切，都是一张图和引导词中的一个1girl实现的。

顺便说一句这个例子里面我们实现的是更换背景，而衣服的改变是比较随机的，如果要更换衣服我们可以在第二次生成的引导词中着重描述衣服即可。

（五）保留主角，替换动作

有了前面的基础，我就不一步一步讲解了，只说一下思路，其实很简单，就是不更改引导词，而在第二次生成的时候创建两个controlnet，一个是参考模式，另一个是open pose即可。

如果不知道怎么同时使用多个controlnet的，请到设置settings里面修改，见下图。

我们还是用那张图作为参考，在新的controlnet中选择上次生成的随便一张图，我选了一张动作比较符合场景，但是衣服没有借鉴参考原图的一张。

顺便看看能不能把这张图的服饰修复到我们前一试验的期望效果。

效果相当完美不是吗？我们不仅解决了前一试验中不能100%copy原图风格的问题，而且达到了指定动作的效果。

这里需要说一个重要的技巧，如果你观察过stable diffusion的生成（扩散）过程，会发现，扩散的早期颜色并不会稳定下来，而构图却会。

所以我一般会将open pose设定为早期介入，而reference_only相对晚一些介入，这个技巧对于reference_only和canny联合使用的时候有更为出色的效果。

这样设置的目的是避免参考图的构图和画面元素影响最终构图。设置方式如下：

（六）更换模型，改变画风

这个比较简单了，就直接替换底模，就可以了，可以实现人像转漫画，2d/2.5d/3d风格互转，相比之前使用controlnet canny的草稿模式，这个方式有两个优势：

1.风格迁移稳定，如之前测试所见，只需稍加约束即可（引导词约束或者controlnet约束）；

2.风格转换时画面具有随机性，而不是千篇一律的构图，一张图换一套图，我就问你香不香？

操作非常简单，所以直接看结果：

先回忆一下参考原图，是这样的

转2.5D是这样的

转2D（动漫风）是这样的

而且这里同样有个技巧——我只替换了底模为anything v5，但是讲道理这个底模应该出图都是2D风格啊，但是如果你亲身去尝试，就会发现参考模式会干扰生成效果，如果没有特殊修改最多只能达到2.5D的效果。

而就像前面提到的修改参考模式在扩散过程中进入和退出的时机，可以排除这种强干扰。经过测试，这样的设置就可以保证降维成功（实现2.5 D转2D）。

四.更好玩的事情

因为这部分内容我都是开脑洞尝试的，所以仅仅抛砖引玉，提供思路。

感兴趣的朋友可以自己多多上手操作尝试。

五.总结

通过这些测试，得到了如下使用经验和结论：

参考模式可以将给定的一张图作为生成图的参照物，通过相对简单的引导词即可用参考图的内容生成到新图中

相比controlnet草稿模式，参考模式拥有很多的随机发挥的灵活性

参考模式可以大大减轻编写引导词的工作量

结合open pose或者多次迭代修正，可以实现类似lora的效果

参考模式能够比重绘更轻易的实现2d，2.5d，3d的转换

设置好controlnet进入和退出的时机非常重要，需要观察扩散过程得到经验

参考模式使用Eumler a采样器效果会好于其他设置（可能是幻觉）

当希望prompt的引导权重提高一些，但是又不希望降低参考模式的强度时，提高cfg scale将是一个好的方式

好了，大家都学会了吗？学会了记得点赞关注在看三连哦~

这会鼓励我们加快速度，继续创作下一篇内容~

如果没学会，有问题也可以私。

历史AI绘画文集，请查看AI绘画文章合集

关注我们，更多有用又有趣的AI绘图技能知识持续更新中~

作者:piikee | 分类:游戏攻略 | 浏览:41 | 评论:0

热门文章

最近发表

标签列表

最新留言