2022-09-23 aws ec2 使用paddlehub部署stable_diffusion模型（text2image）

1.stable_diffusion模型

可以通过文字生成图片，感觉用来生成古诗词插图很不错，比如枯藤老树昏鸦，小桥流水人家

4adb7c3a0142cb77292abe098ffcab6.png

2.aws ec2选型

2.1 镜像选择

2.1.1选择深度学习镜像，默认安装cuda（推荐）

image.png

2.1.2自主安装

如果选择自主安装可以参考nvidia cudatool可安装系统

image.png

自主安装cuda会出现很多问题，比如我用ubuntu安装

//更新
apt-get update
//安装cuda
apt install build-essential
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
//配置环境变量
vim ~/.bashrc
export PATH="/usr/local/cuda-10.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH"
export CUDA_VISIBLE_DEVICES=0
source ~/.bashrc
//验证是否安装成功
nvcc -V
//替换python版本，原先18.04版本的python过老
apt install python37
mv /usr/bin/python37 /usr/bin/python
//pip安装更新
apt install python-pip
pip install --upgrade pip

2.2 实例型号

x86 g4n系列皆可，我这边选最便宜的测试选择g4dn.xlarge

image.png

2.3 硬盘大小

单部署我这边选了45g，单stable_diffusion模型都要10g空间

3启动服务

3.1paddlehub安装

飞桨安装地址

//ec2建议切换成root使用，不然模型会因为权限问题无法安装
sudo passwd root
su root
//查看cuda信息
nvcc -V
//ec2 机器学习镜像是11.2，对应安装paddlepaddle
pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
//安装paddlehub
pip install paddlehub
//下载模型
hub install stable_diffusion
//启动服务，默认端口8866
export CUDA_VISIBLE_DEVICES=0
hub serving start -m stable_diffusion

3.2测试结果：

使用aws linux 深度学习镜像 cuda 11.2版本：
1qps，一个请求处理不完无法处理第二个请求；
cpu部署，一次模型生成20多分钟，整个结果太久没等到；
使用gpu模型部署，单512x512的图生成需要一分多，整个请求大约4分钟（国内访问海外服务器），返回结果大概25mb左右，主要原因是里面包含生成过程gif图；
512x512 xlarge最大同时处理3张图，再多gpu内存不够；
目前测试768x768可以，但是1024x1024会显存不够；

备注：

4.简单优化：

4.1注释多余功能

因为不需要返回生成过程，我们可以直接修改模型的处理逻辑，减少返回结果，加快处理时间
编辑模型代码 vim ~/.paddlehub/modules/stable_diffusion/module.py
编辑后处理单张图约30s,整个请求1分钟（访问海外服务器），返回结果在800kb左右

4.2 提高qps

我想通过启动多个服务来提高qps，目前来看增加机器是稳定可行，但是单机部署多服务没法测试，因为单个服务占用10g的gpu内存，而xlarge单gpu 15g左右，启动第二个服务会报错显存不够

代码：

# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import ast
import os
import sys
from functools import partial
from typing import List
from typing import Optional
import random

import numpy as np
from PIL import Image
from tqdm.auto import tqdm
from docarray import Document
from docarray import DocumentArray
from IPython import display
import paddlehub as hub
import paddle
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving

from stable_diffusion.diffusers import AutoencoderKL, UNet2DConditionModel
from stable_diffusion.clip.clip.utils import build_model, tokenize
from stable_diffusion.diffusers import PNDMScheduler, LMSDiscreteScheduler, DDIMScheduler


@moduleinfo(name="stable_diffusion",
            version="1.0.0",
            type="image/text_to_image",
            summary="",
            author="paddlepaddle",
            author_email="paddle-dev@baidu.com")
class StableDiffusion:
    def __init__(self):
        self.vae = AutoencoderKL(
            in_channels=3,
            out_channels=3,
            down_block_types=("DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D"),
            up_block_types=("UpDecoderBlock2D", "UpDecoderBlock2D", "UpDecoderBlock2D", "UpDecoderBlock2D"),
            block_out_channels=(128, 256, 512, 512),
            layers_per_block=2,
            act_fn="silu",
            latent_channels=4,
            sample_size=512)

        self.unet = UNet2DConditionModel(
            sample_size=64,
            in_channels=4,
            out_channels=4,
            center_input_sample=False,
            flip_sin_to_cos=True,
            freq_shift=0,
            down_block_types=("CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D"),
            up_block_types=("UpBlock2D", "CrossAttnUpBlock2D", "CrossAttnUpBlock2D", "CrossAttnUpBlock2D"),
            block_out_channels=(320, 640, 1280, 1280),
            layers_per_block=2,
            downsample_padding=1,
            mid_block_scale_factor=1,
            act_fn="silu",
            norm_num_groups=32,
            norm_eps=1e-5,
            cross_attention_dim=768,
            attention_head_dim=8)

        vae_path = os.path.join(self.directory, 'pre_trained', 'stable-diffusion-v1-4-vae.pdparams')
        unet_path = os.path.join(self.directory, 'pre_trained', 'stable-diffusion-v1-4-unet.pdparams')
        self.unet.set_dict(paddle.load(unet_path))
        self.vae.set_dict(paddle.load(vae_path))
        for parameter in self.unet.parameters():
            parameter.stop_gradient = True
        self.vae.eval()
        for parameter in self.vae.parameters():
            parameter.stop_gradient = True
        self.unet.eval()

        self.text_encoder = build_model()
        for parameter in self.text_encoder.parameters():
            parameter.stop_gradient = True
        self.scheduler = PNDMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear",
                                       num_train_timesteps=1000, skip_prk_steps=True)

    def generate_image(
            self,
            text_prompts,
            style: Optional[str] = None,
            artist: Optional[str] = None,
            width_height: Optional[List[int]] = [512, 512],
            batch_size: Optional[int] = 1,
            num_inference_steps=50,
            guidance_scale=7.5,
            enable_fp16=False,
            seed=None,
            display_rate=5,
            use_gpu=True,
            output_dir: Optional[str] = 'stable_diffusion_out'):
        """
        Create Disco Diffusion artworks and save the result into a DocumentArray.

        :param text_prompts: Phrase, sentence, or string of words and phrases describing what the image should look like.  The words will be analyzed by the AI and will guide the diffusion process toward the image(s) you describe. These can include commas and weights to adjust the relative importance of each element.  E.g. "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."Notice that this prompt loosely follows a structure: [subject], [prepositional details], [setting], [meta modifiers and artist]; this is a good starting point for your experiments. Developing text prompts takes practice and experience, and is not the subject of this guide.  If you are a beginner to writing text prompts, a good place to start is on a simple AI art app like Night Cafe, starry ai or WOMBO prior to using DD, to get a feel for how text gets translated into images by GAN tools.  These other apps use different technologies, but many of the same principles apply.
        :param style: Image style, such as oil paintings, if specified, style will be used to construct prompts.
        :param artist: Artist style, if specified, style will be used to construct prompts.
        :param width_height: Desired final image size, in pixels. You can have a square, wide, or tall image, but each edge length should be set to a multiple of 64px, and a minimum of 512px on the default CLIP model setting.  If you forget to use multiples of 64px in your dimensions, DD will adjust the dimensions of your image to make it so.
        :param batch_size: This variable sets the number of still images you want SD to create for each prompt.
        :param num_inference_steps: The number of inference steps.
        :param guidance_scale: Increase the adherence to the conditional signal which in this case is text as well as overall sample quality.
        :param enable_fp16: Whether to use float16.
        :param use_gpu: whether to use gpu or not.
        :param output_dir: Output directory.
        :return: a DocumentArray object that has `n_batches` Documents
        """
        if seed:
            np.random.seed(seed)
            random.seed(seed)
            paddle.seed(seed)

        if use_gpu:
            try:
                _places = os.environ.get("CUDA_VISIBLE_DEVICES", None)
                if _places:
                    paddle.device.set_device("gpu:{}".format(0))
            except:
                raise RuntimeError(
                    "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
                )
        else:
            paddle.device.set_device("cpu")
        paddle.disable_static()

        if not os.path.exists(output_dir):
            os.makedirs(output_dir, exist_ok=True)

        if isinstance(text_prompts, str):
            text_prompts = text_prompts.rstrip(',.，。')
            if style is not None:
                text_prompts += ",{}".format(style)
            if artist is not None:
                text_prompts += ",{},trending on artstation".format(artist)
            text_prompts = [text_prompts]
        elif isinstance(text_prompts, list):
            for i, prompt in enumerate(
                    text_prompts):  # different from dd here, dd can have multiple prompts for one image with weight.
                text_prompts[i] = prompt.rstrip(',.，。')
                if style is not None:
                    text_prompts[i] += ",{}".format(style)
                if artist is not None:
                    text_prompts[i] += ",{},trending on artstation".format(artist)

        width, height = width_height
        da_batches = DocumentArray()

        for prompt in text_prompts:
            d = Document(tags={'prompt': prompt})
            da_batches.append(d)
            # for i in range(batch_size):
            #    d.chunks.append(Document(tags={'prompt':prompt, 'image idx': i}))
            # d.chunks.append(Document(tags={'prompt':prompt, 'image idx': 'merged'}))
            with paddle.amp.auto_cast(enable=enable_fp16, level='O2'):
                prompts = [prompt] * batch_size
                text_input = tokenize(prompts)
                text_embeddings = self.text_encoder(text_input)
                uncond_input = tokenize([""] * batch_size)
                uncond_embeddings = self.text_encoder(uncond_input)
                text_embeddings = paddle.concat([uncond_embeddings, text_embeddings])

                latents = paddle.randn(
                    (batch_size, self.unet.in_channels, height // 8, width // 8),
                )
                if isinstance(self.scheduler, LMSDiscreteScheduler):
                    latents = latents * self.scheduler.sigmas[0]

                self.scheduler.set_timesteps(num_inference_steps)
                for i, t in tqdm(enumerate(self.scheduler.timesteps)):
                    # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
                    latent_model_input = paddle.concat([latents] * 2)

                    if isinstance(self.scheduler, LMSDiscreteScheduler):
                        sigma = self.scheduler.sigmas[i]
                        latent_model_input = latent_model_input / ((sigma ** 2 + 1) ** 0.5)

                    # predict the noise residual
                    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings)["sample"]

                    # perform guidance
                    noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                    noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

                    # compute the previous noisy sample x_t -> x_t-1
                    if isinstance(self.scheduler, LMSDiscreteScheduler):
                        latents = self.scheduler.step(noise_pred, i, latents)["prev_sample"]
                    else:
                        latents = self.scheduler.step(noise_pred, t, latents)["prev_sample"]
                    if i % display_rate == 0:
                        # vae decode
                        images = self.vae.decode(1 / 0.18215 * latents)
                        images = (images / 2 + 0.5).clip(0, 1)
                        merge_image = images.cpu().transpose([2, 0, 3, 1]).flatten(1, 2).numpy()
                        merge_image = (merge_image * 255).round().astype(np.uint8)
                        merge_image = Image.fromarray(merge_image)
                        # merge_image.save(os.path.join(output_dir,
                        #                            f'{prompt}-progress.png'))
                        # c = Document(tags={'step': i, 'prompt': prompt})
                        # c.load_pil_image_to_datauri(merge_image)
                        # d.chunks[-1].chunks.append(c)
                        display.clear_output(wait=True)
                        display.display(merge_image)
                        # images = images.cpu().transpose([0, 2, 3, 1]).numpy()
                        # images = (images * 255).round().astype(np.uint8)
                        # for j in range(images.shape[0]):
                        # image = Image.fromarray(images[j])
                        # c = Document(tags={'step': i, 'prompt': prompt})
                        # c.load_pil_image_to_datauri(image)
                        # d.chunks[j].chunks.append(c)

                # vae decode
                images = self.vae.decode(1 / 0.18215 * latents)
                images = (images / 2 + 0.5).clip(0, 1)
                merge_image = images.cpu().transpose([2, 0, 3, 1]).flatten(1, 2).numpy()
                merge_image = (merge_image * 255).round().astype(np.uint8)
                merge_image = Image.fromarray(merge_image)
                # merge_image.save(os.path.join(output_dir,
                #                                    f'{prompt}-merge.png'))
                display.clear_output(wait=True)
                display.display(merge_image)
                d.load_pil_image_to_datauri(merge_image)
                # d.chunks[-1].load_pil_image_to_datauri(merge_image)
                # images = images.cpu().transpose([0, 2, 3, 1]).numpy()
                # images = (images * 255).round().astype(np.uint8)
                # for j in range(images.shape[0]):
                # image = Image.fromarray(images[j])
                # image.save(os.path.join(output_dir,
                #                                f'{prompt}-image-{j}.png'))
                # d.chunks[j].load_pil_image_to_datauri(image)
        return da_batches

    @serving
    def serving_method(self, text_prompts, **kwargs):
        """
        Run as a service.
        """
        results = self.generate_image(text_prompts=text_prompts, **kwargs).to_base64()
        return results

    @runnable
    def run_cmd(self, argvs):
        """
        Run as a command.
        """
        self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
                                              prog='hub run {}'.format(self.name),
                                              usage='%(prog)s',
                                              add_help=True)
        self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
        self.arg_config_group = self.parser.add_argument_group(
            title="Config options", description="Run configuration for controlling module behavior, not required.")
        self.add_module_config_arg()
        self.add_module_input_arg()
        args = self.parser.parse_args(argvs)
        results = self.generate_image(text_prompts=args.text_prompts,
                                      style=args.style,
                                      artist=args.artist,
                                      width_height=args.width_height,
                                      batch_size=args.batch_size,
                                      num_inference_steps=args.num_inference_steps,
                                      guidance_scale=args.guidance_scale,
                                      enable_fp16=args.enable_fp16,
                                      seed=args.seed,
                                      display_rate=args.display_rate,
                                      use_gpu=args.use_gpu,
                                      output_dir=args.output_dir)
        return results

    def add_module_config_arg(self):
        """
        Add the command config options.
        """

        self.arg_input_group.add_argument(
            '--num_inference_steps',
            type=int,
            default=50,
            help=
            "The number of inference steps."
        )

        self.arg_input_group.add_argument(
            '--guidance_scale',
            type=float,
            default=7.5,
            help=
            "Increase the adherence to the conditional signal which in this case is text as well as overall sample quality."
        )

        self.arg_input_group.add_argument(
            '--seed',
            type=int,
            default=None,
            help=
            "Deep in the diffusion code, there is a random number ‘seed’ which is used as the basis for determining the initial state of the diffusion.  By default, this is random, but you can also specify your own seed."
        )

        self.arg_input_group.add_argument(
            '--display_rate',
            type=int,
            default=10,
            help=
            "During a diffusion run, you can monitor the progress of each image being created with this variable."
        )

        self.arg_config_group.add_argument('--use_gpu',
                                           type=ast.literal_eval,
                                           default=True,
                                           help="whether use GPU or not")

        self.arg_config_group.add_argument('--enable_fp16',
                                           type=ast.literal_eval,
                                           default=False,
                                           help="whether use float16 or not")

        self.arg_config_group.add_argument('--output_dir',
                                           type=str,
                                           default='stable_diffusion_out',
                                           help='Output directory.')

    def add_module_input_arg(self):
        """
        Add the command input options.
        """
        self.arg_input_group.add_argument(
            '--text_prompts',
            type=str,
            help=
            'Phrase, sentence, or string of words and phrases describing what the image should look like.  The words will be analyzed by the AI and will guide the diffusion process toward the image(s) you describe. These can include commas and weights to adjust the relative importance of each element.  E.g. "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."Notice that this prompt loosely follows a structure: [subject], [prepositional details], [setting], [meta modifiers and artist]; this is a good starting point for your experiments. Developing text prompts takes practice and experience, and is not the subject of this guide.  If you are a beginner to writing text prompts, a good place to start is on a simple AI art app like Night Cafe, starry ai or WOMBO prior to using DD, to get a feel for how text gets translated into images by GAN tools.  These other apps use different technologies, but many of the same principles apply.'
        )
        self.arg_input_group.add_argument(
            '--style',
            type=str,
            default=None,
            help='Image style, such as oil paintings, if specified, style will be used to construct prompts.'
        )
        self.arg_input_group.add_argument(
            '--artist',
            type=str,
            default=None,
            help='Artist style, if specified, style will be used to construct prompts.'
        )

        self.arg_input_group.add_argument(
            '--width_height',
            type=ast.literal_eval,
            default=[512, 512],
            help=
            "Desired final image size, in pixels. You can have a square, wide, or tall image, but each edge length should be set to a multiple of 64px, and a minimum of 512px on the default CLIP model setting.  If you forget to use multiples of 64px in your dimensions, DD will adjust the dimensions of your image to make it so."
        )
        self.arg_input_group.add_argument(
            '--batch_size',
            type=int,
            default=1,
            help=
            "This variable sets the number of still images you want SD to create for each prompt."
        )

使用ubuntu安装paddlehub出现问题：
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
解决方案：
pip install PyYAML --ignore-installed

最后编辑于：2022.10.08 18:09:56

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 220,295评论 6赞 512
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,928评论 3赞 396
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 166,682评论 0赞 357
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 59,209评论 1赞 295
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,237评论 6赞 397
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,965评论 1赞 308
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,586评论 3赞 420
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,487评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 46,016评论 1赞 319
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,136评论 3赞 340
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,271评论 1赞 352
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,948评论 5赞 347
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,619评论 3赞 331
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,139评论 0赞 23
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,252评论 1赞 272
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,598评论 3赞 375
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,267评论 2赞 358

2022-09-23 aws ec2 使用paddlehub部署stable_diffusion模型（text2image）

1.stable_diffusion模型

2.aws ec2选型

2.1 镜像选择

2.1.1选择深度学习镜像，默认安装cuda（推荐）

2.1.2自主安装

2.2 实例型号

2.3 硬盘大小

3启动服务

3.1paddlehub安装

3.2测试结果：

4.简单优化：

4.1注释多余功能

4.2 提高qps

代码：

推荐阅读更多精彩内容