摘要:Flask
,Celery
Celery简介
Celery是一个专注于实时处理
和任务调度
的分布式任务队列
,使用Celery的常见场景:
- Web应用:当用户触发一个操作需要较长时间才能执行完成,可以把这个任务交给Celery异步执行,这段时间不需要用户等待,提高网站吞吐量和降低响应时间
- 定时任务:Celery可以快速在不同机器设定不同的定时任务
- 需要异步执行任务的其他场景:所有不需要必须同步完成的附加工作都可以异步完成,比如发送短信/邮件,推送消息,清理缓存等
Celery组件
(1)Celery包含的组件如下
- Celery Beat:任务调度器,负责周期性的将需要执行的任务发送给任务队列
- Celery Worker:消费者,负责执行任务,通常部署在多个服务器起多个消费者提升执行效率
- Broker:消息代理(消息中间件),接受生产者发来的任务消息,再分发给消费者
- Producer:生产者,负责调用Celery的API,函数或者装饰器产生任务,发送给消息中间件
- Result Backend:任务处理完后保存的状态信息和结果,支持Redis,MongoDB,RabbitMQ存储等方式
Celery的各组件的工作流程如下
(2)消息代理的选择
Celery推荐使用RabbitMQ,Redis,如果选择Redis则存在断电停机数据丢失的问题
(3)数据序列化
数据在消息队列的传输需要序列化和反序列化,Celery支持json,yaml,msgpack,默认是json,其中msgpack
是一个二进制类似json的序列化方案,比json数据结构更小更快
快速开始:Flask + Celery异步发送邮件
(1)安装准备
安装准备,以redis作为消息代理和结果存储,在安装celery,flask
pip install celery
pip install redis
版本如下
celery --version
5.1.2 (sun-harmonics)
flask --version
Python 3.7.6
Flask 1.1.1
Werkzeug 1.0.0
(2)Flask应用脚本
构建一个Web应用实现给输入的邮箱发送邮件,发送邮件采用Python自带模块smtplib
和email
,前端使用HTML渲染,Flask后台完成POST表单获取邮箱数据发送邮件,分别对比同步执行和使用Celery异步执行的响应的时间
import os
import sys
import traceback
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.header import Header
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from flask import Flask, render_template, request, redirect, url_for, session
from celery import Celery
from settings import Config
app = Flask(__name__)
app.config.from_object(Config)
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0' # 消息代理
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost:6379/0' # 任务结果写入redis
app.config['CELERY_TASK_SERIALIZER'] = 'msgpack' # 序列化方式
app.config['CELERY_ACCEPT_CONTENT'] = ['msgpack', 'json'] # 如果指定了msgpack序列化方式,需要增加msgpack为可接受
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config) # 使用flask app的config
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "GET":
return render_template("index.html")
email = request.form.get("email")
if request.form.get("submit") == "Send":
# send_async_email.delay("直接发送", email)
import time
t = time.time()
# send_async_email("直接发送", email)
send_async_email.delay("直接发送", email)
# TODO somethine
print(time.time() - t)
elif request.form.get("submit") == "Send in 1 minute":
send_async_email.apply_async(["延迟1分分钟发送", email], countdown=60)
return redirect(url_for('index'))
@celery.task(name='main.send_async_email')
def send_async_email(message, to_email):
send_mail(message, to_email)
def send_mail(message, to_email):
conn = None
try:
conn = smtplib.SMTP_SSL(app.config["SMTP_HOST"], app.config["SMTP_PORT"])
conn.login(app.config["FROM_EMAIL_ACCOUNT"], app.config["FROM_EMAIL_PASSWORD"])
msg = MIMEMultipart()
subject = Header('测试邮件', 'utf-8').encode()
msg['Subject'] = subject
msg['From'] = app.config["FROM_EMAIL_ACCOUNT"]
msg['To'] = to_email
text = MIMEText(message, 'plain', 'utf-8')
msg.attach(text)
conn.sendmail(app.config["FROM_EMAIL_ACCOUNT"], to_email, msg.as_string())
except Exception as e:
traceback.print_exc()
finally:
if conn:
conn.quit()
if __name__ == '__main__':
app.run("0.0.0.0", "5010")
脚本主要有2个地方加入和调用了Celery生成的任务
- 创建Celery异步任务:脚本中使用
@celery.task(name='main.send_async_email')
装饰器构建了一个Celery任务,他的作用是把普通的Python函数包装成Celery任务给队列异步执行,其中需要指定name为模块名.函数名,否则Celery会找不到这个任务(没有注册成功) - 执行Celery异步任务:调用了
send_async_email.delay()
方法,如果不加delay就是调用普通的Python函数
下一步启动Flask,最终的Web界面显示效果如下
(3)启动Celery消费者
只有启动了消费者,脚本中需要被传输给中间件的任务才会被执行,下一步启动Celery的消费者,在项目根目录下命令行中输入
root@ubuntu:~/myproject/CELERY_TEST# celery -A celery_test.main:celery worker -l info
/opt/anaconda3/lib/python3.7/site-packages/celery/platforms.py:835: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
uid=uid, euid=euid, gid=gid, egid=egid,
-------------- celery@ubuntu (sun-harmonics)
--- ***** -----
-- ******* ---- Linux-4.15.0-54-generic-x86_64-with-debian-buster-sid 2021-07-22 19:57:25
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: celery_test.main:0x7f6430716690
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: redis://localhost:6379/0
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. main.send_async_email
[2021-07-22 19:57:25,863: WARNING/MainProcess] Please run `celery upgrade settings path/to/settings.py` to avoid these warnings and to allow a smoother upgrade to Celery 6.0.
[2021-07-22 19:57:26,144: INFO/MainProcess] Connected to redis://localhost:6379/0
[2021-07-22 19:57:26,169: INFO/MainProcess] mingle: searching for neighbors
[2021-07-22 19:57:27,192: INFO/MainProcess] mingle: all alone
[2021-07-22 19:57:27,231: INFO/MainProcess] celery@ubuntu ready.
出现最后一行ready表示消费者启动成功,同时可以看到celery配置的相关参数:
-
app
:应用在celery_test.main模块下 -
transport
:消息队列使用redis存储,地址是localhost:6379/0 -
results
:完成结果存储在redis,地址是localhost:6379/0 -
concurrency
:线程4 -
[tasks]
:执行的任务是main模块下的send_async_email方法
Celery可以采用后台运行的方式,并且将日志写入指定目录,方法是使用celery multi
来管理任务的启动和停止等,指定logfile日志目录和pidfile的pid目录,以一个自定义的命名来定义任务名,比如web
root@ubuntu# celery multi start web -A celery_test.main:celery -l info --logfile=logs/celery_%n.log --pidfile=celery_%n.pid
celery multi v5.1.2 (sun-harmonics)
> Starting nodes...
> web@ubuntu: OK(sun-harmonics)
> Starting nodes...
> web@ubuntu: OK
root@ubuntu:~/myproject
root@ubuntu:~/myproject/CELERY_TEST# ls
celery_test celery_web.pid logs __pycache__ settings.py
root@ubuntu:~/myproject/CELERY_TEST/logs# ls
celery_web.log
任务在后台启动,并在在目录下生成了pid文件也log目录
(4)异步任务测试
在界面输入邮箱地址发送邮件,点击send发送,此时是采用delay方式执行函数,采用异步发送邮件,视图函数中使用time进行计时,假设在发送邮件下面还有其他任务,记录这些总共花费的时间
异步方式执行完发送邮件和其他任务总计花费8ms
127.0.0.1 - - [23/Jul/2021 14:32:44] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:32:44] "GET / HTTP/1.1" 200 -
0.009320974349975586
127.0.0.1 - - [23/Jul/2021 14:32:46] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:32:46] "GET / HTTP/1.1" 200 -
0.008226156234741211
127.0.0.1 - - [23/Jul/2021 14:32:48] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:32:48] "GET / HTTP/1.1" 200 -
0.007288694381713867
然后使用原始的Python函数不调用delay(send_async_email("直接发送", email)),需要800ms左右,并且网页端明显感觉有卡顿
0.8022255897521973
127.0.0.1 - - [23/Jul/2021 14:35:00] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:35:00] "GET / HTTP/1.1" 200 -
0.7504754066467285
127.0.0.1 - - [23/Jul/2021 14:35:02] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:35:02] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Jul/2021 14:35:04] "POST / HTTP/1.1" 302 -
127.0.0.1 - - [23/Jul/2021 14:35:04] "GET / HTTP/1.1" 200 -
0.8088760375976562
0.8098959922790527
进一步修改html加入计算机的功能,将执行函数改为直接sleep 10秒,可以在Web界面发现提交之后等10秒才能得到计算结果
@celery.task(name='main.send_async_email')
def send_async_email(message, to_email):
import time
time.sleep(10)
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "GET":
return render_template("index.html")
email = request.form.get("email")
val1 = request.form.get("value1")
val2 = request.form.get("value2")
if request.form.get("submit") == "Send":
# send_async_email.delay("直接发送", email)
import time
t = time.time()
send_async_email("直接发送", email)
# send_async_email.delay("异步发送", email)
value3 = float(val1) + float(val2)
return render_template("index.html", **locals())
(5)观察Redis中存储的队列和处理结果
修改消息队列的broker地址为redis库1,结果处理地址为redis库2,查看redis的数据
root@ubuntu:~# redis-cli
127.0.0.1:6379> select 1;
(error) ERR invalid DB index
127.0.0.1:6379> select 1
OK
127.0.0.1:6379[1]> keys *
1) "_kombu.binding.celery.pidbox"
2) "_kombu.binding.celeryev"
3) "_kombu.binding.celery"
127.0.0.1:6379[1]> select 2
OK
127.0.0.1:6379[2]> keys *
1) "celery-task-meta-4b64592e-bcc5-4dcf-b838-e0960b7c8e4b"
2) "celery-task-meta-80cb2a66-23f0-4e22-80af-7faa39f12094"
127.0.0.1:6379[2]> get celery-task-meta-80cb2a66-23f0-4e22-80af-7faa39f12094
"{\"status\": \"SUCCESS\", \"result\": null, \"traceback\": null, \"children\": [], \"date_done\": \"2021-07-23T07:19:06.224197\", \"task_id\": \"80cb2a66-23f0-4e22-80af-7faa39f12094\"}"
其中kombu是Celery使用kombu 维护消息队列存储的相关数据,celery-task-meta相关的数据是存储结果数据,json类型,包含处理状态,时间,处理结果,任务id等