性能测试怎么做? 什么性能测试工具最称手?仁者见仁,智者见智。
性能测试的目的是为了充分了解系统及服务
- 所能承受的最大容量是多少,
- 有无性能瓶颈?
- 如果有性能瓶颈,瓶颈在哪里?
最重要的有三点:
1)响应时间
2)吞吐量
3)成功率
1. 协议层面的考量
以最常用的 HTTP 协议, 我们要看如下要点:
- 响应码(Response code)
响应码代表了 HTTP REST API 的响应成功与否, 其中5xx 的响应码是需要重点注意, 密切观察产线上出现的错误
- 响应时间(ResponseTime)
响应时间是衡量 REST API 性能的重要指标, 基于它我们可以知道微服务可以在多长时间内响应, 通常我们需要知道最大值max, 平均值average 和P99 (百分之99的请求的响应时间) 值
- 请求次数(Request volume)
请求次数也就是请求数量的多少, 绝对数量意义不大, 单位时间内的请求数量更有意义.
- HTTP 请求的频率(Request frequency)
常用度量指标有QPS (Query Per Second) 或 TPS (Transaction Per Second)
- 应用程序性能指标 (APDEX)
假设响应时间在T秒之内是令人满意的,在 F 秒之外是令人沮丧的
- 1) 满意的 satistifed
这代表响应时间小于设定的阈值(T秒),用户感觉满意。
- 2) 可容忍的 tolerating
这代表响应时间大于T秒 并小于F秒,性能不佳但是还可以继续使用,用户感觉仍可容忍。
- 3)失望的 Frustrated
这代表响应时间超过F秒,用户难以接受,放弃继续使用,用户感觉失望。
其它的协议有各自类似的度量指标,比如 SIP 就有
- SRD (Session Request Delay) 会话请求延迟
- SDD (Session Disconnect Delay) 会话断开延迟
- SDT (Session Duration Time) 会话持续时间
- SER (Session Establishment Ratio) 会话建立比率
- SEER (Session Establishment Effectiveness Ratio) 会话建立有效率
- ISAs (Ineffective Session Attempts) 无效会话尝试数
- SCR (Session Completion Ratio) 会话完成率
2. 系统层面的考量
当系统的压力越来越大,相关的资源能否能满足要求?
我们需要考察如下指标:
CPU 利用率
内存 利用率
磁盘剩余空间
磁盘 I/O 率
网络 I/O 率
JVM 虚拟机统计
3. 测试怎么做
以一个简单的微服务为例
import os
import json
import requests
import redis
from flask_httpauth import HTTPBasicAuth
from flask import make_response
from flask import Flask
from flask import request
from werkzeug.exceptions import NotFound, ServiceUnavailable
from flask import render_template
ACCOUNTS_API_PATH = "/api/v1/accounts"
REDIS_KEY = "walter_accounts"
app = Flask(__name__)
current_path = os.path.dirname(os.path.realpath(__file__))
auth = HTTPBasicAuth()
users = {
"walter": "pass1234"
}
json_file = "{}/account.json".format(current_path)
redis_enabled = True
#docker run --restart always -p 6379:6379 -d --name local-redis redis
class RedisClient:
def __init__(self):
self.redis_host = "localhost"
self.redis_port = 6379
self.redis_password = ''
self.redis_conn = None
def connect(self):
#if(redis_enabled):
pool = redis.ConnectionPool(host=self.redis_host, port=self.redis_port)
self.redis_conn = redis.Redis(connection_pool=pool)
def set(self, key, value):
self.redis_conn.set(key, value)
def get(self, key):
return self.redis_conn.get(key)
redis_client = RedisClient()
if(redis_enabled):
redis_client.connect()
def read_data():
if redis_enabled:
jsonStr = redis_client.get(REDIS_KEY)
if not jsonStr:
jsonStr = "{}"
return json.loads(jsonStr)
else:
json_fp = open(json_file, "r")
return json.load(json_fp)
def save_data(accounts):
if redis_enabled:
redis_client.set(REDIS_KEY, json.dumps(accounts))
else:
json_fp = open(json_file, "w")
json.dump(accounts, json_fp, sort_keys=True, indent=4)
@auth.get_password
def get_pw(username):
if username in users:
return users.get(username)
return None
def generate_response(arg, response_code=200):
response = make_response(json.dumps(arg, sort_keys=True, indent=4))
response.headers['Content-type'] = "application/json"
response.status_code = response_code
return response
@app.route('/')
def index():
return render_template('index.html')
@auth.login_required
@app.route(ACCOUNTS_API_PATH, methods=['GET'])
def list_account():
accounts = read_data()
return generate_response(accounts)
# Create account
@auth.login_required
@app.route(ACCOUNTS_API_PATH, methods=['POST'])
def create_account():
account = request.json
sitename = account["siteName"]
accounts = read_data()
if sitename in accounts:
return generate_response({"error": "conflict"}, 409)
accounts[sitename] = account
save_data(accounts)
return generate_response(account)
# Retrieve account
@auth.login_required
@app.route(ACCOUNTS_API_PATH + '/<sitename>', methods=['GET'])
def retrieve_account(sitename):
accounts = read_data()
if sitename not in accounts:
return generate_response({"error": "not found"}, 404)
return generate_response(accounts[sitename])
# Update account
@auth.login_required
@app.route(ACCOUNTS_API_PATH + '/<sitename>', methods=['PUT'])
def update_account(sitename):
accounts = read_data()
if sitename not in accounts:
return generate_response({"error": "not found"}, 404)
account = request.json
print(account)
accounts[sitename] = account
save_data(accounts)
return generate_response(account)
# Delete account
@auth.login_required
@app.route(ACCOUNTS_API_PATH + '/<sitename>', methods=['DELETE'])
def delete_account(sitename):
accounts = read_data()
if sitename not in accounts:
return generate_response({"error": "not found"}, 404)
del (accounts[sitename])
save_data(accounts)
return generate_response("", 204)
if __name__ == "__main__":
app.run(port=5000, debug=True)
'''
http --auth walter:pass --json POST http://localhost:5000/api/v1/accounts \
userName=walter password=pass siteName=163 siteUrl=http://163.com
'''
准备环境
先安装 libev 和 python3
brew install libev
brew install python3
所需类库在 requirements.txt 描述如下
flask
flask-httpauth
requests
httpie
redis
locust
再安装 virtualenv 和 所需要的类库
virtualenv pip3 install virtualenv
virtualenv -p python3 venv
source venv/bin/activate
# then install the required libraries
pip install -r requirements.txt
启动
python account.py
可使用用 httpie (参见 https://httpie.org/) 来做一个简单的测试
# 添加网易帐号
http --auth walter:pass --json POST http://localhost:5000/api/v1/accounts userName=walter password=pass siteName=163 siteUrl=http://163.com
HTTP/1.0 200 OK
Content-Length: 108
Content-type: application/json
Date: Thu, 24 Oct 2019 14:08:05 GMT
Server: Werkzeug/0.12.2 Python/3.7.3
{
"password": "pass",
"siteName": "163",
"siteUrl": "http://163.com",
"userName": "walter"
}
# 添加微博帐号
http --auth walter:pass --json POST http://localhost:5000/api/v1/accounts userName=walter password=pass siteName=weibo siteUrl=http://weibo.com
HTTP/1.0 200 OK
Content-Length: 108
Content-type: application/json
Date: Thu, 24 Oct 2019 14:08:05 GMT
Server: Werkzeug/0.12.2 Python/3.7.3
{
"password": "pass",
"siteName": "weibo",
"siteUrl": "http://weibo.com",
"userName": "walter"
}
# 获取所有帐号
http --auth walter:pass --json GET http://localhost:5000/api/v1/accounts
HTTP/1.0 200 OK
Content-Length: 290
Content-type: application/json
Date: Thu, 24 Oct 2019 14:20:54 GMT
Server: Werkzeug/0.12.2 Python/3.7.3
{
"163": {
"password": "pass",
"siteName": "163",
"siteUrl": "http://163.com",
"userName": "walter"
},
"weibo": {
"password": "pass",
"siteName": "weibo",
"siteUrl": "http://weibo.com",
"userName": "walter"
}
}
性能测试
传统的性能测试工具有很多, 比如 ab, jmeter, loadrunner 等等,用来持续增加压力。
以 ab(Apache Benchmark) 为例, 我们测试100个并发量,10000条请求
$ab -c 100 -n 10000 http://127.0.0.1:5000/api/v1/accounts
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: Werkzeug/0.12.2
Server Hostname: 127.0.0.1
Server Port: 5000
Document Path: /api/v1/accounts
Document Length: 290 bytes
Concurrency Level: 100
Time taken for tests: 30.550 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 4370000 bytes
HTML transferred: 2900000 bytes
Requests per second: 327.33 [#/sec] (mean)
Time per request: 305.504 [ms] (mean)
Time per request: 3.055 [ms] (mean, across all concurrent requests)
Transfer rate: 139.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 2.5 1 144
Processing: 5 303 78.7 288 587
Waiting: 3 303 78.7 288 587
Total: 15 304 78.8 289 588
Percentage of the requests served within a certain time (ms)
50% 289
66% 347
75% 362
80% 372
90% 405
95% 435
98% 458
99% 571
100% 588 (longest request)
ab 用来做简单的 HTTP 接口测试还行,如果需要做业务接口的串联测试就力有未逮了。
Jmeter 当然功能强大,也有一定的扩展性,但是在这里我们并不想用 jmeter , 原因有两点
1)Jmeter是资源消耗黑洞, 每个任务/用户都要使用一个线程。
2)Jmeter是基于配置的,而Locust是基于编程来实现的性能测试工具,它可以实现更加灵活的控制。
Locust 即英文蝗虫之意, 是一款开源的性能测试工具,开始上手试一下
它要先写一个脚本文件 locust.py
from locust import HttpLocust, TaskSet, task, seq_task
import load_test_util
import json
import yaml
from queue import Queue
from threading import Timer
logger = load_test_util.init_logger("account-load-test")
token_refresh_time = 300
class UserBehavior(TaskSet):
def on_start(self):
logger.info("on_start")
self.auth_headers = load_test_util.getAuthHeaders()
self.account_queue = Queue()
def on_stop(self):
logger.info("on_stop, clear queue")
def list_account(self):
self.client.get("/api/v1/accounts")
@seq_task(1)
def create_account(self):
post_dict = load_test_util.create_account_request()
post_data = json.dumps(post_dict)
logger.info("auth_headers: %s", json.dumps(self.auth_headers))
logger.info("post_data: %s", post_data)
response = self.client.post("/api/v1/accounts", headers = self.auth_headers, data=post_data)
logger.info("response: %d, %s", response.status_code, response.text)
if (200 <= response.status_code < 300):
siteName = post_dict['siteName']
logger.info("siteName: %s" % siteName)
self.account_queue.put(siteName)
return response
@seq_task(2)
def retrieve_account(self):
if not self.account_queue.empty():
siteName = self.account_queue.get(True, 1)
logger.info("retrieve_account by siteName %s", siteName)
response = self.client.get("/api/v1/accounts/" + siteName, headers=self.auth_headers, name="/api/v1/accounts/siteName")
logger.info("retrieve_account's response: %d, %s", response.status_code, response.text)
self.account_queue.put(siteName)
@seq_task(3)
def update_account(self):
if not self.account_queue.empty():
siteName = self.account_queue.get(True, 1)
post_dict = load_test_util.create_account_request()
put_data = json.dumps(post_dict)
response = self.client.put("/api/v1/accounts/"+ siteName, headers = self.auth_headers, data=put_data, name="/api/v1/accounts/siteName")
logger.info("response: %d, %s", response.status_code, response.text)
self.account_queue.put(siteName)
@seq_task(4)
def delete_account(self):
if not self.account_queue.empty():
siteName = self.account_queue.get(True, 1)
response = self.client.delete("/api/v1/accounts/" + siteName, headers = self.auth_headers, name="/api/v1/accounts/siteName")
logger.info("response: %d, %s", response.status_code, response.text)
class WebsiteUser(HttpLocust):
task_set = UserBehavior
min_wait = 500
max_wait = 3000
启动 locust 作性能测试
$ locust -f account_load_test.py --host=http://localhost:5000
Starting web monitor at *:8089
Starting Locust 0.11.0
参见下图
可以用如下方法在命令行下实行梯度加压
locust -f account_load_test.py --host=http://localhost:5000 --no-web c 100 -r 100 -t 30m --csv=100.csv
locust -f account_load_test.py --host=http://localhost:5000 --no-web c 200 -r 200 -t 30m --csv=200.csv
locust -f account_load_test.py --host=http://localhost:5000 --no-web c 400 -r 400 -t 30m --csv=400.csv
locust -f account_load_test.py --host=http://localhost:5000 --no-web c 800 -r 800 -t 30m --csv=800.csv
当一台机器产生的压力不够时,我们可以使用多台服务器来加压,如下图所示, 一台 master server, 若干台 slave server
- 先启动 master locust
locust -f account_load_test.py --master --host=http://localhost:5000
- 再启动 slave locust
locust -f account_load_test.py --slave --master-host:10.224.77.11
通过 locust 所生成的测试报告,可导出 csv 文件进行详细分析,再结合微服务的各种度量数据进行分析:
- 系统级度量分析 CPU, Mem, Disk I/O, Network I/O, etc.
- 应用级度量分析:响应时间,响应码,吞吐量等等
- 业务级度量分析:与特定的业务流程相关的度量数据分析。
之后就是通过C/C++, Python,Java等语言各自的 profiler 工具进行性能分析,如 gprof, cProfiler, hprof, VisualVM, JMC 等等。
以我比较熟悉的 Java 来说, 在服务器上启动 Java service 时在命令行中加入如下参数
java -jar -Dcom.sun.management.jmxremote.host=10.224.112.73 \
-Dcom.sun.management.jmxremote.port=9091 \
-Dcom.sun.management.jmxremote.rmi.port=9012 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false potato-server.jar
然后在本机上启动JDK 自带的 JVisualVm,添加远程节点 10.224.112.73, 并创建到 10.224.112.73:9091的 JMX 连接
观察 CPU, Mem, Class, Thread
线程
并可针对 CPU 或 Mem 进行一段时间的采样