why
保证服务在高压的时候不会出现异常
what
多线程 重复地对服务的接口进行调用,去看服务的表现:
- tps (transaction peer second)服务每秒可以处理的请求个数
- tpr (time peer request)每条请求的处理时间 平均值 99.9%的情况 99%的情况
- errorrate 错误率
how
开始压之前
首先,我们要有一个并发测试的工具
- 定时执行
- 压力逐渐增加
都是通过修改config.xml来实现
<?xml version="1.0" encoding="UTF-8"?>
<tasks>
<task>
<suitename>ApiPayTest</suitename>
<methodname>queryUserByOutId</methodname>
<israndom>true</israndom>
<isrampup>true</isrampup>
<interval>100</interval>
<poolcount>5</poolcount>
<threadcount>10</threadcount>
<sampletype>bouncetime</sampletype>
<client_timeout>3000</client_timeout>
<sleeptime>100</sleeptime>
<value>false</value>
<startTime>2014/10/15/10/35</startTime>
<stopTime>2014/10/15/10/50</stopTime>
</task>
</tasks>
</pre>
然后,我们写测试用例
- 继承Worktest,重写working方法
- 准备工作写到构造函数中
- 返回多余时间
<pre><code>package com.xiaomi.cashpay.works;
import java.util.UUID;
public class Notifytest extends Worktest {
private static IdManager.Iface idManagerService;
public Notifytest() {
init();
}
private void init() {
notifyService = ThriftClientFactory.createClient(Notify.Iface.class,
1000);
this.idManagerService = ThriftClientFactory.createClient(
IdManager.Iface.class, 1000);
receiver = "http://10.101.30.185/cashpay/notify1.php";
}
@Override
public String working() throws TException {
long s1 = System.currentTimeMillis();
try {
notifyid = this.logic_Merchant_UniqueId();
} catch (TException e) {
e.printStackTrace();
System.out.println("idmanager TException" + e.getMessage());
}
long s2 = System.currentTimeMillis();
long extratime = (s2 - s1);
SendResponse res = notifyService.send(req);
return res.getResEnum().toString() + " " + notifyid+ "~" + extratime ;
}
public static void main(String[] argv) {
Notifytest test = new Notifytest();
try {
System.out.println(test.working());
} catch (TException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
</code></pre>
其实,我们还需要一个压测环境
- zookeeper
- resin
- db
终于开始压了
看服务的性能
- top
- vmstat
- collectl
看数据库的性能
压完之后
用脚本static.py分析我们的log
<pre><code>import os
fp=open('Pthread-info.log','r+')
fpnew=open('b.txt','a+')
count=0
linenumber = 0
first=True
processtime =[]
for line in fp:
linenumber = linenumber+1
s = line.split()
if True == first:
minutetemp=s[1][0:16]
first=False
count = count+1
processtime.append(int(float(s[4])))
elif minutetemp==s[1][0:16]:
processtime.append(int(float(s[4])))
count = count+1
else:
processtime.sort()
pos_eighty =int(len(processtime)*0.8)
pos_ninty = int(len(processtime)*0.9)
pos_doublenine = int(len(processtime)*0.99)
sum(processtime)
fpnew.write(minutetemp+" "+str(int(count/60.0))+" "+str(sum(processtime)/len(processtime))+" "+str(processtime[pos_eighty])+" "+str(processtime[pos_ninty])+" "+str(processtime[pos_doublenine])+'\n')
minutetemp=s[1][0:16]
count = 1
processtime=[]
processtime.append(int(float(s[4])))
processtime.sort()
pos_eighty =int(len(processtime)0.8)
pos_ninty = int(len(processtime)0.9)
pos_doublenine = int(len(processtime)*0.99)
fpnew.write(minutetemp+" "+str(int(count/60.0))+" "+str(sum(processtime)/len(processtime))+" "+str(processtime[pos_eighty])+" "+str(processtime[pos_ninty])+" "+str(processtime[pos_doublenine])+'\n')
print(linenumber)
</code></pre>
看服务本身的log
awk
分割
默认为空格
-F ''
修改为需要的分割符
自定义变量
awk '{begin{count=0}}'grep
-E
或sort -r
unique
通用的几个脚本:
分析服务处理请求的平均时间,最大值,最小值
<pre><code>ls cashpay-core.log.2014121522* cashpay-core.log.2014121523* cashpay-core.log.2014121600* cashpay-core.log.2014121601* cashpay-core.log.2014121602* | xargs zgrep "INFO" | grep queryAccountByUserId | awk 'BEGIN {max=-1;print ""; print "count,total time, average, max, min"}; {gsub(":","",$3); gsub("\.","",$3); gsub("]","",$3); if (!s[$5]) {s[$5]=$3; e[$5]=$3} else {e[$5]=$3; t=(e[$5]-s[$5]); if (t < 40000 && t >0) { count++; sum+=t; if (max==-1) {max = t; min =t; } if (t>max) {max =t;} if ( t<min) min=t } }}; END {if (count >0) {print count"\t"sum"\t"sum/count"\t"max"\t"min}}'
</code></pre>
找到服务的具体处理时间和这次调用对应的logid
<pre><code>zgrep "100%] - createCharge" cashpay-core.log.2014121720 | awk -F'[' '{print $2}' | awk -F 'ms,' '{print$1 $2}'|awk '{print$1" " $4}'|awk -F':' '{print $1" "$2}'|awk '{print$1" " $3}'|awk -F',' '{if(NF>1)print $0}'|sort</code>
</pre>
分析Pthread-error.log中的connection-refused错误
<pre><code>
grep -B 12 "HttpHostConnectException" Pthread-error.log.2014121615|awk '{if (NR%14==1){ printf$0} else if(NR%14==13){print" "$0}}'|awk '{print$2" "$3" " $10}'|awk -F ']' '{print$1$2}'|awk -F'.' '{print $1}'|awk -F ':' '{print $1":"$2}'|awk '{print$2}'|awk 'begin{count=0;minute="start"} {if($1!=minute){minute=$1; print $0" " count;count=0}else{ count++;}}'
</code></pre>