节点主动重启维护
- 准备: 节点必须为 health: HEALTH_OK 状态,操作如下:
sudo ceph -s
sudo ceph osd set noout
sudo ceph osd set norebalance
- 重启一个节点:
sudo reboot
- 重启完成后检查节点状态,pgs: active+clean 为正常状态:
sudo ceph -s
正常状态,继续重启另一个节点
所有节点轮流重启后,检查状态正常 active+clean 后,如下设置:
sudo ceph -s
sudo ceph osd unset noout
sudo ceph osd unset norebalance
调整 pg_num 和 pgp_num
- ceph -s 确保集群状态健康
- pg_num 只能调大,不能调小
- 每次按照 2 的 N 次方来调整
- 线上有数据的情况下,平滑调整,不要一次调的太猛
- 先调 pg_num 无问题后,再调 pgp_num
批量调整所有的 pg_num
n=32
for poolname in poolname pg_num $n ;
done
调整完,检查状态
ceph -w
批量调整所有的 pgp_num
n=32
for poolname in $(rados lspools); do
ceph osd pool set $poolname pgp_num $n ;
done
删除默认 pool,增加其他命名的 pool
- data
- metadata
- rbd
ceph osd pool create rbdpool 8
ceph osd pool set rbdpool pg_num 32
ceph osd pool set rbdpool pgp_num 32
把已存在的集群的配置收集到 ceph-deploy
mkdir -p cluster1
cd cluster1
ceph-deploy config pull HOST
ceph-deploy gatherkeys HOST
所有的 node 增加一块硬盘 /dev/xvde
ceph osd status
node1=host1
node2=host2
node3=host3
disk="/dev/xvde"
ceph-deploy --overwrite-conf osd create --data $disk $node1
ceph-deploy --overwrite-conf osd create --data $disk $node2
ceph-deploy --overwrite-conf osd create --data $disk $node3
提示 pg 太小
ceph -s
health: HEALTH_WARN
1 pools have many more objects per pg than average
ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)
pool=cn-south-1.rgw.buckets.data
ceph osd pool get $pool pg_num
ceph osd pool set $pool pg_num 32
ceph osd pool set $pool pgp_num 32