看到问题以为很简单,马上查找源码
在PGMap.cc中
// TOO_MANY_PGS
auto max_pg_per_osd = cct->_conf.get_val<uint64_t>("mon_max_pg_per_osd");
if (num_in && max_pg_per_osd > 0) {
auto per = sum_pg_up / num_in;
if (per > max_pg_per_osd) {
ostringstream ss;
ss << "too many PGs per OSD (" << per
<< " > max " << max_pg_per_osd << ")";
checks->add("TOO_MANY_PGS", HEALTH_WARN, ss.str(),
per - max_pg_per_osd);
}
}
理所当然看到mon_max_pg_per_osd 这个值啊,我修改了。已经改成了1000
[mon]
mon_max_pg_per_osd = 1000
是不是很奇怪,并不生效。通过config查看
# ceph --show-config |grep mon_max_pg
mon_max_pg_per_osd = 250
还是250.
继续看源码 在options.cc中看到
Option("mon_max_pg_per_osd", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
.set_min(1)
.set_default(250)
.add_service("mgr")
.set_description("Max number of PGs per OSD the cluster will allow")
.set_long_description("If the number of PGs per OSD exceeds this, a "
"health warning will be visible in `ceph status`. This is also used "
"in automated PG management, as the threshold at which some pools' "
"pg_num may be shrunk in order to enable increasing the pg_num of "
"others."),
并且要放到global中
这里表明是由mgr-server接手了,那为何还要mon起头我就有点不明白
重启mgr服务,错误解除,问题解决。
# ceph --show-config |grep mon_max_pg
mon_max_pg_per_osd = 1000