Skip to content

Commit

Permalink
mon: warn if crush has non-optimal tunables
Browse files Browse the repository at this point in the history
Allow warning to be disabled via ceph.conf.  Link to the docs from the
warning detail.  Add a section to the docs specifically about what to do
about the warning.

Signed-off-by: Sage Weil <[email protected]>
  • Loading branch information
Sage Weil committed Dec 18, 2013
1 parent d129e09 commit d0f14df
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 0 deletions.
33 changes: 33 additions & 0 deletions doc/rados/operations/crush-map.rst
Original file line number Diff line number Diff line change
Expand Up @@ -922,6 +922,39 @@ Which client versions support CRUSH_TUNABLES2
* v0.55 or later, including bobtail series (v0.56.x)
* Linux kernel version v3.9 or later (for the file system and RBD kernel clients)

Warning when tunables are non-optimal
-------------------------------------

Starting with version v0.74, Ceph will issue a health warning if the
CRUSH tunables are not set to their optimal values (the optimal values are
the default as of v0.73). To make this warning go away, you have two options:

1. Adjust the tunables on the existing cluster. Note that this will
result in some data movement (possibly as much as 10%). This is the
preferred route, but should be taken with care on a production cluster
where the data movement may affect performance. You can enable optimal
tunables with::

ceph osd crush tunables optimal

If things go poorly (e.g., too much load) and not very much
progress has been made, or there is a client compatibility problem
(old kernel cephfs or rbd clients, or pre-bobtail librados
clients), you can switch back with::

ceph osd crush tunables legacy

2. You can make the warning go away without making any changes to CRUSH by
adding the following option to your ceph.conf ``[mon]`` section::

mon warn on legacy crush tunables = false

For the change to take effect, you will need to restart the monitors, or
apply the option to running monitors with::

ceph -- tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables


A few important points
----------------------

Expand Down
1 change: 1 addition & 0 deletions src/common/config_opts.h
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ OPTION(mon_globalid_prealloc, OPT_INT, 100) // how many globalids to prealloc
OPTION(mon_osd_report_timeout, OPT_INT, 900) // grace period before declaring unresponsive OSDs dead
OPTION(mon_force_standby_active, OPT_BOOL, true) // should mons force standby-replay mds to be active
OPTION(mon_warn_on_old_mons, OPT_BOOL, true) // should mons set health to WARN if part of quorum is old?
OPTION(mon_warn_on_legacy_crush_tunables, OPT_BOOL, true) // warn if crush tunables are not optimal
OPTION(mon_min_osdmap_epochs, OPT_INT, 500)
OPTION(mon_max_pgmap_epochs, OPT_INT, 500)
OPTION(mon_max_log_epochs, OPT_INT, 500)
Expand Down
8 changes: 8 additions & 0 deletions src/crush/CrushWrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,14 @@ class CrushWrapper {
}
bool has_v2_rules() const;

bool has_optimal_tunables() const {
return
crush->choose_local_tries == 0 &&
crush->choose_local_fallback_tries == 0 &&
crush->choose_total_tries == 50 &&
crush->chooseleaf_descend_once == 1;
}

// bucket types
int get_num_type_names() const {
return type_map.size();
Expand Down
13 changes: 13 additions & 0 deletions src/mon/OSDMonitor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1990,6 +1990,19 @@ void OSDMonitor::get_health(list<pair<health_status_t,string> >& summary,
detail->push_back(make_pair(HEALTH_WARN, ss.str()));
}

// old crush tunables?
if (g_conf->mon_warn_on_legacy_crush_tunables) {
if (!osdmap.crush->has_optimal_tunables()) {
ostringstream ss;
ss << "crush map has non-optimal tunables";
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
if (detail) {
ss << "; see http://ceph.com/docs/master/rados/operations/crush-map/#tunables";
detail->push_back(make_pair(HEALTH_WARN, ss.str()));
}
}
}

get_pools_health(summary, detail);
}
}
Expand Down

0 comments on commit d0f14df

Please sign in to comment.