Skip to content

Commit

Permalink
Merge pull request ceph#55331 from ceph/revert-55096-sjust/for-review…
Browse files Browse the repository at this point in the history
…/wip-crush-msr

Revert "crush: add multistep retry rules"
  • Loading branch information
ljflores authored Jan 26, 2024
2 parents 37d5d93 + a5ce9c3 commit 702cb64
Show file tree
Hide file tree
Showing 20 changed files with 192 additions and 2,523 deletions.
30 changes: 3 additions & 27 deletions doc/rados/operations/crush-map-edits.rst
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ centers for three-way replication, and yet another rule for erasure coding acros
six storage devices. For a detailed discussion of CRUSH rules, see **Section 3.2**
of `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_.

A normal CRUSH rule takes the following form::
A rule takes the following form::

rule <rulename> {

Expand All @@ -430,18 +430,6 @@ A normal CRUSH rule takes the following form::
step emit
}

CRUSH MSR rules are a distinct type of CRUSH rule which supports retrying steps
and provides better support for configurations that require multiple OSDs within
each failure domain. MSR rules take the following form::

rule <rulename> {

id [a unique integer ID]
type [msr_indep|msr_firsn]
step take <bucket-name> [class <device-class>]
step choosemsr <N> type <bucket-type>
step emit
}

``id``
:Description: A unique integer that identifies the rule.
Expand All @@ -453,14 +441,12 @@ each failure domain. MSR rules take the following form::

``type``
:Description: Denotes the type of replication strategy to be enforced by the
rule. msr_firstn and msr_indep are a distinct descent algorithm
which supports retrying steps within the rule and therefore
multiple OSDs per failure domain.
rule.
:Purpose: A component of the rule mask.
:Type: String
:Required: Yes
:Default: ``replicated``
:Valid Values: ``replicated``, ``erasure``, ``msr_firstn``, ``msr_indep``
:Valid Values: ``replicated`` or ``erasure``


``step take <bucket-name> [class <device-class>]``
Expand Down Expand Up @@ -539,16 +525,6 @@ each failure domain. MSR rules take the following form::
final CRUSH mapping transformation is therefore 1, 2, 3, 4, 5
→ 1, 2, 6, 4, 5.

``step choosemsr {num} type {bucket-type}``
:Description: Selects a num buckets of type bucket-type. msr_firstn and msr_indep
must use choosemsr rather than choose or chooseleaf.

- If ``{num} == 0``, choose ``pool-num-replicas`` buckets (as many buckets as are available).
- If ``pool-num-replicas > {num} > 0``, choose that many buckets.
:Purpose: Choose step required for msr_firstn and msr_indep rules.
:Prerequisite: Follows ``step take`` and precedes ``step emit``
:Example: ``step choosemsr 3 type host``

.. _crush-reclassify:

Migrating from a legacy SSD rule to device classes
Expand Down
22 changes: 0 additions & 22 deletions doc/rados/operations/crush-map.rst
Original file line number Diff line number Diff line change
Expand Up @@ -709,13 +709,6 @@ The relevant erasure-code profile properties are as follows:
[default: ``default``].
* **crush-failure-domain**: the CRUSH bucket type used in the distribution of
erasure-coded shards [default: ``host``].
* **crush-osds-per-failure-domain**: Maximum number of OSDs to place in each
failure domain -- defaults to 1. Using a value greater than one will
cause a CRUSH MSR rule to be created, see below. Must be specified if
crush-num-failure-domains is specified.
* **crush-num-failure-domains**: Number of failure domains to map. Must be
specified if crush-osds-per-failure-domain is specified. Results in
a CRUSH MSR rule being created.
* **crush-device-class**: the device class on which to place data [default:
none, which means that all devices are used].
* **k** and **m** (and, for the ``lrc`` plugin, **l**): these determine the
Expand All @@ -733,21 +726,6 @@ The relevant erasure-code profile properties are as follows:
argument is omitted, then Ceph will create the CRUSH rule automatically.
CRUSH MSR Rules
---------------

Creating an erasure-code profile with a crush-osds-per-failure-domain
value greater than one will cause a CRUSH MSR rule type to be created
instead of a normal CRUSH rule. Normal crush rules cannot retry prior
steps when an out OSD is encountered and rely on CHOOSELEAF steps to
permit moving OSDs to new hosts. However, CHOOSELEAF rules don't
support more than a single OSD per failure domain. MSR rules, new in
squid, support multiple OSDs per failure domain by retrying all prior
steps when an out OSD is encountered. Using MSR rules requires that
OSDs and clients be required to support the CRUSH_MSR feature bit
(squid or newer).


Deleting rules
--------------

Expand Down
4 changes: 1 addition & 3 deletions qa/erasure-code/ec-rados-plugin=jerasure-k=4-m=2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@ tasks:
k: 4
m: 2
technique: reed_sol_van
crush-failure-domain: host
crush-osds-per-failure-domain: 2
crush-num-failure-domains: 3
crush-failure-domain: osd
op_weights:
read: 100
write: 0
Expand Down
2 changes: 1 addition & 1 deletion qa/tasks/mgr/dashboard/test_erasure_code_profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def test_create_plugin(self):
self.assertStatus(201)

self._get('/api/erasure_code_profile/lrc')
self.assertJsonSubset({
self.assertJsonBody({
'crush-device-class': '',
'crush-failure-domain': 'host',
'crush-root': 'default',
Expand Down
68 changes: 4 additions & 64 deletions src/crush/CrushCompiler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -321,13 +321,6 @@ int CrushCompiler::decompile(ostream &out)
if (crush.get_allowed_bucket_algs() != CRUSH_LEGACY_ALLOWED_BUCKET_ALGS)
out << "tunable allowed_bucket_algs " << crush.get_allowed_bucket_algs()
<< "\n";
if (crush.has_nondefault_tunables_msr()) {
out << "tunable msr_descents " << crush.get_msr_descents()
<< "\n";
out << "tunable msr_collision_tries "
<< crush.get_msr_collision_tries()
<< "\n";
}

out << "\n# devices\n";
for (int i=0; i<crush.get_max_devices(); i++) {
Expand Down Expand Up @@ -370,18 +363,12 @@ int CrushCompiler::decompile(ostream &out)
out << "\tid " << i << "\n";

switch (crush.get_rule_type(i)) {
case CRUSH_RULE_TYPE_REPLICATED:
case CEPH_PG_TYPE_REPLICATED:
out << "\ttype replicated\n";
break;
case CRUSH_RULE_TYPE_ERASURE:
case CEPH_PG_TYPE_ERASURE:
out << "\ttype erasure\n";
break;
case CRUSH_RULE_TYPE_MSR_FIRSTN:
out << "\ttype msr_firstn\n";
break;
case CRUSH_RULE_TYPE_MSR_INDEP:
out << "\ttype msr_indep\n";
break;
default:
out << "\ttype " << crush.get_rule_type(i) << "\n";
}
Expand Down Expand Up @@ -435,15 +422,6 @@ int CrushCompiler::decompile(ostream &out)
out << "\tstep set_chooseleaf_stable " << crush.get_rule_arg1(i, j)
<< "\n";
break;
case CRUSH_RULE_SET_MSR_DESCENTS:
out << "\tstep set_msr_descents " << crush.get_rule_arg1(i, j)
<< "\n";
break;
case CRUSH_RULE_SET_MSR_COLLISION_TRIES:
out << "\tstep set_msr_collision_tries "
<< crush.get_rule_arg1(i, j)
<< "\n";
break;
case CRUSH_RULE_CHOOSE_FIRSTN:
out << "\tstep choose firstn "
<< crush.get_rule_arg1(i, j)
Expand Down Expand Up @@ -472,13 +450,6 @@ int CrushCompiler::decompile(ostream &out)
print_type_name(out, crush.get_rule_arg2(i, j), crush);
out << "\n";
break;
case CRUSH_RULE_CHOOSE_MSR:
out << "\tstep choosemsr "
<< crush.get_rule_arg1(i, j)
<< " type ";
print_type_name(out, crush.get_rule_arg2(i, j), crush);
out << "\n";
break;
}
}
out << "}\n";
Expand Down Expand Up @@ -561,10 +532,6 @@ int CrushCompiler::parse_tunable(iter_t const& i)
crush.set_straw_calc_version(val);
else if (name == "allowed_bucket_algs")
crush.set_allowed_bucket_algs(val);
else if (name == "msr_descents")
crush.set_msr_descents(val);
else if (name == "msr_collision_tries")
crush.set_msr_collision_tries(val);
else {
err << "tunable " << name << " not recognized" << std::endl;
return -1;
Expand Down Expand Up @@ -814,13 +781,9 @@ int CrushCompiler::parse_rule(iter_t const& i)
string tname = string_node(i->children[start+2]);
int type;
if (tname == "replicated")
type = CRUSH_RULE_TYPE_REPLICATED;
type = CEPH_PG_TYPE_REPLICATED;
else if (tname == "erasure")
type = CRUSH_RULE_TYPE_ERASURE;
else if (tname == "msr_firstn")
type = CRUSH_RULE_TYPE_MSR_FIRSTN;
else if (tname == "msr_indep")
type = CRUSH_RULE_TYPE_MSR_INDEP;
type = CEPH_PG_TYPE_ERASURE;
else
ceph_abort();

Expand Down Expand Up @@ -942,18 +905,6 @@ int CrushCompiler::parse_rule(iter_t const& i)
crush.set_rule_step_set_chooseleaf_stable(ruleno, step++, val);
}
break;
case crush_grammar::_step_set_msr_descents:
{
int val = int_node(s->children[1]);
crush.set_rule_step_set_msr_descents(ruleno, step++, val);
}
break;
case crush_grammar::_step_set_msr_collision_tries:
{
int val = int_node(s->children[1]);
crush.set_rule_step_set_msr_collision_tries(ruleno, step++, val);
}
break;

case crush_grammar::_step_choose:
case crush_grammar::_step_chooseleaf:
Expand Down Expand Up @@ -981,17 +932,6 @@ int CrushCompiler::parse_rule(iter_t const& i)
}
break;

case crush_grammar::_step_choose_msr:
{
string type = string_node(s->children[3]);
if (!type_id.count(type)) {
err << "in rule '" << rname << "' type '" << type << "' not defined" << std::endl;
return -1;
}
crush.set_rule_step_choose_msr(ruleno, step++, int_node(s->children[1]), type_id[type]);
}
break;

case crush_grammar::_step_emit:
crush.set_rule_step_emit(ruleno, step++);
break;
Expand Down
Loading

0 comments on commit 702cb64

Please sign in to comment.