Skip to content

Commit

Permalink
doc: beginnings of documentation of stuck pgs and pg states
Browse files Browse the repository at this point in the history
Signed-off-by: Josh Durgin <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
  • Loading branch information
liewegas committed Feb 27, 2012
1 parent f02195b commit f317028
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 10 deletions.
38 changes: 33 additions & 5 deletions doc/control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,39 @@ Add auth keyring for an osd. ::

Show auth key OSD subsystem.

PG subsystem
------------
::

$ ceph -- pg dump [--format <format>]

Output the stats of all pgs. Valid formats are "plain" and "json",
and plain is the default. ::

$ ceph -- pg dump_stuck inactive|unclean|stale [--format <format>] [-t|--threshold <seconds>]

Output the stats of all PGs stuck in the specified state.

``--format`` may be ``plain`` (default) or ``json``

``--threshold`` defines how many seconds "stuck" is (default: 300)

**Inactive** PGs cannot process reads or writes because they are waiting for an OSD
with the most up-to-date data to come back.

**Unclean** PGs contain objects that are not replicated the desired number
of times. They should be recovering.

**Stale** PGs are in an unknown state - the OSDs that host them have not
reported to the monitor cluster in a while (configured by
mon_osd_report_timeout). ::

$ ceph pg <pgid> mark_unfound_lost revert

Revert "lost" objects to their prior state, either a previous version
or delete them if they were just created. ::


OSD subsystem
-------------
::
Expand Down Expand Up @@ -108,11 +141,6 @@ Create a cluster snapshot. ::

Mark an OSD as lost. This may result in permanent data loss. Use with caution. ::

$ ceph pg <pgid> mark_unfound_lost revert

Revert "lost" objects to their prior state, either a previous version
or delete them if they were just created. ::

$ ceph osd create [<id>]

Create a new OSD. If no ID is given, a new ID is automatically selected
Expand Down
64 changes: 59 additions & 5 deletions doc/dev/placement-group.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,64 @@ consistent hashing; you can think of it as::
result.append(chosen)
return result

User-visible PG States
======================

PG status refreshes only when pg mapping changes
================================================
.. todo:: diagram of states and how they can overlap

*creating*
the PG is still being created

*active*
requests to the PG will be processed

*clean*
all objects in the PG are replicated the correct number of times

*down*
a replica with necessary data is down, so the pg is offline

*replay*
the PG is waiting for clients to replay operations after an OSD crashed

*splitting*
the PG is being split into multiple PGs (not functional as of 2012-02)

*scrubbing*
the PG is being checked for inconsistencies

*degraded*
some objects in the PG are not replicated enough times yet

*inconsistent*
replicas of the PG are not consistent (e.g. objects are
the wrong size, objects are missing from one replica *after* recovery
finished, etc.)

*peering*
the PG is undergoing the :doc:`/dev/peering` process

*repair*
the PG is being checked and any inconsistencies found will be repaired (if possible)

*recovering*
objects are being migrated/synchronized with replicas

*backfill*
a special case of recovery, in which the entire contents of
the PG are scanned and synchronized, instead of inferring what
needs to be transferred from the PG logs of recent operations

*incomplete*
a pg is missing a necessary period of history from its
log. If you see this state, report a bug, and try to start any
failed OSDs that may contain the needed information.

*stale*
the PG is in an unknown state - the monitors have not received
an update for it since the PG mapping changed.

*remapped*
the PG is temporarily mapped to a different set of OSDs from what
CRUSH specified

The pg status currently doesn't get refreshed when the actual pg
mapping doesn't change, and e.g. a pool size change of 2->1 won't do
that. It will refresh if you restart the OSDs, though.

0 comments on commit f317028

Please sign in to comment.