From f317028f42c658b2fafea80a332196e86ca87c84 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Mon, 27 Feb 2012 15:41:57 -0800
Subject: [PATCH] doc: beginnings of documentation of stuck pgs and pg states

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
---
 doc/control.rst             | 38 +++++++++++++++++++---
 doc/dev/placement-group.rst | 64 ++++++++++++++++++++++++++++++++++---
 2 files changed, 92 insertions(+), 10 deletions(-)
diff --git a/doc/control.rst b/doc/control.rst
index 8020333a806d0..a0863fd11f151 100644
--- a/doc/control.rst
+++ b/doc/control.rst
@@ -54,6 +54,39 @@ Add auth keyring for an osd.  ::
 
 Show auth key OSD subsystem.
 
+PG subsystem
+------------
+::
+
+	$ ceph -- pg dump [--format <format>]
+
+Output the stats of all pgs. Valid formats are "plain" and "json",
+and plain is the default. ::
+
+	$ ceph -- pg dump_stuck inactive|unclean|stale [--format <format>] [-t|--threshold <seconds>]
+
+Output the stats of all PGs stuck in the specified state.
+
+``--format`` may be ``plain`` (default) or ``json``
+
+``--threshold`` defines how many seconds "stuck" is (default: 300)
+
+**Inactive** PGs cannot process reads or writes because they are waiting for an OSD
+with the most up-to-date data to come back.
+
+**Unclean** PGs contain objects that are not replicated the desired number
+of times. They should be recovering.
+
+**Stale** PGs are in an unknown state - the OSDs that host them have not
+reported to the monitor cluster in a while (configured by
+mon_osd_report_timeout). ::
+
+	$ ceph pg <pgid> mark_unfound_lost revert
+
+Revert "lost" objects to their prior state, either a previous version
+or delete them if they were just created. ::
+
+
 OSD subsystem
 -------------
 ::
@@ -108,11 +141,6 @@ Create a cluster snapshot. ::
 
 Mark an OSD as lost. This may result in permanent data loss. Use with caution. ::
 
-        $ ceph pg <pgid> mark_unfound_lost revert
-
-Revert "lost" objects to their prior state, either a previous version
-or delete them if they were just created. ::
-
 	$ ceph osd create [<id>]
 
 Create a new OSD. If no ID is given, a new ID is automatically selected
diff --git a/doc/dev/placement-group.rst b/doc/dev/placement-group.rst
index 5755277bcc7a3..a5abb2b5755c9 100644
--- a/doc/dev/placement-group.rst
+++ b/doc/dev/placement-group.rst
@@ -81,10 +81,64 @@ consistent hashing; you can think of it as::
 	       result.append(chosen)
 	   return result
 
+User-visible PG States
+======================
 
-PG status refreshes only when pg mapping changes
-================================================
+.. todo:: diagram of states and how they can overlap
+
+*creating*
+  the PG is still being created
+
+*active*
+  requests to the PG will be processed
+
+*clean*
+  all objects in the PG are replicated the correct number of times
+
+*down*
+  a replica with necessary data is down, so the pg is offline
+
+*replay*
+  the PG is waiting for clients to replay operations after an OSD crashed
+
+*splitting*
+  the PG is being split into multiple PGs (not functional as of 2012-02)
+
+*scrubbing*
+  the PG is being checked for inconsistencies
+
+*degraded*
+  some objects in the PG are not replicated enough times yet
+
+*inconsistent*
+  replicas of the PG are not consistent (e.g. objects are
+  the wrong size, objects are missing from one replica *after* recovery
+  finished, etc.)
+
+*peering*
+  the PG is undergoing the :doc:`/dev/peering` process
+
+*repair*
+  the PG is being checked and any inconsistencies found will be repaired (if possible)
+
+*recovering*
+  objects are being migrated/synchronized with replicas
+
+*backfill*
+  a special case of recovery, in which the entire contents of
+  the PG are scanned and synchronized, instead of inferring what
+  needs to be transferred from the PG logs of recent operations
+
+*incomplete*
+  a pg is missing a necessary period of history from its
+  log.  If you see this state, report a bug, and try to start any
+  failed OSDs that may contain the needed information.
+
+*stale*
+  the PG is in an unknown state - the monitors have not received
+  an update for it since the PG mapping changed.
+
+*remapped*
+  the PG is temporarily mapped to a different set of OSDs from what
+  CRUSH specified
 
-The pg status currently doesn't get refreshed when the actual pg
-mapping doesn't change, and e.g. a pool size change of 2->1 won't do
-that. It will refresh if you restart the OSDs, though.