diff --git a/doc/rados/operations/cache-tiering.rst b/doc/rados/operations/cache-tiering.rst new file mode 100644 index 0000000000000..e733c7e29a4e1 --- /dev/null +++ b/doc/rados/operations/cache-tiering.rst @@ -0,0 +1,363 @@ +=============== + Cache Tiering +=============== + +A cache tier provides Ceph Clients with better I/O performance for a subset of +the data stored in a backing storage tier. Cache tiering involves creating a +pool of relatively fast/expensive storage devices (e.g., solid state drives) +configured to act as a cache tier, and a backing pool of either erasure-coded +or relatively slower/cheaper devices configured to act as an economical storage +tier. The Ceph objecter handles where to place the objects and the tiering +agent determines when to flush objects from the cache to the backing storage +tier. So the cache tier and the backing storage tier are completely transparent +to Ceph clients. + + +.. ditaa:: + +-------------+ + | Ceph Client | + +------+------+ + ^ + Tiering is | + Transparent | Faster I/O + to Ceph | +---------------+ + Client Ops | | | + | +----->+ Cache Tier | + | | | | + | | +-----+---+-----+ + | | | ^ + v v | | Active Data in Cache Tier + +------+----+--+ | | + | Objecter | | | + +-----------+--+ | | + ^ | | Inactive Data in Storage Tier + | v | + | +-----+---+-----+ + | | | + +----->| Storage Tier | + | | + +---------------+ + Slower I/O + + +The cache tiering agent handles the migration of data between the cache tier +and the backing storage tier automatically. However, admins have the ability to +configure how this migration takes place. There are two main scenarios: + +- **Writeback Mode:** When admins configure tiers with ``writeback`` mode, Ceph + clients write data to the cache tier and receive an ACK from the cache tier. + In time, the data written to the cache tier migrates to the storage tier + and gets flushed from the cache tier. Conceptually, the cache tier is + overlaid "in front" of the backing storage tier. When a Ceph client needs + data that resides in the storage tier, the cache tiering agent migrates the + data to the cache tier on read, then it is sent to the Ceph client. + Thereafter, the Ceph client can perform I/O using the cache tier, until the + data becomes inactive. This is ideal for mutable data (e.g., photo/video + editing, transactional data, etc.). + +- **Read-only Mode:** When admins configure tiers with ``readonly`` mode, Ceph + clients write data to the backing tier. On read, Ceph copies the requested + object(s) from the backing tier to the cache tier. Stale objects get removed + from the cache tier based on the defined policy. This approach is ideal + for immutable data (e.g., presenting pictures/videos on a social network, + DNA data, X-Ray imaging, etc.), because reading data from a cache pool that + might contain out-of-date data provides weak consistency. Do not use + ``readonly`` mode for mutable data. + +Since all Ceph clients can use cache tiering, it has the potential to +improve I/O performance for Ceph Block Devices, Ceph Object Storage, +the Ceph Filesystem and native bindings. + + +Setting Up Pools +================ + +To set up cache tiering, you must have two pools. One will act as the +backing storage and the other will act as the cache. + + +Setting Up a Backing Storage Pool +--------------------------------- + +Setting up a backing storage pool typically involves one of two scenarios: + +- **Standard Storage**: In this scenario, the pool stores multiple copies + of an object in the Ceph Storage Cluster. + +- **Erasure Coding:** In this scenario, the pool uses erasure coding to + store data much more efficiently with a small performance tradeoff. + +In the standard storage scenario, you can setup a CRUSH ruleset to establish +the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD +Daemons perform optimally when all storage drives in the ruleset are of the +same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_ +for details on creating a ruleset. Once you have created a ruleset, create +a backing storage pool. + +In the erasure coding scenario, the pool creation arguments will generate the +appropriate ruleset automatically. See `Create a Pool`_ for details. + +In subsequent examples, we will refer to the backing storage pool +as ``cold-storage``. + + +Setting Up a Cache Pool +----------------------- + +Setting up a cache pool follows the same procedure as the standard storage +scenario, but with this difference: the drives for the cache tier are typically +high performance drives that reside in their own servers and have their own +ruleset. When setting up a ruleset, it should take account of the hosts that +have the high performance drives while omitting the hosts that don't. See +`Placing Different Pools on Different OSDs`_ for details. + + +In subsequent examples, we will refer to the cache pool as ``hot-storage``. + + +Creating a Cache Tier +===================== + +Setting up a cache tier involves naming a backing storage pool, +a cache pool and the cache mode. :: + + ceph osd tier add {storagepool} {cachepool} [writeback|readonly] + +To set up a writeback (read/write) cache pool ``hot-storage`` for pool +``cold-storage``, execute the following:: + + ceph osd tier add cold-storage hot-storage [cache-mode] + +You can specify the cache mode when creating a tier. To set the cache mode +in a subsequent step, execute the following:: + + ceph osd tier cache-mode {cachepool} {cache-mode} + +For example:: + + ceph osd tier cache-mode hot-storage writeback + +Writeback cache tiers overlay the backing storage tier, so they require one +additional step: you must direct all client traffic from the storage pool to +the cache pool. To direct client traffic directly to the cache pool, execute +the following:: + + ceph osd tier set-overlay {storagepool} {cachepool} + +For example:: + + ceph osd tier set-overlay cold-storage hot-storage + + +Configuring a Cache Tier +======================== + +Cache tiers have several configuration options. You may set +cache tier configuration options with the following usage:: + + ceph osd pool set {cachepool} {key} {value} + +See `Pools - Set Pool Values`_ for details. + + +Target Size and Type +-------------------- + +Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``:: + + ceph osd pool set {cachepool} hit_set_type bloom + +For example:: + + ceph osd pool set hot-storage hit_set_type bloom + +The ``hit_set_count`` and ``hit_set_period`` define how much time each HitSet +should cover, and how many such HitSets to store. Currently there is minimal +benefit for ``hit_set_count`` > 1 since the agent does not yet act intelligently +on that information. :: + + ceph osd pool set {cachepool} hit_set_count 1 + ceph osd pool set {cachepool} hit_set_period 3600 + ceph osd pool set {cachepool} target_max_bytes 1000000000000 + +Binning accesses over time allows Ceph to determine whether a Ceph client +accessed an object at least once, or more than once over a time period +("age" vs "temperature"). + +.. note:: The longer the period and the higher the count, the more RAM the + ``ceph-osd`` daemon consumes. In particular, when the agent is active to + flush or evict cache objects, all ``hit_set_count`` HitSets are loaded + into RAM. + + +Cache Sizing +------------ + +The cache tiering agent performs two main functions: + +- **Flushing:** The agent identifies modified (or dirty) objects and forwards + them to the storage pool for long-term storage. + +- **Evicting:** The agent identifies objects that haven't been modified + (or clean) and evicts the least recently used among them from the cache. + + +Relative Sizing +~~~~~~~~~~~~~~~ + +The cache tiering agent can flush or evict objects relative to the size of the +cache pool. When the cache pool consists of a certain percentage of +modified (or dirty) objects, the cache tiering agent will flush them to the +storage pool. To set the ``cache_target_dirty_ratio``, execute the following:: + + ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0} + +For example, setting the value to ``0.4`` will begin flushing modified +(dirty) objects when they reach 40% of the cache pool's capacity:: + + ceph osd pool set hot-storage cache_target_dirty_ratio 0.4 + +When the cache pool reaches a certain percentage of its capacity, the cache +tiering agent will evict objects to maintain free capacity. To set the +``cache_target_full_ratio``, execute the following:: + + ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0} + +For example, setting the value to ``0.8`` will begin flushing unmodified +(clean) objects when they reach 80% of the cache pool's capacity:: + + ceph osd pool set hot-storage cache_target_full_ratio 0.8 + + +Absolute Sizing +~~~~~~~~~~~~~~~ + +The cache tiering agent can flush or evict objects based upon the total number +of bytes or the total number of objects. To specify a maximum number of bytes, +execute the following:: + + ceph osd pool set {cachepool} target_max_bytes {#bytes} + +For example, to flush or evict at 1 TB, execute the following:: + + ceph osd pool hot-storage target_max_bytes 1000000000000 + + +To specify the maximum number of objects, execute the following:: + + ceph osd pool set {cachepool} target_max_objects {#objects} + +For example, to flush or evict at 1M objects, execute the following:: + + ceph osd pool set hot-storage target_max_objects 1000000 + +.. note:: If you specify both limits, the cache tiering agent will + begin flushing or evicting when either threshold is triggered. + + +Cache Age +--------- + +You can specify the minimum age of an object before the cache tiering agent +flushes a recently modified (or dirty) object to the backing storage pool:: + + ceph osd pool set {cachepool} cache_min_flush_age {#seconds} + +For example, to flush modified (or dirty) objects after 10 minutes, execute +the following:: + + ceph osd pool set hot-storage cache_min_flush_age 600 + +You can specify the minimum age of an object before it will be evicted from +the cache tier:: + + ceph osd pool {cache-tier} cache_min_evict_age {#seconds} + +For example, to evict objects after 30 minutes, execute the following:: + + ceph osd pool set hot-storage cache_min_evict_age 1800 + + + +Removing a Cache Tier +===================== + +Removing a cache tier differs depending on whether it is a writeback +cache or a read-only cache. + + +Removing a Read-Only Cache +-------------------------- + +Since a read-only cache does not have modified data, you can disable +and remove it without losing any recent changes to objects in the cache. + +#. Change the cache-mode to ``none`` to disable it. :: + + ceph osd tier cache-mode {cachepool} none + + For example:: + + ceph osd tier cache-mode hot-storage none + +#. Remove the cache pool from the backing pool. :: + + ceph osd tier remove {storagepool} {cachepool} + + For example:: + + ceph osd tier remove cold-storage hot-storage + + + +Removing a Writeback Cache +-------------------------- + +Since a writeback cache may have modified data, you must take steps to ensure +that you do not lose any recent changes to objects in the cache before you +disable and remove it. + + +#. Change the cache mode to ``forward`` so that new and modified objects will + flush to the backing storage pool. :: + + ceph osd tier cache-mode {cachepool} forward + + For example:: + + ceph osd tier cache-mode hot-storage forward + + +#. Ensure that the cache pool has been flushed. This may take a few minutes:: + + rados -p {cachepool} ls + + If the cache pool still has objects, you can flush them manually. + For example:: + + rados -p {cachepool} cache-flush-evict-all + + +#. Remove the overlay so that clients will not direct traffic to the cache. :: + + ceph osd tier remove-overlay {storagetier} + + For example:: + + ceph osd tier remove-overlay cold-storage + + +#. Finally, remove the cache tier pool from the backing storage pool. :: + + ceph osd tier remove {storagepool} {cachepool} + + For example:: + + ceph osd tier remove cold-storage hot-storage + + +.. _Create a Pool: ../pools#create-a-pool +.. _Pools - Set Pool Values: ../pools#set-pool-values +.. _Placing Different Pools on Different OSDs: ../crush-map/#placing-different-pools-on-different-osds +.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter +.. _CRUSH Maps: ../crush-map \ No newline at end of file diff --git a/doc/rados/operations/index.rst b/doc/rados/operations/index.rst index cd08625b8c3e8..508e88f420aed 100644 --- a/doc/rados/operations/index.rst +++ b/doc/rados/operations/index.rst @@ -33,6 +33,7 @@ CRUSH algorithm. data-placement pools + cache-tiering placement-groups crush-map diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst index b5e1e646f8792..43e811a37b54f 100644 --- a/doc/rados/operations/pools.rst +++ b/doc/rados/operations/pools.rst @@ -240,6 +240,7 @@ To remove a snapshot of a pool, execute:: .. _setpoolvalues: + Set Pool Values =============== @@ -251,25 +252,34 @@ You may set values for the following keys: ``size`` -:Description: Sets the number of replicas for objects in the pool. See `Set the Number of Object Replicas`_ for further details. Replicated pools only. +:Description: Sets the number of replicas for objects in the pool. + See `Set the Number of Object Replicas`_ for further details. + Replicated pools only. + :Type: Integer ``min_size`` -:Description: Sets the minimum number of replicas required for io. See `Set the Number of Object Replicas`_ for further details. Replicated pools only. -:Type: Integer +:Description: Sets the minimum number of replicas required for I/O. + See `Set the Number of Object Replicas`_ for further details. + Replicated pools only. -.. note:: Version ``0.54`` and above +:Type: Integer +:Version: ``0.54`` and above ``crash_replay_interval`` -:Description: The number of seconds to allow clients to replay acknowledged, but uncommitted requests. +:Description: The number of seconds to allow clients to replay acknowledged, + but uncommitted requests. + :Type: Integer ``pgp_num`` -:Description: The effective number of placement groups to use when calculating data placement. +:Description: The effective number of placement groups to use when calculating + data placement. + :Type: Integer :Valid Range: Equal to or less than ``pg_num``. @@ -279,14 +289,108 @@ You may set values for the following keys: :Description: The ruleset to use for mapping object placement in the cluster. :Type: Integer + ``hashpspool`` :Description: Set/Unset HASHPSPOOL flag on a given pool. :Type: Integer :Valid Range: 1 sets flag, 0 unsets flag +:Version: Version ``0.48`` Argonaut and above. -.. note:: Version ``0.48`` Argonaut and above. +``hit_set_type`` + +:Description: Enables hit set tracking for cache pools. + See `Bloom Filter`_ for additional information. + +:Type: String +:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object`` +:Default: ``bloom``. Other values are for testing. + +``hit_set_count`` + +:Description: The number of hit sets to store for cache pools. The higher + the number, the more RAM consumed by the ``ceph-osd`` daemon. + +:Type: Integer +:Valid Range: ``1``. Agent doesn't handle > 1 yet. + + +``hit_set_period`` + +:Description: The duration of a hit set period in seconds for cache pools. + The higher the number, the more RAM consumed by the + ``ceph-osd`` daemon. + +:Type: Integer +:Example: ``3600`` 1hr + + +``hit_set_fpp`` + +:Description: The false positive probability for the ``bloom`` hit set type. + See `Bloom Filter`_ for additional information. + +:Type: Double +:Valid Range: 0.0 - 1.0 +:Default: ``0.05`` + + +``cache_target_dirty_ratio`` + +:Description: The percentage of the cache pool containing modified (dirty) + objects before the cache tiering agent will flush them to the + backing storage pool. + +:Type: Double +:Default: ``.4`` + + +``cache_target_full_ratio`` + +:Description: The percentage of the cache pool containing unmodified (clean) + objects before the cache tiering agent will evict them from the + cache pool. + +:Type: Double +:Default: ``.8`` + + +``target_max_bytes`` + +:Description: Ceph will begin flushing or evicting objects when the + ``max_bytes`` threshold is triggered. + +:Type: Integer +:Example: ``1000000000000`` #1-TB + + +``target_max_objects`` + +:Description: Ceph will begin flushing or evicting objects when the + ``max_objects`` threshold is triggered. + +:Type: Integer +:Example: ``1000000`` #1M objects + + +``cache_min_flush_age`` + +:Description: The time (in seconds) before the cache tiering agent will flush + an object from the cache pool to the storage pool. + +:Type: Integer +:Example: ``600`` 10min + + +``cache_min_evict_age`` + +:Description: The time (in seconds) before the cache tiering agent will evict + an object from the cache pool. + +:Type: Integer +:Example: ``1800`` 30min + Get Pool Values @@ -350,3 +454,4 @@ a size of 2). .. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref +.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter \ No newline at end of file