Skip to content

Commit

Permalink
Update and refactor warming documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
whitfin committed Sep 30, 2024
1 parent bc3e42d commit 7be46a3
Show file tree
Hide file tree
Showing 2 changed files with 128 additions and 83 deletions.
94 changes: 34 additions & 60 deletions docs/warming/proactive-warming.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,10 @@
# Proactive Warming

Introduced alongside Cachex v3, cache warmers act as an eager way to populate a cache. Rather than waiting for a cache miss to retrieve a value, values will be pulled up front to ensure that there is never a miss. This can be viewed as being proactive, whereas `Cachex.fetch/4` can be seen as reactive. As such, this is a better tool for those who know what data will be requested, rather than those dealing with arbitrary data.

Warmers are deliberately easy to create, as anything complicated belongs outside of Cachex itself. A warmer is simply a module which implements the `Cachex.Warmer` behaviour, consisting of just a single callback at the time of writing (please see the `Cachex.Warmer` documentation to verify). A warmer should expose `execute/1` which actually implements the cache warming. The easiest way to explain a warmer is to implement one, so let's implement a warmer which reads from a database via the module `DatabaseWarmer`.
Introduced alongside Cachex v3, cache warmers act as an eager way to populate a cache. Rather than waiting for a cache miss to retrieve a value, values will be pulled up front to ensure that there is never a miss. This can be viewed as being _proactive_, whereas `Cachex.fetch/4` can be seen as _reactive_. As such, this is a better tool for those who know what data will be requested, rather than those dealing with arbitrary data.

## Defining a Warmer

First of all, let's define our warmer on a cache at startup. This is done by passing a list of `warmer()` records inside the `:warmers` option of `Cachex.start_link/2`:

```elixir
# for warmer()
import Cachex.Spec

# define the cache with our warmer
Cachex.start_link(:my_cache, [
warmers: [
warmer(
interval: :timer.seconds(30),
module: MyProject.DatabaseWarmer,
state: connection
)
]
])
```

The fields above are generally the only three fields you'll have to set in a `warmer()` record. The `:module` tag defines the module implementing the `Cachex.Warmer` behaviour, the `:state` field defines the state to be provided to the warmer (used later), and the `:interval` controls the frequency with which the warmer executes (in milliseconds). In previous versions of Cachex the `:interval` option was part of the module behaviour, but this was changed to be more flexible as of Cachex v4.x.

In terms of some of the other options, you may pass a `:name` to use as the warmer's process name (which defaults to the warmer's PID). You can also use the `:required` flag to signal whether it is necessary for a warmer to fully execute before your cache is deemed "available". This defaults to `true` but can easily be set to `false` if you're happy for your data to load asynchronously. The `:required` flag in Cachex v4.x is the same as `async: false` in Cachex v3.x.

With our cache created all that remains is to create the `MyProject.DatabaseWarmer` module, which will implement our warmer behaviour:
To implement this type of warming, Cachex introduced the [Cachex.Warmer](https://hexdocs.pm/cachex/Cachex.Warmer.html) behaviour. This behaviour can be implemented on a module to define the logic you want to run periodically in order to refresh your data from a source. Let's look at defining a very typical proactive warmer, which fetches rows from a database and maps them into a cache table using the `id` field as the cache key:

```elixir
defmodule MyProject.DatabaseWarmer do
Expand Down Expand Up @@ -59,33 +35,34 @@ defmodule MyProject.DatabaseWarmer do
end
```

There are a couple of things going on here. When the `execute/1` callback is fired, we use the stored connection to query the database and map all rows back into the cache table. In case of an error, we use the `:ignore` value to signal that the warmer won't be writing anything to the table.
This simple warmer will ensure that if you look for a row identifier in your cache, it's always going to be readily available (assuming it exists in the database). The format of the result value must be provided as either `{ :ok, pairs }` or `{ :ok, pairs, options }`. These pairs and options should match the same format you'd use when calling `Cachex.put_many/3`.

When formatting results to place into the cache table, you must provide your results in the form of either `{ :ok, pairs }` or `{ :ok, pairs, options }`. These pairs and options should match the same formats that you'd use when calling `Cachex.put_many/3`, so check out the documentation if you need to. In our example above these pairs are simply storing `row.id -> row` in our cache. Not particularly useful, but it'll do for now!

Although simple, this example demonstrates that a single warmer can populate many records in a single pass. This is particularly useful when fetching remote data, instead of using a warmer for every row in a database. This would be sufficiently complicated that you'd likely just roll your own warming instead, and so Cachex tries to negate this aspect by the addition of `put_many/3` in v3.x.

## Example Use Cases

To demonstrate this, we'll use the same examples from the [Reactive Warming](reactive-warming.md) documentation, which is acting as a cache of an API call to `/api/v1/packages` which returns a list of packages. In case of a cache miss, reactive warming will call the API and put it in the cache for future calls. With a warmer we can actually go a lot further for this use case:
To make use of a warmer, a developer needs to assign it within the `:warmers` option during cache startup. This is where we can also control the frequency with which the warmer is run by setting the `:interval` option (which can also be `nil`):

```elixir
# need our records
# for warmer()
import Cachex.Spec

# initialize our cache with a database connection
Cachex.start_link(:my_cache, [
# define the cache with our warmer
Cachex.start_link(:cache, [
warmers: [
warmer(
interval: :timer.minutes(5),
module: MyProject.PackageWarmer,
state: connection
state: connection,
module: MyProject.DatabaseWarmer,
interval: :timer.seconds(30),
required: true
)
]
])
```

And then we define our warmer to do the same thing; pull the packages from the database every 5 minutes. It should be noted that reactive warming runs **at most** every 5 minutes, whereas a proactive warmer will **always** run every 5 minutes with a provided interval.
The `:warmers` option accepts a list of `:warmer` records, which include information about the module, the warmer's state, and various other options. If your cache warmer is necessary for your application, you can flag it as `:required`. This will ensure that your cache supervision tree is not considered "started" until your warmer has run successfully at least once.

## Example Use Cases

To demonstrate this in an application, we'll use the same examples from the [Reactive Warming](reactive-warming.md) documentation, which is acting as a cache of an API call to retrieve a list of packages from a database. In the case of a cache miss, reactive warming would call the database and place the result in the cache for future calls.

With proactive warming, we can go a lot further. As creation of a package is infrequent, we can load the entire list into memory to guarantee we have everything accessible in our cache right from application startup:

```elixir
defmodule MyProject.PackageWarmer do
Expand All @@ -95,11 +72,11 @@ defmodule MyProject.PackageWarmer do
use Cachex.Warmer

@doc """
Executes this cache warmer with a connection.
Executes this cache warmer.
"""
def execute(connection) do
def execute(_) do
# load all of the packages from the database
packages = Database.load_packages(db_conn)
packages = Repo.all(from p in Package)

# create pairs from the API path and the package
package_pairs = Enum.map(packages, fn(package) ->
Expand All @@ -112,32 +89,31 @@ defmodule MyProject.PackageWarmer do
end
```

Using the same amount of database calls, on the same frequency, we have not only populated `"/api/v1/packages"` to return the list of packages, but we have also populated the entire API `"/api/v1/packages/{id}"` to return the single package referenced in the path. This is a much more optimized solution for this type of caching, as you can explode out your key writes with a single cache action, while requiring no extra database requests.

Somewhat obviously these warmers can only be used if you know what types of data you're expecting to be cached. If you're dealing with seeded data (i.e. from a user) you probably can't use warmers, and should be looking at reactive warming instead. You must also consider how relevant the data is that you're caching; if you only care about it for a short period of time, you likely don't want a warmer as they run for the lifetime of the cache.

## Triggered Warming

In some cases you may not wish to use automated interval warming, such as if your data is static and changes rarely or maybe doesn't change at all. For this case Cachex v4.x allows the `:interval` to be set to `nil`, which will only run your warmer a single time on cache startup. It also introduces `Cachex.warm/2` to allow the developer to manually warm a cache and implement their own warming schedules.

When using manual warming your cache definition is much the same as before, with the only change being dropping the `:interval` option from the `warmer()` record:
We then just provide our warmer during initialization of our cache, and define that it needs to be completed prior to startup via the `:required` flag. The `:interval` option is used to specify that it will refresh every 5 minutes:

```elixir
# need our records
import Cachex.Spec

# initialize our cache with a database connection
Cachex.start_link(:my_cache, [
# initialize our cache
Cachex.start_link(:cache, [
warmers: [
warmer(
module: MyProject.PackageWarmer,
state: connection
interval: :timer.minutes(5),
required: true
)
]
])
```

Cachex will run this warmer a single time on cache startup, and will then never run this warmer again without it being explicitly requested. In this case the developer will have to manually trigger the warmer via `Cachex.warm/2`:
As a result of being able to populate many keys at once we have not only populated `"/api/v1/packages"` to return the list of packages, but we have also populated the entire API `"/api/v1/packages/{id}"`. This is a much more optimized solution for this type of caching, as you can explode out your key writes with a single cache action, while requiring no extra database requests.

Somewhat obviously these warmers can only be used if you know what types of data you're expecting to be cached. If you're dealing with seeded data (i.e. from a user) you probably can't use proactive warming, and should be looking at reactive warming instead. You must also consider how relevant the data is that you're caching; if you only care about it for a short period of time, you likely don't want a warmer as they run for the lifetime of the cache.

## Triggered Warming

In addition to having your warmers managed by Cachex, it's now also possible to manually warm a cache. As of Cachex v4.x, the interface now includes `Cachex.warm/2` for this purpose. Calling this function will execute all warmers attached to a cache, or a subset of warmers you select at call time:

```elixir
# warm the cache manually
Expand All @@ -150,6 +126,4 @@ Cachex.warm(:my_cache, wait: true)
Cachex.warm(:my_cache, only: [MyProject.PackageWarmer])
```

To extend the previous example to benefit from this type of warming, imagine that our previous package listing is part of a CRUD API which also includes package creation and deletion. In this scenario you could manually warm your cache after a package is either created or removed, rather than run it every 5 minutes (even if nothing has changed in the meantime!).

It should also be noted that `Cachex.warm/2` is still available even if you _have_ specified the `:interval` option. If you have a high cache interval of something like `:timer.hours(24)` and you want to trigger an earlier warming, you can always `iex` into your node and run a cache warming manually.
This is extremely helpful for things like evented cache invalidation and debugging. The Cachex internal management actually delegates through to this under the hood, meaning that there should be no surprising inconsistencies between managed vs. manual warming. It should be noted that `Cachex.warm/2` can be run either with or without an `:interval` set in your warmer record.
Loading

0 comments on commit 7be46a3

Please sign in to comment.