|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Elixir's new continuable enumerators |
| 4 | +author: Peter Minten |
| 5 | +category: "What's New in Elixir" |
| 6 | +excerpt: In 0.12.0 Elixir's enumerators have gained the ability to suspend value |
| 7 | + production and to terminate early. |
| 8 | +--- |
| 9 | + |
| 10 | +As you may have heard in the upcoming 0.12.0 release Elixir's enumerators gained |
| 11 | +some new features. In this blog post I'll explain what's new, what it enables |
| 12 | +and how it works. |
| 13 | + |
| 14 | +For those of you who use the development version of Elixir these changes are |
| 15 | +already available. For the exact differences in code you can look at the |
| 16 | +[relevant pull request](https://github.com/elixir-lang/elixir/pull/1922). |
| 17 | + |
| 18 | +## A recap of enumerators, and some terminology |
| 19 | + |
| 20 | +The basic idea of enumerators is that you traverse some data structure or |
| 21 | +resource (lines from a file) by putting the thing that is traversed in control. |
| 22 | +That is if you're reading from a file you have a loop that reads lines from a |
| 23 | +file and for each line calls a function. Just calling a function isn't all that |
| 24 | +useful for most tasks as there'd be no way to remember previous lines (ugly |
| 25 | +hacks aside), so some accumulator value is passed to the function and a new |
| 26 | +accumulator is returned by it. |
| 27 | + |
| 28 | +For example here's how you can count the total length of strings in a list. |
| 29 | + |
| 30 | +```elixir |
| 31 | +Enumerable.reduce(l, 0, fn x, acc -> String.length(x) + acc end) |
| 32 | +``` |
| 33 | + |
| 34 | +Often the actual call to `Enumerable.reduce/3` is hidden inside another |
| 35 | +function. Say that we want to define a `sum` function. The usual way is to |
| 36 | +write it like this: |
| 37 | + |
| 38 | +```elixir |
| 39 | +def sum(coll) do |
| 40 | + Enumerable.reduce(coll, 0, fn x, acc -> x + acc end) |
| 41 | +end |
| 42 | +``` |
| 43 | + |
| 44 | +This could get called as `Enum.map(1..10, &(&1 * &1)) |> sum()` to get the sum of |
| 45 | +squares. Desugaring this means `sum(Enum.map(1..10, &(&1 * &1)))`. |
| 46 | + |
| 47 | +The general pattern is this: |
| 48 | + |
| 49 | +```elixir |
| 50 | +def outer_function(coll, ...) do |
| 51 | + ... |
| 52 | + Enumerable.reduce(coll, initial_consumer_acc, consumer) |
| 53 | + ... |
| 54 | +end |
| 55 | + |
| 56 | +something_that_returns_an_enumerable(...) |> outer_function(...) |
| 57 | +``` |
| 58 | + |
| 59 | +You'll notice the slightly uncommon terminology of "outer function" and |
| 60 | +"consumer" (normally called an "iteratee"). That's intentional, naming an |
| 61 | +iteratee a consumer better reflects that it consumes values. |
| 62 | + |
| 63 | +Along the same lines I call the reduce function for a specific enumerable a |
| 64 | +producer, it produces values which are given to a consumer. |
| 65 | + |
| 66 | +The outer function is the function to which the enumerable is passed. |
| 67 | +Syntactically it looks like this is the consumer, but it's really a function |
| 68 | +that combines the producer and the consumer. For simple consumers (say `fn x, |
| 69 | +acc -> length(x) + acc end`) the consumer will often be written directly in the |
| 70 | +source text of the outer function, but let's try to keep those concepts |
| 71 | +distinguished. |
| 72 | + |
| 73 | +## Two issues with classic Elixir enumerators |
| 74 | + |
| 75 | +Enumerators are great, but they have their limitations. One issue is that it's |
| 76 | +not possible to define a function that only returns at most 3 elements without |
| 77 | +traversing all elements or using ugly tricks such as `throw` (with a |
| 78 | +`try...catch` construct in the outer function). The `throw` trick is used in |
| 79 | +`Enum` and `Stream` to implement functions such as `Enum.take/2` and |
| 80 | +`Stream.take_while/2`. It works, but it's not what I'd call stylish. |
| 81 | + |
| 82 | +A bigger problem, that doesn't have a workaround, is that there's no way to |
| 83 | +interleave two enumerables. That is, it's not possible to define a function that |
| 84 | +for two enumerables `A` and `B` returns a list `[A1, B1, A2, B2, A3, ...]` |
| 85 | +(where `A1` is the first element of A) without first traversing both lists and |
| 86 | +then interleaving the collected values. Interleaving is important because it's |
| 87 | +the basis of a zip function. Without interleaving you cannot implement |
| 88 | +`Stream.zip/2`. |
| 89 | + |
| 90 | +The underlying problem, in both cases, is that the producer is fully in control. |
| 91 | +The producer simply pushes out as many elements to the consumer as it wants and |
| 92 | +then says "I'm done". There's no way aside from `throw`/`raise` for a consumer |
| 93 | +to tell a producer "stop producing". There is definitely no way to tell a |
| 94 | +producer "stop for now but be prepared to continue where you left off later". |
| 95 | + |
| 96 | +## Power to the consumer! |
| 97 | + |
| 98 | +At CodeMeshIO José Valim and Jessica Kerr sat down and discussed this problem. |
| 99 | +They came up with a solution inspired by a [Monad.Reader |
| 100 | +article](http://themonadreader.files.wordpress.com/2010/05/issue16.pdf) (third |
| 101 | +article). It's an elegant extension of the old system, based on a simple idea. |
| 102 | +Instead of returning only an accumulator at every step (for every produced |
| 103 | +value) the consumer returns a combination of an accumulator and an instruction |
| 104 | +to the producer. Three instructions are available: |
| 105 | + |
| 106 | +* `:cont` - Keep producing. |
| 107 | +* `:halt` - Stop producing. |
| 108 | +* `:suspend` - Temporarily stop producing. |
| 109 | + |
| 110 | +A consumer that always returns `:cont` makes the producer behave exactly the |
| 111 | +same as in the old system. A consumer may return `:halt` to have the producer |
| 112 | +terminate earlier than it normally would. |
| 113 | + |
| 114 | +The real magic is in `:suspend` though. It tells a producer to return the |
| 115 | +accumulator and a continuation function. |
| 116 | + |
| 117 | +```elixir |
| 118 | +{ :suspended, n_, cont } = Enumerable.reduce(1..5, { :cont, 0 }, fn x, n -> |
| 119 | + if x == 3 do |
| 120 | + { :suspend, n } |
| 121 | + else |
| 122 | + { :cont, n + x } |
| 123 | + end |
| 124 | +end) |
| 125 | +``` |
| 126 | + |
| 127 | +After running this code `n_` will be `3` (1 + 2) and `cont` will be a |
| 128 | +function. We'll get back to `cont` in a minute but first take a look at some of |
| 129 | +the new elements here. The initial accumulator has an instruction as well, so |
| 130 | +you could suspend or halt a producer immediately, if you really want to. The |
| 131 | +value passed to the consumer (`n`) does not contain the instruction. The return |
| 132 | +value of the producer also has a symbol in it. Like with the instructions of |
| 133 | +consumers there are three possible values: |
| 134 | + |
| 135 | +* `:done` - Completed normally. |
| 136 | +* `:halted` - Consumer returned a `:halt` instruction. |
| 137 | +* `:suspended` - Consumer return a `:suspend` instruction. |
| 138 | + |
| 139 | +Together with the other values returned the possible return values from a |
| 140 | +producer are `{ :done, acc } | { :halted, acc } | { :suspended, acc, |
| 141 | +continuation }`. |
| 142 | + |
| 143 | +Back to the continuation. A continuation is a function that given an accumulator |
| 144 | +returns a new producer result. In other words it's a way to swap out the |
| 145 | +accumulator but keep the same producer in the same state. |
| 146 | + |
| 147 | +## Implementing `interleave` |
| 148 | + |
| 149 | +Using the power of suspension it is now possible to create an interleave |
| 150 | +function. |
| 151 | + |
| 152 | +```elixir |
| 153 | +def interleave(a, b) do |
| 154 | + step = fn x, acc -> [x|acc] end |
| 155 | + af = &Enumerable.reduce(a, &1, step) |
| 156 | + bf = &Enumerable.reduce(b, &1, step) |
| 157 | + do_interleave(af, bf, []) |> :lists.reverse() |
| 158 | +end |
| 159 | + |
| 160 | +defp do_interleave(a, b, acc) do |
| 161 | + case a.({ :cont, acc }) do |
| 162 | + { :suspended, acc, a } -> |
| 163 | + case b.({ :cont, acc }) do |
| 164 | + { :suspended, acc, b } -> |
| 165 | + do_interleave(a, b, acc) |
| 166 | + { :done, acc } -> |
| 167 | + # Get remainder of a's entries |
| 168 | + { :done, acc } = a.({ :cont, acc }, fn x, acc -> [x|acc] end) |
| 169 | + acc |
| 170 | + end |
| 171 | + { :done, acc } -> |
| 172 | + { :done, acc } = b.({ :cont, acc }, fn x, acc -> [x|acc] end) |
| 173 | + acc |
| 174 | + end |
| 175 | +end |
| 176 | +``` |
| 177 | + |
| 178 | +Lets go through this step by step. The main `interleave` function first |
| 179 | +partially applies `Enumerable.reduce/3` to get function values that work just |
| 180 | +like the continuations. This makes things easier for `do_interleave`. |
| 181 | + |
| 182 | +The `do_interleave` function first calls `a` (`af` from `interleave`) with the |
| 183 | +`step` function so that the available element of `a` gets added to the |
| 184 | +accumulator and `a` immediately suspends afterwards. Then the same is done for |
| 185 | +`b`. If either producer is done all the remaining elements of the other get |
| 186 | +added to the accumulator list. |
| 187 | + |
| 188 | +Note that `acc` is sometimes used to mean a tuple like `{ :cont, x }` and |
| 189 | +sometimes the accumulator value proper. It's a bit confusing, yes. |
| 190 | + |
| 191 | +This example shows that through clever combination of an outer function |
| 192 | +(`do_interleave`) and an inner function `step` two producers can be interleaved. |
| 193 | + |
| 194 | +## `Enum.reduce` and `Enumerable.reduce` are now slightly different |
| 195 | + |
| 196 | +In the old system `Enum.reduce/3` simply called `Enumerable.reduce/3` with the |
| 197 | +same arguments. In order to keep the `Enum.reduce/3` function simple to use this |
| 198 | +is no longer the case in the new system. `Enum.reduce/3` now calls |
| 199 | +`Enumerable.reduce/3` |
| 200 | +with a very simple wrapper around the passed function: `fn x, acc -> { :cont, |
| 201 | +f.(x, acc) } end`. Thus `Enum.reduce/3` works just as it has always done, it |
| 202 | +calls the supplied function for every produced element without the possibility |
| 203 | +of halting or suspending the producer. |
| 204 | + |
| 205 | +## Conclusion |
| 206 | + |
| 207 | +The new system of enumerators certainly makes things a bit more complicated but |
| 208 | +also adds power. I suspect many interesting and "interesting" functions can be |
| 209 | +built on top of it. |
0 commit comments