Skip to content

Commit 0cef704

Browse files
author
José Valim
committed
Merge pull request elixir-lang#179 from pminten/cont-enumerators-post
Blog post about the new enumerators
2 parents 44e474c + 6344c2f commit 0cef704

File tree

1 file changed

+209
-0
lines changed

1 file changed

+209
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
---
2+
layout: post
3+
title: Elixir's new continuable enumerators
4+
author: Peter Minten
5+
category: "What's New in Elixir"
6+
excerpt: In 0.12.0 Elixir's enumerators have gained the ability to suspend value
7+
production and to terminate early.
8+
---
9+
10+
As you may have heard in the upcoming 0.12.0 release Elixir's enumerators gained
11+
some new features. In this blog post I'll explain what's new, what it enables
12+
and how it works.
13+
14+
For those of you who use the development version of Elixir these changes are
15+
already available. For the exact differences in code you can look at the
16+
[relevant pull request](https://github.com/elixir-lang/elixir/pull/1922).
17+
18+
## A recap of enumerators, and some terminology
19+
20+
The basic idea of enumerators is that you traverse some data structure or
21+
resource (lines from a file) by putting the thing that is traversed in control.
22+
That is if you're reading from a file you have a loop that reads lines from a
23+
file and for each line calls a function. Just calling a function isn't all that
24+
useful for most tasks as there'd be no way to remember previous lines (ugly
25+
hacks aside), so some accumulator value is passed to the function and a new
26+
accumulator is returned by it.
27+
28+
For example here's how you can count the total length of strings in a list.
29+
30+
```elixir
31+
Enumerable.reduce(l, 0, fn x, acc -> String.length(x) + acc end)
32+
```
33+
34+
Often the actual call to `Enumerable.reduce/3` is hidden inside another
35+
function. Say that we want to define a `sum` function. The usual way is to
36+
write it like this:
37+
38+
```elixir
39+
def sum(coll) do
40+
Enumerable.reduce(coll, 0, fn x, acc -> x + acc end)
41+
end
42+
```
43+
44+
This could get called as `Enum.map(1..10, &(&1 * &1)) |> sum()` to get the sum of
45+
squares. Desugaring this means `sum(Enum.map(1..10, &(&1 * &1)))`.
46+
47+
The general pattern is this:
48+
49+
```elixir
50+
def outer_function(coll, ...) do
51+
...
52+
Enumerable.reduce(coll, initial_consumer_acc, consumer)
53+
...
54+
end
55+
56+
something_that_returns_an_enumerable(...) |> outer_function(...)
57+
```
58+
59+
You'll notice the slightly uncommon terminology of "outer function" and
60+
"consumer" (normally called an "iteratee"). That's intentional, naming an
61+
iteratee a consumer better reflects that it consumes values.
62+
63+
Along the same lines I call the reduce function for a specific enumerable a
64+
producer, it produces values which are given to a consumer.
65+
66+
The outer function is the function to which the enumerable is passed.
67+
Syntactically it looks like this is the consumer, but it's really a function
68+
that combines the producer and the consumer. For simple consumers (say `fn x,
69+
acc -> length(x) + acc end`) the consumer will often be written directly in the
70+
source text of the outer function, but let's try to keep those concepts
71+
distinguished.
72+
73+
## Two issues with classic Elixir enumerators
74+
75+
Enumerators are great, but they have their limitations. One issue is that it's
76+
not possible to define a function that only returns at most 3 elements without
77+
traversing all elements or using ugly tricks such as `throw` (with a
78+
`try...catch` construct in the outer function). The `throw` trick is used in
79+
`Enum` and `Stream` to implement functions such as `Enum.take/2` and
80+
`Stream.take_while/2`. It works, but it's not what I'd call stylish.
81+
82+
A bigger problem, that doesn't have a workaround, is that there's no way to
83+
interleave two enumerables. That is, it's not possible to define a function that
84+
for two enumerables `A` and `B` returns a list `[A1, B1, A2, B2, A3, ...]`
85+
(where `A1` is the first element of A) without first traversing both lists and
86+
then interleaving the collected values. Interleaving is important because it's
87+
the basis of a zip function. Without interleaving you cannot implement
88+
`Stream.zip/2`.
89+
90+
The underlying problem, in both cases, is that the producer is fully in control.
91+
The producer simply pushes out as many elements to the consumer as it wants and
92+
then says "I'm done". There's no way aside from `throw`/`raise` for a consumer
93+
to tell a producer "stop producing". There is definitely no way to tell a
94+
producer "stop for now but be prepared to continue where you left off later".
95+
96+
## Power to the consumer!
97+
98+
At CodeMeshIO José Valim and Jessica Kerr sat down and discussed this problem.
99+
They came up with a solution inspired by a [Monad.Reader
100+
article](http://themonadreader.files.wordpress.com/2010/05/issue16.pdf) (third
101+
article). It's an elegant extension of the old system, based on a simple idea.
102+
Instead of returning only an accumulator at every step (for every produced
103+
value) the consumer returns a combination of an accumulator and an instruction
104+
to the producer. Three instructions are available:
105+
106+
* `:cont` - Keep producing.
107+
* `:halt` - Stop producing.
108+
* `:suspend` - Temporarily stop producing.
109+
110+
A consumer that always returns `:cont` makes the producer behave exactly the
111+
same as in the old system. A consumer may return `:halt` to have the producer
112+
terminate earlier than it normally would.
113+
114+
The real magic is in `:suspend` though. It tells a producer to return the
115+
accumulator and a continuation function.
116+
117+
```elixir
118+
{ :suspended, n_, cont } = Enumerable.reduce(1..5, { :cont, 0 }, fn x, n ->
119+
if x == 3 do
120+
{ :suspend, n }
121+
else
122+
{ :cont, n + x }
123+
end
124+
end)
125+
```
126+
127+
After running this code `n_` will be `3` (1 + 2) and `cont` will be a
128+
function. We'll get back to `cont` in a minute but first take a look at some of
129+
the new elements here. The initial accumulator has an instruction as well, so
130+
you could suspend or halt a producer immediately, if you really want to. The
131+
value passed to the consumer (`n`) does not contain the instruction. The return
132+
value of the producer also has a symbol in it. Like with the instructions of
133+
consumers there are three possible values:
134+
135+
* `:done` - Completed normally.
136+
* `:halted` - Consumer returned a `:halt` instruction.
137+
* `:suspended` - Consumer return a `:suspend` instruction.
138+
139+
Together with the other values returned the possible return values from a
140+
producer are `{ :done, acc } | { :halted, acc } | { :suspended, acc,
141+
continuation }`.
142+
143+
Back to the continuation. A continuation is a function that given an accumulator
144+
returns a new producer result. In other words it's a way to swap out the
145+
accumulator but keep the same producer in the same state.
146+
147+
## Implementing `interleave`
148+
149+
Using the power of suspension it is now possible to create an interleave
150+
function.
151+
152+
```elixir
153+
def interleave(a, b) do
154+
step = fn x, acc -> [x|acc] end
155+
af = &Enumerable.reduce(a, &1, step)
156+
bf = &Enumerable.reduce(b, &1, step)
157+
do_interleave(af, bf, []) |> :lists.reverse()
158+
end
159+
160+
defp do_interleave(a, b, acc) do
161+
case a.({ :cont, acc }) do
162+
{ :suspended, acc, a } ->
163+
case b.({ :cont, acc }) do
164+
{ :suspended, acc, b } ->
165+
do_interleave(a, b, acc)
166+
{ :done, acc } ->
167+
# Get remainder of a's entries
168+
{ :done, acc } = a.({ :cont, acc }, fn x, acc -> [x|acc] end)
169+
acc
170+
end
171+
{ :done, acc } ->
172+
{ :done, acc } = b.({ :cont, acc }, fn x, acc -> [x|acc] end)
173+
acc
174+
end
175+
end
176+
```
177+
178+
Lets go through this step by step. The main `interleave` function first
179+
partially applies `Enumerable.reduce/3` to get function values that work just
180+
like the continuations. This makes things easier for `do_interleave`.
181+
182+
The `do_interleave` function first calls `a` (`af` from `interleave`) with the
183+
`step` function so that the available element of `a` gets added to the
184+
accumulator and `a` immediately suspends afterwards. Then the same is done for
185+
`b`. If either producer is done all the remaining elements of the other get
186+
added to the accumulator list.
187+
188+
Note that `acc` is sometimes used to mean a tuple like `{ :cont, x }` and
189+
sometimes the accumulator value proper. It's a bit confusing, yes.
190+
191+
This example shows that through clever combination of an outer function
192+
(`do_interleave`) and an inner function `step` two producers can be interleaved.
193+
194+
## `Enum.reduce` and `Enumerable.reduce` are now slightly different
195+
196+
In the old system `Enum.reduce/3` simply called `Enumerable.reduce/3` with the
197+
same arguments. In order to keep the `Enum.reduce/3` function simple to use this
198+
is no longer the case in the new system. `Enum.reduce/3` now calls
199+
`Enumerable.reduce/3`
200+
with a very simple wrapper around the passed function: `fn x, acc -> { :cont,
201+
f.(x, acc) } end`. Thus `Enum.reduce/3` works just as it has always done, it
202+
calls the supplied function for every produced element without the possibility
203+
of halting or suspending the producer.
204+
205+
## Conclusion
206+
207+
The new system of enumerators certainly makes things a bit more complicated but
208+
also adds power. I suspect many interesting and "interesting" functions can be
209+
built on top of it.

0 commit comments

Comments
 (0)