In Refactoring Methods with Recursive Combinators, we saw how recursive combinators like #divide_and_conquer
and #linear_recursion
are abstraction wins. They make recursive code much easier to read, because you know the general form of the algorithm and don't need to pick through it to discover the individual steps.
We also saw that by separating the recursion implementation from the declaration of how to perform the steps of an algorithm like #rotate
, we leave ourselves the opportunity to improve the performance of our implementation without the risk of adding bugs to our declaration. And today we're going to do just that, along with a few tweaks for usability.
In this post, we're going to optimize our combinators' performance and make them a little easier to use with goodies like string_to_proc
. To do that, we're going to work with closures, defining methods with define_method
, and implement functional programming's partial application. We'll wrap up by converting linrec
from a recursive to an iterative implementation.
First, a little organization. Here are the original examples. I've placed them in a module and named the combinators multirec
and linrec
in conformance with common practice:
module RecursiveCombinators
def multirec(value, steps)
if steps[:divisible?].call(value)
steps[:recombine].call(
steps[:divide].call(value).map { |sub_value| multirec(sub_value, steps) }
)
else
steps[:conquer].call(value)
end
end
def linrec(value, steps)
if steps[:divisible?].call(value)
trivial_part, sub_problem = steps[:divide].call(value)
steps[:recombine].call(
trivial_part, linrec(sub_problem, steps)
)
else
steps[:conquer].call(value)
end
end
module_function :multirec, :linrec
end
Since they are also module functions, call them by sending a message to the module:
def merge_sort(list)
RecursiveCombinators.multirec(
list,
:divisible? => lambda { |list| list.length > 1 },
:conquer => lambda { |list| list },
:divide => lambda do |list|
half_index = (list.length / 2) - 1
[ list[0..half_index], list[(half_index + 1)..-1] ]
end,
:recombine => lambda { |pair| merge_two_sorted_lists(pair.first, pair.last) }
)
end
Or you can include the RecursiveCombinators
module and call either method directly:
include RecursiveCombinators
def merge_two_sorted_lists(*pair)
linrec(
pair,
:divisible? => lambda { |pair| !pair.first.empty? && !pair.last.empty? },
:conquer => lambda do |pair|
if pair.first.empty? && pair.last.empty?
[]
elsif pair.first.empty?
pair.last
else
pair.first
end
end,
:divide => lambda do |pair|
preceding, following = case pair.first.first <=> pair.last.first
when -1: [pair.first, pair.last]
when 0: [pair.first, pair.last]
when 1: [pair.last, pair.first]
end
[ preceding.first, [preceding[1..-1], following] ]
end,
:recombine => lambda { |trivial_bit, divisible_bit| [trivial_bit] + divisible_bit }
)
end
merge_sort([8, 3, 10, 1, 9, 5, 7, 4, 6, 2])
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Ok, we're ready for some slightly more substantial work. These methods were fine for illustration, but I have a few questions for the author(!)
First, note that every single time we call a method like merge_sort
, we create four new lambdas from scratch. This seems wasteful, especially since the lambdas never change. Why create some objects just to throw them away?
On the other hand, it's nice to be able to use create algorithms without having to define a method by name. Although I probably wouldn't do a merge sort anonymously, when I need a one-off quickie, I might like to write something like:
RecursiveCombinators.multirec(
[1, 2, 3, [[4,5], 6], [[[7]]]],
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
=> 140
But when I want a permanent sum of the squares method, I don't want to write:
def sum_squares(list)
RecursiveCombinators.multirec(
list,
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
end
...because that would create four lambdas every time I call the function. There are a couple of ways around this problem. First, our "recipe" for summing squares is a simple hash. We could extract that from the method into a constant:
SUM_SQUARES_RECIPE = {
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
}
def sum_squares(list)
RecursiveCombinators.multirec(list, SUM_SQUARES_RECIPE)
end
That (and the isomorphic solution where the constant SUM_SQUARES_RECIPE
is instead a private helper method #sum_squares_recipe
) is nice if you have some reason you wish to re-use the recipe elsewhere. But we don't, so this merely clutters our class up and separates the method definition from its logic.
I have something in mind. To see what it is, let's start by transforming our method definition from using the def
keyword to using the define_method
private class method. This obviously needs a module or class to work:
class Practicum
include RecursiveCombinators
define_method :sum_squares do |list|
multirec(
list,
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
end
end
Practicum.new.sum_squares([1, 2, 3, [[4,5], 6], [[[7]]]])
As you probably know, any method taking a block can take a lambda using the &
operator, so:
define_method :sum_squares, &(lambda do |list|
multirec(
list,
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
end)
This is useful, because now we can express what we want: a lambda taking one argument that in turn calls multirec
with the other arguments filled in. Functional programmers call this Partial Application. The idea is that if you have a function or method taking two arguments, if you only give it one argument you get a function back that takes the other. So:
multirec(x).call(y)
=> multirec(x,y)
Now the drawback with this "standard" implementation of partial application is that we would pass a list to multirec
and get back a function taking a hash of declarations. That isn't what we want. We could partially apply things backwards so that multirec(x).call(y) => multirec(y,x)
(if Ruby was a concatenative language, we would be concatenating the multirec combinator with a thrush). The trouble with that is it is the reverse of how partial application works in every other programming language and functional programming library.
Instead, we will switch the arguments to multirec
ourselves, so it now works like this:
multirec(
{
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
},
list
)
The drawback with this approach is that we lose a little of Ruby's syntactic sugar, the ability to fake named parameters by passing hash arguments without {}
if they are the last parameter. And now, let's give it the ability to partially apply itself. You can do some stuff with allowing multiple arguments and counting the number of arguments, but we're going to make the wild assumption that you never attempt a recursive combinator on nil
. Here's multirec
, you can infer the implementation for linrec
:
def multirec(steps, optional_value = nil)
worker_proc = lambda do |value|
if steps[:divisible?].call(value)
steps[:recombine].call(
steps[:divide].call(value).map { |sub_value| worker_proc.call(sub_value) }
)
else
steps[:conquer].call(value)
end
end
if optional_value.nil?
worker_proc
else
worker_proc.call(optional_value)
end
end
Notice that you get the same correct result whether you write:
RecursiveCombinators.multirec(
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
).call([1, 2, 3, [[4,5], 6], [[[7]]]])
=> 140
Or:
RecursiveCombinators.multirec(
{
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
},
[1, 2, 3, [[4,5], 6], [[[7]]]]
)
=> 140
Let's go back to what we were trying to do with &
:
define_method :sum_squares, &(lambda do |list|
multirec(
list,
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
end)
Now we know how to build our lambda:
require 'partial_application_recursive_combinators'
class Practicum
extend PartialApplicationRecursiveCombinators # so we can call multirec in class scope
define_method :sum_squares, &multirec(
:divisible? => lambda { |value| value.kind_of?(Enumerable) },
:conquer => lambda { |value| value ** 2 },
:divide => lambda { |value| value },
:recombine => lambda { |list| list.inject() { |x,y| x + y } }
)
end
Practicum.new.sum_squares([1, 2, 3, [[4,5], 6], [[[7]]]])
=> 140
You can verify for yourself that no matter how many times you call sum_squares
, you do not build those lambdas again. What we have just done is added partial application to multirec
and linrec
, which in turn allows us to ensure that he cost of constructing lambdas for our methods is only done when the method is defined, not every time it is called.
We have already renamed divide_and_conquer
and linear_recursion
to bring them into line with standard practice and other programming languages. Now it's time for us to bring the parameters--the declarative lambdas--into line with standard practice.
The four arguments to both methods are normally called cond
, then
, before
, and after
:
cond
is the logical inverse ofdivisible?
So ifcond(value)
evaluates to true, then we do not need to subdivide the problem.then
is exactly the same asconquer
, if cond then then. That's the way I think of it.before
is the same asdivide
.after
is the same asrecombine
.
Things look very similar with the new scheme for now:
require 'legacy_recursive_combinators'
class Practicum
extend LegacyRecursiveCombinators # so we can call multirec in class scope
define_method :sum_squares, &multirec(
:cond => lambda { |value| value.kind_of?(Numeric) }, # the only change right now
:then => lambda { |value| value ** 2 },
:before => lambda { |value| value },
:after => lambda { |list| list.inject() { |x,y| x + y } }
)
end
All right, now our combinators will look familiar to functional programmers, and even better when we look at functional programs using recursive combinators we will understand them at a glance. Okay, let's get serious and work on making our combinators easy to use and our code easy to read.
As long as you're writing these lambdas out, writing :cond =>
isn't a hardship. And in an explanatory article like this, it can help at first. However, what if you find a way to abbreviate things? For example, you might alias lambda
to L
. Or you might want to use string_to_proc.
So we should support passing the declarative arguments by position as well as by 'name.' And with a final twist, if any of the declarative arguments aren't already lambdas, we'll try to create lambdas by sending them the message to_proc
. So our goal is to write what we wrote above or either of the following and have it "just work:"
define_method :sum_squares, &multirec(
lambda { |value| value.kind_of?(Numeric) }, # the only change right now
lambda { |value| value ** 2 },
lambda { |value| value },
lambda { |list| list.inject() { |x,y| x + y } }
)
include 'string-to_proc'
define_method :sum_squares, &multirec("value.kind_of?(Numeric)", "value ** 2","value","value.inject(&'+')")
And here is the code that makes it work:
module RecursiveCombinators
separate_args = lambda do |args|
if ![1,2,4,5].include?(args.length)
raise ArgumentError
elsif args.length <= 2
steps = [:cond, :then, :before, :after].map { |k| args.first[k].to_proc }
steps.push(args[1]) unless args[1].nil?
steps
else
steps = args[0..3].map { |arg| arg.to_proc }
steps.push(args[4]) unless args[4].nil?
steps
end
end
define_method :multirec do |*args|
cond_proc, then_proc, before_proc, after_proc, optional_value = separate_args.call(args)
worker_proc = lambda do |value|
if cond_proc.call(value)
then_proc.call(value)
else
after_proc.call(
before_proc.call(value).map { |sub_value| worker_proc.call(sub_value) }
)
end
end
if optional_value.nil?
worker_proc
else
worker_proc.call(optional_value)
end
end
define_method :linrec do |*args|
cond_proc, then_proc, before_proc, after_proc, optional_value = separate_args.call(args)
worker_proc = lambda do |value|
if cond_proc.call(value)
then_proc.call(value)
else
trivial_part, sub_problem = before_proc.call(value)
after_proc.call(
trivial_part, worker_proc.call(sub_problem)
)
end
end
if optional_value.nil?
worker_proc
else
worker_proc.call(optional_value)
end
end
module_function :multirec, :linrec
end
Now when we have trivial lambdas, we can use nice syntactic sugar to express them. string_to_proc
is not part of our recursive combinators, but making recursive combinators flexible, we make it "play well with others," which is a win for our code.
In Refactoring Methods with Recursive Combinators, we read the claim that by separating the recursion implementation from the declaration of how to perform the steps of an algorithm like #rotate
, we leave ourselves the opportunity to improve the performance of our implementation without the risk of adding bugs to our declaration.
In other words, we can optimize linrec
if we want to. Well, we want to. So what we're going to do is optimize its performance by trading time for space. Let's have a quick look at the worker_proc
lambda inside of linrec
:
worker_proc = lambda do |value|
if cond_proc.call(value)
then_proc.call(value)
else
trivial_part, sub_problem = before_proc.call(value)
after_proc.call(
trivial_part, worker_proc.call(sub_problem)
)
end
end
As you can see, it is recursive, it calls itself to solve each sub-problem. And here is an iterative replacement:
worker_proc = lambda do |value|
trivial_parts, sub_problem = [], value
while !cond_proc.call(sub_problem)
trivial_part, sub_problem = before_proc.call(sub_problem)
trivial_parts.unshift(trivial_part)
end
trivial_parts.unshift(then_proc.call(sub_problem))
trivial_parts.inject do |recombined, trivial_part|
after_proc.call(trivial_part, recombined)
end
end
This version doesn't call itself. Instead, it uses an old-fashioned loop, accumulating the results in an array. In a certain sense, this uses more explicit memory than the recursive implementation. However, we both know that the recursive version uses memory for its stack, so that's a bit of a wash. However, the Ruby stack is limited while arrays can be much larger, so this version can handle much larger data sets.
If you drop the new version of worker_proc
into the linrec
definition, each and every method you define using linrec
gets the new implementation, for free. This works because we separated the implementation of recursive divide and conquer algorithms from the declaration of the steps each particular algorithm. Here's our new version of linrec
:
define_method :linrec do |*args|
cond_proc, then_proc, before_proc, after_proc, optional_value = separate_args.call(args)
worker_proc = lambda do |value|
trivial_parts, sub_problem = [], value
while !cond_proc.call(sub_problem)
trivial_part, sub_problem = before_proc.call(sub_problem)
trivial_parts.unshift(trivial_part)
end
trivial_parts.unshift(then_proc.call(sub_problem))
trivial_parts.inject do |recombined, trivial_part|
after_proc.call(trivial_part, recombined)
end
end
if optional_value.nil?
worker_proc
else
worker_proc.call(optional_value)
end
end
recursive_combinators.rb contains the final, practical implementation of multirec
and linrec
. It's leaner and faster than the naive implementations shown in Refactoring Methods with Recursive Combinators. Rails users can drop it into config/initializers
and use it in their projects.
p.s. In an upcoming post, we'll talk about why multirec
and linrec
are implemented using define_method
instead of the def
keyword.
More on combinators: Kestrels, The Thrush, Songs of the Cardinal, Quirky Birds and Meta-Syntactic Programming, Aspect-Oriented Programming in Ruby using Combinator Birds, The Enchaining and Obdurate Kestrels, Finding Joy in Combinators, Refactoring Methods with Recursive Combinators, Practical Recursive Combinators, The Hopelessly Egocentric Blog Post, Wrapping Combinators, and Mockingbirds and Simple Recursive Combinators in Ruby.
My recent work:
- JavaScript Allongé, CoffeeScript Ristretto, and my other books.
- allong.es, practical function combinators and decorators for JavaScript.
- Method Combinators, a CoffeeScript/JavaScript library for writing method decorators, simply and easily.
- jQuery Combinators, what else? A jQuery plugin for writing your own fluent, jQuery-like code.
(Spot a bug or a spelling mistake? This is a Github repo, fork it and send me a pull request!)