Fast loops for Haskell (for when GHC can't optimize forM_
), with benchmarks.
Summary:
forM_ [0..n]
is slow. This gives you:
import Control.Loop (forLoop)
forLoop 0 (< n) (+1) $ \i -> do
-- high performance loop body here
If you want an overflow check, use succ
instead of (+1)
; it's a bit slower.
On Hackage: cabal install loop
You (and everybody else) would like to write
forM_ [0..n] $ \i -> ...
Clearly GHC should optimize this list away, yielding a nice loop in the generated assembly.
Unfortunately, this isn't always true, and at the moment, it really doesn't happen reliably.
Unfortunately, this seems to be the most robustly fast (across all types I have tested it with) loop:
forLoop :: (Monad m) => a -> (a -> Bool) -> (a -> a) -> (a -> m ()) -> m ()
forLoop start cond inc f = go start
where
go !x | cond x = f x >> go (inc x)
| otherwise = return ()
{-# INLINE forLoop #-}
And you can use it as
forLoop 0 (< n) (+1) $ \i -> ...
Now, you probably don't like this, and neither do I, but it really looks like this is what you should use if you want to write high performance code.
It looks like C, but at least it's just as fast.
One might think that in many cases the loop incrementing performance doesn't matter too much, since the loop body should dominate the time spent.
That is true, but it is easy to get into cases where it doesn't. Some examples:
-
You want to write a test to make sure that reinterpret-casting a Word32 to Float and back gives you the same Word32. Using
forLoop
can make the difference if you have to wait 40 seconds for your test to pass or 3 seconds. (This is how I got into this topic.) -
You want to implement something that does trivial operations on unboxed vectors, say dot product or matrix multiplication.
forLoop
can make a 5x performance difference.
-
Maintaining the fastest way to loop in one place frees one from thinking about it, and I plan to keep this updated with the fastest implementation known (contributions welcome).
-
It gives us a place to discuss alternatives and collect benchmarks for high-performance looping.
This originates from the thread How to write fast for loops on Haskell-Cafe.
- bench/Bench.hs results
- bench/TraverseW32.hs results
- bench/FoldlAndIORefAreSlow.hs results - see also llvm, ghc7.8, ghc7.8/llvm
Some results:
- It is tricky to write an Enum-based general loop function that is fast for all data types. The way overflow checks are implemented in different types makes e.g. Int and Word32 behave differently; Word32 is slower.
[a..b]
is nice and can be just as fast as a manual loop - if you are lucky. Unfortunately, if you don't want to rely on luck for your program's performance, you can't have your[a..b]
(at the time being).- Vector's equivalent of
[a..b]
might not be as luck dependent, but suffers from a factor 5 penalty, which is hopefully a but (see John's post).