[This post is literate Haskell, get the source here]
As most Haskell programmers know, there are two ways to fold a list: from the right with foldr and from the left with foldl. foldr is corecursive (productive), which is great when the output can be produced lazily. foldl (or better, its strict cousin foldl') is tail recursive, preventing stack overflows.
We can define analogous operations for other data structures like 1-dimensional arrays. Libraries like 'Data.ByteString' and 'Data.Vector' provide these. But as I will show in this post there are more fold operations than the common two.
The data type I will use in this post is simply
type Array1D a = Array Int a -- and two utility functions for getting the lower and upper bounds lo,hi :: Array1D a -> Int lo = fst . bounds hi = snd . bounds
The right fold applies a function f to the current value and the folded result of the rest of the array:
foldra :: (a -> b -> b) -> b -> Array1D a -> b foldra f z0 ar = go (lo ar) where go i | i > hi ar = z0 | otherwise = f (ar ! i) (go (i + 1))
The (strict) left fold uses an accumulator parameter:
foldl'a :: (b -> a -> b) -> b -> Array1D a -> b foldl'a f z0 ar = go z0 (lo ar) where go !z i | i > hi ar = z | otherwise = go (f z (ar ! i)) (i + 1)
In each case, the recursive go function is very similar in structure to the list version; only instead of recursing for the tail of the list we recurse for index i+1. The time and space behavior is also similar. For example, if you have a large array
testArray :: Array1D Integer testArray = listArray (1,10^6) [1..]
Then for computing something like the sum of all elements, you should use a strict left fold:
*Main> foldl'a (+) 0 testArray 50000005000000 *Main> foldra (+) 0 testArray *** Exception: stack overflow
On the other hand, a right fold is the way to go when you are only interested in a part of a lazily produced result. For example when converting an array to a list:
*Main> take 10 . foldra (:) [] $ testArray [1,2,3,4,5,6,7,8,9,10] (0.02 secs, 520824 bytes) *Main> take 10 . foldl'a (flip (:)) [] $ testArray [1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991] (5.89 secs, 263122464 bytes)
All of this is exactly the same as with lists.
But, if you look at foldra and foldl'a, you will see that they both contain a loop doing (i + 1).
So in a sense, both of these functions work from left to right!
Because arrays allow for random access, it is possible to make true right to left folds, just start at the end and do (i - 1) in each iteration.
foldlb :: (b -> a -> b) -> b -> Array1D a -> b foldlb f z ar = go (hi ar) where go i | i < lo ar = z | otherwise = f (go (i - 1)) (ar ! i)
foldr'b :: (a -> b -> b) -> b -> Array1D a -> b foldr'b f z0 ar = go z0 (hi ar) where go !z i | i < lo ar = z | otherwise = go (f (ar ! i) z) (i - 1)
Just look at the pretty duality there! We now have a lazy left fold and a strict right fold.
The behavior is exactly the opposite of that of the folda functions above:
*Main> foldlb (+) 0 testArray *** Exception: stack overflow *Main> foldr'b (+) 0 testArray 50000005000000
*Main> take 10 . foldr'b (:) [] $ testArray [1,2,3,4,5,6,7,8,9,10] (6.19 secs, 263055372 bytes) *Main> take 10 . foldlb (flip (:)) [] $ testArray [1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991] (0.00 secs, 524836 bytes)
To summarize, four ways to fold an array are:
| lo to hi, i+1 | hi to lo, i-1 | |
| corecursion, productive, lazy | foldra | foldlb |
| accumulator, tail recursive, strict | foldl'a | foldr'b |
Exercise: can you think of other ways to fold an array?
Comments
Here's a binary tree fold over arrays:
-- store bounds along for convenience data BoundedArray1D a = A (Array1D a) Int Int deriving Show fromArray1D a = A a (lo a) (hi a)
-- treatment of an array as a binary node. The node's -- tag is at the center. asNode :: BoundedArray1D a -> Maybe (BoundedArray1D a, a, BoundedArray1D a) asNode (A a l h) | l > h = Nothing | otherwise = let center = (l + h) `div` 2 lower = A a l (center-1) upper = A a (center+1) h in Just (lower, a ! center, upper)
-- fold upwards in the tree bfold :: (b -> a -> b -> b) -> b -> BoundedArray1D a -> b bfold f b arr = case asNode arr of Nothing -> b Just (l, a, h) -> f (bfold f b l) a (bfold f b h)
-- example bsum arr = bfold addThree 0 arr where addThree a b c = a+b+cThis should allow you to fold over an array in logarithmic time. Well, I think...
mapReduce
googles mapReduce is a fold, too. first fmap, to a Monoid, then mappend the results.mapReduce :: (a->b) -> (b->b->b) -> Array1D a -> bthe reduce is an associative fold.i'd prefer sth like a "square-fold":
squareFoldl' :: b -> (b->a->b) -> (b->b->b) -> Array1D a -> bthe first is a sequential foldl' over a part of the array, running in an own thread, the second combines the results of the threads (in an associative way).
It's worth noting that you can use both foldr and foldl' to sum up a list because + happens to be associative. If you have a non-associative binary operator you want to fold to the right in constant space, or if you want to fold to the left while lazily producing results, it's a good chance to use foldr'_b or foldl_b.Can you explain the definition of corecursion/productivity? Is corecursion another word for coinduction, which means that a function is defined to be a greatest fixed-point?