blog 21

Tue 2010-07-13 21:47

More composition operators

Here is an idea for some more function composition operators, beyond just (.):

(f .$ g) x = (f) . (g $ x)
(f $. g) x = (f $ x) . (g)
(f .$$ g) x y = (f) . (g $ x $ y)
(f $.$ g) x y = (f $ x) . (g $ y)
(f $$. g) x y = (f $ x $ y) . (g)
-- etc.
infixl 8 .$, $., .$$, $.$, $$. -- slightly less tight than (.)

The .$ name is supposed suggests that an extra argument is applied on the right before the functions are composed. Notice also that the dollars and dot on the left hand site match those on the right hand side. These combinators make writing point free code easier:

concatMap = concat .$ map
sum23 = (+) . (2*) $. (3*)  -- \x y -> 2*x + 3*y


Here is another family of composition operators:

(f $. g) x = (f) (g x)    -- a.k.a. (.)
(f .$ g) x = (f x) (g)    -- a.k.a. flip
(f $.. g) x y = (f) (g x y)
(f .$. g) x y = (f x) (g y)
(f ..$ g) x y = (f x y) (g)
(f $... g) x y z = (f) (g x y z)
(f .$.. g) x y z = (f x) (g y z)
(f ..$. g) x y z = (f x y) (g z)
(f ...$ g) x y z = (f x y z) (g)
-- etc.
infixl 8 $., .$, $..,.$.,..$, $...,.$..,..$.,...$

Think of the . as the placeholder for an argument. It would be better if I could use _, but Haskell doesn't allow that. You can also think of the dots as the points from point-free style, so these operators allow for the preservation of the number of points :). With these operators the previous example becomes:

concatMap = concat $.. map
sum23 = (+) $. (2*) .$. (3*)  -- \x y -> 2*x + 3*y

I like the second family better, because they do not use (.), which makes the first family more confusing. What do you think? Would these operators be useful in practice?

posted by Twan | 10 comments | tags = /haskell | permalink | rss
Mon 2009-11-09 02:11

Four ways to fold an array

[This post is literate Haskell, get the source here]

As most Haskell programmers know, there are two ways to fold a list: from the right with foldr and from the left with foldl. foldr is corecursive (productive), which is great when the output can be produced lazily. foldl (or better, its strict cousin foldl') is tail recursive, preventing stack overflows.

We can define analogous operations for other data structures like 1-dimensional arrays. Libraries like 'Data.ByteString' and 'Data.Vector' provide these. But as I will show in this post there are more fold operations than the common two.

The data type I will use in this post is simply

type Array1D a = Array Int a
-- and two utility functions for getting the lower and upper bounds
lo,hi :: Array1D a -> Int
lo = fst . bounds
hi = snd . bounds

The right fold applies a function f to the current value and the folded result of the rest of the array:

foldra :: (a -> b -> b) -> b -> Array1D a -> b
foldra f z0 ar = go (lo ar)
  where go i
         | i > hi ar = z0
         | otherwise = f (ar ! i) (go (i + 1))

The (strict) left fold uses an accumulator parameter:

foldl'a :: (b -> a -> b) -> b -> Array1D a -> b
foldl'a f z0 ar = go z0 (lo ar)
  where go !z i
         | i > hi ar = z
         | otherwise = go (f z (ar ! i)) (i + 1)

In each case, the recursive go function is very similar in structure to the list version; only instead of recursing for the tail of the list we recurse for index i+1. The time and space behavior is also similar. For example, if you have a large array

testArray :: Array1D Integer
testArray = listArray (1,10^6) [1..]

Then for computing something like the sum of all elements, you should use a strict left fold:

*Main> foldl'a (+) 0 testArray
50000005000000
*Main> foldra (+) 0 testArray
*** Exception: stack overflow

On the other hand, a right fold is the way to go when you are only interested in a part of a lazily produced result. For example when converting an array to a list:

*Main> take 10 . foldra (:) [] $ testArray
[1,2,3,4,5,6,7,8,9,10]
(0.02 secs, 520824 bytes)
*Main> take 10 . foldl'a (flip (:)) [] $ testArray
[1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991]
(5.89 secs, 263122464 bytes)

All of this is exactly the same as with lists.


But, if you look at foldra and foldl'a, you will see that they both contain a loop doing (i + 1). So in a sense, both of these functions work from left to right!

Because arrays allow for random access, it is possible to make true right to left folds, just start at the end and do (i - 1) in each iteration.

foldlb :: (b -> a -> b) -> b -> Array1D a -> b
foldlb f z ar = go (hi ar)
  where go i
         | i < lo ar = z
         | otherwise = f (go (i - 1)) (ar ! i)
foldr'b :: (a -> b -> b) -> b -> Array1D a -> b
foldr'b f z0 ar = go z0 (hi ar)
  where go !z i
         | i < lo ar = z
         | otherwise = go (f (ar ! i) z) (i - 1)

Just look at the pretty duality there! We now have a lazy left fold and a strict right fold.

The behavior is exactly the opposite of that of the folda functions above:

*Main> foldlb (+) 0 testArray
*** Exception: stack overflow
*Main> foldr'b (+) 0 testArray
50000005000000
*Main> take 10 . foldr'b (:) [] $ testArray
[1,2,3,4,5,6,7,8,9,10]
(6.19 secs, 263055372 bytes)
*Main> take 10 . foldlb (flip (:)) [] $ testArray
[1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991]
(0.00 secs, 524836 bytes)

To summarize, four ways to fold an array are:

lo to hi, i+1hi to lo, i-1
corecursion, productive, lazyfoldrafoldlb
accumulator, tail recursive, strictfoldl'afoldr'b

Exercise: can you think of other ways to fold an array?

posted by Twan | 8 comments | tags = /haskell | permalink | rss
Mon 2009-07-20 03:24

CPS based functional references

[This post is literate Haskell, download source code here]

I have recently come up with a new way of representing functional references.

As you might recall, functional references (also called lenses) are like a pointer into a field of some data structure. The value of this field can be extracted and modified. For example:

GHCi> get fstF (123,"hey")
123
GHCi> set fstF 456 (123,"hey")
(456,"hey")
GHCi> modify fstF (*2) (123,"hey")
(246,"hey")

where fstF is a functional reference to the first element of a pair. It has the type RefF (a,b) a, i.e. in a 'record' of type (a,b) it points to an a.

Previous representations relied on a record that contained the get and set or the get an modify functions. But there is a much nicer looking representation possible using Functors.


First of all we will need a language extension and some modules:

{-# LANGUAGE Rank2Types #-}
import Control.Applicative
import Control.Monad.Identity

Now the representation for functional references I came up with is:

type RefF a b = forall f. Functor f => (b -> f b) -> (a -> f a)

This type looks a lot like a continuation passing style function, which would be simply (b -> r) -> (a -> r), but where the result is f a instead of any r. With different functors you get different behaviors. With the constant functor we can get the field pointed to:

get :: RefF a b -> a -> b
get r = getConst . r Const

While the identity functor allows a function us to modify the field:

modify :: RefF a b -> (b -> b) -> a -> a
modify r m = runIdentity . r (Identity . m)

set :: RefF a b -> b -> a -> a
set r b = modify r (const b)

As an example of an 'instance', here is the fstF function I used in the introduction:

fstF :: RefF (a,b) a
fstF a_to_fa (a,b) = (\a' -> (a',b)) <$> a_to_fa a

If we had tuple sections it could be written as simply

fstF x (a,b) = (,b) <$> x a


To get access to inner fields, functional references can be composed. So compose fstF fstF points to the first element inner inside the first outer element of a nested pair. One of the things that I like about the cps/functor based representation is that composition is quite beautiful and symmetric:

compose :: RefF b c -> RefF a b -> RefF a c
compose r s = s . r

idF :: RefF a a
idF = id

Let me conclude with the pair operator, called (***) in Control.Arrow. Unfortunately this operator is not as easy to define.

pair :: RefF a c -> RefF b d -> RefF (a,b) (c,d)
pair r s cd_to_fcd (a,b) = some_ugly_code

In fact, the only way I know of implementing pair is by moving back and forth to a get/set representation

 where some_ugly_code =
         let fcd = cd_to_fcd (get r a, get s b)      -- :: f (c,d)
             cd_to_ab (c,d) = (set r c a, set s d b) -- :: (c,d) -> (a,b)
         in fmap cd_to_ab fcd                        -- :: f (a,b)

The problem is that we need to split one function of type (c,d) -> f (c,d) into two, c -> f c and d -> f d, because that is what the left and right arguments expect. Then later, we would need to do the reverse and combine two of these functions again.

Does anyone have a better suggestion for implementing pair?

posted by Twan | 17 comments | tags = /haskell | permalink | rss
Sat 2009-04-25 16:03

Where do I get my non-regular types?

[This post is literate Haskell, view the source code here]

Friday I wrote about the type

data FunList a b
    = Done b
    | More a (FunList a (a -> b))

Where did this type come from? What can you use it for?

The story starts with another way of constructing FunLists, besides pure. For contrast I will call it 'impure'.

impure :: a -> FunList a a
impure a = More a (Done id)

I claim that any FunList can be written in the form

pure b <*> impure a1 <*> impure a2 <*> ...

for some b and a1, a2, etc. In other words, impure and Applicative are all that you need. The following function converts a FunList to the above form, where impure and the Applicative instance are left as parameters:

withImpure :: Applicative f => (a -> f a) -> FunList a b -> f b
withImpure imp (Done b)   = pure b
withImpure imp (More a f) = withImpure imp f <*> imp a

If you use this with the Applicative instance from last time you will find that getAs . withImpure impure = reverse . getAs!, I have written a reverse function without realizing it. Since this time we don't want to reverse the list, I am going to turn the Applicative instance around for this post:

instance Applicative (FunList a) where
    pure = Done
    c <*> Done b   = fmap ($b) c
    c <*> More a z = More a ((.) <$> c <*> z)

To support my claim above I need to prove that withImpure impure = id. This is a simple exercise in proof by induction. First of, we have that

withImpure impure (Done b) = pure b = Done b

Now assume that the theorem holds for z, i.e. withImpure impure z = z. Then

  withImpure impure (More a z)
= withImpure impure z <*> impure a
= z <*> impure a -- by induction hypotheis
= z <*> More a (Done id)
= More a ((.) <$> z <*> Done id)
= More a (fmap ($id) (fmap (.) z))
= More a (fmap (.id) z)
= More a z

By induction withImpure impure z = z for all z.


I actually came upon FunList from the other direction. I started with the higher order type

type ApplicativeFunList a b = forall f. Applicative f => (a -> f a) -> f b

An ApplicativeFunList is a function of the form \imp -> applicativeStuff. Since the applicativeStuff has to work for any applicative functor it can only use operations from that class in addition to the imp argument. Because of the Applicative laws, things like anything <*> pure x are the same as ($x) <$> anything, so the only interesting functions of this form are

\imp -> pure b
\imp -> pure b <*> imp a1
\imp -> pure b <*> imp a1 <*> imp a2
-- etc.

Which is precisely what a FunList can represent! Indeed, we can convert any FunList to an ApplicativeFunList, and back again:

toAFL :: FunList a b -> ApplicativeFunList a b
toAFL fl imp = withImpure imp fl

fromAFL :: ApplicativeFunList a b -> FunList a b
fromAFL afl = afl impure

We already know that fromAFL . toAFL = withImpure impure = id. The other way around, I claim (but do not prove yet) that toAFL . fromAFL = id. Hence, FunList and ApplicativeFunList are isomorphic!

posted by Twan | 2 comments | tags = /haskell | permalink | rss
Thu 2009-04-23 21:50

A non-regular data type challenge

[This post is literate Haskell, download source code here]

While playing around with generalized functional references I encountered the following list-like data type:

data FunList a b
    = Done b
    | More a (FunList a (a -> b))

This is a non-regular data type, meaning that inside the FunList a b there is a FunList a not-b. So, what does a value of this type look like? Well, it can be

We either have just b, or an a and a function a->b, or two as (i.e. a2) and a function a2->b, or a3 and a3->b, etc.

A FunList a b is therefore a list of as together with a function that takes exactly that number of as to give you a b. Extracting the single represented b value is easy:

getB :: FunList a b -> b
getB (Done b)   = b
getB (More a z) = getB z a

As is getting to the list of as:

getAs :: FunList a b -> [a] 
getAs (Done _)   = []
getAs (More a z) = a : getAs z

But then things quickly get much trickier. Since a FunList a b holds exactly one b, we might ask how much access we have to it. First of, FunList a is a Functor, so the b value can be changed:

instance Functor (FunList a) where
    fmap f (Done b)   = Done (f b)
    fmap f (More a z) = More a (fmap (f .) z)

The above case for More looks a bit strange, but remember that the data type is non-regular, so we recurse with a different function f. In this case instead of having type b -> c as the outer f does, we need something with type (a -> b) -> (a -> c).

The Applicative instance is even stranger. There is a flip there, where the heck did that come from?

instance Applicative (FunList a) where
    pure = Done
    Done b   <*> c = fmap b c                    -- follows from Applicative laws
    More a z <*> c = More a (flip <$> z <*> c)   -- flip??

Aside from manipulating the b value we can also do more list like things to the list of as, such as zipping:

zipFun :: FunList a b -> FunList c d -> FunList (a,c) (b,d)
zipFun (Done b)   d          = Done (b,getB d)
zipFun b          (Done d)   = Done (getB b,d)
zipFun (More a b) (More c d) = More (a,c) (applyPair <$> zipFun b d)
    where applyPair (f,g) (x,y) = (f x,g y)

Surprisingly, the applicative operator defined above can be used as a kind of append, just look at the type:

(<*>) :: FunList a (b -> c) -> FunList a b -> FunList a c

it takes two 'lists' and combines them into one. It is indeed true that getAs a ++ getAs b == getAs (a <*> b).

This is as far as I got, so I will end this post with a couple of challenges:

posted by Twan | 16 comments | tags = /haskell | permalink | rss
Wed 2008-12-10 22:06

Knight in n, part 4: tensors

[This post is literate Haskell, get the source here]

Previously in this series:

Welcome to the fourth installement of the Knight in n series. In part 3 we talked about the direct product of rings, and how they helped us solve the knight moves problem. This time yet another type of product is going to help in decomposing the algorithm to allow faster parts to be put in.

The tensor product of rings

In part three I introduced the direct product on rings, which is nothing more than a pair of numbers. Confusingly this operation is also called direct sum. To illustrate this name, take the direct sum/product of Array i a with Array j b. For every index i (within the bounds of the first array) there is a value of type a, and for every index j there is a value of type b. Instead of a pair of arrays, this could also be implemented as a single array with the type Array (Either i j) (Either a b). "Either" is just Haskell's way of saying "disjoint union" or "sum type", hence "direct sum".

There is another product operation that we can perform on two rings: the tensor product. Dually to the direct sum, the tensor product of Array i a and Array j b has type Array (i,j) (a,b). The definition is very simple: the array contains all pairs where the first part comes from the first array, and the second part comes from the second array.

Slightly more generally, we can use any combining function. The general tensor product of two arrays can be implemented as:

tensorWith :: (Ix i, Ix j) => (a -> b -> c) -> Array i a -> Array j b -> Array (i,j) c
tensorWith f a b
    = array ((alo,blo),(ahi,bhi))
      [ ((i,j), f x y) | (i,x) <- assocs a, (j,y) <- assocs b ]
  where (alo,ahi) = bounds a
        (blo,bhi) = bounds b

Usually elements are multiplied:

(><) :: (Ix i, Ix j, Num a) => Array i a -> Array j a -> Array (i,j) a
(><) = tensorWith (*)

The mathematical notation for this (><) operator is ⊗. Now an example: Here we take two 4-element vectors, their tensor product has 4*4=16 elements. The two vectors are "one dimensional*" objects, their tensor product is a "two dimensional" matrix.


	\begin{pmatrix}
	 1\\2\\3\\4
	\end{pmatrix}
	 \otimes
	\begin{pmatrix}
	 a\\c\\d\\e
	\end{pmatrix}
	 = 
	\begin{pmatrix}
	 1a & 1b & 1c & 1d \\
	 2a & 2b & 2c & 2d \\
	 3a & 3b & 3c & 3d \\
	 4a & 4b & 4c & 4d \\
	\end{pmatrix}

A special case we will use often is the tensor product of an array with itself:

square x = x >< x

For example (using simple reflection of expressions which is now on hackage as Debug.SimpleReflect):

Knight4> square (listArray (0,2) [u,v,w])
listArray ((0,0),(2,2)) [u*u, u*v, u*w
                        ,v*u, v*v, v*w
                        ,w*u, w*v, w*w]

Interchange law

The tensor product and convolution operations satisfy the very useful interchange law:


	(a \otimes b) * (c \otimes d) = (a * c) \otimes (b * d)

And since exponentiation is repeated convolution, also


	(a \otimes b)^n = a^n \otimes b^n

For a proof sketch of this equation, compare the definitions of (><) and mulArray. Ignoring array bounds stuff, we have:

convolution:     [ ( i+j,  x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]
tensor product:  [ ((i,j), x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]

The only difference is in what happens to indices, with convolution the indices are added, with the tensor product a pair is formed. Now consider the interchange law. Informally, the indices of the left hand side are of the form (ia,ib)+(ic,id), and on the right hand side (ia+ic,ib+id). This corresponds exactly to the piecewise addition for Num (α,β).

The interchange law is often exploited to perform faster convolutions. For example, consider blurring an image by taking the convolution with a Gaussian blur kernel:
image*blur=blurred_image
Performing this convolution requires O(n4) operations for an n by n image.

The two dimensional Gaussian blur kernel can be written as the tensor product of two one dimensional kernels, with a bit algebra this gives:

So now to blur an image we can perform two convolution, first with the horizontal kernel, and then with the vertical one:
image*blurH*blurV=blurred_image
This procedure needs only O(n3) operations.

Back to business

Blurring images is not what we are trying to do. Instead of convolution with the Gaussian blur kernel, we are interested in convolution with moveMatrix. We could try the same trick, finding an a such that moveMatrix == a >< a. Unfortunately, this is impossible.

But we can still get close, there is a way to write moveMatrix == square a + square b, well, almost. Actually, what we have is:

2 * moveMatrix
      0 2 0 2 0        1 1 0 1 1      1 -1  0 -1  1 
      2 0 0 0 2        1 1 0 1 1     -1  1  0  1 -1 
 ==   0 0 0 0 0   ==   0 0 0 0 0  -   0  0  0  0  0   ==   square a - square b
      2 0 0 0 2        1 1 0 1 1     -1  1  0  1 -1 
      0 2 0 2 0        1 1 0 1 1      1 -1  0 -1  1 

where

a,b :: Array Int Integer
a = listArray (-2,2) [1,1,0,1,1]
b = listArray (-2,2) [1,-1,0,-1,1]

Now we can start with pathsconv from last time:

pathsconv n ij = (moveMatrix ^ n) `safeAt` ij

Where safeAt is a safe array indexing operator, that returns 0 for indices that are out of bounds:

safeAt ar i
    | inRange (bounds ar) i = ar ! i
    | otherwise             = 0

Now let's do some algebraic manipulation:

    pathsconv n ij
= {- by definition of pathsconv -}
    (moveMatrix ^ n) `safeAt` ij
= {- by defintion of a and b -}
    ((square a - square b) `div` 2)^n `safeAt` ij -- division by 2 is pseudocode
= {- division does not depend on the index -}
    (square a - square b)^n `safeAt` ij `div` 2^n

We still cannot apply the interchange law, because the exponentiation (^n) is applied to the difference of two tensor products and not a single one. We can, however, expand this exponentation by the formula:

(a + b)^n = sum [ multinomial [na,nb] * a^na * b^nb | (na,nb) <- split n ]

This is just the usual binomial expansion, as in


	(x+y)^2 &= x^2 + 2xy + y^2\\
	(x+y)^3 &= x^3 + 3x^2y + 3xy^2 + y^3\\
	\text{etc.}

Applying binomial expansion to our work-in-progress gives:

    (square a - square b)^n `safeAt` ij `div` 2^n
= {- binomial expansion -}
    sum [ multinomial [na,nb] * square a^na * (-square b)^nb
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- (-square b)^nb == (-1)^nb * square b^nb -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square a^na * square b^nb
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- interchange law -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square (a^na * b^nb)
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- move `safeAt` inwards, since addition is pointwise -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square (a^na * b^nb) `safeAt` ij
        | (na,nb) <- split n ]
    `div` 2^n

Fast indexing

Since square something already has n2 elements and the loop is performed n+1 times, this algorithm still requires O(n3) operations.

The only reason for calculating square (a^na * b^nb) is because we need the element at index ij. So instead of constructing a whole array, let's just calculate that single element:

-- square x `safeAt` ij  ==  x `squareAt` ij
x `squareAt` (i,j) = x `safeAt` i * x `safeAt` j

So the inner part of the algorithm becomes:

    square (a^na * b^nb) `safeAt` ij
= {- property of squareAt -}
    (a^na * b^nb) `squareAt` ij

We are still not there yet. Both a^na and b^nb have O(n) elements, so just calculating their convolution takes O(n2) work. But again, we need only two elements of the convolution, so we can define:

-- a * b `safeAt` i  ==  mulArrayAt a b i
mulArrayAt a b n = sum [ x * b `safeAt` (n-i) | (i,x) <- assocs a ]

And update squareAt accordingly:

mulSquareAt a b (i,j) = mulArrayAt a b i * mulArrayAt a b j

Finally we need a more efficient way to calculate all the powers of a and b. The iterate function can help us with that:

Knight4> iterate (*u) 1
[1, 1*u, 1*u*u, 1*u*u*u, 1*u*u*u*u, ...

Putting the pieces together gives a O(n2) algorithm for the knight moves problem:

pathstensor n ij
      = sum [ multinomial [na,nb] * (-1)^nb
            * mulSquareAt (powers_of_a !! na) (powers_of_b !! nb) ij
            | (na,nb) <- split n
            ]
       `div` 2^n
  where powers_of_a = iterate (*a) 1
        powers_of_b = iterate (*b) 1

Note that the savings we have made do not come directly from decomposing the moveMatrix. It is just that this decomposition allows us to see that we are computing all elements of am expensive product where a single one would do.

This post brings another order of improvement. Do you think you can do better than O(n2) time and O(n2) space complexity? If so I would like to hear.


*: The number of elements is often called the dimension of a vector. Here we use the term dimension to refer to the number of indices used, also known as the (tensor) order. So a 100*100 pixel image has dimension 10000 according to the first interpretation (the number of elements), but dimension two in the second interpretation (the number of indices).

posted by Twan | 2 comments | tags = /haskell | permalink | rss
Thu 2008-12-04 19:00

Knight in n, part 3: rings

[This post is literate Haskell, get the source here]

Previously in this series:

In this third installment, we will look at how to use various types as numbers, i.e. how to make them an instance of the Num type class. The solution the Knight-moves-problem will emerge at the end, almost as if by magic. :)

Tangent: Things as numbers

Many types can be used as if they are numbers. Haskell-wise this means they can be an instance of the Num type class. Mathematically it means that these types are rings.

Pairs as numbers

Let's start with a Num instance for pairs (α,β). In general, our only choice is to do everything pointwise. So for all operations ⊗ (i.e. (+), (-) and (*):


	\begin{pmatrix}a\\b\end{pmatrix}
	 \otimes
	\begin{pmatrix}c\\d\end{pmatrix}
	 =
	\begin{pmatrix}a\otimes c\\b \otimes d\end{pmatrix}

In ring theory this is called the direct product. In Haskell we can write it as:

instance (Num α, Num β) => Num (α,β) where
    (a,b) + (c,d) = (a+c,b+d)
    (a,b) - (c,d) = (a-c,b-d)
    (a,b) * (c,d) = (a*c,b*d)
    fromInteger i = (fromInteger i, fromInteger i)
    abs     (a,b) = (abs    a, abs    b)
    signum  (a,b) = (signum a, signum b)

We could also make instances for triples, quadruples and other tuples this way, but those are not needed for the rest of the story.

Arrays as numbers

A more general kind of tuple is an array; which is somewhat like a tuple of arbitrary size. Of course, that is not quite true, since two arrays with the same type can have a different size. One way around this problem is to treat all arrays as if they are infinite, by taking values outside the bounds to be equal to 0. So

listArray (0,0) [1] == listArray (-∞,∞) [..,0,0,1,0,0,..] -- pseudocode

That way we can still do addition pointwise,


	\begin{pmatrix}a\\b\\\vdots\end{pmatrix}
	 +
	\begin{pmatrix}c\\d\\\vdots\end{pmatrix}
	 =
	\begin{pmatrix}a+c\\b+d\\\vdots\end{pmatrix}

The accumArray function can help with the missing elements by setting them to 0 by default:

addArray a b = accumArray (+) 0 (min alo blo, max ahi bhi) (assocs a ++ assocs b)
  where (alo,ahi) = bounds a
        (blo,bhi) = bounds b

Next up is the fromInteger function. fromInteger 0 is easy; there are two options for other values

  1. fromInteger i is an infinite array of values i.
  2. fromInteger i is an array with values i at some single point.

The first choice mimics the definition for tuples, fromInteger i = (fromInteger i, fromInteger i). But for arrays this has the slight problem of requiring an infinite array. For the second alternative we need to pick the index where to put the number i. The obvious choice is to put i at 'the origin', index 0:

intArray i = listArray (0,0) [fromInteger i]

Finally, multiplication. As you have learned in school, multiplication can be seen as repeated addition, In our Haskell world that means that we expect the law a + a = fromInteger 2 * a to hold.

If we had used the first choice for fromInteger then multiplication could be done pointwise as it was for tuples. But we have made a different choice, so now fromInteger 2 is an array that contains the value 2 at index 0 (and is implicitly zero everywhere else). When calculating fromInteger 2 * a, this 2 should by multiplied with all elements of the array a.

The operation that does the right thing is convolution. It looks like this:


	\begin{pmatrix}a\\b\\c\end{pmatrix}
	 *
	\begin{pmatrix}d\\e\\f\end{pmatrix}
	 \; = \;
	\raisebox{5mm }{$a\begin{pmatrix}d\\e\\f\end{pmatrix}$} +
	\raisebox{0mm }{$b\begin{pmatrix}d\\e\\f\end{pmatrix}$} +
	\raisebox{-5mm}{$c\begin{pmatrix}d\\e\\f\end{pmatrix}$}
	 \; = \;
	\left(\begin{array}{@{}c@{}c@{}c@{}c@{}c@{}}
	     ad&&&&\\ae&+&bd&&\\af&+&be&+&cd\\&&bf&+&ce\\&&&&cf
	\end{array}\right)

So for each element v at index i in the first array, we shift a copy of the second array so that its origin becomes i. This copy is multiplied by v and all these copies are added. If one of the arrays is fromInteger v (i.e. a scalar), then this corresponds to multiplying all elements in the other array by v; exactly what we wanted.

Convolution can be implemented with accumArray as:

mulArray a b
    = accumArray (+) 0 (bounds a + bounds b)
      [ (i+j, x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]

Notice that we use the Num (α,β) instance for the bounds, and that this definition is a nicely symmetrical.

Putting it all together, we get the following instance:

instance (Ix i, Num i, Num a) => Num (Array i a) where
    fromInteger = intArray
    (+)         = addArray
    (*)         = mulArray
    negate      = fmap negate
    abs         = fmap abs
    signum      = fmap signum

In mathematical terms, what we constructed here is called a group ring. There is a group ring G[R] for any group G and ring R, which corresponds to an instance Num (Array g r) when g is a group (i.e. an instance of Num) and r is a ring (also an instance of Num).

Arrays as polynomials

Another way to interpret the above instance, is by treating arrays as polynomials over some variable x. The array array [(i,a),(j,b),(k,c),..] then represents the polynomial axi+bxj+cxk+.... The addition and multiplication defined above now have the expected meaning, for example:

> let a = listArray (0,2) [2,3,4]  --  2 + 3x + 4x2
> let b = listArray (1,2) [5,6]    --  5x + 6x2
> a + b
array (0,2) [(0,2),(1,8),(2,10)]   --  2 + 8x + 10x2
> a * b
array (1,4) [(1,10),(2,27),(3,38),(4,24)]  --  10x + 27x2 + 38x3 + 24x4

We can make this even more suggestive by defining:

x = listArray (1,1) [1]
> (2 + 3*x + 4*x^2) * (5*x + 6*x^2) == 10*x + 27*x^2 + 38*x^3 + 24*x^4
True

If you are interested in this interpretation, sigfpe wrote an interesting blog post about convolutions, polynomials and power series.

It's magic!

Now, let's go back to our original problem, the moves of a chess knight.

The positions reachable in a single move can be put into a two dimensional array (i.e. a matrix).

moveMatrix :: Array (Int,Int) Integer
moveMatrix = accumArray (+) 0 ((-2,-2),(2,2)) [ (m,1) | m <- moves ]

This is the familiar move matrix, which we already saw in part 1.

Knight3> printMatrix moveMatrix
    0 1 0 1 0
    1 0 0 0 1
    0 0 0 0 0
    1 0 0 0 1
    0 1 0 1 0

Now the magic. We defined multiplication of two arrays a and b as adding copies of b for each value in a. If we use the move matrix as b, then this means we add all possible destinations of a knight making one move from each place it can reach. Repeating this n times gives us our answer. Since repeated multiplication is exponentiation:

allPathsconv n = moveMatrix ^ n

For example, for n=2:
moveMatrix * moveMatrix

If we are interested in just a single point there is the array indexing operator (!!) to help us,

pathsconv n ij
    | inRange (bounds m) ij = m ! ij
    | otherwise             = 0
  where m = allPathsconv n

This convolutional algorithm can count the number of paths in O(n3), but not just for a single end point, but for all end points at once! The program is also a lot simpler than the

The pathsconv algorithm is pretty good, but we can still do better. Next time I will show how the algorithm from part 3 can be improved further, and curiously, how it will start to look more like the algorithm from part 2.

posted by Twan | 4 comments | tags = /haskell | permalink | rss
Mon 2008-12-01 23:22

Knight in n, part 2: combinatorics

[This post is literate Haskell, get the source here]

Previously in this series:

In my previous post I introduced the 'knight moves problem': How many ways are there for a chess knight to reach cell (i,j) in exactly n moves? The recursive solution from last time is horribly inefficient for larger values of n. Today I will show some more efficient solutions.

Ignoring the order of moves

If the knight first makes a move (-1,2) and then a move (2,1) it will end up at (1,3). If it first moves (2,1) and then (-1,2) it will also end up at (1,3). So, the order in which the moves happen does not matter for the final position! We can exploit this fact to make a faster program. Instead of determining what move to make at each step, we can count how many moves we make of each type and then determine in how many different orders these moves can be performed.

Denote by n1 the number of moves of the first type, n123 the number of moves of type 1, 2 or 3, etc. So n1234 = n1+n2+n3+n4, and since there are eight different moves, n12345678 = n. A count nab can be split into na+nb in several ways, for now we will consider all possibilities:

split n = [ (i,n-i) | i <- [0..n] ]

So for example, split 3 = [(0,3),(1,2),(2,1),(3,0)].

By repeatedly splitting n we arrive at:

pathssplit n (i,j) = sum $ do
    let n12345678 = n
    (n1,n2345678) <- split n12345678
    (n2,n345678) <- split n2345678
    (n3,n45678) <- split n345678
    (n4,n5678) <- split n45678
    (n5,n678) <- split n5678
    (n6,n78) <- split n678
    (n7,n8) <- split n78
    let counts = [n1,n2,n3,n4,n5,n6,n7,n8]
    guard $ (i,j) == destination counts
    return $ multinomial counts

Here we only keep sequences of moves that end up in (i,j), as determined by the destination function:

destination counts = (sum hs, sum vs)
    where (hs,vs) = unzip [ (n*δi,n*δj) | (n,(δi,δj)) <- zip counts moves ]

Next, we need to know how many different paths can be formed with a particular set of moves. You might remember binomial coefficients from high school, which give the number of ways to pick k items from a set of size n: multinomial [2,1,1]


	\binom{n}{k} = \frac{n!}{k!(n-k)!}

If we take n equal to m+k we get the number of different lists containing exactly k red balls and m green balls. Or put differently, the number of different paths containing k moves of the first type and m moves of the second type. This interpretation of binomial coefficients can be generalized two more than two types, giving multinomial coefficients. These are exactly what we need to determine the number of paths given the counts of each type of move:

multinomial xs | any (< 0) xs = 0
multinomial xs = factorial (sum xs)
               `div` product (map factorial xs)

This multinomial function requires calculating a lot of factorials, to make this as fast as possible they should be stored in an 'array':

factorial :: Int -> Integer
factorial = unboundedArray $ scanl (*) 1 [1..]

Calculating pathssplit only takes O(n7) integer operations, since each split effectively costs a factor n. While this is better than the previous result, it is still not satisfactory.

Solving the guard condition

The above function uses a "generate and test" approach: Generate all possibilities and test which ones reach the destination. It would be more efficient to generate only those possibilities.

Algebraic reasoning can help us here. Let's start by expanding the condition in the guard statement:

   (i,j) == destination counts
= {- by definition of destination -}
   (i,j) == (sum hs, sum vs)
     where (hs,vs) = unzip [ (n*δi,n*δj) | (n,(δi,δj)) <- zip counts moves ]
= {- expand unzip and simplify -}
   i == sum (zipWith (*) counts (map fst moves) &&
   j == sum (zipWith (*) counts (map snd moves)
= {- by definition of moves (see previous post) -}
   i == sum (zipWith (*) counts [2,2,-2,-2,1,-1,1,-1] &&
   j == sum (zipWith (*) counts [1,-1,1,-1,2,2,-2,-2]
= {- expanding the sum and product, remember n12 = n_1+2, etc. -}
   i == 2*n12*2 - 2*n34 + n57 - n68 &&
   j == 2*n56*2 - 2*n78 + n13 - n24
= {- reordering -}
   n57 - n68 == i - 2*n12*2 + 2*n34 &&
   n13 - n24 == j - 2*n56*2 + 2*n78

These are equations we can work with. Take the equation involving i. We know that n57 + n68 = n5678, and that n57 - n68 == i - 2*n12*2 + 2*n34. From these two equations, we can solve for n57 and n68, without needing an expensive split:

-- | find a and b such that a+b == c, a-b == d, a,b >= 0
solvepm c d
   | ok == 0 && a >= 0 && a <= c = return (a,c-a)
   | otherwise                   = mzero
 where (a,ok) = (c + d) `divMod` 2

This gives an O(n5) algorithm:

pathspm n (i,j) = sum $ do
    let n12345678 = n
    (n1234,n5678) <- split n12345678
    (n12,n34) <- split n1234
    (n56,n78) <- split n5678
    (n57,n68) <- solvepm n5678 (i - 2*n12 + 2*n34)
    (n13,n24) <- solvepm n1234 (j - 2*n56 + 2*n78)
    (n1,n2) <- split n12
    let n3 = n13 - n1
    let n4 = n24 - n2
    (n5,n6) <- split n56
    let n7 = n57 - n5
    let n8 = n68 - n6
    return $ multinomial [n1,n2,n3,n4,n5,n6,n7,n8]

Multinomial laws

It turns out that we don't actually need to know n1, n2, etc. If you think about it, the multinomial coefficient [n1,n2,n3,n4,n5,n6,n7,n8] means: "The number of different lists with n1 red balls, n2 of green balls, etc.". To make such a list we can first pick where to put the red balls, then where to put the blue balls, then the green balls and so on.

But we could also first decide where the brightly colored balls (red and green) go and where the dark collored ones (blue) go. Now there are only two types of balls, so this is a binomial coefficient, or in terms of a multinomial, multinomial [nrg,nb]. Then for the positions with brightly colored balls, we need to determine which ones are which color, which can be done in multinomial [nr,ng] ways. In a picture:

multinomial [2,1,1] = multinomial [3,1] * multinomial [2,1]

This same arguments also holds when there are eight types of balls (or moves), so

multinomial [n1,n2,n3,n4,n5,n6,n7,n8]
 == multinomial [n12,n34,n56,n78]
  * multinomial [n1,n2] * multinomial[n3,n4]
  * multinomial [n5,n6] * multinomial[n7,n8]

If you plug this into the pathspm function, you might notice that the last part of the function is calculating the product of two independent things. One part is about n1..n4 and the other about n5..n8. Now remember that the function paths takes the sum of all possibilities, and that products distributes over sums. This means that the two loops for n1234 and n5678 can be performed independently, giving us an O(n4) algorithm:

pathsO4 n (i,j) = sum $ do
    let n12345678 = n
    (n1234,n5678) <- split n12345678
    (n12,n34) <- split n1234
    (n56,n78) <- split n5678
    (n57,n68) <- solvepm n5678 (i - 2*n12 + 2*n34)
    (n13,n24) <- solvepm n1234 (j - 2*n56 + 2*n78)
    let result1234 = sum $ do
         (n1,n2) <- split n12
         let n3 = n13 - n1
         let n4 = n24 - n2
         return $ multinomial [n1,n2] * multinomial[n3,n4]
    let result5678 = sum $ do
         (n5,n6) <- split n56
         let n7 = n57 - n5
         let n8 = n68 - n6
         return $ multinomial [n5,n6] * multinomial[n7,n8]
    return $ multinomial [n12,n34,n56,n78] * result1234 * result5678

Here both of the result parts are of the form

sum [ multinomial [a,b] * multinomial[x-a,y-b] | (a,b) <- split n ]

which just so happens to be equivalent to just multinomial [x,y] (a proof of this statement is left as an exercise, i.e. I am too lazy to write it out). This equation immediately leads to a (much simpler) O(n3) algorithm:

pathsO3 n (i,j) = sum $ do
    let n12345678 = n
    (n1234,n5678) <- split n12345678
    (n12,n34) <- split n1234
    (n56,n78) <- split n5678
    (n57,n68) <- solvepm n5678 (i - 2*n12 + 2*n34)
    (n13,n24) <- solvepm n1234 (j - 2*n56 + 2*n78)
    return $ multinomial [n12,n34,n56,n78]
           * multinomial [n57,n68]
           * multinomial [n13,n24]

Verifying the results

After all this manipulation it is a good idea to check whether the program still does the right thing. We can either manually compare the path matrices:

check paths = and [ pathMatrix pathsrec n == pathMatrix paths n | n <- [0..3] ]

Or use QuickCheck or SmallCheck:

Knight2> smallCheck 5 (\(N n) ij -> pathsO3 n ij == pathsrec n ij)
...
Depth 5:
  Completed 726 test(s) without failure.

Finally, to contrast with the first part of this series, here is the time it takes to calculate the number of paths in 100 steps:

Knight2> pathsO3 100 (4,4)
2422219241769802380469882122062019059350760968380804461263234408581143863208781993964800
(4.75 secs, 270708940 bytes)

The recursive algorithm would need in the order of 1077 years to arrive at this answer.

Still, pathsO3 is not the fastest possible algorithm. Next time I will look at a completely different approach, but further improvements to the solution in this post are possible as well. As an exercise for the reader, you should try transforming pathsO3 into an O(n2) solution. Hint: there are more sums-of-products of independent values.

posted by Twan | 5 comments | tags = /haskell | permalink | rss
Wed 2008-11-26 02:04

Knight in n, part 1: moves

[This post is literate Haskell, get the source here]

Consider the following problem:

A knight is placed at the origin of a chessboard that is infinite in all directions. How many ways are there for that knight to reach cell (i,j) in exactly n moves?

This knight moves problem is not hard, nor does it have any real life applications. The problem is still interesting because there are many different ways to solve it, ranging from very simple to quite complex. In this series of articles I will describe some of these solutions.

Knight's moves

The possible moves for a chess knight

In chess, a knight can move two squares horizontally and one square vertically, or two squares vertically and one square horizontally. One complete move therefore looks like the letter 'L'. The picture on the right shows all possible moves for the black knight in the center.

We can summarize all these moves in an array:

moves :: [(Int,Int)]
moves = [(2,1),(2,-1),(-2,1),(-2,-1)
        ,(1,2),(-1,2),(1,-2),(-1,-2)]

Counting the number of paths to (i,j) in n steps can now be done with a simple recursive function. The base case is that in 0 moves only cell (0,0) is reachable. In the recursion step we simply try all moves:

pathsrec :: Int -> (Int,Int) -> Integer
pathsrec 0 (0,0) = 1
pathsrec 0 (_,_) = 0
pathsrec n (i,j) = sum [ pathsrec (n-1) (i+δi,j+δj) | (δi,δj) <- moves ]

So for example

Knight1> paths_rec 4 (2,2)
54

I.e. there are 54 ways to reach cell (2,2) in 4 moves.

Unfortunately the function pathsrec is not very efficient. In fact, it is very much not efficient. At each step all 8 possible moves are considered, so the total time complexity of this function is O(8n).

Tables

Besides calculating the number of paths to a single point it can also be interesting to display the number of pats for each possible end point. We can make a list of lists containing all the path counts,

pathMatrix paths n
    = [ [ paths n (i,j) | j <- [-2*n .. 2*n] ] | i <- [-2*n .. 2*n] ]

and then display this list in a tabular format

showMatrix :: Show α => [[α]] -> String
showMatrix xss = unlines [ unwords [ show x | x <- xs ] | xs <- xss ]

printPathMatrix paths = putStr . showMatrix . pathMatrix paths

The path matrix for n=1 should be familiar, it is the same as the image of possible moves of a knight.

Knight1> printPathMatrix pathsrec 1
    0 1 0 1 0
    1 0 0 0 1
    0 0 0 0 0
    1 0 0 0 1
    0 1 0 1 0

But now we can also make larger tables:

Knight1> printPathMatrix pathsrec 2
    0 0 1 0 2 0 1 0 0
    0 2 0 2 0 2 0 2 0
    1 0 0 0 2 0 0 0 1
    0 2 0 2 0 2 0 2 0
    2 0 2 0 8 0 2 0 2
    0 2 0 2 0 2 0 2 0
    1 0 0 0 2 0 0 0 1
    0 2 0 2 0 2 0 2 0
    0 0 1 0 2 0 1 0 0

If you were to continue increasing n, the table and the numbers in it become ever larger. It is a good idea to make a 'density plot', i.e. to use colors to visualize larger numbers. For example for n=4, the path matrix can be rendered as:

Special cases

Looking at the above matrices, you might start to see some patterns emerge:

These observations can be used as additional cases in the recursive function to quickly eliminate large parts of the input space:

pathscase :: Int -> (Int,Int) -> Integer
pathscase 0 (0,0) = 1
pathscase 0 (_,_) = 0
pathscase n (i,j) | (n+i+j) `mod` 2 /= 0 = 0
pathscase n (i,j) | abs i + abs j > 3*n  = 0
pathscase n (i,j) | abs i > 2*n          = 0
pathscase n (i,j) | abs j > 2*n          = 0
pathscase n (i,j) = sum [ pathscase (n-1) (i+δi,j+δj) | (δi,δj) <- moves ]

A quick test shows that this can be a big improvement for the run time:

Knight1> pathsrec 8 (4,4)
124166
(92.88 secs, 4605991724 bytes)
Knight1> pathscase 8 (4,4)
124166
(17.69 secs, 807191624 bytes)

The asymptotic time complexity of pathscase is harder to analyze. It is still O(8n) in the worst case, but the complexity is now also output dependant.


That is all for now, next time we will look at smarter algorithms. For the interested reader I would suggest that you try to come up with some ideas of your own. I would love to hear how other people approach this problem.

posted by Twan | 2 comments | tags = /haskell | permalink | rss
Fri 2008-11-14 22:10

Arrays without bounds

[This post is literate Haskell, get the source here]

Regular old arrays have a size; you can't just have an infinite array. On the other hand, a lazy language such as Haskell does allow infinite lists. The idea behind the UnboundedArray module is to combine the O(1) access of arrays with the unbounded size of lazy lists.

module UnboundedArray where

This data type is built on top of ordinary arrays and unsafe IO operations:

import Data.Array
import Data.IORef
import System.IO.Unsafe

To keep things simple, an unbounded array is just a function from the natural numbers to array elements:

type UnboundedArray a = Int -> a

I am just going to dump the code here instead of explaining it. The idea is to make an array and resize it when it becomes too small. If the size increases geometrically with each resize, then the amortized cost of a single access will be O(1).

-- | Create an unbounded array from an infinite list
--   Accessing element /n/ takes /O(n)/ time, but only /O(1)/ amortized time.
unboundedArray :: [a] -> UnboundedArray a
unboundedArray xs = unsafePerformIO . unsafePerformIO (unboundedArrayIO xs)

unboundedArrayIO :: [a] -> IO (Int -> IO a)
unboundedArrayIO xs = do
    theArray <- newIORef (listArray (0,0) xs)
    return $ \n -> do
        ar <- readIORef theArray
        let (0,size) = bounds ar
        if n <= size
          then return $ ar ! n
          else do let size' = max n (size * 3 `div` 2)
                  let ar' = listArray (0,size') xs
                  writeIORef theArray ar'
                  return $ ar' ! n

So, what are UnboundedArrays good for? A simple application is memoization, for example:

memoInt f = unboundedArray (map f [0..])

fib = memoInt realFib
  where realFib 0 = 1
        realFib 1 = 1
        realFib n = fib (n - 1) + fib (n - 2)
> map fib [1..20]
[1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946]

But since we can use an arbitrary list for initialization the unboundedArray function can sometimes be more flexible/convenient than memoInt.

posted by Twan | 2 comments | tags = /haskell | permalink | rss
Fri 2008-08-15 02:49

A generic merge function

[This post is literate Haskell, get the source here]

When working with sorted lists you often come to the point where you want to combine two or more of them. This merge procedure forms the heart of merge sort it works something like:

merge [1,3,4,5] [2,3,4] = [1,2,3,3,4,4,5]

This merge function is not in the Haskell standard library, and even if there were, it might not be very useful.

The problem is that when you need merge you often need a slight variation. For example, you might want to remove duplicates,

merge_union [1,3,4,5] [2,3,4] = [1,2,3,4,5]

Or find the elements common to both lists,

merge_intersection [1,3,4,5] [2,3,4] = [3,4]

Or you want the difference, the symmetric difference, or...


The solution for all these problems is to make a more general merge function. To do that we take a note from the most generic function over a single list, foldr. The generic merge function is also a right fold, but over two lists. Behold the type signature:

mergeByR :: (a -> b -> Ordering)  -- ^ cmp: Comparison function
         -> (a -> b -> c -> c)    -- ^ fxy: Combine when a and b are equal
         -> (a -> c -> c)         -- ^ fx:  Combine when a is less
         -> (b -> c -> c)         -- ^ fy:  Combine when b is less
         -> c                     -- ^ z:   Base case
         -> [a] -> [b] -> c       -- ^ Argument lists and result list

Don't be scared by the size. The reason there are a lot of arguments is that for each case we use a different combining function: If the smallest element comes from the first list we use fx, if it comes from the second list we use fy, and when the two elements are equal, we combine them both with fxy. As in foldr these calls to fx/fy/fxy are then chained like fx x1 (fx x2 (.. z)).

The lists from the example above can be aligned as follows:

xs               =  [1,      3,   4,   5 ]
ys               =  [    2,  3,   4      ]
function to use  =  [fx, fy, fxy, fxy, fx]
mergeByR ....    =  fx 1 . fy 2 . fxy 3 3 . fxy 4 4 . fx 5 $ z

The function implementation is straightforward:

mergeByR cmp fxy fx fy z = go
    where go []     ys     = foldr fy z ys
          go xs     []     = foldr fx z xs
          go (x:xs) (y:ys) = case cmp x y of
              LT -> fx  x   (go xs (y:ys))
              EQ -> fxy x y (go xs ys)
              GT -> fy    y (go (x:xs) ys)


Now, let's look at some uses of this function. First of all, the usual merge sort merge function:

mergeBy cmp = mergeByR cmp (\a b c -> a:b:c) (:) (:) []
merge = mergeBy compare

Instead of adding both a and b to the resulting list when they are equal, we can instead add only one of them, or even the result of some function on them. This gives the set union operation:

unionByWith cmp f = mergeByR cmp (\a b c -> f a b:c) (:) (:) []
unionWith = unionByWith compare

If we ignore elements that occur in only one of the lists by setting fx and fy to const id, we get the intersection instead:

intersectionByWith cmp f = mergeByR cmp (\a b c -> f a b:c) (const id) (const id) []
intersectionWith = intersectionByWith compare


With these merge functions, implementing merge sort becomes simple. All that is left to do is split a list in two, and recursively sort and merge.

split :: [a] -> ([a],[a])
split (x:y:zs) = let (xs,ys) = split zs in (x:xs,y:ys)
split xs       = (xs,[])

sort []  = []
sort [x] = [x]
sort xs  = let (ys,zs) = split xs in merge (sort ys) (sort zs)

If we replace merge by unionWith we instead get a sort that combines duplicate elements.


Besides set operations, mergeByR can also be (ab)used for other things, such as

zipWith = intersectionByWith (const $ const EQ)

Or a variant of zipWith, that keeps the tail of the longer list:

zipWith' = unionByWith (const $ const EQ)

We can even implement concatenation:

(++) = mergeByR (const $ const LT) undefined (:) (:) []
posted by Twan | 8 comments | tags = /haskell | permalink | rss
Sat 2008-07-26 17:19

Solving nonograms

In this post I will show how to solve nonograms automatically using a computer. The code has been on the Haskell wiki for over year, but I have never taken the time to explain how it works.

This post is literate haskell (download the source here), so we need to start with some imports:

import qualified Data.Set as Set
import qualified Data.Map as Map
import Data.Set (Set)
import Data.List
import Control.Applicative

Since we will be working with sets a lot, here are some additional utility functions:

setAll :: (a -> Bool) -> Set a -> Bool
setAll pred = all pred . Set.toList
unionMap :: (Ord a, Ord b) => (a -> Set b) -> Set a -> Set b
unionMap f = Set.unions . map f . Set.toList

The puzzle

So, what is a nonogram anyway? Quoting Wikipedia:

Nonograms are picture logic puzzles in which cells in a grid have to be colored or left blank according to numbers given at the side of the grid to reveal a hidden picture. In this puzzle type, the numbers measure how many unbroken lines of filled-in squares there are in any given row or column. For example, a clue of "4 8 3" would mean there are sets of four, eight, and three filled squares, in that order, with at least one blank square between successive groups.

A solved nonogram might look like the following image:

A Haskell function to solve nonograms for us could have the following type, taking the clues for the rows and columns, and returning a grid indicating which squares are filled,

solvePuzzle :: [[Int]] -> [[Int]] -> [[Bool]]

Values and cells

For simplicity we will start with a single row. A first idea is to represent the cells in a row as booleans, type Row = [Bool]. This works fine for a finished puzzle like:
[3,4][.####.###];
but consider a partially solved row:
[3,4][.#???.??#].

First of all we will need a way to distinguish between blank cells (indicated by a cross) and unknown cells. Secondly, we throw away a lot of information. For instance, we know that the last filled cell will be the last cell of a group of three.

To solve the second problem we can give each position an unique label, so the first filled cell will always be, for instance 1, the second one will be 2, etc. For blank cells we can use negative numbers; the first group of blanks will be labeled -1, the second group will be -2, etc. Since the groups of blanks are of variable size, we give each one the same value. Our solved row now looks like:
[3 4][-1,2,3,4,5,-6,-6,7,8,9].

In Haskell we can define the type of cell values as simply

newtype Value = Value Int
    deriving (Eq, Ord, Show)

Since negative values encode empty cells, and positive values are filled cells, we can add some utility functions:

blank (Value n) = n < 0
filled = not . blank

This still leaves the first issue, dealing with partially solved puzzles.

Partial information

When we don't know the exact value of a cell it is still possible that there is some information. For instance, we might know that the first cell will not contain the value 9, since that value is already somewhere else. One way of representing this is to keep a set of possible values:

type Cell = Set Value

An unknown cell is simply a cell containing all possible values, and the more we know about a cell, the less the set will contain.

At a higher level we can still divide cells into four categories:

data CellState = Blank | Filled | Indeterminate | Error
    deriving Eq

cellState :: Cell -> CellState
cellState x
    | Set.null      x = Error         -- Something went wrong, no options remain
    | setAll blank  x = Blank         -- The cell is guaranteed to be blank
    | setAll filled x = Filled        -- The cell is guaranteed to be filled
    | otherwise       = Indeterminate

CellStates are convenient for displaying (partial) solution grids,

instance Show CellState where
    show Blank         = "."
    show Filled        = "#"
    show Indeterminate = "?"
    show Error         = "E"

For example, here is our running example again, this time rotated 90°. The CellStates are shown on the left as before; while the actual Cell set is on the right:
[3 4][-1,2,3,4,5,-6,-6,7,8,9]

Solving a single row

Now it is time to solve a row.

As stated before, each filled cell gets a unique value. From a clue of the group lengths we need to construct such a unique labeling, such that labeling [4,3] == [-1,-1,2,3,4,5,-6,-6,7,8,9,-10,-10]. The exact values don't matter, as long as they are unique and have the right sign.

Constructing this labeling is simply a matter of iterating over the clues,

labeling :: [Int] -> [Value]
labeling = map Value . labeling' 1
    where labeling' n []     = [-n,-n]
          labeling' n (x:xs) = [-n,-n] ++ [n+1 .. n+x] ++ labeling' (n+x+1) xs

This labeling gives us important local information: we know what values can occur before and after a particular value. This is also the reason for including the negative (blank) values twice, since after a -1 another -1 can occur.

We can determine what comes after a value by zipping the labeling with its tail. In our example:

after    [-1,-1, 2, 3, 4, 5,-6,-6, 7, 8,  9, -10, -10]
comes [-1,-1, 2, 3, 4, 5,-6,-6, 7, 8, 9,-10, -10]

Collecting all pairs gives the mapping:

{ -1 -> {-1,2}, 2 -> {3}, 3 -> {4}, 4 -> {5}, 5 -> {-6}, -6 -> {-6,7}, ...}

Instead of carrying a Map around we can use a function that does the lookup in that map. Of course we don't want to recalculate the map every time the function is called, so we need to be careful about sharing:

bad1 a    x =  Map.lookup x (expensiveThing a)
bad2 a    x =  Map.lookup x theMap  where theMap = expensiveThing a
good a = \x -> Map.lookup x theMap  where theMap = expensiveThing a

So for determining what comes after a value in the labeling:

mkAfter :: [Value] -> (Value -> Cell)
mkAfter vs = \v -> Map.findWithDefault Set.empty v afters
    where afters = Map.fromListWith Set.union
                 $ zip vs (map Set.singleton $ tail vs)

Row data type

In the Row datatype we put all the information we have:

data Row = Row
    { cells         :: [Cell]
    , before, after :: Value -> Cell
    , start,  end   :: Cell
    }

Some simple Show and Eq instances:

instance Show Row where
    show row = "[" ++ concatMap show (rowStates row) ++ "]"

instance Eq Row where
    a == b  =  cells a == cells b

To construct a row we first make a labeling for the clues. Then we can determine what comes after each value, and what comes after each value in the reversed labeling (and hence comes before it in the normal order).

mkRow :: Int -> [Int] -> Row
mkRow width clue = Row
        { cells  = replicate width (Set.fromList l)
        , before = mkAfter (reverse l)
        , after  = mkAfter l
        , start  = Set.singleton $ head l
        , end    = Set.singleton $ last l
        }
    where l = labeling clue

Actually solving something

Now all the things are in place to solve our row: For each cell we can determine what values can come after it, so we can filter the next cell using this information. To be more precise, we can take the intersection of the set of values in a cell with the set of values that can occur after the previous cell. In this way we can make a forward pass through the row:

solveForward, solveBackward :: Row -> Row
solveForward row = row { cells = newCells (start row) (cells row) }
    where newCells _    []     = []
          newCells prev (x:xs) = x' : newCells x' xs
              where x' = x `Set.intersection` afterPrev
                    afterPrev = unionMap (after row) prev

Applying solveForward to the example row above, we get

solveForward

In much the same way we can do a backwards pass. Instead of duplicating the code from solveForward it is easier to reverse the row, do a forward pass and then reverse the row again:

solveBackward = reverseRow . solveForward . reverseRow

Where reverseRow reverses the cells and swaps before/after and start/end:

reverseRow :: Row -> Row
reverseRow row = Row
    { cells  = reverse (cells row)
    , before = after row,   after = before row
    , start  = end   row,   end   = start  row }

In the running example even more cells will be known after doing a backwards pass,

solveBackward

These two steps together are as far as we are going to get with a single row, so let's package them up:

solveRow :: Row -> Row
solveRow = solveBackward . solveForward

In the end we hopefully have a row that is completely solved, or we might h We can determine whether this is the case by looking at the CellStates of the cells:

rowStates :: Row -> [CellState]
rowStates = map cellState . cells

rowDone, rowFailed :: Row -> Bool
rowDone   = not . any (== Indeterminate) . rowStates
rowFailed = any (== Error) . rowStates

Human solution strategies

By using just one single solution strategy we can in fact emulate most of the techniques humans use. The Wikipedia page on nongrams lists several of these techniques. For instance, the simple boxes technique is illustrated with the example:

The Haskell program gives the same result:

Nonograms> solveRow $ mkRow 10 [8]
[??######??]

The reason why humans need many different techniques, while a single technique suffices for the program is that this simple technique requires a huge amount of administration. For each cell there is a while set of values, which would never fit into the small square grid of a puzzle.

The whole puzzle

Just a single row, or even a list of rows is not enough. In a whole nonogram there are clues for both the rows and the columns. So, let's make a data type to hold both:

data Puzzle = Puzzle { rows, columns :: [Row] }
    deriving Eq

And a function for constructing the Puzzle from a list of clues,

mkPuzzle :: [[Int]] -> [[Int]] -> Puzzle
mkPuzzle rowClues colClues = Puzzle 
    { rows    = map (mkRow (length colClues)) rowClues
    , columns = map (mkRow (length rowClues)) colClues
    }

To display a puzzle we show the rows,

instance Show Puzzle where
    show = unlines . map show . rows
    showList = showString . unlines . map show

Initially the puzzle grids are a bit boring, for example entering in GHCi

Nonograms> mkPuzzle [[1],[3],[1]] [[1],[3],[1]]
[???]
[???]
[???]

We already know how to solve a single row, so solving a whole list of rows is not much harder,

stepRows :: Puzzle -> Puzzle
stepRows puzzle = puzzle { rows = map solveRow (rows puzzle) }

Continuing in GHCi:

Nonograms> stepRows previousPuzzle
[???]
[###]
[???]

To also solve the columns we can use the same trick as with reverseRow, this time transposing the puzzle by swapping rows and columns.

transposePuzzle :: Puzzle -> Puzzle
transposePuzzle (Puzzle rows cols) = Puzzle cols rows

But this doesn't actually help anything! We still display only the rows, and what happens there is not affected by the values in the columns. Of course when a certain cell in a row is filled (its cellState is Filled), then we know that the cell in the corresponding column is also filled. We can therefore filter that cell by removing all blank values

filterCell :: CellState -> Cell -> Cell
filterCell Blank  = Set.filter blank
filterCell Filled = Set.filter filled
filterCell _      = id

A whole row can be filtered by filtering each cell,

filterRow :: [CellState] -> Row -> Row
filterRow states row = row { cells = zipWith filterCell states (cells row) }

By transposing the list of states for each row we get a list of states for the columns. With filterRow the column cells are then filtered.

stepCombine :: Puzzle -> Puzzle
stepCombine puzzle = puzzle { columns = zipWith filterRow states (columns puzzle) }
    where states = transpose $ map rowStates $ rows puzzle

To solve the puzzle we apply stepRows and stepCombine alternatingly to the rows and to the columns. When to stop this iteration? We could stop when the puzzle is done, but not all puzzles can be solved this way. A better aproach is to take the fixed point:

solveDirect :: Puzzle -> Puzzle
solveDirect = fixedPoint (step . step)
    where step = transposePuzzle . stepCombine . stepRows

The fixed point of a function f is the value x such that x == f x. Note that there are different fixed points, but the one we are interested in here is found by simply iterating x, f x, f (f x), ...

fixedPoint :: Eq a => (a -> a) -> a -> a
fixedPoint f x
    | x == fx   = x
    | otherwise = fixedPoint f fx
  where fx = f x

The tiny 3*3 example can now be solved:

Nonograms> solveDirect previousPuzzle
[.#.]
[###]
[.#.]

But for other puzzles, such as the letter lambda from the introduction, we have no such luck:

Nonograms> solveDirect lambdaPuzzle
[??????????]
[??????????]
...

Guessing

To solve more difficult puzzles the direct reasoning approach is not enough. To still solve these puzzles we need to make a guess, and backtrack if it is wrong.

Note that there are puzzles with more than one solution, for example
and

To find all solutions, and not just the first one, we can use the list monad.

To make a guess we can pick a cell that has multiple values in its set, and for each of these values see what happens if the cell contains just that value. Since there are many cells in a puzzle there are also many cells to choose from when we need to guess. It is a good idea to pick the best one.

For picking the best alternative a pair of a value and a score can be used:

data Scored m a = Scored { best :: m a, score :: Int }

This data type is an applicative functor if we use 0 as a default score:

instance Functor m => Functor (Scored m) where
    fmap f (Scored a i) = Scored (fmap f a) i
instance Applicative m => Applicative (Scored m) where
    pure a = Scored (pure a) 0
    Scored f n <*> Scored x m = Scored (f <*> x) (n `min` m)

When there are alternatives we want to pick the best one, the one with the highest score:

instance Alternative m => Alternative (Scored m) where
    empty = Scored empty minBound
    a <|> b | score a >= score b  =  a
            | otherwise           =  b

Now given a list we can apply a function to each element, but change only the best one. This way we can find the best cell to guess and immediately restrict it to a single alternative. We can do this by simply enumerating all ways to change a single element in a list.

mapBest :: Alternative m => (a -> m a) -> [a] -> m [a]
mapBest _ []      =  pure []
mapBest f (x:xs)  =  (:xs) <$> f x         -- change x and keep the tail
                 <|> (x:) <$> mapBest f xs -- change the tail and keep x

This can also be generalized to Rows and whole Puzzles:

mapBestRow :: Alternative m => (Cell -> m Cell) -> Row -> m Row
mapBestRow f row = fmap setCells $ mapBest f $ cells row
    where setCells cells' = row { cells = cells' }

mapBestRows :: Alternative m => (Cell -> m Cell) -> Puzzle -> m Puzzle
mapBestRows f puzzle = fmap setRows $ mapBest (mapBestRow f) $ rows puzzle
    where setRows rows' = puzzle { rows = rows' }

What is the best cell to guess? A simple idea is to use the cell with the most alternatives, in the hope of eliminating as many of them as soon as possible. Then the score of a cell is the size of its set. The alternatives are a singleton set for each value in the cell.

guessCell :: Cell -> Scored [] Cell
guessCell cell = Scored
    { best  = map Set.singleton $ Set.toList cell
    , score = Set.size cell }

We can now make a guess by taking the best way to apply guessCell to a single cell:

guess :: Puzzle -> [Puzzle]
guess = best . mapBestRows guessCell

Putting it together

Direct solving is much faster than guess based solving. So the overall strategy is to use solveDirect, and when we get a puzzle that is not done we do a single guess, and then continue with direct solving all alternatives:

solve :: Puzzle -> [Puzzle]
solve puzzle
    | failed puzzle' = []
    | done   puzzle' = [puzzle']
    | otherwise      = concatMap solve (guess puzzle')
  where puzzle' = solveDirect puzzle
done, failed :: Puzzle -> Bool
done   puzzle = all rowDone   (rows puzzle ++ columns puzzle)
failed puzzle = any rowFailed (rows puzzle ++ columns puzzle)

Finally we can solve the lambda puzzle!

lambdaPuzzle = mkPuzzle
    [[2],[1,2],[1,1],[2],[1],[3],[3],[2,2],[2,1],[2,2,1],[2,3],[2,2]]
    [[2,1],[1,3],[2,4],[3,4],[4],[3],[3],[3],[2],[2]]
Nonograms> solve lambdaPuzzle
[.##.......]
[#.##......]
[#..#......]
[...##.....]
[....#.....]
[...###....]
[...###....]
[..##.##...]
[..##..#...]
[.##...##.#]
[.##....###]
[##.....##.]
posted by Twan | 6 comments | tags = /haskell | permalink | rss
Wed 2008-01-30 17:45

Simple reflection of expressions

This blog post is inspired by a message from Cale on #haskell yesterday. He came up with an amazing way to show how foldr and foldl work:

<Cale>      > foldr (\x y -> concat ["(f ",x," ",y,")"]) "z" (map show [1..5])
<lambdabot> "(f 1 (f 2 (f 3 (f 4 (f 5 z)))))"

While the output looks great, the call itself could be clearer, especially for beginners. Through a combination of overloading and small hacks it is possible to get the same result with a much nicer expression,

> foldr f x [1..5]
f 1 (f 2 (f 3 (f 4 (f 5 x))))

Let's get started

I will call this module SimpleReflect, since this is a poor mans form of reflection, converting code back to expressions at run time.

module SimpleReflect where

Our results will be 'expressions'. All we need to do with expressions is show them, convert them to strings.

The Show class has a function showsPrec :: Int -> a -> ShowS for converting a value of type a to a string. The ShowS type improves the performance compared to using strings; the integer is used for putting parentheses in the right places. But none of this matters for now, we will just emulate that behavior for our expression type:

newtype Expr = Expr { showExpr :: Int -> ShowS }
instance Show Expr where
    showsPrec p r = showExpr r p

The things like f and x will be variables these are just strings. Showing strings is easy,

var :: String -> Expr
var s = Expr { showExpr = \_ -> showString s }

In fact, we can show all kinds of values, for instance numbers. So we could make a function that lifts any showable value to an expression:

lift :: Show a => a -> Expr
lift x = Expr { showExpr = \p -> showsPrec p x }

While this is almost identical to var, it is not the same, because the Show instance for String is not the same as showString. Compare:

> var "x"
x
> lift "x"
"x"

From variables to functions

In your average piece of source code multiple expressions are combined with operators. The most common operator is function application, written with just whitespace. Each Haskell operator has a precedence level, indicating how tight that operator binds to its arguments.

In this blog post we only deal with left associative operators, which means that the left sub-expressions is printed with the same precedence level. A simple combinator for operators is then:

op :: Int -> String -> Expr -> Expr -> Expr
op prec op a b = Expr { showExpr = showFun }
 where showFun p = showParen (p > prec) $
                   showExpr a prec . showString op . showExpr b (prec + 1)

We would like to be able to use variables like f as if they were functions, so this f has to have the type f :: a -> Expr, or f :: a -> b -> Expr, etc. This can be done with type classes. The class FromExpr defines what things we can use expressions for:

class FromExpr a where
    fromExpr :: Expr -> a

Obviously expressions are themselves expressions,

instance FromExpr Expr where
    fromExpr = id

Any expression can also be used as a function. As stated above function application is the operator " "; it has precedence level 10, higher than any real operator. To be as generic as possible we can lift any showable argument to an expression.

instance (Show a, FromExpr b) => FromExpr (a -> b) where
    fromExpr f a = fromExpr $ op 10 " " f (lift a)

With FromExpr in place we can make more generic variables that can be used as any function type:

fun :: FromExpr a => String -> a
fun = fromExpr . var

With all this in place Cale's foldr example can now be written as

> foldr (fun "f") (var "x") [1..5]
f 1 (f 2 (f 3 (f 4 (f 5 x))))

Lifting the alphabet

To write even shorter examples a slightly evil idea is to simply define 26 variables,

a,b,c,.. :: FromExpr a => a

There is a minor problem with this idea, which will become apparent once you try it out:

*SimpleReflect> foldr f x [1..5]

<interactive>:1:8:
    Ambiguous type variable `b' in the constraints:
      `FromExpr b' arising from a use of `x' at <interactive>:1:8
      `Show b' arising from a use of `f' at <interactive>:1:6
    Probable fix: add a type signature that fixes these type variable(s)

The compiler doesn't know what the type of x should be. It is only used as an argument to f, but that can be any Showable type. In the future we might be able to write default FromExpr Expr (see the Haskell' wiki), but until then we will have to do something else.

Since usually the names f, g, etc. are used for functions, I chose to only overload those, and make the rest simple variables:

a,b,c,d,e,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z :: Expr
[a,b,c,d,e,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]
 = [var [x] | x <- ['a'..'e']++['i'..'z']]
f,g,h :: FromExpr a => a
f = fun "f"
g = fun "g"
h = fun "h"

With our 26 new top level names we can finally the original example in a natural way,

> foldr f x [1..5]
f 1 (f 2 (f 3 (f 4 (f 5 x))))

Lifting numbers (a.k.a. lots-of-instances)

All this work for just foldr and foldl seems like a bit of a waste of time. To make things a little bit more interesting we could also add support for numeric operations. Then we can write

> sum [1..5] :: Expr
0 + 1 + 2 + 3 + 4 + 5

To do this we need to define instances of Num and Enum. The first of these is not very hard, but we need Eq and later Ord instances as well.

instance Eq Expr where
    a == b = show a == show b

The Ord class has two functions of type a -> a -> a, which is where we can do something interesting:

instance Ord Expr where
    compare a b = compare (show a) (show b)
    min = fun "min"
    max = fun "max"

Now we get minimum [x,y,z] ==> min (min x y) z for free.

The Num class has some operators. The mechanism for defining does is already in place, so this class should be simple:

instance Num Expr where
    (+)    = op 6 " + "
    (-)    = op 6 " - "
    (*)    = op 7 " * "
    negate = fun "negate"
    abs    = fun "abs"
    signum = fun "signum"
    fromInteger = lift

To write [1..5] :: [Expr], Expr needs to be an instance of Enum. Here we bump into a bit of a problem, how do we enumerate expressions?

Well, I will cheat a bit, and read out the expression as an integer. This operation is the inverse of lift, let's call it unlift:

unlift :: Read a => Expr -> a
unlift expr = read (show expr)

Conversion to Integers is usually done with toInteger from the Integral, so we add an instance for that as well. We need a Real instance first:

instance Real Expr where
    toRational = toRational . toInteger
instance Integral Expr where
    toInteger = unlift
    quot = op 7 " `quot` "
    rem  = op 7 " `rem` "
    div  = op 7 " `div` "
    mod  = op 7 " `mod` "
    -- someone forgot a default :(
    quotRem a b = (quot a b, rem a b)
    divMod  a b = (div  a b, mod a b)

Finally the Enum class. As I already said, the actual enumeration can be handled by going through toInteger and fromInteger.

instance Enum Expr where
    succ   = fun "succ"
    pred   = fun "pred"
    toEnum = fun "toEnum"
    fromEnum = fromEnum . toInteger
    enumFrom       a     = map fromInteger $ enumFrom       (ti a)
    enumFromThen   a b   = map fromInteger $ enumFromThen   (ti a) (ti b)
    enumFromTo     a   c = map fromInteger $ enumFromTo     (ti a)        (ti c)
    enumFromThenTo a b c = map fromInteger $ enumFromThenTo (ti a) (ti b) (ti c)
ti = toInteger -- just to fit in the page layout of the blog

Playing a bit

None of the above was terribly complicated, just a lot of boilerplate code. What can we do with such a well-plated boiler? Here are some examples:

> sum $ map (*x) [1..5]
0 + 1 * x + 2 * x + 3 * x + 4 * x + 5 * x
> iterate (^2) x
[x, x * x, x * x * (x * x), x * x * (x * x) * (x * x * (x * x)), ...
> scanl f x [a,b,c]
[x, f x a, f (f x a) b, f (f (f x a) b) c]
> zipWith3 f [1,2..] [1,3..] [1,4..] :: [Expr]
[f 1 1 1, f 2 3 4, f 3 5 7, f 4 7 10, f 5 9 13, f 6 11 16, ...

Coming soon to a lambdabot near you.

posted by Twan | 7 comments | tags = /haskell | permalink | rss
Wed 2007-11-07 19:26

References, Arrows and Categories

Recap: functional references

Last time (okay, it was over two months ago) I talked about overloading functional references so that they can be used both as regular functions and as references. The data type of references I used was

data FRef s a = FRef
      { get :: s -> a
      , set :: a -> s -> s
      }

While I arrived at the type class,

class Ref r where
      ref :: (a -> b) -> (b -> a -> a) -> r a b
      (.) :: r b c -> r a b -> r a c

This provides 'construction' and 'composition'.

Arrows

The class parameter r has kind * -> * -> *, meaning it takes two types and 'evaluates' to a type. If you are familiar with the Haskell libraries you may know there is a similar class who's parameter also has kind * -> * -> *, called Arrow. There is an instance Arrow (->), just like there I defined an instance Ref (->).

According to the arrows webpage, "Arrows are a new abstract view of computation". This raises the question: Can we combine these ideas? Are functional references arrows?

That Arrow class looks like

class Arrow a where
      arr :: (b -> c) -> a b c
      (>>>) :: a b c -> a c d -> a b d
      -- some more stuff

If you look closely, (>>>) is just (.) with the arguments reversed. What about arr? arr should turn any function into a reference. But references need a way to transform the result back to be able to set the new value.

Clearly this is not going to work. The problem is that Arrows are not general enough!

The easy fix

What we need here is to make Ref a superclass of Arrow. All current Arrows can implement (.), since it is the same as (>>>). To implement ref an arrow just ignores the setter, then ref becomes the same as arr.

Okay, we're done.

Except that we will have the same problem again the next time someone wants to generalize Arrows. It would be much better to fix it once and for all.

Categories

The most basic idea behind arrows comes from category theory . An 'arrow' or 'morphism' is a connection between two objects from a 'category', in this case two Haskell types. Usually the category we are interested in is Hask, the category of Haskell functions. In Hask a 'morphism' between two types a and b is just a function from type a to type b, so a function of type a -> b.

An Arrow is nothing more than a different category. The types are still the same, but instead of a function we get something else for the morphisms.

If you look up the definition of a category you will find that in general only two things are required of morphism:

Unlike the Arrow type class, there is nothing saying that the morphisms have to correspond to functions. For instance using the reverse arrow, (<-) = flip (->), as the morphisms gives is a perfectly valid category. So does the something as strange as newtype I a b = I Integer.

If you look at the above definition of a category, it immediately leads to a Haskell type class

class Category cat where
      id  :: cat a a
      (.) :: cat b c -> cat a b -> cat a c

Which is a generalization of both Ref and Arrow. So it turns out that the only essential component we are left with is the composition operator. Everything else was relating the category to Haskell functions or references.

Category theory comes with a small set of laws as well:

id . f  ==  f  ==  f . id
f . (g . h) == (f . g) . h

In other words, composing with the identity function does nothing, and composition is associative. Great!

Making it useful again

Now that we have kicked arr and friends out of the type class lots of types can become instances. On the other hand, the class itself has become pretty useless.

Before going to functional references there is a more general notion, invertible functions. These are discussed in relation to arrows in "There and back again: arrows for invertible programming". The only way to be sure that a function is invertible is to give its inverse. In a data type that could look like

data Invertible a b = Invertible
      { forward  :: a -> b
      , backward :: b -> a
      }

To put that in the Arrow/Category framework we can add a subclass InvArrow. It is similar to the Ref class, only for invertible functions instead of references.

class Category cat => InvArrow cat where
      arrInv :: (a -> b) -> (b -> a) -> (a ~~> b)
      
      -- We get a default implementation for id.
      -- Note that this is not valid Haskell, we would need something like class aliases.
      id = arrInv (\x -> x) (\x -> x)

What does it mean if a type/category is an instance of InvArrow? It means that that category contains all invertible functions. Read this statement carefully. An InvArrow does not mean that morphisms in the category are invertible, but that invertible functions can be turned into morphisms.

With InvArrow we already get all kinds of interesting morphisms, for example

negate :: (InvArrow cat, Num a) => cat a a
(+) :: (InvArrow cat, Num a) => a -> cat a a
> update negate (+1) 3 == 2 -- increment the negation by 1, so decrement by one
> set (3+) undefined 4 == 1 -- find a value x such that 3+x == 4

This last example is a bit ugly, because we use function references. It will look better if we use the Invertible type. The function similar to set is inverse:

-- Get the inverse of an invertible function
inverse :: InvArrow cat => Invertible a b -> cat b a
inverse i = arrInv (backward i) (forward i)

Now we can write the above example without undefined:

> inverse (+3) 4 == 1

To references and beyond

Inverses are nice, but we haven't got references yet. There is no way to define fst :: InvArrow cat => cat (a,b) a.

For that we really need the Ref class, or in this arrow framework, RefArrow:

class InvArrow cat => RefArrow cat where
      arrRef :: (a -> b) -> (b -> a -> a) -> cat a b
      
      -- A default implementation of @arrInv@
      arrInv f g = arrRef f (\b a -> g b)

Like with InvArrow, if a category type is an instance of RefArrow, it means that that category contains all functional references.

Finally, the least restrictive class is the regular old Arrow,

class RefArrow cat => Arrow cat where
      arr :: (a -> b) -> cat a b
      
      arrRef f _ = arr f

To summarize, we now have a class hierarchy that looks like

Category => InvArrow => RefArrow => Arrow

The rest of the Arrow class

If you look back to the definition of Arrow I gave above, you will see

      -- some more stuff

Besides lifting (arr) and composition (>>>) the standard Arrow class also defines combinators for working with tupled values.

We could put these in the new Arrow class, but they might also be useful for types which are not full arrows. Like, say, functional references.

The most flexible thing to do is to put this functionality in yet another class. For working with pairs we can define

class Category cat => CategoryPair cat where
      first  :: cat a b -> cat (a,c) (b,c)
      second :: cat a b -> cat (c,a) (c,b)
      (***)  :: cat a b -> cat c d -> cat (a,c) (b,d)

There are some tricky issues to work out, but this post is already five pages long.

I am going to stop here. Pairs, sum types, fixed points, monoids and duality all will have to wait until next time.

The code

That was a long story, and I even stopped way before the end and skipped the instances.

The generalized arrow/category framework is growing into a useful library that hopefully someday can become part of the base libraries. I have decided to put the code somewhere. In this case, somewhere is

darcs get http://code.haskell.org/category

As the name suggests, this library is not just for functional references. Rather it contains the whole Category framework. The FRef type is just a bonus.

The library also contains code for deriving RefArrow functions for record fields, courtesy to omnId.

footnotes
: I am in no way a category theory expert; Category theorists feel free to hate me for abuse of terminology and incorrect explanations. In particular, the type constructor cat is not really the category itself, just like the f in Functor f is not really a functor. But it is the closest thing we have got.

posted by Twan | 4 comments | tags = /haskell | permalink | rss
Mon 2007-09-03 17:08

Overloading functional references

Recently there have been some blog post and mailing list messages about "functional references". In this message I will look into ways to improve upon that concept.

The above links should give you an idea of what a functional reference is, but I will explain it here in my own words. You can skip this introduction if you already know what functional references are.

What are functional references?

A functional reference is a data structure that can be used to update parts of another structure, it is a reference into that structure. We need a way to get the part, and a way to replace the part by setting it to something else. This leads to the data type:

data FRef s a = FRef
      { get :: s -> a
      , set :: a -> s -> s
      }

Now FRef s a represents a reference to an a inside an s structure.

One of the simplest possible (non-trivial) references is that to the first part of a pair:

fst :: FRef (x,y) x
fst = FRef
      { get = \(x,y) -> x
      , set = \x (_,y) -> (x,y)
      }

Having defined this, we can use it to access and modify pairs, for example

> get fst (1,2)
1
> set fst 3 (1,2)
(3,2)

You can read this as "get the first part of ..." and "set the first part to 3 in ...".

Another useful function is update,

update :: FRef s a -> (a -> a) -> (s -> s)
update ref f s = set ref (f (get ref s)) s

Update gets the value, applies a function, and sets it again. This allows us to 'map' functions over parts of data structures:

> update fst (+1) (1,2)
(2,2)

The real power of functional references lies in their composability. Like functions, we can compose two references to give a new one

compose :: FRef b c -> FRef a b -> FRef a c
compose bc ab = FRef
      { get = get bc . get ab
      , set = update ab . set bc
      }

We can now modify nested pairs:

> update (fst `compose` fst) (*2) ((3,4),5)
((6,4),5)

Use case: records

The place where these references shine is with records. Say we have the following data type:

data Employee = Employee
      { name   :: String
      , salary :: Int
      }

It would be great if name and salary where references, then we could simply say

giveRaise = update salary (+100)

This shouldn't be too hard to automate with Data.Derive, the functions would look like

name = FRef
      { get = name_
      , set = \n e -> e { name_ = n }
      }

Or better yet, we could specify references as the default behavior in the next Haskell standard!

There is a problem, however, when we have defined the record accessors to be references. Take the normally legal code

johnsSallary = salary john

This is no longer valid, since salary is not a function. Instead we must write

johnsSallary = get salary john

Type classes to the rescue

Fortunately there is a clean solution to this problem using type classes. We can define the type class

class Ref r where
      ref :: (a -> b) -> (b -> a -> a) -> r a b

We can define an instance for functional references,

instance Ref FRef where
      ref = FRef

As well as for functions

instance Ref (->) where
      ref = const

The record accessors can now be defined as

name :: Ref r => r Employee String
name = ref name_ (\n e -> e { name_ = n })

And all is well again:

giveRaise = update salary (+100)
johnsSallary = salary john

While we are at it, we could also add the (.) operator to the class,

class Ref r where
      ref :: (a -> b) -> (b -> a -> a) -> r a b
      (.) :: r b c -> r a b -> r a c
instance Ref FRef where
      ref = FRef
      (.) = compose
instance Ref (->) where
      ref = const
      (.) = (Prelude..) -- the (.) from the prelude

now

giveRaiseToFirst = update (salary . fst) (+100)

Gives a raise to the first employee in a pair.

Concluding remarks

Note that all this is perfectly valid Haskell 98 code, no extensions are needed. This means that it should not be hard to add such references to the language standard.

There are many more neat things you can do with functional references, but I will save that for another time.

posted by Twan | 8 comments | tags = /haskell | permalink | rss
Mon 2007-04-16 00:05

Knuth-Morris-Pratt in Haskell

A request that comes up regularly on the Haskell mailing list is for a function to determine whether one string (the needle) is a substring of another one (the haystack). While there is no such function in the Haskell standard library, it is easy enough to implement:

import Data.List
as `isSubstringOf` bs = any (as `isPrefixOf`) (tails bs)

Unfortunatly, this function has a worst case time complexity of O(length as * length bs). For example if we evaluate

"aaaaaaaaaab" `isSubstringOf` replicate 100 'a'

We will first match 10 characters starting from the first position and fail just before we matched the entire string. Then, starting from the second position, we will match 10 characters again, etc. In total we we will do 11 * 100 = O(length as * length bs) comparisons.

There exists an algorithm called the Knuth-Morris-Pratt string searching algorithm which has a much better, O(length as + length bs), worst case behavior. Unfortunately all descriptions you find of the algorithm rely on building a table, and using random access patterns on it. Not only does this make it impossible to use simple data structures like lists, it also obfuscates the underlying idea.

The idea

The core idea of the algorithm is that we only want to process each character of both strings once. This is done by building a table from the needle, and using that table to determine what should be done after each character of the haystack. Either the entire needle has been matched at that point and we are done, or we get a new position in the table to use for the next character.

So, let's turn the above description into a Haskell datatype!

data KMP a = KMP
      { done :: Bool
      , next :: (a -> KMP a)
      }

Clearly, if we know how to make such a 'table' the matching process is straight forward. We need to apply next to each character and we want to know if any of the intermediate tables are done:

isSubstringOf2 :: Eq a => [a] -> [a] -> Bool
isSubstringOf2 as bs = match (makeTable as) bs
   where  match table []     = done table
          match table (b:bs) = done table || match (next table b) bs

This can be made shorter using functions from the Prelude:

isSubstringOf3 as bs = any done $ scanl next (makeTable as) bs

Making the table

All that is left is to make a table, constructing it using a simple recursive function is not an option

makeTable1 :: Eq a => [a] -> KMP a
makeTable1 []     = KMP True  undefined?
makeTable1 (x:xs) = KMP False (\c -> if c == x then makeTable1 xs else ????)

Because what do we do if we don't have a match? Let's look at an example, the calculation "abc" `isSubstringOf` "aabc" would go something like:

makeTable "abc" = table0
done table0 = False
next table0 'a' = (\c -> if c == 'a' then table1 else ????) 'a' = table1
done table1 = False
next table1 'a' = (\c -> if c == 'b' then table2 else ????) 'a'
                = ???? -- what to do now?

What we should do, is start over, but dropping the first character from the input, in this case that gives

-- start over, now for "abc" `isSubstringOf` "abc"
next table0 'a' = (\c -> if c == 'a' then table1 else ????) 'a' = table1
done table1 = False
next table1 'b' = (\c -> if c == 'b' then table2 else ????) 'b' = table2
done table2 = False
next table2 'b' = (\c -> if c == 'c' then table3 else ????) 'c' = table3
done table3 = True

The trick

At first glance it would seem that we have to reexamine parts of the haystack when we start over. But this is not the case.

If, for example the test of table35 fails, we don't have to move back 35 characters, because we already know what those characters are, namely the characters we matched to get to table35! So the table in case of a failed match is always the same, and we can compute that as well.

Lets look again at the makeTable function. If f is the table we get for a failed match, we call next f the failure function, and pass it along as a second parameter. For the first character, in case of a failed match we simply and start from the beginning for the next character:

makeTable :: Eq a => [a] -> KMP a
makeTable xs = table
   where table = makeTable' xs (const table)

Notice we have tied the knot, table depends on table itself! In Haskell this is not a problem because of lazy evaluation, as long as we don't try to use what is not computed yet.

The makeTable' function is where the real work happens.

makeTable' []     failure = KMP True failure
makeTable' (x:xs) failure = KMP False test
   where  test  c = if c == x then success else failure c
          success = makeTable' xs (next (failure x))

The base case is not very interesting, although we can now use something better than undefined. That becomes useful when looking for multiple matches.

The interesting clause is for (x:xs). The next function compares a character c against x.
Is it the same? Great, move to the table for xs.
Is it different? Then look at the failure function.

Finally, to determine the table for xs, we need a new failure function, describing what would have happened if we started later and ended up at the position after x. We can ask the current failure function what would have happened in that case, next (failure x).

Correctness

It would be nice if we could be sure that what we have constructed is actually a substring matching algorithm. The easiest way to verify that I use a simple QuickCheck property:

prop_isSubstringOf :: [Bool] -> [Bool] -> Bool
prop_isSubstringOf as bs = (as `isSubstringOf` bs) == (as `isSubstringOf2` bs)
> Test.QuickCheck.test prop_isSubstringOf
OK, passed 100 tests.

It seems to work, that's great.

An interesting exercise would be to prove that what I have made here is equivalent to the naïve algorithm using equational reasoning. Also nice would be comparing it to the imperative Knutt-Moris-Pratt algorithm, is this actually KMP? Maybe next time.

footnotes
Actually, this function was recently added under the, in my opnion, wrong name isInfixOf.
It is wrong because while "a is a prexif of b" and "a is a suffix of b" are valid English sentences, there is as far as I know no such thing as "an infix of". Maybe "infix in", but not "of". </rant>

posted by Twan | 9 comments | tags = /haskell | permalink | rss
Sun 2007-04-15 00:56

Formating system

This is just a test post, to test my custom formating system. It is written in Perl (because blosxom is). It is nothing more then a bunch of regular expressions, systematically bashing against a string.

This will be a heading

And this is a paragraph.

Break and start new.

{-# OPTIONS_GHC -fglasgow-exts #-}
module Code where
import Has%%%KEYWORDkell.Something

f = let x = 3
    in  x + 1 -- this is just a test
g = "Something\n\" asdf" ++ ""
This is not a haskell module
if it is then else bla bla

Something with inline code, inline math, emphasis, strong emphasis. Links should work normally..

Escaping: code < code & code. But not &

code < code > code & code
posted by Twan | 21 comments | tags = /test | permalink | rss