Home > OS >  Why is the result of my foldr call on an empty list not in the correct order?
Why is the result of my foldr call on an empty list not in the correct order?

Time:02-06

I'm trying to complete problem 8 of the Haskell Ninety-Nine problems however I'm having issues understanding why the list result of my function is ordered wrong.

The aim of the compress function is to eliminate any duplicate letters from the input list and output another list that contains the unique letters in the order in which they first appear in the input list. Here is my code for the compress function:

compress l = foldr f [] l where f a b = if a `elem` b then b else a : b

It functions correctly when the duplicate letters are adjacent to each other hence "aaaabbb" outputs "ab" which is correct however when a duplicate letter is separated by another letter it changes its order in the output hence "aba" outputs "ba" whereas the expected output is "ab".

Even when writing out the stack trace for foldr I seem to get the expected output however when running the code in the GHCI with an input such as "aba" or "abca" I get the incorrect result. What causes this behaviour? Why is it that when a duplicate letter is separated by a different letter the order of the output is changed?

CodePudding user response:

The foldr function starts with an initial value (of the type b) and a Foldable container t. For 'each' a value in the container, it calls the function (a -> b -> b) with the a value and the 'current' b value.

Prelude> :t foldr
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

In compress, the initial value is [], which also enables the compiler to infer that the Foldable instance is [].

Now try out the steps in GHCi. Define (just for the GHCi session) f as a top-level function:

Prelude> f a b = if a `elem` b then b else a : b

If the input is "aba", the first time f is called, the b value will be [] and the a value will be 'a', because foldr folds from the right.

Prelude> f 'a' []
"a"

The return value "a" now becomes the accumulator value b for the next time around:

Prelude> f 'b' "a"
"ba"

This is because f conses 'b' onto "a".

The accumulator value is now "ba". Pass it to f again, using the third and final value in the list:

Prelude> f 'a' "ba"
"ba"

See e.g. this other answer that outlines an interactive way to explore and 'debug' Haskell functions.

CodePudding user response:

compress l = foldr f [] l where f a b = if a `elem` b then b else a : b

... "aba" outputs "ba" whereas the expected output is "ab".

foldr on lists is very simple. It is defined so that

foldr g z [a,b,c,...,n]  =  g a (foldr g z [b,c,...,n])

-- and by generalization,
foldr g z [          n]  =  g n z

-- which means
foldr g z [           ]  =      z

That's really all there is to it. Looks self-explanatory, isn't it? Just re-write your foldr call, syntactically, to immediately see exactly what's going on. In particular, re-writing your definition a little bit, we have

compress xs  =  foldr f [] xs
    where
    f a r  =  if  elem a r  then      r 
                            else  a : r

(I do (try to) always call the second argument to the combining function "r", for the "Recursively calculated Result for the Rest of the input list".)

Thus we have

compress             [a,b,c,...,n]
   =  foldr f []     [a,b,c,...,n]  
   =  f a ( foldr f [] [b,c,...,n] )
   =  if  elem a r  then      r 
                    else  a : r
        where 
        r = foldr f [] [b,c,...,n]
   =  if  elem a r  then      r 
                    else  a : r
        where 
        r = compress   [b,c,...,n]

(using where as if it where a part of the expression, like a "flipped let"). In other words,

compress (x:xs) = if  elem x   r  
                    then       r
                    else   x : r
                  where        r = compress xs
compress []     = []

This reads: "if x is present in the compressed rest of input, skip it; else, keep it; and continue with the compressed rest". (so, calling it b is misleading; suggests a and b are similar entities; they are not -- a (or x) is an element of the list, and r is its transformed tail).

So you see where the problem is: if there are more then one as in the list, this definition keeps the last one, whereas you want to keep the first.

So if there are several of them in a row this difference is unobservable:

       -- 1.                    -- 2.
     a a a a b b b            a a a a b b b
           a     b            a       b

but if there's an intervening character then of course we can see the difference:

       -- 1.                    -- 2.
     a a a a b b b a          a a a a b b b a
                 b a          a       b
  •  Tags:  
  • Related