I've been slowly making my way through Real World Haskell. In Chapter 24, the authors detail a program for reading a file in chunks and then processing it MapReduce-style. However, it fails with hGetBufSome: illegal operation (handle is closed). The program, pared down to an MRE and updated to modern Haskell, is as follows:
import Control.Exception (finally)
import Control.Parallel.Strategies (NFData, rdeepseq)
import qualified Data.ByteString.Lazy.Char8 as LB
import GHC.Conc (pseq)
import System.Environment (getArgs)
import System.IO
main :: IO ()
main = do
args <- getArgs
res <- chunkedReadWith id (head args)
print res
chunkedReadWith ::
(NFData a) =>
(LB.ByteString -> a) ->
FilePath ->
IO a
chunkedReadWith process path = do
(chunk, handle) <- chunkedRead path
let r = process chunk
-- the RHS of finally is for some reason being run before the handle is
-- finished being used. removing it allows the program to run, with the obvious
-- disadvantage of leaving closing the handle to the garbage collector
(rdeepseq r `seq` return r) `finally` hClose handle
chunkedRead ::
FilePath ->
IO (LB.ByteString, Handle)
chunkedRead path = do
h <- openFile path ReadMode
chunk <- LB.take 64 <$> LB.hGetContents h
rdeepseq chunk `pseq` return (chunk, h)
I suspect this is a problem with inadequately forcing strict evaluation, but my current understanding of seq/pseq and Strategies tells me that the program as written should work, because reduction to normal form should mean that the handle has already been read from by the time hClose is evaluated. What have I missed?
On a small side note, it's unclear why the authors chose to use seq in one place and pseq in the other, but since my example has removed any parallel operation, it shouldn't (and indeed doesn't) make a difference.
CodePudding user response:
Quoting from this comment on the bug I filed,
The
NFDatainstance forLazyByteStringis correct, albeit perhaps written obtusely. Note that theChunkconstructor'sS.ByteStringfield is a strict field, and theNFDatainstance forStrictByteStringevaluates only to WHNF.The problem is elsewhere: It's that
rdeepseq chunkis anEval LazyByteStringobject that can reach WHNF (as witnessed byseqorpseq) beforechunkhas actually beendeepseq'ed. TrywithStrategy rdeepseq chunkinstead.
In other words, it seems merely applying rdeepseq is not enough. Instead, we must use withStrategy (or using, alternately) to actually apply the strategy. It seems likely that rnf from the 1.x API had slightly different behavior. There is an rnf in Control.DeepSeq that seems to behave similarly.
Concretely, replacing the offending line with the following fixes the problem:
(withStrategy rdeepseq r `seq` return r) `finally` mapM_ hClose handles
Alternately using deepseq, we could more concisely say
(rnf r `seq` return r) `finally` mapM_ hClose handles
or even
(r `deepseq` return r) `finally` mapM_ hClose handles
