A tale of two Monads: Free vs MTL
by @eborden on September 28, 2017
Much has been said about the benefits of Free
monads. The idea of expressing
computation as a data type is very powerful. Separating interpretation from
representation gives us a lot of freedom, but what are its costs?
A recent blog post did a fantastic job of summarizing many of the issues with
Free
, Free Monads Considered Harmful.
I don’t want to rehash those details. Instead I want to speak concretely. Why
did Front Row build a tool with Free
? Why did we subsequently refactor to
mtl
and how did we make that decision?
Consider your use case
Free
is a powerful abstraction, but you really must consider your use case
before using it. Like the sage says, “use the right tool for the right job.” At
Front Row we utilized Free
to design a library for generating arbitrary
fixtures, which makes it easy to write
integration tests. Free
worked perfectly! Until it didn’t.
Why did it work?
Prototyping
Free
allowed us to quickly iterate on an implementation without thinking about
transformer stacks, associativity of monads in the stack, how best to decompose
our type classes, etc. These are noise when you are prototyping. You likely
don’t even understand your semantics at first, you just need to get something
built and prove your use case.
Small graphs of data
Our initial use case, building graphs of arbitrary relational data, did not
require high performance. These graphs were shallow, building fewer than 20
artifacts, so the diminishing returns of Free
did not manifest.
Inspecting the graph
We thought we might need to inspect the graph to produce finalizers for unwinding idempotent graphs. We didn’t.
Why didn’t it work?
One simple answer; unintended use cases. An enterprising engineer took this tool
and said, “Hey, can I use this to build data for benchmarking?” The answer was
obviously YES! Great, our tool has found a new use case that we didn’t
anticipate, cheers everyone. What happened next lead us to refactor away from
Free
.
Generating 300,000 arbitrary entities
Our enterprising engineer quickly spun up a tool for generating 300,000 database entities to test a few new API endpoints. Writing this tool took no time, but running it did. The runtime was 30 minutes! This is unacceptable. Why did it take that long?
Probing the problem
This library is essentially a DSL around Arbitrary
and database insertion, so our first stop was the database. Were any queries
being run?
postgres=# SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; \watch 1
The above query allowed us to watch for insertions, and we quickly realized that
NOTHING was being done. A quick hop over to top
showed that memory was
climbing. What was going on? Why was nothing happening except generating heat?
Profiling to the rescue
GHC
has fantastic profiling tools. Sadly for a long time they were locked away
because of terrible work flows. stack
(and yes nix
) has greatly simplified
utilizing these tools. Passing the --profile
flag to a stack
command will
build all your libraries with profiling instrumentation and produce .prof
files.
Building a benchmark
First we built a benchmark to get some hard and consistent numbers on speed. Our
simple benchmark was generating n
nodes without any dependencies. We used
criterion to accomplish this.
import Data.Functor.Identity
import Control.Monad (void, replicateM)
import Criterion.Main (defaultMain)
import GHC.Generics
import qualified Graphula.Free as Free
import Test.QuickCheck.Arbitrary
main :: IO ()
main = defaultMain
[ bgroup "initial algebra"
[ bench "1" . nfIO $ replicateNodeInitial 1
, bench "100" . nfIO $ replicateNodeInitial 100
, bench "1000" . nfIO $ replicateNodeInitial 1000
]
]
data A
= A
{ aa :: String
, ab :: Int
} deriving (Generic)
instance Arbitrary A where
arbitrary = A <$> arbitrary <*> arbitrary
instance HasDependencies A
graphIdentity :: Free.Frontend NoConstraint Identity (IO r) -> IO r
graphIdentity f = case f of
Free.Insert n next ->
next $ Just $ Identity n
Free.Remove _ next ->
next
replicateNodeInitial :: Int -> IO ()
replicateNodeInitial i = void . Free.runGraphula graphIdentity . replicateM i $ node @A
We then ran this script with profiling enabled, stack bench --profile
. This
produced a little noise from criterion
, but we got the information we needed.
Identifying cost centers
GHC
*.prof
files contain lots of useful information, but we were really only
concerned with cost. Where was our program spending most of its time? Cost center
reports show a breakdown of functions with their module/source lines and
percentage of time/allocation spent in them.
COST CENTRE MODULE SRC %time %alloc
>>=.\ Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(229,39)-(231,44) 20.5 27.1
>>= Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(229,3)-(231,44) 18.1 16.6
fmap Graphula.Free src/Graphula/Free.hs:124:1-48 11.8 20.0
getOverhead Criterion.Monad Criterion/Monad.hs:(47,1)-(56,12) 7.1 0.0
getGCStats Criterion.Measurement Criterion/Measurement.hs:(46,1)-(48,16) 4.7 0.0
>>= Data.Vector.Fusion.Util Data/Vector/Fusion/Util.hs:33:3-18 3.4 3.6
transFreeT Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:337:1-79 3.1 4.0
fmap Graphula.Free src/Graphula/Free.hs:118:1-58 2.8 6.5
fmap Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(209,3)-(211,42) 2.2 2.5
fmap Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(132,3)-(133,37) 2.2 1.8
iterT Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(309,1)-(313,21) 1.5 1.3
basicUnsafeWrite Data.Vector.Primitive.Mutable Data/Vector/Primitive/Mutable.hs:114:3-69 1.4 0.7
fmap.f' Control.Monad.Trans.Free src/Control/Monad/Trans/Free.hs:(210,5)-(211,42) 1.1 1.6
rSquare.p Statistics.Regression Statistics/Regression.hs:109:5-66 1.0 0.7
nodeEditWith Graphula src/Graphula.hs:(385,1)-(389,10) 1.0 1.0
innerProduct.\ Statistics.Matrix.Algorithms Statistics/Matrix/Algorithms.hs:42:3-51 1.0 0.7
backendArbitrary.\ Graphula.Free src/Graphula/Free.hs:(253,24)-(255,10) 1.0 0.5
fmap Data.Vector.Fusion.Stream.Monadic Data/Vector/Fusion/Stream/Monadic.hs:(125,3)-(127,20) 1.0 1.0
This report is a dead giveaway. Collapsing stats for >>=.\
and >>=
show 40%
of time and 43% of allocations are spent in binding Free
. Our next top
offender is fmap
from our data type’s Functor
instance. We are spending
half of the time in this program just glueing computation together!
Refactoring
Obviously Free
was a problem. The primary use case of our library is
generating arbitrary nodes, not monadic bind. We need to be spending our
resources more wisely. So we refactored to mtl
. This was mostly mechanical
since we’d already figured out the details in our prototype.
Next step was creating a benchmark to prove our time wasn’t wasted.
import Control.Monad (replicateM, void)
import Control.Monad.Catch
import Control.Monad.Trans
import Criterion.Main
import Data.Functor.Identity
import GHC.Generics
import Graphula
import Test.QuickCheck.Arbitrary
main :: IO ()
main = defaultMain
[ bgroup "final algebra"
[ bench "1" . nfIO $ replicateNodeFinal 1
, bench "100" . nfIO $ replicateNodeFinal 100
, bench "1000" . nfIO $ replicateNodeFinal 1000
]
]
newtype GraphulaIdentity a = GraphulaIdentity { runGraphulaIdentity :: IO a }
deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadCatch, MonadMask)
instance MonadGraphulaFrontend GraphulaIdentity where
type NodeConstraint GraphulaIdentity = NoConstraint
type Node GraphulaIdentity = Identity
insert = pure . Just . Identity
remove = const (pure ())
replicateNodeFinal :: Int -> IO ()
replicateNodeFinal i = void . runGraphulaIdentity . runGraphulaT . replicateM i $ node @A
New cost centers
Again we ran this benchmark with stack bench --profile
and inspected our cost
centers.
COST CENTRE MODULE SRC %time %alloc
nodeEditWith Graphula src/Graphula.hs:(385,1)-(389,10) 18.0 19.3
attemptsToInsertWith Graphula src/Graphula.hs:(419,1)-(426,60) 18.0 16.6
replicateNodeFinal Main bench/Main.hs:77:1-89 13.0 17.6
getOverhead Criterion.Monad Criterion/Monad.hs:(47,1)-(56,12) 6.2 0.0
>>= Data.Vector.Fusion.Util Data/Vector/Fusion/Util.hs:33:3-18 6.0 8.6
getGCStats Criterion.Measurement Criterion/Measurement.hs:(46,1)-(48,16) 4.1 0.0
basicUnsafeWrite Data.Vector.Primitive.Mutable Data/Vector/Primitive/Mutable.hs:114:3-69 2.3 1.5
attemptsToInsertWith.\ Graphula src/Graphula.hs:425:17-22 2.3 3.5
newTFGen System.Random.TF.Init src/System/Random/TF/Init.hs:82:1-43 2.1 8.8
logNode Graphula src/Graphula.hs:139:3-21 2.1 0.0
rSquare.p Statistics.Regression Statistics/Regression.hs:109:5-66 1.9 1.7
generate Test.QuickCheck.Gen Test/QuickCheck/Gen.hs:(101,1)-(103,20) 1.8 3.5
innerProduct.\ Statistics.Matrix.Algorithms Statistics/Matrix/Algorithms.hs:42:3-51 1.8 1.8
qr.\.\ Statistics.Matrix.Algorithms Statistics/Matrix/Algorithms.hs:29:34-51 1.5 1.4
kbnAdd Numeric.Sum Numeric/Sum.hs:(138,1)-(141,39) 1.4 0.9
basicUnsafeIndexM Data.Vector.Primitive Data/Vector/Primitive.hs:215:3-75 1.4 2.2
fmap Data.Vector.Fusion.Stream.Monadic Data/Vector/Fusion/Stream/Monadic.hs:(125,3)-(127,20) 1.3 2.4
newQCGen Test.QuickCheck.Random Test/QuickCheck/Random.hs:80:1-31 1.1 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base Data/Vector/Unboxed/Base.hs:283:841-899 1.1 0.0
generateNode Graphula src/Graphula.hs:138:3-44 1.0 0.0
transpose Statistics.Matrix Statistics/Matrix.hs:(268,1)-(270,22) 0.8 1.4
unsafeIndex Statistics.Matrix Statistics/Matrix.hs:242:1-40 0.7 1.0
basicUnsafeSlice Data.Vector.Primitive.Mutable Data/Vector/Primitive/Mutable.hs:(84,3)-(85,25) 0.6 1.3
Now these look a lot better! Our most expensive functions are all associated with generating nodes. Our costs are aligned. We are spending time and allocation where our primary use case lies.
What about timing?
Turns out this implementation was belligerently faster.
The fun part is that we barely changed anything. Some constraints needed to be
changed on our functions, but their implementations didn’t need to change
because Free
allowed us to think about the core semantic abstractions first. If you
are curious you can see both implementations on github,
free
vs mtl.
The actual benchmarks live here.
Even better the runtime of our enterprising engineer’s data generation went from 30 minutes to 3 minutes. An order of magnitude is always a nice performance validation.
What did we learn?
What we learned is utilizing Free
vs mtl
is not universally good or
bad. They are each fantastic abstractions and it really comes down to your use
case. If your use case is going to end up building deeply nested graphs of
Free
data types then it probably isn’t for you. In the end we gained a lot
clarity from prototyping with Free
and a lot of performance from refactoring
to mtl
.