Graphula
by @pbrisbin on November 01, 2021
Freckle’s Graphula library helps you test Haskell code that interacts with a database. Its responsibility is to define and insert arbitrary values and their relations into the database ahead of running a test, while giving you control to tailor the data for that test’s needs, and to keep it all succinct and readable.
For a peek at what it can do, take this test:
describe "updateCourse" $ do
it "sets values given" $ runDB $ do
now <- liftIO getCurrentTime
teacherId <- insert $ Teacher -- <
{ teacherName = "foo" -- <
, teacherEmail = "x@y.com" -- <
, teacherCreatedAt = now -- <
, teacherUpdatedAt = now -- <
, teacherDeletedAt = Nothing -- <
} -- <
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- Purely noise, irrelevant to the test
courseId <- insert $ Course
{ courseTeacher = teacherId
-- ^^^^^^^^^^^^^^^^^^^^^^^^^
-- Hope you remember to get this bit right
, courseName = "bar"
, courseCreatedAt = now
, courseUpdatedAt = now
, courseArchivedAt = Nothing -- <
, courseArchivedReason = Nothing -- <
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- The only actual setup for this test
}
void
$ updateCourse courseId
$ UpdateCourse
{ archivedAt = now
, archivedReason = "reason"
}
Just Course {..} <- P.get courseId
courseArchivedAt `shouldSatisfy` isJust
courseArchivedReason `shouldBe` Just "reason"
And see how Graphula resolves the test setup issues noted:
describe "updateCourse" $ do
it "sets values given" $ graph $ do
now <- liftIO getCurrentTime
teacher <- node @Teacher () mempty
-- ^^^^^^^^^^^^^^^^^^^^^^^
-- Minimal noise
course <- node @Course (onlyKey teacher) $ edit setCourseUnarchived
-- ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
-- Clearer requirements intention-revealing
-- naming for the stuff
-- that matters
void
$ lift
$ runDB
$ updateCourse (entityKey course)
$ UpdateCourse
{ archivedAt = now
, archivedReason = "reason"
}
Just Course {..} <- lift $ runDB $ P.get $ entityKey course
courseArchivedAt `shouldSatisfy` isJust
courseArchivedReason `shouldBe` Just "reason"
This post is a tutorial on Graphula; how to convert a project to it and make the most effective use of it. The examples are derived from an actual Pull Request in a personal, non-Freckle project that recently underwent this conversion, but they’ve been re-written to use Freckle’s domain models.
Example Domain
Freckle is in the Education space and has a rich data model from Districts to
Students to Practice Sessions and the Content being practiced. Though we may
mention (or make up) other models to convey particular points, we will center
the main examples on School
has many Teacher
s has many Course
s.
Pre-requisites
We’ll assume you have a runDB
function that you currently use in your tests,
and that it accepts a SqlPersistT m a
argument only. In other words, the
connection pool it needs is coming from a Reader context and not passed
explicitly. You can absolutely use Graphula if this is not the case, but the
examples in this post won’t be showing it that way.
At Freckle, our runDB
looks like:
runDB
:: (MonadUnliftIO m, MonadReader env m, HasSqlPool env)
=> SqlPersistT m a
-> m a
We’ll also assume you are using
hspec-expectations-lifted
, which puts all
assertion functions into MonadIO m
. This means we can call them without
liftIO
from within our Graphula expressions, which is how the examples here
will show it.
Getting Started
As of this writing, Graphula version 2.0.0.5 is available in the latest Stackage LTS, and version 2.0.1.0 is on Hackage. That latter is the same code, only with improved documentation.
We recommend making a support module for use in any Graphula-enabled tests:
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Freckle.Test.Graphula
( graph
, onlyKey
, generate
, arbitrary
, module Graphula
) where
import Freckle.Prelude
import Graphula
import Graphula.Arbitrary
import Test.QuickCheck.Arbitrary
instance MonadFail m => MonadFail (GraphulaT n m) where
fail = lift . fail
graph
:: HasSqlPool env
=> GraphulaT (YesodExample env) (YesodExample env) a
-> YesodExample env ()
graph = void . runGraphulaT Nothing runDB
onlyKey :: Entity e -> Only (Key e)
onlyKey = only . entityKey
The Approach
Target a small test with one or two entities and convert it like so:
+import Freckle.Test.Graphula
spec :: Spec
spec = withApp $ do
describe "rosteringAllows" $ do
- it "always allows unarchived courses" $ runDB $ do
- teacher <- insertTeacher "name" "x@y.com"
- course <- insertCourse teacher "name" False
- shouldAllow course
+ it "always allows unarchived courses" $ graph $ do
+ teacher <- node @Teacher () mempty
+ course <- node @Course (onlyKey teacher) $ ensure $ not . courseIsArchived
+ lift $ runDB $ shouldAllow course
You can do this one it
at a time, and we recommend you do. Attempting node
will force you into Arbitrary
and HasDependencies
instances for the model,
which we will dive into in the coming sections. The basic approach is to do this
repeatedly for every it
and every model in the system. This can take some
time, but your suite should remain green and no concurrent work should be
blocked while it is ongoing.
Arbitrary
At Freckle, we’re huge fans of Generic
-based deriving of instances and not
hand-writing boilerplate, so let’s start there. First, make sure you’re deriving
Generic
for any models you want to do this for in config/models
:
Course
name CourseName
...
UniqueCourse name
- deriving Eq Show
+ deriving Eq Show Generic
Then, install generic-arbitrary
and:
import Test.QuickCheck.Arbitrary
import Test.QuickCheck.Arbitrary.Generic
-- ...
mkPersist sqlSettings $(persistFileWith lowerCaseSettings "config/models")
instance Arbitrary Course where
arbitrary = genericArbitrary
If this works for you (or you otherwise have Arbitrary
on your models), you’re
pretty much done with this section. However, it’s likely you’ll hit missing
instances for primitive types, such as Text
or UTCTime
.
Orphans vs newtype
s
Orphan instances are to be avoided, full stop. While an application (vs a library) may be unlikely to hit the coherency issues that are the Actual Problem with orphan instances, they will drastically slow your team down through unnecessary recompilation. How this happens is complicated and nuanced and not worth discussing here, but because it is complicated and nuanced, all teams will eventually hit a size where they’re impacted, so a policy of avoidance is best.
Freckle aspires to have newtype
s on basically all primitives. userEmail
is
not a Text
, it’s an EmailAddress
. courseCreatedAt
is not a UTCTime
it’s
a CreatedAt
(or a CreatedAt 'Course
if you want to get fancy).
answerAttempts
is not a Natural
, it’s an AttemptsCount
. Not only will this
avoid orphan instances, but it will bring a lot of clarity, safety, and
flexibility throughout your project generally. We’ve never regretted a
newtype
.
That said, we live in the real world and sometimes you just can’t take on that effort all at once. So let’s assume you’ve thought long and hard and made a reasoned decision to go down the path of orphans. What then?
Here are some tips:
- Give them a home
-
Try not to write them yourself
quickcheck-instances
is a package of orphanArbitrary
instances for common types likeText
,UTCTime
, andNatural
. We recommend using this, rather than rolling our own. -
Beware the universe
QuickCheck is really good at pushing edge-cases. If you are working with a list type, it is going to make sure your tests run with really small lists and really big lists. This is a good thing, but we’ve experienced some problematic outcomes.
Once, we had a type with multiple levels of
NonEmpty
. Attempting good bounds checking, QuickCheck made some very big values (of very big values) for us, causing the test to hang or crash. And of course anyArbitrary
-based failures are going to be intermittent, so it wasn’t trivial to track this down.A similar situation happens with
Text
: the instance, as it should, comes up with really strange and gobbledygookText
. Something that would never be accepted by us as a Course name, for example. This can again result in tests that fail or hang intermittently. If you are testing “for allText
s” on code that would never be asked to handle suchText
s, you’re going to have a bad time.This is just another argument for
newtype
s where, for example,CourseName
can have anArbitrary
instance that draws from a restricted character set and meets realistic length requirements. This leads to more realistic and reliable tests. And refining your types like this works in both directions, it allows your tests to be more accurate, but it also allows your business domain to be more effectively defined, pushing constraints further to the boundaries
HasDependencies
When you generate some test data using node
, its usage is:
{model} <- node @{ModelType} ({dependencies}) {modifier}
GHC can usually infer the type of a node
expression, but we like to use
TypeApplications
to make it visible in the test. It’ll also make any mistakes
show up as type errors at the point where your expectation is wrong, rather than
multiple function-calls later, wherever inference ultimately got blocked.
It also means that if you’re not sure of a model’s dependencies, you can have GHC tell you:
session <- node @PracticeSession () mempty
Neglecting PracticeSession
s dependencies here results in an informative error.
Thanks compiler!
• Couldn't match type ‘()’
with ‘(Subject, CourseName, Key Student)’
Expected type: Dependencies PracticeSession
Actual type: ()
How does GHC know this? Because we’ve defined a HasDependencies
instance.
Imagine this (made-up) PracticeSession
model looked like this:
PracticeSession
subject Subject
course CourseName
studentId StudentId
completedAt UTCTime Maybe1
duration Seconds Maybe
accuracy Percentage Maybe
deriving Eq Show Generic
Here would be the instance:
instance HasDependencies PracticeSession where
type Dependencies PracticeSession = (Subject, CourseName, StudentId)
This is showing a few things:
-
Dependencies that are not foreign keys
The
CourseName
dependency is used instead of aCourseId
. Hypothetically, we’ve chosen to loosely couple sessions to Courses by name – and Graphula works perfectly well with this. -
Dependencies that are not foreign anything
We’ve also chosen to say the Subject (e.g. “math”) is a dependency, even though that’s a value that doesn’t point to any related table. We can still make that attribute required in
node
by calling it a dependency like this. -
Inferred order
The order of the types in the
Dependencies
“tuple” must match the order they are defined in the model for the generic-based implementation to work.Under the hood, this is defining a
dependsOn
method for you. If necessary, you can drop down and define it yourself to do whatever you need.
For completeness, here are some other styles of instance:
-- A model with no dependencies
instance HasDependencies School where
type Dependencies School = ()
-- A model with only one dependency, which is an actual DB foreign-key
instance HasDependencies Teacher where
type Dependencies Teacher = Only SchoolId
NodeOptions
Revisiting the node
usage, let’s talk about {modifier}
now.
{model} <- node @{ModelType} ({dependencies}) {modifier}
This argument is how you, well, modify the arbitrary data before it is inserted
into the database. Its type is NodeOptions
and it’s a Semigroup
, which means
different modifiers are combined with (<>)
to apply one after the other. There
are two main uses at the time of this writing:
-
edit
: apply somemodel -> model
transformationFor example, to set some field or fields to values required for the test. You can of course use whatever functions you like, but we find this is a good use-case for Lens setters with
fieldLens
1.teacher <- node @Teacher () $ edit $ fieldLens TeacherName .~ x . fieldLens TeacherEmail .~ y . fieldLens TeacherDeletedAt ?~ z
-
ensure
: ensure the value passes somemodel -> Bool
predicateThis is Graphula’s version of QuickCheck’s
suchThat
and can be used to make really readable specs2:describe "/courses/archived" $ do it "shows only archived Courses" $ graph $ do school <- node @School () mempty teacher <- node @Teacher (onlyKey school) mempty course <- node @Course (onlyKey teacher) mempty $ ensure courseIsArchived otherCourse <- node @Course (onlyKey teacher) mempty $ ensure $ not . courseIsArchived -- log in as the Teacher, view the page, and assert we see the course but -- not otherCourse
Using Graphula Effectively
In Freckle, we’ve developed some patterns for using Graphula within our test suites. Graphula is a dense library with a lot of features and flexibility, and following standard patterns helps new Engineers get acclimated.
Factories over Fixtures
A Shared Fixture is a single value that is built at the start
of a test suite and used repeatedly by different tests. A Factory, on the other
hand, is a style of test data setup popularized by thoughtbot’s factory_bot
gem in which recipes exist for various, well-named scenarios that
can be called to produce fresh values for every test. The recipes also support
arguments to further tailor the produced data for a given test.
We had a lot of Fixture-style tests before we had Graphula, and some have been converted mechanically to be Graphula-based but are still Fixture-like by always generating the same kind of value that is used from many tests. We try to avoid this in new code as we find it leads to coupling, test pollution, and Mystery Guest or Obscure Test smells.
Graphula very much favors a Factory-like approach if you just get out of its
way, since node
is a generalized data producer and its {modifier}
is a
generalized way to further tailor the produced data. We try to lean into that as
much as possible.
Centralized Factories
Another thing we like to do is have a single {App}.Test.Factories
module to
hold all shared functions related to Graphula generation, or edit functions for
the purposes of test setup. Otherwise, we find it difficult for Engineers to
know if something exists to generate the data they need for their tests, and
duplication often results.
Generators
We call abstractions over node
“generators” because un-specified values are
“generated” by Arbitrary
. For this reason, we prefix any shared abstractions
over node
with gen
.
These should be used sparingly. One anti-pattern we’ve found in our tests is a
plethora of genThisWithThat
and genThatWithoutThis
functions. In these
cases, in-lining the node
calls and making use of the inherent readability of
ensure
or set
-functions (see below) results in simpler code with less
duplication.
We suggest reaching for a gen
-function when:
-
Creating a child model related to a parent is not trivial
In converting that personal project, there was a good example of this: a
Job
is related to aRepo
by multiple columns and a Pull Request value as dependency. That requires some ceremony for something that would be simple if it were a single foreign key. This means it’s a good candidate to abstract over:genJob :: (MonadIO m, GraphulaContext m '[Job]) => Entity Repo -> NodeOptions Job -> m (Entity Job) genJob (Entity _ Repo {..}) options = do pullRequest <- generate arbitrary node @Job (repoOwner, repoName, pullRequest) options
-
Creating a model that has an important value-level meaning beyond its type
One example of this would a
genAdmin
as an abstraction overnode @User
.
One example where you could reach for a generator, but we advise caution is when you are generating a collection of models together. It is very attractive to have:
genTeacherWithCourses
:: GraphulaContext m '[School, Teacher, Course]
=> NodeOptions Teacher
-> m (Entity Teacher)
genTeacherWithCourses options = do
school <- node @School () mempty
teacher <- node @Teacher (onlyKey school) options
teacher <$ replicateM_ 3 (node @Course (onlyKey teacher) mempty)
This is all well and good, as long as:
- You only ever need
NodeOptions
to modify theTeacher
created - You only ever need access to the created
Teacher
afterward - You never care how many
Course
s were created
If any of these are not true (and we find they usually are not), you are well on your way to some or all of the following pain:
-
A slew of varying
genTeacherWith{SomeNumberOf}Courses{LikeThisOrThat}
And they all duplicate 80% of each other, differing only slightly in the named ways, requiring annoying mass-updates when you decide
School
should really depend onDistrict
. -
A growing list of
NodeOptions
argumentsSince you’ll want separate ability to modify the
School
,Teacher
, andCourse
s that are being created. And if you need to modify one of theCourse
s that is being created, which is very common in testing, you simply can’t do it through this interface. -
A complex return value
Because we all love getting
(Entity School, Entity Teacher, [Entity Course])
back from an Esqueleto query so much.
There can be a time and a place to encapsulate generation of a “tree” of models together like this. And we absolutely have such functions in our code base (with and without said pain). All we’re saying is, be careful.
Setters
We recommend your most common Graphula-related abstraction not even be directly
Graphula-related: well-named set
-functions. This avoids a mess of
gen
-functions that make lots of internal assumptions or set up data in similar
but subtly different ways, which leads to duplication or bugs.
It works because set
-functions can be mixed and matched into any edit
of any
node
call easily:
-- Bad
course1 <- genArchivedAndDisabledCourse
course2 <- genArchivedCourse
-- Almost...
course1 <- genArchivedCourse $ edit setCourseDisabled
course2 <- genArchivedCourse id
-- Good
course1 <- node @Course () $ edit $ setCourseArchived <> setCourseDisabled
course2 <- node @Course () $ edit $ setCourseArchived
Note that in this specific instance, ensure
may be better anyway:
-- Best?
course1 <- node @Course () $ ensure courseIsArchived <> ensure (not . courseIsEnabled)
course2 <- node @Course () $ ensure courseIsArchived
set
-functions and predicates can also be used in your implementation code and
re-used in test setup. This further cuts down on refactor costs when (e.g.) the
implementation of “archived” for a Course
changes. It’s ideal if you only have
to modify the test that is doing or checking archived-ness, and not also every
other test that relies on a collaborating Course
being archived or not.
Haskell’s types alone can’t help you avoid Shotgun Surgery,
all it can do is make you aware of every place you need to hit.
Wrapping Up
In this post, you learned a bit about Freckle’s Graphula library, walked through migrating a non-trivial project, and hopefully got exposed to some good patterns for testing database interactions in Haskell.
We’re proud of Graphula and it is working well for us, but we are just one use-case. If you’d like contribute, please don’t hesitate to get involved.
-
Unfortunately, we have to define a different
fieldLens
for this to work. For more details, see this Issue. ↩ -
While
ensure
can make for very readable examples, it can also be inefficient if it would take many generation attempts to hit the condition you are ensuring. In such cases, usingedit
to explicitly set up what you need is better. ↩