Skip to the content.

Past Does Not Require Premium

by @pbrisbin on April 27, 2022

It’s not uncommon for Freckle to experience a period of high database load in September, as School traffic returns and code changes made over the summer start to see real scale for the first time. We typically spend a lot of that month finding small changes causing big performance problems and fixing them.

One “summer regression” from this past September was such an excellent example of a bug correctable through types, that I’d like to describe it here.

Premium Timespans

Freckle considers viewing certain historical data a premium feature and we have a requireTimespan helper that enforces that in the APIs for such data.

One such usage looks like this:

premiumLicenses <- fetchPremiumLicensesForSchool now schoolId
timeSpan <- requireTimeSpan MaxSpanAllTime PastDoesNotRequirePremium premiumLicenses

The function is given the current user’s licenses and two other arguments to control how to enforce things. Shortly after students returned to school, our instrumentation alerted us to unusually high database load. It turns out, this was caused by the addition of exactly this code to a very hot path.

Have you spotted the landmine yet?

requireTimeSpan MaxSpanAllTime PastDoesNotRequirePremium premiumLicenses
--                             ^^^^^^^^^^^^^^^^^^^^^^^^^

If “Past Does Not Require Premium”, then all these queries are for naught. We’re fetching licenses that the underlying code doesn’t even consider!

The following would function exactly the same:

requireTimeSpan MaxSpanAllTime PastDoesNotRequirePremium mempty

And that’s exactly how we… considered resolving things at the time.

Instead, we wondered: how many other places might we be doing this? Should we grep for it? No, that’s a dynamic-typist talking. We should make it invalid through the types!

Smarter Data

Here’s a small change with a large effect:

  data PastRequiresPremium
-   = PastRequiresMathPremium
-   | PastRequiresElaPremium
-   | PastRequiresAnyPremium
+   = PastRequiresMathPremium PremiumLicenses
+   | PastRequiresElaPremium PremiumLicenses
+   | PastRequiresAnyPremium PremiumLicenses
    | PastDoesNotRequirePremium
    deriving stock (Eq, Show)

This minor re-arrangement gets the compiler forcing us to fix this bug in all code, present and future.

Rather than requireTimeSpan (et al) accepting PastRequiresPremium and PremiumLicenses separately, we made the PastDoesNotRequirePremium type smarter. By holding the PremiumLicenses only in the constructors that need it, we can ensure we don’t demand them for the constructor that doesn’t.

This felt a bit like solving Data Clump, it’s following Smart data structures and dumb code, and has similar effectiveness to Make Invalid States Unrepresentable.

Chasing Compiler Errors

From here, we can’t get things wrong. The compiler forces us to never need or use PremiumLicenses for the PastDoesNotRequirePremium case:

    -> PastRequiresPremium
    -> UTCTime
    -> TimeSpan
-   -> PremiumLicenses
    -> Validation ValidationErrors ()
- validateTimeSpan maxTimeSpan pastRequiresPremium now timeSpan premiumLicenses
+ validateTimeSpan maxTimeSpan pastRequiresPremium now timeSpan
    | needToCheckPremium && not hasValidPremium = legacyFailure Errors.NoPremium
    | isValidTimeSpan = success ()
    | otherwise = legacyFailure Errors.InvalidDateRange
   where
    needToCheckPremium =
      deltaTimeSpan > week && TimeSpan.elem oneWeekAgo timeSpan
    hasValidPremium = case pastRequiresPremium of
-     PastRequiresAnyPremium -> premiumLicensesHasAny premiumLicenses
-     PastRequiresMathPremium -> premiumLicensesHasMath premiumLicenses
-     PastRequiresElaPremium -> premiumLicensesHasEla premiumLicenses
+     PastRequiresAnyPremium premiumLicenses ->
+       premiumLicensesHasAny premiumLicenses
+     PastRequiresMathPremium premiumLicenses ->
+       premiumLicensesHasMath premiumLicenses
+     PastRequiresElaPremium premiumLicenses ->
+       premiumLicensesHasEla premiumLicenses
      PastDoesNotRequirePremium -> True

Cases that do require licenses take a minor hit to calling syntax:

- timeSpan <- requireTimeSpan MaxSpanAllTime PastRequiresElaPremium premiumLicenses
+ timeSpan <- requireTimeSpan MaxSpanAllTime $ PastRequiresElaPremium premiumLicenses

Cases that don’t get simpler:

- getFetchAssignmentsAndSessionsArgs premiumLicenses = do
+ getFetchAssignmentsAndSessionsArgs = do
    mCompletedAt <- sendValidateT $ optionalUTCTimeFilter
      MaxSpanAllTime
      PastDoesNotRequirePremium
-     premiumLicenses
      "completed-at"
    mAssignmentIds <- optionalGetParam "assignmentIds"

And eventually we are led directly to our original hot path:

- now <- liftIO getCurrentTime
- premiumLicenses <- runDB $ fetchPremiumLicensesForSchool now schoolId
- args <- getFetchAssignmentsAndSessionsArgs premiumLicenses
+ args <- getFetchAssignmentsAndSessionsArgs

This graph is DB load the day the fix went out (~10AM) with a dotted-line representing the day before:

Prologue

Apparently we hit this issue at some point in the past already. That time, we chose to solve it in a more explicit way:

unless (requiresPremium == PastDoesNotRequirePremium) $ do
  teacherId <- requireTeacherId
  premiumLicenses <- runDB $ fetchPremiumLicenses now teacherId
  sendValidation $ validateTimeSpan
    maxTimeSpan
    requiresPremium
    now
    timeSpan
    premiumLicenses

This is exactly the sort of thing folks mean when they talk about conditionals as a smell and the benefits of moving business logic into the types. This original fix, in addition to being localized and unable to prevent the same regression elsewhere, also introduced another bug: we were nope-ing out of all validation of the timespan parameters in order to avoid fetching licenses unnecessarily for the premium portion of it.

Now that the types are better, the obvious code is also the correct code:

sendValidation $ validateTimeSpan maxTimeSpan requiresPremium now timeSpan