Knock helps teams deliver great notification experiences to end users. One of our most popular features is batching: Instead of sending many notifications to a user in a short time frame, Knock can batch multiple notifications into a single message, delivered to the user at the right time.
Sometimes a batch accumulates far more notifications than a single message can handle. If a tweet goes viral and gets thousands of likes, it wouldn’t make sense for a single notification to list the name of every user that liked your tweet.
When we batch notifications at Knock, we produce a number of variables you can use to give your users helpful information about the batch, without listing every item contained within it.
These variables include the count of items in the batch and the count of unique users in the batch. We also provide an array of the first ten items in the batch for including details like a user name or information about the subject of the notification. This is similar to how Twitter delivers its own notifications.
Recently a customer asked us if, instead of the first ten activities, could we instead provide the last ten activities (that is, the most recent activities.) Naturally, we took up the challenge. How hard could it be?
It turned out that it was hard to design and then easy to implement, all thanks to the power of Postgres.
tl;dr: Take a good look at this chunk of SQL and then tell us, why would shoving two case statements with mixed ordering and fixed dates into an ORDER BY ever be a good idea? What if we told you that this is part of a WINDOW function?
ORDER BY
CASE
WHEN sort_order = 'asc' THEN inserted_at
ELSE '1900-01-01'
END ASC,
CASE
WHEN sort_order = 'desc' THEN inserted_at
ELSE '1900-01-01'
END DESC
Intrigued? Good - let's talk about how this bit of SQL enabled a cool new feature at Knock.
A short aside on window functions
SQL window functions are the beating heart of our batch functions. A window function makes it possible to fetch rows from the database and perform operations on them in groups. When resolving a group of batches, we first window over each batch, and within each batch we take the first ten activities in that batch. Instead of doing all this work in our backend code, SQL will load, process, and return each batch in just the right way.
Instead of performing an N+1 query on each batch returned and then processing the results in our backend to build our batches, we can send just one query and the batches come loaded with at most 10 items.
When each batch is ordered differently
The hard part is that we need to load multiple batches in one query, where some batches would want the first ten items (aka ORDER BY ASC
) and some batches would want the last ten items (aka ORDER BY DESC
). Our customers set this using our dashboard editor, which saves the setting on the batch step itself:
So the question is: How do we selectively order each set of items in a batch - some ascending, some descending - using a single query?
Other approaches we tried
Our initial research concluded without finding a good way to solve this problem using window functions. Instead, we thought about shifting from a “resolve on read” approach to a “resolve on write” approach:
- Resolve on read: As activities arrive, store them quickly. When it’s time to load the batch, figure out what is in the batch (more work done in one shot).
- Resolve on write: As activities arrive, build the batch (spread the work out over each activity). When it’s time to load the batch, no further work is required.
Using a “resolve on write” approach, we looked at building a list of each activity that belonged in the finished batch. If a batch needed just the first ten items, we would add items to the batch until we hit 10 items, and the remaining items would not be delivered when the notification is built. If a batch needed the last ten items, we would add items to the end of the batch, and trim the start of the batch to ensure we only have 10 items delivered when the notification is built.
Although we started down this road, we revisited the window query to see if there was any way to make it work…and there was!
ORDER BY 🤝 CASE statements
Postgres ORDER BY
clauses have a few interesting properties:
- Typically, an
ORDER BY
clause has a list of columns by which to sort, and maybe the direction to sort (By defaultASC
, orDESC
):ORDER BY inserted_at ASC
- Sometimes an
ORDER BY
clause will have more than one column:ORDER BY updated_at DESC, inserted_at ASC
- You can also use
CASE
statements in anORDER BY
clause. This makes it possible to selectively choose which column to sort by!This gets us part of the way to our desired outcome, but we still need a way to choose whether we are sortingORDER BY CASE WHEN sort_by = 'inserted_at' THEN inserted_at ELSE updated_at END DESC
ASC
orDESC
. - A quirk of
ORDER BY
is that you can order by columns or by a fixed value in the query. Take this example:In this query, there are two order by clauses. The first clause is ignored because every row will have the same value, so sorting by that value will do nothing. Instead, all rows will be sorted byORDER BY '1900-01-01' DESC, inserted_at ASC
inserted_at ASC
. Although this would be a waste with most code, it enables us to selectively sortASC
orDESC
per batch in a single SQL query.
Using these building blocks, we can actually solve our selective-sort batching problem with just a few lines of SQL!
Bringing it all together
If you’ve been following along this far, congratulations: you spend too much time writing SQL. Even so, we hope the following snippet has been worth the journey:
ORDER BY
CASE
WHEN sort_order = 'asc' THEN inserted_at
ELSE '1900-01-01'
END ASC,
CASE
WHEN sort_order = 'desc' THEN inserted_at
ELSE '1900-01-01'
END DESC
This ORDER BY
clause will do the following:
- For each row, if the
sort_order
isasc
, it will use theinserted_at
column when sorting. Ifsort_order
is notasc
, it will use the fixed value1900-01-01
, which effectively does nothing. - Then, if the
sort_order
isdesc
, it will use theinserted_at
column when sorting. Ifsort_order
is notdesc
, it will use the fixed value1900-01-01
, which effectively does nothing.
Here is a simple table structure and corresponding SQL query that shows it all in action:
SELECT t.activity_id
FROM (SELECT a.id as activity_id, a.batch_id, a.inserted_at,
row_number() OVER activities_partition -- See window definition below
FROM activity a
INNER JOIN batch b ON b.id = a.batch_id
WINDOW activities_partition AS (
PARTITION BY b.id
ORDER BY
-- This is the good part
CASE
WHEN b.sort_order = 'asc' THEN a.inserted_at
ELSE '1900-01-01'
END ASC,
CASE
WHEN b.sort_order = 'desc' THEN a.inserted_at
ELSE '1900-01-01'
END DESC
)
) t
WHERE t.row_number <= 10 -- For 'asc' batches, this is the first items; for 'desc' batches, this is the last 10 items
ORDER BY t.batch_id, t.inserted_at ASC -- Returns all batches to natural sort order within each batch
Conclusion
In the end, the changes to our code involved adding a join (to get the sort_order
setting) and changing our WINDOW ORDER BY
from just inserted_at
to a couple of weird but very readable CASE
statements.
Performance-wise, our query is a little more expensive than it used to be - but still within tolerances for our biggest, most complex batches (and there may be further improvements to our table and index structure to speed things up even more). But, a little more query complexity in exchange for more maintainable, understandable code is a tradeoff that is well worth it for this instance.