Automatic redis, part two: sorting and data structures
This post is part of a sequence I am calling automatic redis, which is my attempt to solve the cache invalidation problem.
In my previous post, I demonstrated that a
library could infer cache update operations from database insert operations by performing
algebraic manipulations on the queries that define the cache keys. The algebraic
laws needed were the distribution laws between monoids. e.g. count
distributes
over the Set
monoid to produce the Sum
monoid. A library could also
infer the arguments of the cache keys (e.g. taskIds.{userId} -> taskIds.65495
) by
performing functional logical evaluation on the cache key’s query. If the library’s goal
became suspended during evaluation, it could proceed by unifying expressions
of low multiplicity with all possible values. For instance, if the goal for a filter
query became suspended, the library could proceed by considering the true
and
false
cases of the filter separately.
In this post I would like to talk about sorting and limiting, as well as flesh out some of the data structures that might be used in an automatic redis library.
Set
Set
is the simplest data structure,
and forms the foundation for two of our other collection types.
type Set a = Data.Set.Set
The monoidal operation for Set
is simply set union.
List
List
is a Set
with an embedded sorting function. Tracking the sorting function
enables us to compute redis sorted set keys if necessary.
data List a b = (Ord b) => List (a -> b) (Set a)
A commonly used sorting function would be x => x.modifiedDate
.
The monoidal operation for List
is the merge operation from merge-sort, with
one restriction: the sorting functions of both lists must be the same
sorting function.
LimitedList
LimitedList
is a List
with an upper bound on its size.
data LimitedList a b = (Ord b) => LimitedList Integer (List a b)
The length of the contained List
must be less than or equal to the upper bound.
Tracking the length enables us to know how to trim cache entries, e.g.
when using the ZREMRANGEBYRANK command.
The monoidal operation for LimitedList
is to merge-sort the two lists and truncate
the result to the limit. Similarly to List
, the library expects both lists to have
the same
upper limit.
First and Last
First
and Last
are essentially LimitedList
s whose upper bound is 1
. Making
specialized types for singleton LimitedLists makes working with non-collection redis
data structures easier.
data First a b = (Ord b) => First (a -> b) (Maybe a)
data Last a b = (Ord b) => Last (a -> b) (Maybe a)
Although First
and Last
have the same representation, they have different monoidal
operations, namely (x,y) => x
and (x,y) => y
.
Maybe
The Maybe
type is useful for queries that always generate a unique result (such
as lookup by primary key), and as such the Maybe
type
does not need to contain a sorting function.
data Maybe a = Nothing | Just a
The monoidal operation is to pick Just
over Nothing
, but with the restriction
that both arguments cannot be Just
s.
instance Monoid Maybe where
Nothing `mappend` Nothing = Nothing
Nothing `mappend` (Just x) = Just x
(Just x) `mappend` Nothing = Just x
(Just x) `mappend` (Just y) = error "This should never happen."
Collision of Just
s can happen if the application developer misuses the The
operation
(defined below). Unfortunately this error cannot be caught by an automatic redis
library, because
the library never actually computes the value of mappend
. The library only
tracks monoidal types so that it can know what the final redis commands will
be.
Speaking of query operations, it’s about time I defined them. But first… one more monoid.
data Sum = Sum Integer
instance Monoid Sum where
mappend = (+)
Query operations
Query operations are parameterized over an input type and an output type.
-- QO = Query Operation
data QO input output where
-- The operations Where, Count, Sum, The, and SortBy are not concerned with the ordering
-- of their input, so they can work on Sets, Lists, LimitedLists, Firsts, Lasts,
-- and Maybes. In these constructor definitions, 'coll' can mean any of those types.
-- A real implementation might have multiples versions of these query operations,
-- e.g. WhereSet, WhereList, WhereLimitedList, ..., CountSet, CountList, etc.
Where :: Expr (a -> Boolean) -> QO (coll a) (coll a)
Count :: QO (coll a) Sum
Sum :: QO (coll Integer) Sum
-- 'The' takes a collection which is expected to have no more than one element
-- and extracts the element.
The :: QO (coll a) (Maybe a)
-- SortBy converts any kind of collection into a List.
SortBy :: (Ord b) => Expr (a -> b) -> QO (coll a) (List a)
-- Limit, First, and Last, are defined for any (seq)uence:
-- Lists, LimitedLists, Firsts, and Lasts.
Limit :: Integer -> QO (seq a) (LimitedList a)
First :: QO (seq a) (First a)
Last :: QO (seq a) (Last a)
-- Mapping only works on Set!
Select :: Expr (a -> b) -> QO (Set a) (Set b)
-- Well technically Select also works on Maybe, but we'll make a separate
-- query operation for Maybes.
Apply :: Expr (a -> b) -> QO (Maybe a) (Maybe b)
-- Lists contain their sorting function, so we cannot allow arbitrary
-- mapping on lists. We can, however, support monotonic mappings.
SelectMonotonic :: Expr (a -> b) -> QO (seq a) (seq b)
-- Mappings which scramble the order are also allowed, as long as we
-- have a way to recover the order. i.e. 'a -> c' has to be monotonic,
-- even though 'a -> b' and 'b -> c' do not.
SelectReversible :: Expr (a -> b) -> Expr (b -> c) -> QO (seq a) (seq b)
A few more data structures and we will have all the pieces necessary for an application developer to define a cache schema.
data Table t = Table String
-- A Query is a sequence of query operations that begins with a table
data Query output where
From :: Table t -> Query (Set t)
Compose :: Query input -> QO input output -> Query output
-- convenience constructor
(+>) = Compose
data CacheKeyDefinition = CacheKeyDefinition {
keyTemplate :: String, -- e.g. "taskIds.{userId}"
query :: Query -- e.g. from tasks where task.userId = userId select task.id
}
Putting it all together, we can showcase the cache schema for a simple task management website.
type TaskId = String
type UserId = String
data Task = {
taskId :: TaskId,
ownerId :: UserId,
title :: String,
completed :: Boolean,
dueDate :: Integer }
deriving (Eq, Ord, Read, Show)
taskTable = Table "tasks" :: Table Task
schema = do
-- The task objects.
-- type: String
-- expected redis commands on insert:
-- SET
"task.{taskId}" $= \tid ->
From taskTable +>
Where (\t -> taskId t == tid) +>
The +>
Apply show
-- For each user, the ids of her most urgent tasks.
-- type: Sorted Set, where the keys are the dueDate and the values are the taskIds.
-- expected redis commands on insert:
-- ZADD
-- ZREMRANGEBYRANK
"activeTaskIds.{userId}" $= \uid ->
From taskTable +>
Where (\t -> ownerId t == uid && not (completed t)) +>
SortBy dueDate +>
Limit 100 +>
SelectReversible (\t -> (dueDate t, taskId t)) fst
-- The number of tasks a user has successfully completed.
-- type: integer
-- expected redis commands on insert:
-- INCR
"numCompleted.{userId}" $= \uid ->
From taskTable +>
Where (\t -> ownerId t == uid && completed t) +>
Count
It’s important to keep in mind that although I have made the above code look
like haskell, no library in haskell could actually use the above code. The variables
occuring after the $=
sign are logic variables, not function parameters. An
EDSL could get close to something like the above, but the normal types for
==
and &&
are unusable, and the lambdas inside the Where
clauses
would need to be reified anyway.
Still to come: deletes, updates, uniqueness constraints (maybe?), and psuedo-code for the generation of redis commands.