Automatic redis, part one: inserts and cache key extraction

This post is part of a sequence I am calling automatic redis, which is my attempt to solve the cache invalidation problem.

These are some initial thoughts on how to automate cache updates. The question I want to answer is this: given a mapping from redis keys to the queries that produce their values, how can I infer which redis commands should be run when I add, remove, and update items in the collections which are my source of truth?

The code in this post is psuedo-haskell. What appears to the left of an = sign is not always a function, and the . is used for record field lookup as well as function composition.

I’ll start with a simple example. Suppose I run a website which is a task manager, and I want to display on my website the number of users who have signed up for an account. i.e. I want to display count users. I don’t want to count the entire collection every time I add an item to it, so instead I keep the count in redis, and increment it whenever a new account is created. Proving that INCR is the right command to send to redis is straightforward:

numUsers = count users
numUsers_new = count (users ++ [user])
numUsers_new = count users + count [user]
numUsers_new = numUsers + 1
-- INCR numUsers

Notice that when count distributes, it changes the plus operation from union (++) to addition (+).

Here is a similar example, this time storing the ids instead of a count.

userIds = map userId users
userIds_new = map userId (users ++ [user])
userIds_new = map userId users ++ map userId [user]
userIds_new = userIds ++ map userId [user]
userIds_new = userIds ++ [user.userId]
-- SADD userIds 65495

Obviously the appropriate redis command to use in this case is SADD.

Filtering is also straightforward.

activeUserIds = map userId (filter (\x -> x.status == ACTIVE) users)
activeUserIds_new = map userId (filter (\x -> x.status == ACTIVE) $
  (users ++
  [user]))
activeUserIds_new = map userId (
  filter (\x -> x.status == ACTIVE) users ++
  filter (\x -> x.status == ACTIVE) [user])
activeUserIds_new =
  map userId (filter (\x -> x.status == ACTIVE) users) ++
  map userId (filter (\x -> x.status == ACTIVE) [user])
activeUserIds_new = activeUserIds ++ map userId (filter (\x -> x.status == ACTIVE) [user])
-- SADD activeUserIds 65495

Obviously a pipeline of SADDs will be correct, and the expression to the right of the ++ gives my automatic cache system a procedure for determining which SADD operations to perform. When the cache system gets the user object to be added, it will learn that the number of SADD operations is either zero or one, but it doesn’t have to know that ahead of time.

A computer can easily verify the above three proofs, as long as they are properly annotated. But can I get the computer to create the proof in the first place?

Rewriting the activeUserIds example to use function composition suggests one approach.

activeUserIds = (map userId . filter (\x -> x.status == ACTIVE)) users
activeUserIds_new = activeUserIds ++ (map userId . filter (\x -> x.status == ACTIVE)) [user]

In general, it seems that queries of the form

values = (f . g . h {- ... -}) entities

become

values_new = values `mappend` (f . g . h {- ... -}) [entity]

provided f, g, h, etc. all distribute over mappend. The actual value of mappend will determine which redis operation to perform. Integer addition becomes INCR, set union becomes SADD, sorted set union becomes ZADD, list concatenation becomes LPUSH or RPUSH, etc. An important monoid which may not be obvious is the Last monoid (mappend x y = y), which becomes SET.

So much for updates on constant cache keys. Parameterized cache keys are much more interesting.

On my task manager website, I want to have one cache entry per user. The user’s id will determine the cache key that I use.

taskIds_'userId' = (map taskId . filter (\t -> t.owner == userId)) tasks

It’s tempting to think of this definition as a function:

taskIds :: UserId -> [TaskId]

But an automatic caching system will not benefit from this perspective. From it’s perspective, the input is a task object, and the output is any number of redis commands. The system has to implicitly discover the userId from the task object it receives. The userId parameter of taskIds.{userId} is therefore more like a logic variable (e.g. from prolog) than a variable in imperative or functional languages.

The monoidal shortcut rule is still valid for parameterized redis keys.

taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId . filter (\t -> t.owner == userId)) [task]

The caching system does not need to reduce this expression further, until it receives the task object. When it does, it can evaluate the addend as an expression in a functional-logical language (similar to Curry).

taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId . filter (\t -> t.owner == userId)) [task]
taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId (filter (\t -> t.owner == userId) [task]))
taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId (if (\t -> t.owner == userId) task then
    task : filter (\t -> t.owner == userId) [] else
           filter (\t -> t.owner == userId) []))
taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId (if task.owner == userId then
    task : filter (\t -> t.owner == userId) [] else
           filter (\t -> t.owner == userId) []))

Unfortunately at this point the goal becomes suspended. The cache system can cheat a little by unifying task.owner == userId with True and False.

In the true case, userId unifies with task.owner, which I’ll say is 65495:

taskIds_65495_new = taskIds_65495 ++ (map taskId $
  if 65495 == 65495 then
    task : filter (\t -> t.owner == userId) [] else
           filter (\t -> t.owner == userId) [])
taskIds_65495_new = taskIds_65495 ++ (map taskId $
  if true then
    task : filter (\t -> t.owner == userId) [] else
           filter (\t -> t.owner == userId) [])
taskIds_65495_new = taskIds_65495 ++ (map taskId $
  task : filter (\t -> t.owner == userId) [])
taskIds_65495_new = taskIds_65495 ++ (map taskId [task])
taskIds_65495_new = taskIds_65495 ++ task.id
-- SADD taskIds_65495 ${task.id}

In the false case, userId remains unbound, but that’s ok, because the expression reduces to a no-op:

taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId (if false then
    task : filter (\t -> t.owner == userId) [] else
           filter (\t -> t.owner == userId) []))
taskIds_'userId'_new = taskIds_'userId' ++ 
  (map taskId (filter (\t -> t.owner == userId) []))
taskIds_'userId'_new = taskIds_'userId' ++
  (map taskId [])
taskIds_'userId'_new = taskIds_'userId' ++ []
-- nothing to do

In general, whenever the cache system’s goals become suspended, it can resume narrowing/residuation by picking a subexpression with low multiplicity (e.g. booleans, enums) and nondeterministically unifying it with all possible values.

Most of the time, each unification will result in either a no-op, or a redis command with all parameters bound. An exception (are there others?) is queries which affect an inifinite number of redis keys, e.g. caching all tasks that do NOT belong to a user.

taskIds_'userId' = (map taskId . (filter (\t -> t.owner != userId))) tasks

This is clearly a bug, so the caching system can just log an error and perform no cache updates. It may even be possible for the caching system to catch the bug at compile time by letting the inserted entity (e.g. a task) be an unbound variable, and seeing if a non-degenerate redis command with unbound redis key parameters can be produced.

This post has focused mostly on inserts and queries that fit the monoidal pattern. In another post I’ll take a look at deletes and queries which are not so straightforward.