A data representation language
I have an idea for a language, and I want to know if it already exists.
The language is a data representation language. It encodes rules about how data is represented in a store
(e.g. MySQL, HBase, Riak, Neo4J, MongoDB, Redis, flat files).
The language would have four directives: entity
, predicate
, operation
, and realize
.
The entity
directive gives the “platonic” description of a type.
entity User
Id id
String userName
String displayName
entity Task
Id id
String name
Date dueDate
entity UserTask // encodes many-to-many relationship
Id id
User user
Task task
entity Comment
Id id
Task parentTask
String comment
Date when
The predicate
directive tells how these types are represented. For example, to represent the user object in a relational
database:
import RDBMS
predicate forall u: User exists r: RelationalRow where
r.table = 'User' and
r['id'] = u.id and
r['userName'] = u.userName and
r['displayName'] = u.displayName
or in a key value store:
import KeyValue
import Pickle
predicate forall u: User exists p: KeyValuePair where
p.key = "User." + u.id.toString() and
p.value : Pickled<Map<String,String>> and
p.value['userName'] = u.userName and
p.value['displayName'] = u.displayName
You can also use predicates to specify how types are embodied in classes.
predicate forall u: User exists o: Object where
o.class = 'org.example.myapp.objects.User' and
o['id'] = u.id and
o['displayName'] = u.displayName
Notice that I left out userName
; classes do not have to be perfectly aligned with the platonic entities. You can even combine
different entities into a single class. For example, imagine a Java class like this:
public class Task {
public long id;
public String name;
public Date dueDate;
public List<Comment> comments;
}
Even though tasks and comments are separate entites, you can still map between them and the task class:
predicate forall t: Task exists o: Object where
o.class = 'org.example.myapp.objects.Task' and
o['id'] = t.id and
o['name'] = t.name and
o['dueDate'] = dueDate and
o['comments'] : List and
o `sortedBy` (c => c['when']) and
forall c: Comment (c.parentTask = t =>
exists co: Object where
co `in` o['comments'] and
co.class = 'org.example.myapp.objects.Comment' and
co['id'] = c.id and
co['when'] = c.when
co['comment'] = c.comment) and
forall co: Object (co `in` o['comments'] =>
exists c: Comment where
c.parentTask = t and
c.id = co['id'])
It’s a little crazy, but it could be made simpler with a library function and/or syntactic sugar saying “this embodied list matches this list of entites”. I just wanted to give you some idea of how flexible I want this language to be.
The operation
directive gives names to operations that might be performed on the entities.
operation createUser(userName: in String, displayName: in String) where
exists u: User and
u.userName = userName and
u.displayName = displayName and
u.id `notIn` Before.User and
After.User = Before.User + u
operation getTasksForUser(userId: in Id, tasks: out Set<Task>) where
exists u: User where u.id = userId => (
forall t: Task (exists ut: UserTask where
ut.user = u and
ut.task = t =>
ut.task `in` tasks) and
forall t: Task (t `in` tasks =>
exists ut: User where ut.user = u and ut.task = t))
The realize
directive indicates how operations will be realized using concrete classes.
realize createUser as
void createUser(userName: java.lang.String, displayName: java.lang.String)
realize getTasksForUser as
java.util.Set<org.example.myapp.objects.Task> getTasksForUser(u: org.example.myapp.objects.User)
Compiling would generate a code block for each realize
directive. It would fail if any of the operations were impossible.
(e.g. getTasksForUser
would be impossible for a key-value store if you had stored only
Task => [User]
pairs and forgotten the User => [Taks]
pairs. It would generate a warning if any of the operations were slow.
(e.g. getCommentsForTask
on an ordered key-value store when the comments were indexed by commentId
and not by $taskId:$commentId
)
So, does a language like this already exist? I know there are several things that come close, ORMs being the obvious example. Most ORMs require you to build schemas according to THEIR rules, not your own rules, and the exceptional ones require you to write custom code, usually 4 different times, for the get, set, update, and delete cases, when the representation is anything non-standard.
I want something that can handle
- multiple stores simultaneously, e.g. memcache and MySQL.
- denormalized data. e.g. if I have a User-to-Task and a Task-to-User representation of
UserTask
in a sharded database, the code generated forcreateUserTask
should do two inserts. - other really crazy representations, such as
- In a column family database, storing the first comment of a task in the
comment1
column of the row for that task, the second comment in thecomment2
column, etc. - Putting a sentinel value in a redis list so I can tell the difference between an empty list and
unknown
.
- In a column family database, storing the first comment of a task in the
Since it seems really useful, I would love to write this language, but honestly, I don’t even know where to begin.
Conceptually, how do you translate quantified logic into imperative code? What would abstraction look like in this language?
(e.g. Can I make a listEqualsList
function?) Outside of the entity
/predicate
/operation
/realize
directives, what
primitives would I need to provide so that other people can write modules for their favorite pet database?