An example of why software composition is hard

Combining two concepts into a program is rarely as simple as import A; import B;. Almost always, when you combine two libraries, additional code must be written to handle their interaction. The following example comes from Wolfgang Scholz. I learned about it from the paper An Overview of Feature-Oriented Software Development, by Apel and Kästner.

Suppose one library provides us with the implementation of a singly linked list:

class ForwardList {
    ForwardNode first { get; set; }
    public virtual void push(ForwardNode n) {
        n.next = first; first = n;
    }
}
class ForwardNode {
    ForwardNode next { get; set; }
}

and another library provides us with the implementation of a reversely linked list:

class ReverseList {
    ReverseNode last { get; set; }
    public virtual void shup(ReverseNode n) {
        n.prev = last; last = n;
    }
}
class ReverseNode {
    ReverseNode prev { get; set; }
}

We’d like to combine them to make a doubly-linked list. If our language had multiple inheritance, a naive combination would look like this:

class List : ForwardList, ReverseList {}

This is problematic in at least three ways.

Our merged List class does not contain a single list with two directions of traversal. It contains two lists:

class List : ForwardList, ReverseList {
    override ForwardNode first { get; set; }
    override ReverseNode last { get; set; }
}

Obviously we want to combine ForwardNode and ReverseNode into a single class Node.

class Node : ForwardNode, ReverseNode {}

Ambiguity remains over when and how occurences of ForwardNode and ReverseNode should be replaced. In the case of a doubly-linked list, replacing all occurences gives us the state we want:

class List : ForwardList, ReverseList {
    Node first { get; set; }
    Node last { get; set; }
}

But, had we been combining circular lists, this would not have been enough.

class CircularList : ForwardCircularList, ReverseCircularList {
    Node first { get; set; }
    Node last { get; set; } // wrong!
}

We would have wanted the fields first and last to be combined into a single field.

Conceivably, all of this ambiguity might be resolvable with a set of annotations or language keywords that the programmer provides to specify exactly the combination for which she is looking. But state is only one part of the problem…

Problem 2: Type incompatibility

Replacing all occurences of ForwardNode and ReverseNode with Node violates the typing rules around inheritance. Our method signatures have become:

class List : ForwardList, ReverseList {
    override virtual void push(Node n) { /* ... */ }
    override virtual void shup(Node n) { /* ... */ }
}

The Nodes in these signatures are in contravariant position, and so these signatures are valid only when Node is a superclass of ForwardNode and ReverseNode, not when it is a subclass! That is, any class that previously operated on a ForwardList would make calls to push(ForwardNode node), and these calls would become invalid in cases where the receiver is a List, because List does not accept ForwardNodes in its push(...) method, unless that ForwardNode is actually a Node!

How might we resolve this problem? Just as in the previous problem, we could attempt replacing all occurences of ForwardList with List. That is, for every class X that uses a ForwardList, the language could generate a class XPrime that uses Lists and Nodes instead of ForwardLists and ForwardNodes. This would work in some, but not all cases. If class X constructed instances of ForwardNode directly, for instance, how should it construct a Node directly? Where would we get a value prev to pass into the constructor of Node?

Problem 3: Emergent logic

Correct implementations of push and shup require additional logic that did not occur in either ForwardList or ReverseList:

class List : ForwardList, ReverseList {
    override virtual void push(Node n) {
        if (first == null) {
            last = n;
        } else {
            first.prev = n;
        }
        super.push(n);
    }
    override virtual void shup(Node n) {
        if (last == null) {
            first = n;
        } else {
            last.next = n;
        }
        super.shup(n);
    }
}

push(...) now incorporates some of the logic from super.shup(...), namely the assignment last = n. Simiarly, shup(...) incorporates some of the logic from super.push(...). How would a programming language know that such mixing needed to occur? Moreover, this borrowed code is conditioned on a null check. Neither the null check, nor the code in the other path of the null check existed in the implementations of ForwardList and ReverseList!

Of our three problems, this one is the most vexing. Programming languages today have a multitude of ways to compose code. We have gone well beyond function application and created inheritance, multiple inheritance, generic functions, mixins, traits, classical prototypes, lieberman-style prototypes, aspects, type classes, open extensible object models, module expressions, and the list goes on and on. Yet none of these composition models avoid the need to write additional logic when you combine a ForwardList and a ReverseList.

Still, it must be possible, because humans do it all the time. A promising approach is to add constraints to our language, a la abstract data types. In such a paradigm, we could omit the implementation code, and instead provide constraints on our methods’ behaviors:

class ForwardList {
    inferred void push(ForwardNode node);
    inferred ForwardNode get(int index);
    // ... extra methods ...
    constraint {
        free vars len : int, list : ForwardList;
        for (i : 0..len) {
            list.push(new ForwardNode());
        }
        for (i : 0..(len - 1)) {
            assert list.get(i).next == list.get(i + 1);
        }
    }
    // ... extra constraints ...
}

class ReverseList {
    inferred void shup(ReverseNode node);
    inferred ReverseNode get(int index);
    // ... extra methods ...
    constraint {
        free vars len : int, list : ForwardList;
        assume list.get(0) == null; // so that we start with an empty list
        for (i : 0..len) {
            list.push(new ReverseNode());
        }
        for (i : len..1) {
            assert list.get(i).prev == list.get(i - 1);
        }
    }
    // ... extra constraints ...
}

The beauty of constraints is that they compose perfectly. The composition of two constraints is simply their conjunction! Combining the two types of list is now straightforward:

class Node combines ForwardNode, ReverseNode {}
class List combines
        ForwardList[ForwardNode => Node],
        ReverseList[ReverseNode => Node] {
    // we receive push() from ForwardList
    // we receive shup() from ReverseList
    
    // we receive get() from both, and they become the same method because
    // their types have been coerced to Node.

    // we receive all of the constraints from ForwardList
    
    // and we receive all of the constraints from ReverseList
}

The constraints from ForwardList and ReverseList both call get(), and so we force our sufficiently-smart compiler to derive new implementations of push() and shup(). The constraints ensure that it will be the implementation we desire.

Such a compiler might seem like magic now, but research is progressing toward it. Current endeavors include Rosette, SynQuid, and the ExCAPE project. And you can get a surprising amount of mileage out of the logic program eval(Program, ...) := ....

Problem 1: Ambiguity around how to share state

Problem 2: Type incompatibility

Problem 3: Emergent logic