A Separation

December 12, 2012


Whenever I create a new module or component, I will first try to separate it into two parts:

And I think you should try that, too.

Why? Here is The Long Story

Programming, in the sense of telling a CPU what to do, independent of the methods involved, be it functional, logical, object-oriented, or actor-based, usually tells us not so much about our program, it tells us something about the CPU, the operating system, or the libraries involved.

So if programs tell our CPUs and operating systems what to do, who tells our programs what to do?

We do it, but that information is usually gone as soon we type the first line of code.

So for any reasonable complex program, programming will lead to the original intentions getting lost because a program is more focused on towards the hardware and frameworks that execute it.

So a program always does two things:

No matter how we decorate a program with meaningful variable names, it will always be obscured by its very own structure, by system calls, additional operators, symbols, lists, lambdas, which are unrelated to the original requirement. These are tools that are optimized to talk to the CPU, to the compiler, to APIs, but barely to humans.

So now that we know there are two separate things, we could give the two parts names: The specification and the implementation.

That is nothing new. Classic programming is about taking a specification, implementing it, and then forgetting it.

But do modern (agile?) programmers use specifications anymore? From what I know, they talk to their customers in regular intervals and then repeatedly prune code into a shape so that it iterates towards their imagination of what they think the client needs. And often it is the programmer's imagination that needs these iterations, not the code. This "process" is the most effective way to develop software as far as I know. So frankly, here, the specification never existed. It can not get lost in translation. The problem seems to be solved.

But to make changes to an existing code base requires a very deep understanding of what is going on in the implementation, and also, more importantly, about what was originally defined by the specification or was imagined in the head of the client.

So one might wonder why we don't write specifications anymore. Or why, at times we did, it always felt pretty useless as soon the first code lines were written.

The most basic problem with the specification is that it is never as detailed as the implementation, because a proper implementation also needs to take a lot of additional variables into account that usually can not be foreseen by a person that is not the programmer.

And because programmers don't like to write or change specifications, all these important decisions and switches just appear in the code and never make its way back in to the specification.

... more precisely, programmers don't like to do anything. Programmers are - by their very nature - very lazy people, because if they wouldn't be, they would not be good at programming, which requires a basic motivation to avoid and automate boring and repeating labor, which then may lead to a world where only programmers and robots are required anymore. The mad realization here is that lazy programmers create a society in which everyone but programmers can be lazy. And although power and wealth is probably a good compensation for that, I doubt that we can survive run by programmers who just want to be lazy but do the entire manual work that's left. The only solution to that problem is to replace programmers by artificial intelligence. Fortunately, we need only lazy programmers to do that.

So we need to accept that a written or imagined specification can - by definition - never be so detailed than the code that runs it. Accepting that, we could both throw the idea of a "living" specification away and rebuild the specification (or intentional map) in our heads by reading a lot of code right before we want to do small changes, or we could finally accept that the code is the specification.

And this is no news either, but compared to all the other progress we made yet, we are heavily struggling with that challenge for a long time now.

For example, TDD or BDD are excellent examples of rudimentary attempts to bring back specifications into our programs by writing code that observes and verifies the behavior of programs. But even though these practices reduce bugs by a fair amount, they introduce yet another liability by adding a lot more code.

What we want is less code, not more. And should testing really be complected with the specification?

For once, we should not forget that a specification is pure in the sense that it defines what should happen. So whatever we test, it can never be the specification that is under test. That's one reason you never need to build a test case for a test, because the test ultimately defines what should happen, and so does a specification.

Consequentially, we need to get aware that it is important to separate specification and implementation right in our code.

One way to separate the specification from the implementation is to think about the specification as a simple data graph that is static and fixed once it has been built. A specification of a program should be an immutable graph that completely defines the dynamic behavior of a program.

Compare that to markup or the source code that we compile. It has the same properties. It is a complete, immutable blueprint that specifies to some interpreter or the CPU how our program is to be executed.

Most of the software projects I see today, mix the specification together with the implementation so that everything looks like a complected mashup of domain specific terms and executable code.

We need the discipline to separate the specification from the program that runs it. We need to create languages (preferable internal DSLs) together with the appropriate domain specific data types that to allow us to create a domain specific specification, which then can be run by an interpreter.

The language builds the data types that form the specification, which is then interpreted.

Now, even that concept is not new either. Compilers, Browsers are all working this way. They take a specification in, and interpret or translate it.

But if we know that this concept leads to the most sophisticated programs (namely the compiler or the browser), and probably the most complex and stable software besides the operating system, why don't we use this model to create our programs?

One explanation could be that we are not smart enough. Abstractions like markup or programming languages take a long time to develop, and even then it may not be guaranteed that they foster change and can be extended easily.

Also there is another scary element that eventually comes up in any fairly complex system. Executable parts, like Turing complete languages that compensate for abstractions we are not able to see yet. JavaScript, originally built to extend HTML, is a prominent example that is taking over the whole web right now.

So we need to be aware that sometimes a specification needs complex executable parts, but these should be small and separate from the time and context the interpreter runs in.

Instead of creating more powerful computer languages, we may need to craft libraries that enable us to create specifications and interpreters for the programs we want to build.

This separation would have some positive consequences:

Admittedly, and so far, this is a rather linear view of the relation between a specification and the interpreter. In reality it would be more like a number of specifications and interpreters working together. But as long the boundaries are clear and we are aware of them, I can imagine that such a basic separation principle could lead to better, more maintainable programs. Programs, which don't hide their business logic between layers of functions or classes.

So how to start? My best guess is just to think first how a specific domain can be modeled, and if the problem can be clearly separated into a specification and an interpreter. If it can't, the domain needs to be untangled first or new abstractions need to be found.

This idea is growing on me now, and I am thinking a lot about the declarative nature of specifications, and how they can stay separate of their execution.

To summarize, I want to share my current ideas about program code the suites as a specification:

So while I can not really grasp how a complex program could be specified instead of programmed, I can try to summarize what it would be like:

That said, I think we should start small by setting up a seed constraint:

And I will try to set up a page with some of the C#/.NET libraries that are great candidates to build software that does not forget its specification. But I need your help for that.

If you really made it down to here and you are a .NET developer, please send me all the libraries and frameworks you like to see on that list. Comments or Twitter preferred.