Introducing Marks

In my previous two posts (here and here), I described how and why programming languages can’t talk about many issues that affect programmers–important issues like product requirements, design constraints, intellectual property, and more. I also inventoried the mechanisms that extend the semantics of languages today, and explored why those mechanisms have limited value. If you haven’t read those posts, please do; what I say next won’t make a lot of sense without that foundation.

In the intent programming language that I’m creating, the solution to this problem is called “marks” (a name which alludes to linguistic markedness). Marks play a role somewhat analogous to adjectives and adverbs in human language; they are crucial enrichers. They resemble decorators or annotations in other languages, though their power is much, much greater.

Without further ado, let me provide a blueprint for this bridge across the semantic gap that I’ve been lamenting–the design guidelines for “marks.” Then I’m going to show you an example of how easy it could be to use marks, and how much power they give you.

image credit: Curious Expeditions (Flickr)

Blueprint

Code and its compiler(s) must have a compile-time API specified by the language.
It’s not okay if Clang generates one type of AST, GCC a second, and MSVC a third; all compilers that support the language must expose a spec-compatible, programmable API for all language constructs. For example, I need to be able to find out what parameters and local variables are declared in a function, and what their data types and other characteristics are. This is similar to what reflection offers, but reflection doesn’t help at all, because I need this before run-time. As I mentioned in my post about making a codebase const-correct, the lack of this feature is really a serious design flaw. Why should code, of all things programmers deal with, be impossible to code against?

Any object in code must support decoration (semantic marks).
Existing decorator/annotation/attribute mechanisms are fairly broad already. However, I haven’t seen any solutions that let me decorate arbitrary blocks of code, individual assignments, conditionals… Plus:
The scope of code must be expanded to include constructs above the level of a compilation unit.
Today, programmers usually write code for classes, packages, and modules, but all structures “above” that level (applications, libraries, assemblies, product suites) are described in some alternative mechanism (e.g. pom.xml, SConstruct, CMakeLists.txt, Visual Studio solution, Eclipse workspace). Typically these constructs are viewed as optional veneer offered by an IDE–often, they’re not even formally defined in the language’s spec. This means you can’t decorate them (see #2), and they don’t have a programmatic API that’s unifiable with that of code at compile-time (see #1).
Code must have a DOM.
This is really a corollary to items 1 through 3; any element of code must be reachable through the code’s interface, which implies something DOM-like. It may be unnecessary to hold the entirety of a DOM in memory, though; perhaps a SAX-style interface would suffice. Interestingly, this requirement also makes code into true hypertext, which has other ramifications that I’m planning to blog about later.
Call graphs and other producer/consumer relationships must be part of the code interface.
The utility of this will become clear in examples.
Decorator attachment must be richer than binary on/off.
This is a flaw in existing decorator mechanisms. If I put @foo on top of a class in Java, the annotation is present. If I don’t put it there, it’s not. Binary.

Human brains and human languages don’t work that way; they’re more fuzzy. If I tell you that “Fred was an executive at Enron,” you immediately generate theories about Fred. The fact that I imparted that information at all means that I consider it significant in some way–so you imagine reasons why, and associate them weakly/tentatively to Fred in your mind: he may have been fired, he may have been guilty of shady behavior, he may be a whistleblower, etc.

At least the following modes of attachment need to be supported: explicitly affirmed (binary “on”), explicitly denied (binary “off”), implicitly affirmed (true unless I get explicit evidence to the contrary), implicitly denied.
Semantic marks must support sophisticated combination and propagation.
For one, marks need to be able to subsume or imply other marks. In the real world, I can tell you that my car is a Lamborghini (in my dreams :-), and in doing so, I’ve already told that my car is a sports car, that it’s expensive, that it’s hard to find parts, that it’s a favorite target for speed traps… Likewise, if I am writing a class, and I put a “prototype” mark on it, I may want you to also know that the class is “insufficiently tested”, or “not shippable”. Such logic must be under programmer control.

Another aspect of propagation is that marks must cascade across various scopes. If I have a function that is marked as “not thread-safe”, then any caller of that function must acquire the same mark. If I have an application that is marked as “free”, then all modules used by that application must also be “free” by implication. If I mark a package as “internal use only”, then no functions in that package should be used in projects that exports symbols from the package.
Marks must be evaluated at compile-time.
Evaluation means running code that marks “carry”. Think of this like static asserts on steroids.

Examples

Let’s take one of the use cases that I’ve mentioned in previous posts: a programming team has a mandate not to use any GPL’ed code. Here’s how simply that rule could be enforced in code, given the “marks” mechanism:

Each component, package, module, or individual function is marked with its license. (Remember, since marks propagate, this is not an onerous task. It requires no more effort than today’s informal convention of checking in a file named LICENSE or COPYING at the top of a folder. License marks would propagate through folder hierarchies unless/until overridden in a sub-folder.)
All team projects receive a “no GPL in the call tree” mark. For now, imagine that this works more or less as follows:
```
@no_gpl
class my_project: project
    // body of project definition
```
During compilation, the compiler evaluates the validity of marks. If the project includes any GPL’ed code, a “semantic error” (not a “syntax error”) is generated, because the project-level mark and the lower-level mark(s) are incompatible.

Does this sound too good to be true? With a reasonable API to access the AST, writing marks that implement this logic is a piece of cake. Here’s python-ish pseudocode for the implementation. (I’m using pseudocode instead of intent code, because I don’t want to bog down this discussion in tons of extraneous details.)

As code is compiled, the compiler executes the can_bind() method of each mark that’s been placed. This causes calculations about semantic compatibility, without the compiler having to understand the semantics itself.

I’m glossing over lots of details here. (At what point in the compilation process is the AST for a function known, making the API used by the marks useful? Which mark placement is tested first? How are errors reported? How does the compiler know that mark code is callable at compile time instead of just at runtime?) I have preliminary answers, but this post is not the place. For now, just take it on faith that the compile model is workable.

One more example, just for fun. Suppose you want to guarantee that across a large object model, all object instances have IDs which are strings. These strings must consist of a single line of between 20 and 40 printable characters; they cannot be null. Anywhere that member variables are named “id”, or parameters are used to set a member variable named “id”, you want these semantics enforced by precondition:

In my next post, I’ll explore a bunch of additional examples, and I’ll cover more details about how these marks work their magic.