romain.bardou.fr - blog RSS

Chalumeau, a Build Library Instead of a Build Framework

I've been reading about the differences between frameworks and libraries. And I noticed that some of the issues with frameworks are the same issues that we encounter in some build tools such as Ocamlbuild or even make. So I started experimenting on the design of a build library, by opposition of a build framework. It's just an experiment; I did not make yet another build system. But I find the results interesting.

Frameworks Versus Libraries

Frameworks take control of your program. They are basically programs with holes that you fill with blocks such as configuration settings, contents, callbacks… They are in charge of gluing those blocks together. By contrast, libraries are sets of blocks and it's your job to glue them together.

With a framework, if the particular behavior you want requires a hole which isn't there, you have to patch the framework to carve the hole yourself. Or, more realistically, you'll just find a clumsy workaround. With a library, if a function does not behave exactly like you want, it's not a big deal. Unless the library hides all of its internals from you, you can use the parts which do what you want and write the parts which do not.

Basically, frameworks tend to be more opinionated than libraries, which tend to be more flexible. This is just a tendancy though. Of course there are badly-designed libraries, and there are flexible frameworks. But I'm starting to believe that the mindset of making a library has better chance to yield a good result than the mindset of making a framework.

Abstract Versus Concrete

Abstracting over data structures can prevent library users from breaking invariants. For instance, the Set module of OCaml is implemented using balanced search trees. Breaking the "balanced" property would make the data structure less efficient, and breaking the "search" property would basically break the whole module. But the interface of Set does not allow modifying trees directly. In fact, there is no way, from the interface, to tell that the underlying data structure is a tree. The library is protecting you from yourself.

Abstraction also allows to change an implementation to an incompatible one without worrying of breaking user code as long as the interface remains the same. For instance, one could change the Set module implementation to use persistent hash tables instead of trees. The interface would be compatible, so one could immediately benchmark the new implementation on existing programs. If it proves more efficient, one could replace the implementation in the standard library and no one would notice except for their program running faster.

The caveat of abstraction is lack of flexibility. That's the whole point: taking control to prevent users from making mistakes. But what if they know what they are doing? Take the Set module of OCaml. Try to write an iterator on an interval of values. Doing that efficiently requires knowledge that the internal data structure is a balanced search tree. But the module is going out of its way to hide this from you. So you're left with rewriting it entirely. Or to ask a lawyer if the license allows you to copy-paste it.

In the previous section, I wrote: "Unless the library hides all of its internals from you, you can use the parts which do what you want and write the parts which do not". Abstraction hides the internals. It has its advantages, but it comes at a price.

Designing Libraries by Factoring Code

It's tempting to try and guess what users will need. But you run the risk of guessing wrongly. You may abstract too much because you did not envision all use cases. You may assume that all programs are written in a specific way and end up with the problems that framework suffer from.

Instead of guessing, an interesting approach is to write user code without the library, observe common patterns and factor them into functions. Step by step, functions are extracted, code becomes smaller.

Then one can look at extracted functions and decide whether it makes sense to package them into a library. Usually it does: after all, at some point of your refactoring, your code used them.

Extracted functions that were built on top of previously extracted functions are naturally more abstract. Layers of abstraction thus appear naturally and you can now decide whether you want to hide them. Or, you could organize the documentation so that those layers appear clearly. Users would be aware that they are using lower-level layers but could still use them. In fact, you could provide several interfaces, one for each level of abstraction.

This does not magically solve everything. If all functions of your library call a common function "orderapizza", and if the user loves everything about your library except the fact that it orders pizzas instead of sushis, the user has to rewrite everything. To solve this you'll have to somehow parameterize your library by the type of food to order. A parameter is a hole in a program. We talked about this already: the problem with holes is that you have to plan for them.

Build Systems

I am very glad that Ocamlbuild exists; its goal of being a tool that "just works" has been a success for simple projects. But as soon as Ocamlbuild is not doing what you want, the experience becomes more painful. Part of the reason is the historical lack of documentation.

But I believe that a deeper reason is that it is basically a framework. It is in charge of the control flow. Sometimes it decides that you may have some say in what to do next, and it looks at tags that you gave to files. If there is no tag for your use case though, you have to write a plugin. A plugin is basically a set of callbacks. But if there is no callback for your use case, you have to find a workaround or use an undocumented feature of the plugin API.

It took me a while to realize this. I actually wrote two Ocamlbuild-like tools because I liked it enough to try and make an even better version. Both had their shortcomings which were partly due to them being frameworks. So what if we made a build system which is not a framework, but a library? We could try to do that using the approach of the previous section, for instance.

Designing a Build Library

Let's start by writing user code. We want to make a program which builds another OCaml program by calling the OCaml compiler in the right sequence. Without a library, this could look like this:

let () =
  print_endline "log.mli -> log.cmi";
  Sys.command "ocamlc -c log.mli";
  print_endline "log.ml -> log.cmo";
  Sys.command "ocamlc -c log.ml";
  print_endline "misc.mli -> misc.cmi";
  Sys.command "ocamlc -c misc.mli";
  print_endline "misc.ml -> misc.cmo";
  Sys.command "ocamlc -c misc.ml";
  print_endline "main.ml -> main.cmi, main.cmo";
  Sys.command "ocamlc -c main.ml";

  print_endline "unix.cma, log.cmo, misc.cmo, main.cmo -> main.byte";
  Sys.command "ocamlc unix.cma log.cmo misc.cmo main.cmo -o main.byte";

  print_endline "log.ml -> log.cmx";
  Sys.command "ocamlopt -c log.ml";
  print_endline "misc.ml -> misc.cmx";
  Sys.command "ocamlopt -c misc.ml";
  print_endline "main.ml -> main.cmx";
  Sys.command "ocamlopt -c main.ml";

  print_endline "unix.cmxa, log.cmx, misc.cmx, main.cmx -> main.native";
  Sys.command "ocamlopt unix.cmxa log.cmx misc.cmx main.cmx -o main.native";

  print_endline "Done."

Obviously this lacks many important details, including error handling, incremental compilation and parallel builds, but let's not make this blog post even longer.

So what can be factored out? Let's start with calls to ocamlc and ocamlopt and common logs:

let ocamlc_c filename output =
  print_endline (filename ^ " -> " ^ output);
  Sys.command ("ocamlc -c " ^ filename)

let ocamlc filenames output =
  print_endline (String.concat ", " filenames ^ " -> " ^ output);
  Sys.command ("ocamlc " ^ String.concat " " filenames ^ " -o " ^ output)

let ocamlopt_c filename =
  print_endline (filename ^ " -> " ^ output);
  Sys.command ("ocamlopt -c " ^ filename)

let ocamlopt filenames output =
  print_endline (String.concat ", " filenames ^ " -> " ^ output);
  Sys.command ("ocamlopt " ^ String.concat " " filenames ^ " -o " ^ output)

let () =
  (* Bytecode *)
  ocamlc_c "log.mli" "log.cmi";
  ocamlc_c "log.ml" "log.cmo";
  ocamlc_c "misc.mli" "misc.cmi";
  ocamlc_c "misc.ml" "misc.cmo";
  ocamlc_c "main.ml" "main.cmi, main.cmo";
  ocamlc [ "unix.cma"; "log.cmo"; "misc.cmo"; "main.cmo" ] "main.byte";

  (* Native code *)
  ocamlopt_c "log.ml" "log.cmx";
  ocamlopt_c "misc.ml" "misc.cmx";
  ocamlopt_c "main.ml" "main.cmx";
  ocamlopt [ "unix.cmxa"; "log.cmx"; "misc.cmx"; "main.cmx" ] "main.native";

  print_endline "Done."

Functions ocamlc_c and ocamlopt_c look quite similar, and so do ocamlc and ocamlopt. Let's use that:

let ocaml_compile compiler filename output =
  print_endline (filename ^ " -> " ^ output);
  Sys.command (compiler ^ " -c " ^ filename)

let ocamlc_c = ocaml_compile "ocamlc"
let ocamlopt_c = ocaml_compile "ocamlopt"

let ocaml_link compiler filenames output =
  print_endline (String.concat ", " filenames ^ " -> " ^ output);
  Sys.command (compiler ^ " " ^ String.concat " " filenames ^ " -o " ^ output)

let ocamlc = ocaml_link "ocamlc"
let ocamlopt = ocaml_link "ocamlopt"

We can then continue factoring code by noticing similarities. Bytecode compilation and native code compilation is very similar, we can share the code by having extensions and the compiler be parameters. Going further, we can compute the order of modules automatically using ocamldep to produce a dependency graph. And so on.

We are adding new possibilities, never removing any. Functions we introduce generalize code we actually need, or allow to repeat a specific use case. If those functions are not flexible enough, we know by construction that the user can write them by hand.

In the end, our program can become a simple call to a function like:

build (ls ".") "main"

If that becomes a general use case we can even have the list of files (given by ls ".") and the name of the executable (here main) be optional and simply write:

build ()

We obtain a function which is very simple to use in the standard use case, but is still flexible enough to be useful in other situations.

The Birth of Chalumeau

Now all we have to do is package our functions into a library. Let's call it Chalumeau (a chalumeau can be used to build stuff, and it sounds like chameau, which is the French word for OCaml; also, a chalumeau is a dromaludaire with two humps).

A user could come up with interesting new ways to combine our functions. For instance, if we make public our function which makes the dependency graph, the user could make an image out of this graph. When he's done, he can propose a patch to include his function in the library, or provide his own extension library.

I continued working on Chalumeau. It is available on GitHub. It is still quite experimental though.

Future Work

I feel that one problem will be to add specific compilation flags for some modules, such as -pp to apply preprocessors. It's basically an instance of the pizza versus sushi issue that we already discussed.

Ocamlbuild uses tags to solve this issue, but it's hard to know which tags are available and what they do exactly. And if you need a tag which does not exist, it gets messy. Basically, tags are the perfect example of what is good and what is bad with a framework.

Conclusion

I'm not saying that frameworks are always a bad idea. But if you are designing a framework, please consider making a library instead. At least try to make the user free of using only the parts he actually wants, when he wants. And try to give him the tools to adapt your work for his use case.

Similarly, I'm not saying that abstraction is bad. I've advocated the opposite for years now. I'm just saying that it's not without its drawbacks.

And finally, I don't know if a build library like Chalumeau would actually work in the long run. I'm not really planning on making yet another build system. But it was a fun experiment and it looks promising. Maybe I'll even use it for some projects, because after all it's a small and simple library which is easy to extend. I don't have much time to turn it into a full-fledged library, because a new league of Path of Exile has just started, but I'd love to hear your feedback.