A benefit of an integrated programmable build system

By Macoy Madson. Published on .

I recently released File Helper, a file organization application I wrote using Cakelisp.

This application had only two external files that were necessary for it to fully function:

I packaged File Helper in a .zip or .tar.gz for Windows or Linux respectively. These archives contain the platform executable as well as a license file and the two necessary font and icon files.

However, wouldn't it be nice if instead I shipped a single executable, thereby eliminating the extract step?

It might sound trivial, but eliminating that extra step has many benefits:

Bundling files into executables

An executable is just a file format which your operating system understands. It is essentially a header and a whole bunch of sections filled with binary data.

Typically, a linker converts a collection of object files into a single executable. Because executables are containers which can hold various kinds of data, we can package data only our application understands in the same container as the application code.

The operating system is fine with this because it only needs to map the executable into memory and start executing code at a designated entry point. It is then up to the program to decide how to interpret the various executable sections.

Platform differences

There are many different file formats for executables. Usually, an operating system only supports one executable file format. On Windows, it's the Win32 Portable Executable format, typically with extension .exe. On Linux, it's usually ELF.

I am only targeting those two platforms, so I can add code to specifically support those formats when building Cakelisp programs.

On Windows, data is added to executables via Resource Files. I wrote a tutorial on how to do this.

On Linux, data can be added via dumping the data to an object file which defines a couple symbols. This is a great tutorial on how to do that.

Good and bad ways

Like everything in programming, you'll hear different advice on how to bundle data.

The most common alternative method is to convert your data to a C-style array definition. This has many limitations, and in my opinion should be avoided:

We are going to proceed with the platform dependent but much more robust approach, which is to convert our data to object files without using a C/C++ compiler.

Integrated build system

Whether we are on Windows or Linux, we need to process our data file into some other form in order for the linker to properly understand the data package. This means adding a step to our build to process the data, because we want it to automatically stay up-to-date when linked in the executable.

Cakelisp includes a simple C/C++ build system as well as compile-time code execution. We need to create a new build step to process our binary data into object files. In order to do that, we use a compile-time build hook to execute a function which performs the conversion.

The full code is here.

The end-user interface is simply:

(import "DataBundle.cake")
(bundle-file data-start data-end (const char)
             "../data/MyFont.ttf")

We declare data-start and data-end to represent pointers to the symbols associated with our data.

That bundle-file invocation is a macro that adds the data file to a list. It also generates the variables we can use to refer to the data.

Finally, a compile-time function convert-all-bundle-files calls the necessary objcopy (or Resource Compiler on Windows1) to generate the actual object file for each bundle-file. It only does this if the data files are changed or the object files don't already exist in the cache.

We can then link the generated objects into the executable alongside our code object files. It also adds that object file to the linker command line.

This function is integrated into the Cakelisp build sequence like so:

(add-compile-time-hook-module pre-build convert-all-bundle-files)

Conclusion

This is pretty great: we extended our build system to support bundling arbitrary data files, all without touching Cakelisp's internals itself.

Not only that, we extended the system in the same language we write our application code, and within the same invocation—we didn't need to create some other phase. We were also able to provide the user with an extremely simple interface to bundling files.


  1. On Windows, we need to generate a .rc file with a list of all the resources that should be compiled into a single object file. Because Cakelisp allows arbitrary compile-time code execution, we can easily do this by writing the filenames out to the generated rc, then invoking the Resource Compiler on that file. This platform-specific step can be completely automated!↩︎