It has been a while since I felt the need to add features and make some more significant changes to Cakelisp.
Two came in this past month:
defer and CRC builds.
defer feature is one I had been wanting for a while, but was unsure about how I wanted to implement it cleanly.
Here's an example usage of
defun main (&return int) (* FILE) (fopen "File.txt", "rb")) (var file (unless file (return 1)) ( (defer (fclose file));; Do file operations... return 0)) (
defer there, I am guaranteed to have the file closed if the function ever returns. This removes the need to copy-paste
(fclose file) before every
return, which can be very cumbersome. It also makes the program more reliable, because I might forget to paste the
This feature can be found in many other new languages, including Zig and Go. It is a simple way to have some automatic actions without needing to add C++-style constructors and destructors, which can be quite complicated.
In Cakelisp, macros and compile-time code modification make it tricky to know when the code is the final state that will be compiled.
defer, I needed to know two things:
- The commands that should be deferred, which can be specified in many separate blocks
- Everywhere a scope exit occurs, so that the commands can be executed before exit
Scopes are sequences of code that will always be executed together. An
if clause can have a scope executed if the condition is true, or (optionally) one that should be executed when the condition is false. Loop constructs like C's
while enter and exit the loop body scope on each iteration. Finally, functions themselves constitute a function-body scope.
How Cakelisp code generation works
In Cakelisp, code generation happens through either a macro or a generator. Macros output tokens. There are only four kinds of tokens: strings, like
"Hello, world!"; symbols, like
defun; open parenthesis; and close parenthesis. Cakelisp macros can run arbitrary code, including custom validation, creating and setting compile-time variables, etc. I have written about macros many times here.
This extremely restricted world makes it simple to write the "evaluator": When the evaluator encounters an open parenthesis token, it expects the next token to be a symbol. If it isn't, it's a syntax error, otherwise, look up the symbol in the evaluator's known list of macros and generators, by name. If one is found, evaluate it immediately. If it isn't found, create a "reference" which we will hope to eventually resolve.
Generators output C or C++ code in the form of "string operations". These operations have various different flags such as "double quote" or "newline after", which are processed by the writer. The writer simply goes operation by operation, following its flags and outputting text into a file as requested.
defer was implemented
defer consisted of three major parts.
defer statement itself was implemented as a generator. The generator outputs the body of the defer into a splice.
Splices are special string operations that say, "output the array of string operations at this address". Splices accomplish a few things:
- They create "holes" that can be later filled. This is used by invocations where Cakelisp doesn't yet know whether you are trying to call a C function or a macro/generator that has not been defined yet. Cakelisp will generate everything in that state as if it were a C function call, then if the macro/generator is later defined, it will clear the splice's operations and replace it with the macro/generator output.
- They make it possible to change the output later. This enables code modification, which is when a function has already finished being generated, then a second pass is done at compile-time which rewrites that function with modifications. For example, GameLib has a compile-time function which rewrites every Cakelisp function to add performance profiling instrumentation.
- They create a place to stow code for other operations. This is how
defer generator outputs a single splice string operation with a flag telling the writer that it should output the contents of that splice on every scope exit.
Second, I needed to mark all the places where scopes enter and exit. I was worried this would be complex, but it turned out simpler than I expected. I had to audit all existing control flow generators (
for, etc.) and mark up their Open and Close operations as scope-entering and scope-exiting operations.
continue statements needed special markings.
Third, the writer needed to have a stack of scopes as well as discovered
defer splices. When a scope enter operation is encountered, it adds a scope to the stack. When a
defer is encountered, it adds a pointer to its splice to the current scope on the stack.
Scope exits are when the
defer statements need to be output. The writer has three different ways to handle scope exits:
- If the exit is "natural", e.g. the end of an
iftrue block is reached, the writer simply outputs all
defersplices in the current scope before the
ifblock's closing bracket.
- If the exit is from a
return, the writer must output the
defersplices for all scopes currently on the stack, because
returnexits all scopes.
- If the exit is from a
break, the writer outputs all
defersplices on all scopes until it hits a "continue breakable scope", which is the start of a
Finally, the writer pops the most recently entered scope off the stack to finish the exit.
One subtle detail is that the writer always outputs separate
defer splices in reverse order within the scope. This ensures that the first
defer is always the last to be executed, in case subsequent defers are dependent on it.
defer did make the writer more complex, but not significantly. I implemented it in the writer because I didn't want to add an extra evaluator stage; as implemented,
defer is very inexpensive in terms of performance during compile-time.
It is limited in that there is no compile-time place where the user could analyze the final code after
defer has been applied, then make changes to it. This is because it happens in the writing stage, which is after any compile-time code generation or modification can occur. I will keep it implemented as is until I find I need to do that, in which case it will need to be moved into an evaluator stage.
My work on distributed-automation was disturbed when I had problems with stale builds. I was trying to create an auto-update build for the distributed-automation worker on Windows, but the executable wasn't being updated.
Cakelisp used file modification times to decide whether an "artifact" (an executable, object file, etc.) needed to be rebuilt. If the source (a
.c file, header file, etc.) had a file modification time later than the artifact, the artifact is out of date and must be rebuilt.
The problem was that my Windows clock wasn't the correct time–it had drifted into the future.1 When I ran a build, all the artifacts were marked as being built at that future time. Once I set the clock to the correct time, no artifacts would be built, because they were already marked as being more recently modified than their source.
This might be obvious to someone who has already written a build system. I knew it was an issue when I wrote the timestamp system, but I figured the clocks were reliable enough that it wouldn't matter.
Now, Cakelisp takes the CRC of every source and header file and records it in a cache. On next build, Cakelisp checks the source files against the recorded CRCs. If they do not match, the artifact is rebuilt. This is slower and more cumbersome than just checking modified times, but is absolutely necessary if the modification times cannot be trusted completely.
I now invalidate artifacts if the CRC is different or the source has a newer timestamp. This lets the user e.g.
touch a file without changing its contents to force a rebuild, for whatever reason. I may remove all modification time code in the future, because it's not really providing value past this.
Neither of these features are flashy, but
defer is a big quality-of-life feature, and the CRC builds are an important fix for what was an untrustworthy build system.
I don't have anything specific planned for Cakelisp in the near future. I am still following the strategy where I only implement things when I have a pressing need for them, so I can't say what I'll do next.
The clock problem consistently happens because I dual-boot Windows and Linux on that machine. The two operating systems don't agree on how the hardware clock should keep time, so I must manually tell Windows to reset the clock to the network time after I've booted. I know I could solve this problem by configuring one or the other, but haven't gotten to it yet. It's good to have solved the problem with timestamps either way, because time in general shouldn't be relied on for this kind of system.↩︎