A performance-oriented Lisp-like language where I can have my cake, and eat it (too)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 

47 KiB

#+TITLE:Cakelisp Language

This document aspires to be one page of everything you need to know to learn the Cakelisp language. I liked how Zig is all on one page, so I'm going to emulate that here.

This doesn't cover the internal workings of the Cakelisp transpiler. See Internals.org for that information.

Cakelisp is not intended to be an easy language for beginners. You should already have a strong grasp of either C or C++, because many functions and idioms used are actually straight from C/C++. Cakelisp is intended for people who already know C/C++ but want more features on top of the language.

Additional notes on Cakelisp as a language, the origins of the name "Cakelisp", etc. are provided after the technical sections.

Running cakelisp

This document assumes you have already built cakelisp. Instructions for building are found in the ReadMe.

Cakelisp is a command-line tool. Usual operation will be either automated or part of your editor's compile commands. Running ./bin/cakelisp will print the help instructions. The output is the most up-to-date documentation on the command-line interface available.

The argument format is simple:

  • Any argument starting with -- is considered an option toggle, e.g. --execute. These are intended to assist in debugging or perform helpful operations. Note that options are not meant to change the generated output - your program's specification on how to build it must be entirely contained in .cake files

  • All other arguments are considered paths to .cake files which will be read in and evaluated

Modules

Cakelisp projects are organized into modules. Modules are usually associated with a single file. For example, I could have a module Math.cake which holds my math functions. When I say "module", I'm not referring to C++20's modules, which Cakelisp does not use.

Modules serve as a powerful organization mechanism. It's easy to see how a project all in one gigantic file will be harder to read and navigate than modules with suitable names. I like creating modules for features, not for concepts like "one class per file", which I think is actually counter to good organization.

Modules automatically provide both the interface and implementation files necessary. This means appropriate .h or .hpp header files are generated for the given generated .c or .cpp file.

The local keyword or suffix is typically relative to module scope. It tells Cakelisp that this function/variable/struct definition/etc. is not intended to be used outside the current module. Declaring module-local variables is a particularly clean way to let modules manage their own memory, without having to pass the module's data around to all its functions and all its callers. See Code Complete, 1st Edition, p. 128 "Module data mistaken for global data".

Importing modules

The import function adds the specified file to the environment:

(import "MyFile.cake" "AnotherFile.cake")

;; Include MyForwardDeclares.cake's generated header in the current module's generated header
;; You might need to do this if you have non-module-local types/signatures which rely on other types
(import &with-decls "MyForwardDeclares.cake")

;; Do not include in any generated code. This is essential for comptime-only modules, which won't
;; even generate headers
(import &comptime-only "ComptimeHelpers.cake")

By default, &with-defs is specified, meaning the generated header will be included in the generated source file only.

Files are evaluated the instant they are imported. If a file has already imported, it will not be evaluated again.

Circular imports are allowed because C/C++ generated headers will make it possible to build the generated code. Circular references are not allowed in macros or generators, because they cannot be built without having built the other.

C/C++ Imports

Thanks to speculative compilation, any C or C++ header may be included in Cakelisp files, and the header's functions and types may be used freely. This is in stark contrast to many other languages which require bindings, FFIs, etc. in order to call C code. It works just as well as a native C file. This eliminates any additional work needed to integrate C/C++ libraries. It also means there is no need to create a Cakelisp standard library, because you already have easy access to the entire C and C++ standard libraries!

This also means that adding Cakelisp to an existing C/C++ project should be virtually effortless. All of your existing code is ready to be used. Additionally, Cakelisp code compiles down to regular C/C++ code, so calling a Cakelisp function is as easy as calling a native C/C++ function. There's no boxing/unboxing, marshalling, type conversions, etc. necessary.

Here are some example imports:

(c-import "<vector>") ;; now just e.g. (var my-vec (<> std::vector int) (array 1 2 3))
(c-import "<cstdio.h>") ;; (printf "Hello %s!\n" "Cakelisp")
(c-import "MyHeader.hpp") ;; (call-on updateState myGlobalVar 0.016)

;; Multiple imports are allowed per call:
(c-import "<vector>" "<map>")

The destination of imports may be specified:

(c-import &with-decls "<vector>") ;; Make sure the header file (declarations) includes vector

;; Go back and forth between destinations freely:
(c-import &with-decls "toHeader.hpp" "anotherHeaderHeader.hpp"
          &with-defs "toSource.cpp")

By default, &with-defs is specified.

You shouldn't expect Cakelisp features to work with external C/C++ code. Features like hot-reloading or introspection aren't available to external code because Cakelisp does not parse any C/C++ headers. This doesn't mean you cannot call C/C++ code from a hot-reloaded Cakelisp function, it just means you cannot magically hot-reload the C/C++ code you're calling.

Types

Types are identical to types in C, but specified in an S-expression notation. Here are some example C++ types and their corresponding Cakelisp:

C/C++ Cakelisp
int int
int* (* int)
const int* (* (const int))
const int* const (const (* (const int)))
int x[] ([] int)
int x[5] ([] 5 int)
int x[4][4] ([] 4 ([] 4 int))
int x[][4] ([] ([] 4 int))
std::vector<int> (<> std::vector int)
std::map<std::string, int> (<> std::map (in std string) int)
int& (& int)
int&& (&& int)
int&& (rval-ref-to int)

Note that C++ scope resolution operator can be used or in can be used. The latter is preferable.

While this is more verbose than C types, they are much more easily parsed and constructed dynamically in this form.

To read C types properly, you must work backwards from the name and apply several heuristics. The parentheses do add more typing, but they're more clear, machine-parseable, and can be read naturally (e.g. read left to right "pointer to constant character" vs. C's "constant character pointer", which seems worse in my mind).

This form also handles arrays as part of the type: (var my-array ([] 5 int)) rather than int myArray[5];, another way it is more consistent, readable, and parsable.

You can use any C/C++ keywords like volatile, unsigned, struct, etc. in the same way that const is demonstrated above.

Functions

Functions are defined with defun. defun provides some variants via different invocations:

  • defun: Define a function which is intended to be public, i.e. exported in the header file

  • defun-local: Define a module-local function. This will add the static keyword to the definition in the final C/C++. Local functions are only callable in the same module

Here is an example:

  (defun add-ints (a int b int &return int)
    (return (+ a b)))

This function will become the following C code:

  int add_ints(int a, int b)
  {
    return a + b;
  }

The example function's signature will also be added to the header file so that it can be called by other Cakelisp modules as well as external C/C++ code.

Unlike Lisps, function returns must be explicitly specified via (return), unless the function has no &return (implicit void return).

Notice that argument names come first. I chose to swap the order of name and type because it places more emphasis on the name. A well-written program will convey more useful information in the name than in the type, so it makes sense to me to have it come first for the reader. This also applies to defstruct members, type-cast, var declarations, etc.

Variable arguments

The keyword ~&variable-arguments can be used to create a function with variadic arguments:

  (c-import "<stdio.h>" "<stdarg.h>")

  (defun varargs (num-args int &variable-arguments)
    (var list va_list)
    (va_start list num-args)
    (each-in-range num-args i
      (printf "%d\n" (va_arg list int)))
    (va_end list))

  (defun main (&return int)
    (varargs 3 1 2 3)
    (return 0))

Variables

The following invocations will declare variables:

  • var: Module- or body-scope local. This is the most-used variable type

  • global-var: Only valid in module-scope. Defines a variable accessible to any module which imports the module with the definition

  • static-var: Only valid within functions. Defines a static variable, i.e. a variable which holds its value even after the function's stack frame is popped

Use set to modify variables:

(var the-answer int 0)
(set the-answer 42)

Arrays have the same syntactic sugar as C, e.g.:

(var my-numbers ([] int) (array 1 2 3))

…is a better way than

(var my-numbers ([] 3 int) (array 1 2 3))

…because the compiler will automatically determine the size.

Type aliases

Aliases can be created for types. Internally, this uses typedef. For example:

;; This will save us a lot of typing!
(def-type-alias FunctionReferenceArray (<> std::vector (* (* void))))
;; Build on it!
(def-type-alias FunctionReferenceMap (<> std::unordered_map std::string FunctionReferenceArray))
;; Declare a variable using our typedef
(var registered-functions FunctionReferenceMap)

By default, type aliases are module-local. Use def-type-alias-global if you want any module which imports the module with the alias to be able to use it.

Function pointers

The syntax for function pointers is shown in HotLoader.cake:

  ;; Currently you must define the signature so the type is parsed correctly
  ;; In this case, bool (*)(void)
  (def-function-signature reload-entry-point-signature (&return bool))
  (var hot-reload-entry-point-func reload-entry-point-signature null)

  ;; An example of a function which takes any type of function pointer, hence the cast
  (register-function-pointer (type-cast (addr hot-reload-entry-point-func) (* (* void)))
                             "reloadableEntryPoint")

Once set, that variable is called just like a function:

  (hot-reload-entry-point-func)

If you wanted to define a function pointer which could point to int main(int numArguments, char* arguments[]), for example:

  (def-function-signature main-signature (num-arguments int
                                          arguments ([] (* char))
                                          &return int))
  (var main-pointer main-signature (addr main))

Expressions and Statements

Use the argument --list-built-ins to see an up-to-date list of all possible expressions and statements.

Special symbols

  • null: Does the language-correct thing for null, e.g. nullptr in C++ and NULL in C. This is the only thing in Cakelisp which does something outside generated code but is not an invocation (i.e. doesn't require parentheses)

  • true and false are processed as regular symbols

Control flow, conditionals

  • while:

  • for-in:

  • continue:

  • break:

  • return:

  • if

  • cond

  • when:

  • unless:

  • array: Used for initializer lists, e.g. (var my-array ([] int) (array 1 2 3)). Without arguments, equals the default initializer, e.g. (array) becomes {} in generated code

  • set: Sets a variable to a value, e.g. (set my-var 5) sets (var my-var int) to 5

  • block: Defines a scope, where variables declared within it are limited to that scope

  • scope: Alias of block, in case you want to be explicit. For example, creating a scope to reduce scope of variables vs. creating a block to have more than one statement in an (if) body

  • ?: Ternary operator. For example, the expression (? true 1 2) will return 1, whereas (? false 1 2) returns 2. Handy for when you don't want to use a full if statement, for example

Pointers, members

  • new: Calls C++ new with the given type, e.g. (new (* char)) will allocate memory for a pointer to a character

  • deref: Return the value at the pointer's address

  • addr: Take the address of a variable/member

  • field: Access a struct/class member. For example, with struct (defstruct my-struct num int), and variable (var my-thing my-struct), access num: (field my-thing num)

  • call-on: Call a member function. For example, if I have a variable my-bar of type Bar with member function do-thing, I can call it like so: (call-on do-thing my-bar arg-1 arg-2)

  • call-on-ptr: Like call-on, only it works on pointers, e.g. (var my-pointer-to-bar (* Bar) (addr a-bar)), call its member: (call-on-ptr do-thing my-pointer-to-bar arg-1 arg-2). These can be nested as necessary

  • call: Call the first argument as a function. This is necessary when you can't type the function's name directly, e.g. it is a function pointer. For example, to call a static member function: (call (in my-class do-static-thing) arg-1 arg-2)

  • in: Scope resolution operator (::). Used for both namespaces and static member access. For e.g. (in SuperSpace SubSpace Thing) would generate SuperSpace::SubSpace::Thing. in may be used within type signatures

  • type-cast: Cast the variable to given type, e.g. (var actually-int (* void) (get-stored-var-pointer "my-int")) could become an int via (type-cast actually-int (* int))

  • type: Parse the first argument as a type. Types are a domain-specific language, so the evaluator needs to know when it should use that special evaluation mode

Logical expressions

  • not: Inverts the boolean result of the argument. (not true) equals false

The following take any number of arguments:

  • or:

  • and:

  • =:

  • !=:

  • eq: Alias of =

  • neq: Alias of !=

  • <=:

  • >=:

  • <:

  • >:

Arithmetic

The following operators take any number of arguments:

  • +:

  • -:

  • *:

  • /:

  • %: Modulo operator. Returns the remainder of the division, e.g. (% 5 2) returns 1

  • mod: Alias for %

The following modify the argument:

  • ++: Add 1 to the argument and set it

  • incr: Alias for ++

  • --: Subtract 1 from the argument and set it

  • decr: Alias for --

Bitwise

  • bit-or:

  • bit-and:

  • bit-xor:

  • bit-ones-complement:

  • bit-<<: Left-shift. E.g. (bit-<< 1 1) shifts 1 to the left once, which in binary becomes 10, or 2 in decimal

  • bit->>: Right-shift. E.g. (bit->> 2 1) shifts 2 to the right once, which in binary becomes 1, or 1 in decimal

Tokens

Tokens are what Cakelisp becomes after the tokenizer processes the text. The Evaluator then reads Tokens in order to decide what to do. Only generators and macros interact with Tokens.

Unlike Lisp, Tokens are stored in flat, resizable arrays. This helps with CPU cache locality while processing Tokens. It does mean, however, that there is no abstract syntax tree. Functions like getArgument() and FindCloseParenTokenIndex() help with interacting with these arrays.

Once some text has been parsed into Tokens, the Token array should be kept around for the lifetime of the environment, and should not be resized. Other things in the Evaluator will refer to Tokens by pointers, so they must not be moved.

Compile-time code execution

There are four major types of compile-time code execution:

  • Macros: Input is tokens, output is tokens

  • Generators: Input is Cakelisp tokens, output is C/C++ code. Generators output to both header (.hpp) and source files (.cpp). All built-ins are generators, though some generators don't output anything, and instead modify the environment in some way

  • Hooks: Cakelisp provides opportunities to run compile-time functions at various stages in the process. For example, the pre-link hook can be used to add link arguments. The post-references-resolved hook is when code modification and arbitrary code generation can occur.

    Each hook has a required function signature. Cakelisp will helpfully output the signature it expected if you forget/make a mistake

  • Compile-time functions: Functions which can be called by other compile-time functions/generators/macros. Used to break up any of the three types above as desired. Declared via defun-comptime, but otherwise are like defun declaration-wise

Destructuring signatures

Macros and generators use a special syntax for their signatures. For example:

  (defmacro get-or-create-comptime-var (bound-var-name (ref symbol) var-type (ref any)
                                        &optional initializer-index (index any))
    (return true))

Notice that the signature does not look like defun signatures. This is because under the hood, all macros and generators have the same signatures corresponding to their types. defmacro and defgenerator overload the second argument (the first argument after the name of the macro/generator) to "destructure" arguments from the tokens received.

Let's go argument-by-argument for the above signature:

  • bound-var-name (ref symbol): A C++-style reference to a Token (const Token&) of type Symbol is required to run this macro. If the user passed in a String, the macro would fail to be invoked. (ref) denotes a binding to a Token, while symbol determines the type of token we expect.

  • var-type (ref any): Like bound-var-name, only this will take the second argument to the macro invocation, and it will accept any type of token. We use any here because types could start with ( or be a single symbol

  • &optional initializer-index (index any): This time, we need the index into the array of tokens. There are a couple reasons to require an index binding. In this case, we cannot use (ref) because the argument is marked &optional (references cannot be made in C++ if they could be null). If the argument is present, the any type means we don't need to perform token type checking. If the argument is omitted, the variable will be set to -1

There are several different "binding types" which dictate the local variable's type in your macro/generator body:

  • index: Indicate the start of the argument via an index into the tokens array. -1 if not set (allowed only if &optional)

  • ref: Set a reference to the argument's first token in the tokens array

  • arg-index: Set a variable with the index of the argument itself. Note that arguments start at 1 because the token at 0 is always the invocation. Can be -1 if the argument was &optional and unspecified. arg-index is mainly useful for CStatementOutput, which takes argument indices instead of token pointers/indices

  • <unspecified>: Set a pointer to the argument's first token. May be null if the argument is &optional and unspecified

If we do not specify (ref) nor (index), the implicit binding type is a pointer to a Token, which is perfect for (token-splice). For example, we could say (bound-var-name symbol) to get a single argument of type symbol which is bound to a Token pointer.

If you want to get an unlimited number of arguments, specify &rest before the final argument. The final argument will be the first of the rest of the arguments. Also specify &optional if you expect zero or more arguments.

The available types to validate against are as follows:

  • any: Do not validate the type. This is useful when your macro/generator accepts a variety of types, or needs to verify the type based on some condition specific to your use-case

  • string: Accept only strings. Note that the contents of the token does not have " like the invocation does in text, e.g. (my-macro "A string") will set the bound var to a token with contents A String

  • symbol: Accept only symbols. Symbols are anything that isn't one of the other types (open/close parens, strings). This includes constants like 4.f, Symbols which aren't valid names, like *, "special symbols" like 'Thing or :thing, etc.

  • array: Expect a "list" of things, e.g. (1 2 3) or (my dsl-symbol (nested thing)). This is called array becauses it is stored as a flat array, not a linked list or tree. You can use FindCloseParenTokenIndex() or FindTokenExpressionEnd() to find the final token in the array (the closing paren)

Note that you have unlimited control over how you process the provided tokens array - the destructuring signature is provided only as syntactic sugar/convenience. If you have a macro/generator which has a signature which cannot be defined with destructuring (e.g. morphs types, number of arguments, etc. based on first argument), you can still implement it, but you will need to operate using the implicitly-provided tokens and startTokenIndex directly.

Here's an invocation of that macro:

(get-or-create-comptime-var modified-vars bool false)

The binding would result like so:

  • bound-var-name would hold a validated reference to token of type symbol with contents "modified-vars"

  • var-type would hold a reference to token of type symbol with contents "bool"

  • initializer-index would hold an index to a token equal to "false" , accessible via (at initializer-index tokens) (but the code should only perform that lookup if (!= -1 intializer-index))

We could output a variable declaration like so:

  (var (<> std::vector Token) initializer)
  (when (!= -1 initializer-index)
    (tokenize-push initializer (token-splice-addr (at initializer-index tokens))))
  (tokenize-push output
                 (var (token-splice-addr bound-var-name) (token-splice-addr var-type)
                      (token-splice-array initializer)))

Macros

Macros are defined via defmacro. The macro function signature is implicitly added by defmacro. This means that any arguments passed to MacroFunc are in the scope of defmacro. The signature is as follows:

typedef bool (*MacroFunc)(EvaluatorEnvironment& environment, const EvaluatorContext& context,
                          const std::vector<Token>& tokens, int startTokenIndex,
                          std::vector<Token>& output);

The purpose of macros is to take inputs from tokens starting at startTokenIndex (the open parenthesis of this macro's invocation) and create new tokens in output which will replace the macro's invocation.

Macros must return true or false to denote whether the expansion was successful. The more validation a macro has early on, the fewer compilation errors the user will have to deal with if the macro output is erroneous.

tokenize-push

tokenize-push is the main "quoting" function. The first argument is the output variable. output is passed in to defmacro automatically, but you can define other token arrays like so:

  (var my-tokens (<> std::vector Token))

tokenize-push copies all source tokens directly to the output until it reaches one of the token* functions. These functions tell the tokenizer to unpack and insert the tokens in the variables rather than the symbol which is the variable name. Unless otherwise specified, these take any number of arguments:

  • token-splice: Given a token's address, insert a copy of that token. If the token is an open parenthesis, insert the whole expression (go until the closing parenthesis is found)

  • token-splice-addr: Like token-splice, only it automatically takes the address of the given arguments

  • token-splice-array: Given an array of tokens, insert a copy of all tokens in the array

  • token-splice-rest: Given a token's address and token's source array (usually tokens), output all expressions. It stops once a closing parenthesis is reached that wasn't counted, or the end of the source array is reached. Accepts only one token argument

The following is an example of tokenize-push:

  (tokenize-push output
                 (defstruct (token-splice (addr struct-name))
                   (token-splice-array member-tokens)))

Where struct-name is a Token and member-tokens is a array of tokens.

The output would look like this:

(defstruct my-struct a int b int)

Generators

Generators output C/C++ source code to both header and source files. All Cakelisp code eventually becomes generator invocations, because only C/C++ code can actually perform work. If this were a true machine-code compiler, you could imagine generators as functions which take language statements and turn them into machine code instructions. In Cakelisp's case, it turns them into C/C++ expressions.

Generators are defined via defgenerator. The generator function signature is implicitly added by defgenerator. This means that any arguments passed to GeneratorFunc are in the scope of defgenerator. The signature is as follows:

typedef bool (*GeneratorFunc)(EvaluatorEnvironment& environment, const EvaluatorContext& context,
                              const std::vector<Token>& tokens, int startTokenIndex,
                              GeneratorOutput& output);

Given input starting at tokens[startTokenIndex], output relevant C/C++ code to output.

Generators must return true or false to denote whether the output was successful.

See GeneratorHelpers.hpp. All of these functions are available to Generator definitions. Of particular relevance are the add*Output functions. These allow you to directly output C/C++ source code.

Additionally, the Expect functions are quick ways to validate your inputs. They will write an error if the expectation isn't met.

Generators.cpp serves as a good reference to how generators are written. However, they are rather verbose because they don't use any macros and have extensive validation. Generators written in Cakelisp can be much more compact thanks to macros.

Why use generators instead of macros?

defgenerator opens the door to any C/C++ feature, even non-built-in features like custom code generation annotations or documentation comment strings. If you encounter a feature not in Cakelisp but in C/C++, you can write a generator to gain access to it.

A big advantage of this is that you now get to decide how you want the syntax to work - if you don't like switch implicitly falling through, you can make your generator automatically insert break.

Macros can only output code which eventually calls generators. Generators output arbitrary text directly to C/C++ source and header files. Generators are primarily for gaining access to features missing in Cakelisp's built-ins.

In practice, you should try to write macros when possible in order to leverage Cakelisp maximally. If you wrote a generator which lets you input arbitrary C/C++, you would lose all the power gained by features like code modification, because generator output cannot be trivially parsed like macro output can.

Build system

Cakelisp's build system is powerful enough at this point to serve as a general-purpose C/C++ build system, even if you aren't using Cakelisp for any runtime code.

Basic projects don't need any build customization at all. Cakelisp uses its module system to automatically determine how to link .cake files together and build them.

Example: Bootstrap

For example, Cakelisp itself consists of C++ code. Bootstrap.cake builds Cakelisp, and serves as a good demonstration of the build system. I'll explain it here.

(skip-build)

This indicates the current module should not be built, nor be linked into the final executable. Bootstrap.cake doesn't contain any runtime code, so we omit it. Modules which contain only compile-time functions like macros should also skip-build.

(set-cakelisp-option executable-output "bin/cakelisp")

This changes the location where the final executable is output. Note that if you don't have a (main) function defined, you can change this output to e.g. lib/libCakelisp.so to output a dynamic library (on Linux).

(add-c-search-directory-module "src")

It is good practice to refer to files without any directories in the path. This helps future developers if they need to relocate files. In this case, we add src to the module search paths, which means only this module and its dependencies will have that search path.

If global is specified instead, all modules and build dependencies would include the search path. Generally, you should try to use module only, because it lessens the chances of unnecessary rebuilds due to command signature changes, and is one less directory for the compiler to search.

(add-cpp-build-dependency
 "Tokenizer.cpp"
 "Evaluator.cpp"
 "Utilities.cpp"
 "FileUtilities.cpp"
 "Converters.cpp"
 "Writer.cpp"
 "Generators.cpp"
 "GeneratorHelpers.cpp"
 "RunProcess.cpp"
 "OutputPreambles.cpp"
 "DynamicLoader.cpp"
 "ModuleManager.cpp"
 "Logging.cpp"
 "Build.cpp"
 "Main.cpp")

When the build system reaches this module, it should also build the files in this list. This mechanism allows you to use Cakelisp as a build system for pure C/C++ projects, and makes it easier to integrate Cakelisp in projects which are partially C/C++.

These dependencies will be built with the same compilation command as the module. They will be built in the cache along with the Cakelisp-generated files, and will have all the same cache-validity checks as Cakelisp-generated files.

(add-build-options "-DUNIX")

Add an argument to the compilation command. In this case, we need to specify an operating system so that the correct system calls are used.

You can specify multiple options. For example, we could set a debug build with warnings as errors like so:

(add-build-options "-g" "-Werror")

These options are appended to the default or module-overridden build command.

;; Cakelisp dynamically loads compile-time code
(add-library-dependency "dl")
;; Compile-time code can call much of Cakelisp. This flag exposes Cakelisp to dynamic libraries
(add-linker-options "--export-dynamic")

add-library-dependency adds dynamic libraries to the list of dependencies.

Note that add-library-dependency will attempt to modify the given library names in a platform-independent way. For example, if you pass in "dl", here is how it would change:

Linker Modified
link.exe dl.dll
cl.exe dl.dll
Anything else -ldl

Note that on Linux, dynamic libraries are named e.g. libdl.so, then requested via e.g. -ldl. Windows' MSVC typically names dlls simply dl.dll. Cakelisp takes dl and tries to do the right thing for each platform. If it's not working, use add-compiler-link-options to provide the exact format you need, and it will not be converted.

add-linker-options passes the given options to the linker itself, not the compiler which invokes the linker. For example, g++ -o will not get --export-dynamic, rather, ld will get it due to -Wl automatically being prepended by add-linker-options. If you want to pass arbitrary options to the compiler invoking the linker, use add-compiler-link-options.

The following are also related to linker configuration:

  • add-library-search-directory

  • add-library-runtime-search-directory: Adds given strings to rpath, which tells Unix systems where to look for dynamic libraries. Note that this does not work on Windows, which requires special treatment for DLL loading. Figuring out how to handle this in Cakelisp is TBD

;; Use separate build configuration in case other things build files from src/
(add-build-config-label "Bootstrap")

This configuration label ensures Cakelisp itself doesn't get affected by your runtime programs. It does this by using a separate folder in the cache.

Procedural command modification

There may be cases when you need to do complex logic or modifications of the link command. We use a hook to give us a chance to do so.

(defun-comptime cakelisp-link-hook (manager (& ModuleManager)
                                    linkCommand (& ProcessCommand)
                                    linkTimeInputs (* ProcessCommandInput) numLinkTimeInputs int
                                    &return bool)
  (Log "Cakelisp: Adding link arguments\n")
  ;; Dynamic loading
  (call-on push_back (field linkCommand arguments)
           (array ProcessCommandArgumentType_String
                  "-ldl"))
  ;; Expose Cakelisp symbols for compile-time function symbol resolution
  (call-on push_back (field linkCommand arguments)
           (array ProcessCommandArgumentType_String
                  "-Wl,--export-dynamic"))
  (return true))

(add-compile-time-hook pre-link cakelisp-link-hook)

(add-compile-time-hook pre-link cakelisp-link-hook) adds the hook, then cakelisp-link-hook is invoked before link time.

Hook order of execution can be changed via the optional argument :priority-increase <int> or :priority-decrease <int>. For example:

(add-compile-time-hook-module pre-build second-hook)
(add-compile-time-hook-module pre-build third-hook :priority-decrease 3)
(add-compile-time-hook-module pre-build first-hook :priority-increase 1)

Will run First, Second, then Third, because Second starts at default priority (0), Third is decreased in priority (runs later), and First is increased in priority, so runs earlier.

Build commands

The environment comes with default commands (defined in src/ModuleManager.cpp). Build commands can be overridden to whatever process you choose, with the structure you choose. For example, the linker can be changed like so:

(set-cakelisp-option build-time-linker "g++")
(set-cakelisp-option build-time-link-arguments
                     "-o" 'executable-output 'object-input
                     "-ldl" "-lpthread" "-Wl,-rpath,.,--export-dynamic")

'executable-output and 'object-input determine slots where the build system will insert arguments specified dynamically, or from other Cakelisp invocations.

The compiler command has more of these slots:

  • 'source-input: Created by Cakelisp, e.g. cakelisp_cache/default/Generated.cake.cpp

  • 'object-output: Created by Cakelisp, e.g. cakelisp_cache/default/Generated.cake.cpp.o

  • 'include-search-dirs: Constructed from add-c-search-directory - a combination of global and module search directories. module search directories are searched first

  • 'additional-options: The list of options from add-build-options

The following commands can be overridden:

  • compile-time-compiler

  • compile-time-compile-arguments

  • compile-time-linker

  • compile-time-link-arguments

  • build-time-compiler

  • build-time-compile-arguments

  • build-time-linker

  • build-time-link-arguments

You want compile-time-compiler to match the platform of the system which is running Cakelisp. You can set build-time-compiler to match the target platform, e.g. a cross-compiler.

Using set-cakelisp-option overrides the global commands. set-module-option allows commands to be changed on a per-module basis.

The following commands can be overridden per-module:

  • build-time-compiler

  • build-time-compile-arguments

Build configurations

Build configurations allow you to easily manage multiple different versions of a program or collection of programs while still utilizing the Cakelisp cache. They could be different based on target platform, compilation settings, etc.

Build configurations are constructed "lazily", meaning all you need to do to create a new configuration is make the necessary changes to the environment and add a unique label.

For example, a build configuration Debug-HotReloadable could be constructed via:

  • Overriding the build command via (set-cakelisp-option build-time-compile-arguments ...), adding debug flags. (add-build-config-label "Debug") and that's all needed to create the Debug configuration

  • Importing HotReloadingCodeModifier.cake, which adds (add-build-config-label "HotReloadable"). This is important because hot-reloadable builds are different from regular builds - they expect their variables to be initialized by the loader, and a dynamically linked library is created instead of a standalone executable

This gives the user the ability to make their configurations as complex as they want, without having to face any additional/introductory complexity. For example, we could add processor architecture, operating system, and C standard library selections to our configurations, if necessary. A/B comparisons between runtime performance could also be done easily, just by adding a label to the alternate. If you are just writing a quick one-off script, you need not worry about configurations at all.

Because all options must be provided in Cakelisp files, it encourages composable configurations. For example, we could take the Debug configuration from above and put it in Config_Debug.cake, then import it and build the program via cakelisp Config_Debug.cake MyProgram.cake.

Cache validity

The C/C++ compilation time dominates the total time from .cake to executable. In order to minimize this, Cakelisp maintains a cache of previously built "artifacts" and reuses them when possible.

It is critical that the cache does not become stale. To the developer, a stale cache results in confusion, because the developer might have made a change but does not see the change reflected in the output. Cakelisp's build system errs on the side of caution at the cost of build time performance to ensure this doesn't occur.

If things are being rebuilt unnecessarily, add the option --verbose-build-reasons. This will tell you why Cakelisp thinks it may be holding a stale artifact.

If you are building several different executables/libraries, you may need to separate them into different build configurations via add-build-config-label, because these targets may be building the the same artifact differently. Each build configuration is stored separately.

The following things are checked before a cached artifact is used (not all are relevant to all types of artifacts):

Command signature

When a compile command changes from e.g. g++ to clang++, all affected files will be recompiled. The entire command is checked, so adding additional warnings, search directories, etc. will invalidate cache files, because these could change what gets built.

Modification time

If e.g. a .cpp source file is more recently modified than it's cached .o, the .o file will be invalid, and the source file will be rebuilt.

If any .o files are newer, the executable/library will be re-linked.

Note that the build system only inspects generated .cake.cpp files, not .cake files themselves. This gives you the freedom to add comments, reformat whitespace, etc. without causing rebuilds. If you do want to force a rebuild of a single file for whatever reason, touch or delete the corresponding generated .cake.cpp file in the cache.

Includes modification times

It is essential to recursively scan the #include files of all source files to determine if any of the headers have been modified, because changing them could require a rebuild. For example, if you change the size or order of a struct declared in a header, all source files which include that header now need updated sizeof calls.

This is somewhat complex and expensive, but must be done every time a build is run, just in case headers change.

If this step was skipped, it opens the door to very frustrating bugs involving stale builds and mismatched headers, which usually result in strange segmentation faults and other crashes.

It does have some nice properties: if you update a 3rd-party library, Cakelisp will automatically determine which files need to be rebuilt based on which headers in that library changed.

Building "clean"

If you want to test a clean build, i.e. one which does not use any existing artifacts, you can do either of the following:

  • Delete the cakelisp_cache directory in the same working directory you have been executing cakelisp

  • Pass the --ignore-cache argument, which will cause all artifacts to be marked stale and invalid

Runtime

The runtime/ folder offers a variety of .cake modules which may be useful to you.

C and C++ helpers

It is intended to keep the C++ part of the language (i.e., code in src/) small. runtime/ has "missing" C constructs like for implemented in Cakelisp. CHelpers.cake may be used on C++ projects, because all C is compatible. As a result, only things that are C++-specific are implemented in CppHelpers.cake.

The idea is that language features like for or switch can be replaced with while and if to keep the core small. Any of those additional language features can be implemented in new generators using the more minimal core.

Hot-reloading

Hot-reloading is implemented entirely in user-space, i.e. outside the core of Cakelisp. This shows the power of compile-time code execution - major language features can be added without touching the language itself.

Compile-time functions

Various other files in runtime/ assist in writing macros and generators faster. There are also utilities for accessing Cakelisp's process execution system during compile-time, which is useful for inserting custom pre-build steps (etc.).

External tools

Additional notes on Cakelisp

Lisp users may be disappointed by Cakelisp's relative impurity. I took ideas I liked and left out a lot of core Lisp concepts. For example, functions only return a value with an explicit return statement. Any sort of immutability or non-sequential assumptions are also out (though you can use C/C++ const). You should think about Cakelisp more as "C with parentheses" than "Lisp that outputs to C". Of course, it's more than just added parentheses (macros are the primary addition), but you get the point.

This means that except in the case of macros, reading The C Programming Language is going to help you learn Cakelisp better than reading Practical Common Lisp.

A note on the name

I thought of the name because the idiom "you can't have your cake and eat it too" was in my head while researching Lisp languages. It is about having both the power of Lisp languages (having cake) and high performance (eating it too).

Admittedly, it is a bit of a misnomer, because Cakelisp is in no way compatible with Common Lisp. The name "Cakelisp" brings some baggage that makes what Cakelisp actually is less clear. A more accurate name would be CakeC, because the actual mechanics of the language are much closer to C/C++. I expect programmers would call it Lisp if they saw it, simply because the code uses S-expression syntax and borrows some keywords (defun, defmacro, when, unless, etc.).

Regardless, the name has excellent characteristics for finding it via search engines, so I'm keeping it.

The combination is pronounced the same as the individual words one after another, i.e. "cake lisp", not "cakel isp", "cak e-lisp" or anything like that. Someone who uses Cakelisp is a "Cakelisp user", not a "caker", nor "baker", nor "Cakelisper".

It's my personal preference that puns off of "cake" when naming programming things don't become a thing. Please don't name your thing something cleverly cake- or baking-related. Name it what it is or what it does. Of course, I'm a hypocrite for not naming Cakelisp "Lisp-to-C-Transpiler (with a bunch of other features)".

C or C++?

Cakelisp itself is written in C++. Macros and generators must generate C++ code to interact with the evaluator.

However, you have more options for your project's generated code:

  • Only C: Generate pure C. Error if any generators which require C++ features are invoked

  • Only C++: Assume all code is compiled with a C++ compiler, even if a Cakelisp module does not use any C++ features

  • Mixed C/C++, warn on promotion: Try to generate pure C, but if a C++ feature is used, automatically change the file extension to indicate it requires a C++ compiler (.c to .cpp) and print a warning so the build system can be updated

Note: The ability to output only C is not yet implemented.

I may also add declarations which allow you to constrain generation to a single module, if e.g. you want your project to be only C except for when you must interact with external C++ code.

Generators keep track of when they require C++ support and will add that requirement to the generator output as necessary.

Hot-reloading won't work with features like templates or class member functions. This is partially a constraint imposed by dynamic loading, which has to be able to find the symbol. C++ name mangling makes that much more complicated, and compiler-dependent.

I'm personally fine with this limitation because I would like to move more towards an Only C environment anyway. This might be evident when reading Cakelisp's source code: I don't use class, define new templates, or define struct/class member functions, but I do rely on some C++ standard library containers and & references.

Why S-expressions?

The primary benefit of using a Lisp S-expression-style dialect is its ease of extensibility. The tokenizer is extremely simple, and parsing S-expressions is also simple. This consistent syntax makes it easy to write macros, which generate more S-expressions.

Additionally, S-expressions are good for representing data, which means writing domain-specific languages is easier, because you can have the built-in tokenizer do most of the work.

It's also a reaction to the high difficulty of parsing C and especially C++, which requires something like libclang to sanely parse.