Writing XML with S-expressions

By Macoy Madson. Published on .

In the late 1950's, Lisp was invented, which used the S-expression syntax style. It looks like this (in this case it is Cakelisp, the programming language I made):

(defun main (num-arguments int
             arguments ([] (* char))
             &return int)
  (fprintf stderr "Hello, Cakelisp!\n")
  (return 0))

In 1998 the XML specification was published. XML looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg width="100%" height="100%" xmlns="http://www.w3.org/2000/svg" version="1.1">
  <rect x="0" y="0" width="100%" height="100%" fill="#333333"/>
  <rect x="0in" y="0.000000in" width="5in" height="2in" fill="#aaaaaa"/>
  <rect x="2in" y="3.500000in" width="5in" height="2in" fill="#aaaaaa"/>
  <rect x="4in" y="7.000000in" width="5in" height="2in" fill="#aaaaaa"/>
</svg>

If you squint, you can see they both describe tree data structures. In the case of S-expressions, defun starts a tree with various branches like (num-arguments... and leaves like main. The XML has <svg which "holds" <rect elements.

The big difference for me is that S-expressions are much more tolerable to hand-write. They aren't hard to machine-parse either. One drawback to S-expressions when compared to XML is that the closing ) doesn't have as much validation as a named tag like </svg>, so it can be hard to determine the "end" of a tree node.

XML is a rather popular data format, while S-expressions have been almost exclusively been used in Lisp dialects. I don't understand why this happened, considering S-expressions have been around for much longer, and are easier to write by hand.

Whatever the history, I inevitably need to create XML documents for one reason or another, thanks to its popularity. I have a website which is written in HTML (which could be considered an XML dialect, or even vise versa), and I have SVGs that I want to generate for graphing.

XML in Cakelisp

I wrote a macro which generates XML documents from Cakelisp code. The XML example shown above can be generated with the following Cakelisp code:

(write-xml-in-format
   svg stderr
   (?xml
    :version "1.0" :encoding "UTF-8" :standalone "no"
    (svg
     :width "100%" :height "100%"
     ;; This is required to get the SVG to work in Firefox
     :xmlns "http://www.w3.org/2000/svg" :version "1.1"
     ;; Background
     (rect :x 0 :y 0 :width "100%" :height "100%" :fill "#333333")
     (each-in-range 3 i
       (rect :x (format-quote stderr "%din" (* i 2))
             :y (format-quote stderr "%fin" (* i 3.5f))
             :width "5in" :height "2in" :fill "#aaaaaa")))))

The write-xml-in-format invocation is a macro which takes an XML format (in this case, SVG) and an output (* FILE) handle. It parses tokens in its body, looking for XML tags and generating output functions for them.

If you compare the two, you'll notice the Cakelisp version has an each-in-range invocation which actually generates a <rect XML element on each iteration. This means I can freely mix the XML-specific tags with Cakelisp.

This is effectively a template engine for arbitrary XML formats. The format is for example what makes SVG different from HTML, which are both representable in XML. In Cakelisp a macro defines formats like so:

(define-xml-format svg
    "svg" "title" "!DOCTYPE" "?xml"
    "defs" "mask" "g"
    "rect" "path" "circle")

I define a list of tags that should be parsed as XML output rather than Cakelisp invocations. That's really all I need for my purposes. I could be more advanced and instead load an XML schema, but that would be significantly more complex to implement.

Ergonomics

This syntax is significantly more intuitive and convenient to write than other XML generation APIs. For example, here's a popular C++ library, RapidXML, being used to create an XML document1:

// Write xml file =================================
xml_document<> doc;
xml_node<>* decl = doc.allocate_node(node_declaration);
decl->append_attribute(doc.allocate_attribute("version", "1.0"));
decl->append_attribute(doc.allocate_attribute("encoding", "utf-8"));
doc.append_node(decl);

xml_node<>* root = doc.allocate_node(node_element, "rootnode");
root->append_attribute(doc.allocate_attribute("version", "1.0"));
root->append_attribute(doc.allocate_attribute("type", "example"));
doc.append_node(root);

xml_node<>* child = doc.allocate_node(node_element, "childnode");
root->append_node(child);

// Convert doc to string if needed
std::string xml_as_string;
rapidxml::print(std::back_inserter(xml_as_string), doc);

As you can see, it's much more boilerplate to generate a document that is actually even simpler than the example document from the beginning of this article.

The simple fact of the matter is that C++ does not allow you to customize the language to generate this data easily. You can make something horrible in an attempt to make it better, but it isn't going to be as convenient as the full-power macro version Cakelisp allows you to make.

If you must generate XML using C or C++, I would recommend using some form of code generation. Don't hand-write code a computer could write faster!

Conclusion

The full text of the macro can be found here. It's 146 lines of Cakelisp. While it did require a couple days of up-front investment, I now have the confidence to write XML without dragging my feet, which is pretty cool.


  1. The snippet is from this blog.↩︎