Wednesday, November 6, 2013

Saving memory by removing unneeded whitespace

Photo by Brad Montgomery
Orbeon Forms stores form definitions and form data in XML format. When using XML, it is customary to use new lines and indentation to make it easier for humans to read and write. Notice for example the following bit of empty form data:
<book>
    <details>
        <title/>
        <author/>
        <language/>
        <link/>
        <rating/>
        <publication-year/>
        <review/>
        <image
            filename=""
            mediatype=""
            size=""/>
    </details>
    <notes>
        <note/>
    </notes>
</book>
New lines and indentation fall in the category of so-called whitespace. Whitespace consists in anything that looks “blank”, including actual space characters and line breaks.

What we have noticed is that, especially for large forms, whitespace can take a significant amount of memory in compiled form definitions. So for Orbeon Forms 4.4, we looked into how we could improve that situation.

The trick is to remove whitespace where it is not needed, because in some cases you do want to keep at least some of it. For example you can remove most indentation and new lines, but consider this HTML fragment:
<p>This is a      <b>great</b>       moment.</p>
Some space after “a” and before “moment” needs to be there, but in most cases that space could be collapsed or normalized to a single space (there are exceptions). But it would certainly be wrong to remove all the spaces.

A different example is the HTML <pre> tag, within which all the whitespace should be preserved, including indentation and new lines.

To address this, we implemented a configurable whitespace stripper [1] for form definitions, with the intent of removing as much whitespace as we can while keeping it where it is needed.

The result is that for very large forms, it is possible to save over 20% memory for the compiled form definition compared with Orbeon Forms 4.3.1. One very large form definition had over 15 MB of waste due to whitespace!

These savings are especially important if you have a large number of distinct form definitions.
We still hope to improve on this in the future. For example it is probably possible to save memory within form data as well.

  1. If you are curious as to how it’s done, see Whitespace.scala, CharacterAccumulator.scala, and WhitespaceXMLReceiver.scala. The default configuration is done with CSS 3 selectors in properties-internal.xml.  ↩

4 comments:

  1. Hello,

    Is this feature available with xforms engine ?

    Thank You.

    Julien

    ReplyDelete
  2. Julien,

    Yes, this is an XForms engine feature.

    -Erik

    ReplyDelete
  3. Yes ! Good to hear !
    Do we need to do something to apply
    it ?
    (Except install orbeon 4.4 :)

    ReplyDelete