Tuesday, February 19, 2013

Faster XML search with the id() function

Photo by Jason Pearce
Consider the following XML document in an XForms instance:
<xf:instance id="my-book">
    <book id="simple_book">
        <chapter id="chapter_1">
            <title>Chapter 1</title>
            <para>Hello world!</para>
        <chapter id="chapter_2">
Some key elements in this document have id attributes which identify the element they are placed on. This means that if you have the value of an id, you can find the associated element. For example the following XPath expression finds the second chapter element:
//*[@id = 'chapter_2']
The downside of using plain XPath here is that if you have a very large document, the expression might have to scan large portions of the document, possily the entire document. This can be costly. In theory, the XPath engine can back the XML document with indexes, but this is typically done only by XML databases such as eXist.

However XPath specifies an id() function exactly for the purpose of improving on searching element with a given id. It is in fact very similar to the JavaScript getElementById() function now implemented natively (and efficiently) by web browsers.

The id() function is not strictly required to be faster than a plain document scan, but it can obviously be backed by an index (a hash map), and this is exactly how we just implemented it![1]
The index is enabled only for a specific instance with:
<xf:instance id="my-book" xxf:index="id">
When support is enabled for an instance, all ids are indexed the first time the id() function is called. Then, for each mutation of the document (insertion, deletion, replacement, change of value), the index is updated. [2] At this time, only elements with a local name of id are indexed. This covers plain id and xml:id attributes.[3]

You can use the id() function this way:
  ref="id('chapter_2', instance('my-book'))"
  origin="xf:element('chapter', xf:attribute('id', 'chapter_3'))"/>
This XForms action inserts a new empty chapter element with id chapter_3 after the chapter element with id chapter_2.

Form Builder benefits from this change a lot, because:
  • The entire form being edited is stored in an XForms instance.
  • All controls and other important elements are assigned ids.
  • Finding elements by id is a very common operation.
This is available in nightly builds and will be part of Orbeon Forms 4.0.1.

  1. Until this change, calling id() on regular (not readonly) instances in Orbeon Forms always returned an empty result. However, calling it on readonly instances already behaved as expected, as we are relying on the Saxon TinyTree implementation for readonly instances.  ↩
  2. The implementation is just slightly non-trivial, because the spec says that when more than one element has a given id, the first in document order must be returned. So we must properly track multiple ids for that reason.  ↩
  3. In theory, id() must support any attribute or element with type xs:ID but Orbeon Forms doesn’t implement this yet. In the future, we might also support indexing elements by other aspects, such as CSS classes or element names, as HTML does.  ↩

Monday, February 11, 2013

Orbeon Forms 4.0 RC1

Today we released Orbeon Forms 4.0 Release Candidate 1 (RC1)! Like 4.0 Beta 3 and the previous milestone builds this is not a final release, but we are really getting there!

Here is the overall list of changes since Beta 3:
  • Form Builder
    • Edit Source: don't allow empty document (#815)
    • IE9: Buttons editors in Builder cause crash (#812)
  • Form Runner
    • Currency/number prefixes don't show in view mode (#783)
    • PDF template: xf:output not supported (#792)
    • Include W-9 sample form (#791)
    • Fix possible XPath error when saving attachments (553c0d145c)
    • Samples: remove unneeded buttons (686b945264)
    • fr:grid doesn't evaluate remove-constraint in context of iteration (#795)
    • Controls form: i18n issue with section template (#738)
    • Controls form: Dynamic Data Dropdown needs some data (#650)
    • IE7: Navbar content misaligned in Noscript mode (#733)
    • Upgrade to TinyMCE 3.5.8 (09b185a35c)
    • Toolbar images of the Formatted Text are not shown, under proxy portlet, on IE9 (#798)
    • View mode: incorrect breaking of text (#804)
    • Bootstrap and Orbeon styles have effect outside the portlet (#805)
    • MySQL: free text search search doesn't return any results (#821)
    • Captcha doesn't show (#822)
    • Recaptcha doesn't work anymore (#825)
    • IE9: fr:us-phone shows "undefined" in field (#810)
    • Lots of small CSS fixes
  • XForms engine
    • Update of switch label with for updates wrong label (#793)
    • XPath error when hiding section (#799)
    • Chrome: JS error when going back to Form Runner home (#716)
    • Upload gets stuck on IE10 (#802)
    • Occasional NPE on portlet initialization (#819)
    • Can't focus on captcha (#823)
    • Setfocus on XBL with mode="focus" focuses on nested control (#824)
    • IE7/IE8: JavaScript placeholder with initial focus doesn't remove placeholder (#811)
  • Other
    • Coding error for socket timeout (#801)
More information is available in the in-progress release notes for 4.0.

You can download the builds using these links:
Don't forget to grab a trial license for the PE version.

Please send feedback: