Wednesday, June 29, 2016

New indexes boost performance with relational databases in Orbeon Forms 2016.2


When you create a form with Form Builder and use Orbeon Forms' implementation of the persistence API for relational databases, data collected by the form at runtime is stored in a set of 4 generic predefined tables. The tables are generic in the sense that they are not form-specific. This has a number of benefits:
  • It gives more control to DBAs: they can create the tables when Orbeon Forms is installed, and since at runtime Orbeon Forms doesn't need to have the permission to create tables, this allows DBAs to have a full knowledge of the database schema, and to optimize its storage or create additional indexes, should they wish to do so.
  • It keeps the database schema simple, and uniform across deployments. (Contrast this to an approach where you would have one table per form, and per version thereof.) Because of this, no "garbage collection" of unused tables (unused form versions) is necessary, upgrades to new versions of Orbeon Forms that require a change to the database schema are simpler, all the operations performed by Orbeon Forms can be part of a transaction (not all databases support DDL to be part of a transaction), and more.
For the tables to be generic:
  • Metadata (such as the form data id, the data owners, the time of the last change), which is the same for all forms, is stored in regular columns.
  • Values entered by users in your form are stored as XML, using a single "generic" xml column.
This works extremely well for operations where Orbeon Forms needs to either read or store all the values for a given instance of a form, like when users save or open form data. However, things gets more complicated when you want to do reporting on the submitted data, like:
  • For a given form, in a table, show the values of 5 significant fields for the last 50 submissions.
  • For a given form, find form data where the field last name has the value Smith.
Those operations are performed by summary pages, and the search API. In Orbeon Forms 2016.1 and earlier, they required SQL to extract data from the XML stored in the relational database. Unfortunately:
  • Some databases only have minimal support for XML.
  • Even for those databases with good support:
    • Searching and extracting values from XML data still isn't as fast as if the data was stored in "regular columns".
    • Optimizing those queries on XML data is highly database-dependant, and requires the creation of indexes or other techniques that DBAs are not necessarily familiar with.
Because of this, in Orbeon Forms 2016.1 and earlier, the performance of summary pages and the search API degraded as more data was added to the database. To solve this problem, Orbeon Forms 2016.2 introduces index tables. Data is still stored as XML as it was before, but the subset thereof needed for summary pages and the search API is also stored in the index tables. As a result, summary pages and the search API never need to access data stored in XML, and can thus run much faster.

This also means that the values in the index tables need to be kept up-to-date by Orbeon Forms. This happens automatically as data gets saved, or forms deployed. However, if you're upgrading to 2016.2, for the summary pages and the search API to work properly you'll need to first populate those indexes, in an operation referred to as reindexing. For more on this, see how to reindex your Orbeon Forms database when upgrading to 2016.2.

Thursday, June 23, 2016

Saying farewell to HTML tables

A very important underlying feature of Orbeon Forms is the grid. A grid groups controls in rows and columns, and can optionally repeat rows.

When we first implemented grids many years ago, we used tables for the underlying HTML markup. In HTML, tables serve two main purpose:
  1. Presenting tabular data (which is their original purpose).
  2. Laying out other content.
The reason that tables got to be used for laying out content is that they offered layout features which no other HTML and CSS construct supported. Over time, however, CSS has introduced most of the necessary support, in particular display: table and related, as well as the more recent Flexible Box Layout (also known as "flexbox").

Because assistive technology such as screen readers usually describe tables to the user as being tables when reading a page, there is widespread agreement that one should not use tables for purposes other than presenting tabular data so as to not confuse the user.

So in the upcoming Orbeon Forms 2016.2 we have changed the layout of grids to no longer use HTML tables when possible.

There are two cases where we still use HTML tables:
The case of row spans is hopefully a temporary limitation: it is just plain harder to emulate row spans with pure CSS, and we hope that this won't be a big issue because the use of row spans in Orbeon Forms is fairly rare. Further, we now place the ARIA presentation role on such tables to help screen readers.

For the case of repeated grids, using HTML tables makes some sense because the data is actually presented in tabular form with headings, rows and columns.

The specific HTML layout is documented here.

As an aside, we have also removed the use of HTML tables in the error summary component.

We hope you like these changes which will be available in Orbeon Forms 2016.2!