Wednesday, June 29, 2016

New indexes boost performance with relational databases in Orbeon Forms 2016.2


When you create a form with Form Builder and use Orbeon Forms' implementation of the persistence API for relational databases, data collected by the form at runtime is stored in a set of 4 generic predefined tables. The tables are generic in the sense that they are not form-specific. This has a number of benefits:
  • It gives more control to DBAs: they can create the tables when Orbeon Forms is installed, and since at runtime Orbeon Forms doesn't need to have the permission to create tables, this allows DBAs to have a full knowledge of the database schema, and to optimize its storage or create additional indexes, should they wish to do so.
  • It keeps the database schema simple, and uniform across deployments. (Contrast this to an approach where you would have one table per form, and per version thereof.) Because of this, no "garbage collection" of unused tables (unused form versions) is necessary, upgrades to new versions of Orbeon Forms that require a change to the database schema are simpler, all the operations performed by Orbeon Forms can be part of a transaction (not all databases support DDL to be part of a transaction), and more.
For the tables to be generic:
  • Metadata (such as the form data id, the data owners, the time of the last change), which is the same for all forms, is stored in regular columns.
  • Values entered by users in your form are stored as XML, using a single "generic" xml column.
This works extremely well for operations where Orbeon Forms needs to either read or store all the values for a given instance of a form, like when users save or open form data. However, things gets more complicated when you want to do reporting on the submitted data, like:
  • For a given form, in a table, show the values of 5 significant fields for the last 50 submissions.
  • For a given form, find form data where the field last name has the value Smith.
Those operations are performed by summary pages, and the search API. In Orbeon Forms 2016.1 and earlier, they required SQL to extract data from the XML stored in the relational database. Unfortunately:
  • Some databases only have minimal support for XML.
  • Even for those databases with good support:
    • Searching and extracting values from XML data still isn't as fast as if the data was stored in "regular columns".
    • Optimizing those queries on XML data is highly database-dependant, and requires the creation of indexes or other techniques that DBAs are not necessarily familiar with.
Because of this, in Orbeon Forms 2016.1 and earlier, the performance of summary pages and the search API degraded as more data was added to the database. To solve this problem, Orbeon Forms 2016.2 introduces index tables. Data is still stored as XML as it was before, but the subset thereof needed for summary pages and the search API is also stored in the index tables. As a result, summary pages and the search API never need to access data stored in XML, and can thus run much faster.

This also means that the values in the index tables need to be kept up-to-date by Orbeon Forms. This happens automatically as data gets saved, or forms deployed. However, if you're upgrading to 2016.2, for the summary pages and the search API to work properly you'll need to first populate those indexes, in an operation referred to as reindexing. For more on this, see how to reindex your Orbeon Forms database when upgrading to 2016.2.

No comments:

Post a Comment