Wednesday, May 18, 2011

Orbeon Forms 3.9: aggregating CSS and JavaScript resources

Orbeon Forms has had the ability to automatically combine (aggregate) together CSS AND JavaScript resources for years. The idea is simple: serving many files to the browser is not optimal due to the latency introduced by each request from the client, so you should try to server fewer files to the client.

In general, this is pretty easy for CSS and JavaScript files: just concatenate them together, create a URL to represent the combined file, and tell your page to load the combined file instead of the individual files.

For CSS, you have to be a bit careful as CSS can contain URLs pointing to other resources, such as images, that are assumed to be relative to the URL of the CSS file itself. So Orbeon Forms rewrites URLs in CSS files to point to the right place. For JavaScript, a simple concatenation works.

This is nice and all, but before Orbeon Forms 3.9 this was limited to built-in XForms engine resources. As more and more built-in CSS and JavaScript is added by the use of nicely packaged XBL components as well as custom resources, this feature had to be improved.

So in Orbeon Forms 3.9, we revisited the whole aggregation feature to make it even better!

Now, the aggregator looks at all CSS and JavaScript:

  • built-in XForms engine resources
  • XBL resources
  • custom resources placed by the form author
Here is how it works. When the page is generated, just before sending the page to the browser, the URLs of all the CSS and JavaScript to use are considered. They are split in three categories:
  1. baseline resources: resources that every single page will load no matter what
  2. supplemental resources: resources specific to the current page
  3. non-aggregated resources
For baseline and supplemental resources, a hash is computed based on the paths to the resources and a version number (Orbeon Forms version number plus, if present, your application's version number). This hash is included in the URL sent to the browser. This ensures that aggressive, perpetual caching can take place on the browser. So you will see paths like:

/orbeon/xforms-server/orbeon-8b3d174e93f2d74146c9b2a5356bd5b8b5e196f8.css

Then the server maps the hashes to the list of actual resources, which when requested by the browser can then be aggregated on the fly and even subsequently cached on the server.

Baseline and supplemental resources are aggregated separately. The idea of the baseline is the assumption that in an application, many pages will share lots of resources. Once you have hit a page the first time and caching has occurred, the baseline resources will be picked directly from cache on all subsequent pages.

Only the supplemental resources, which we hope are small (or even empty if the baseline includes everything you need for all pages), will be loaded for each new page. Of course, the supplemental resources too are cached aggressively by the browser.

Finally, there are some resources that can't be aggregated easily: for example, CSS targeting a specific media. There are usually no such resources, or few of them.

The result: fewer requests to the server when loading and navigating Orbeon Forms pages, and also a cleaner implementation.

More information:

Monday, May 16, 2011

Orbeon Forms 3.9.0 final

We are happy to announce Orbeon Forms 3.9.0 final!

Orbeon Forms 3.9.0 features over 300 improvements since Orbeon Forms 3.8.

Major improvements include:
  • Performance and reliability (including the new XPath dependency analysis engine)
  • Liferay support
  • A new implementation of the upload control
  • Updates to Form Runner, custom components, accessibility, the XForms engine, and more.
Don't miss the complete release notes.

Builds for both Orbeon Forms PE and Orbeon Forms CE are available from the downloads page.

Monday, May 2, 2011

Auto-fixing Windows/Unicode Character Encoding Issues


Nowadays, we use Unicode for almost everything, and Unicode supports a (very!) large number of characters. Unicode assigns a code (number) to each character, and in most cases this code is represented in 2 bytes when in memory (e.g. in Java and by Windows since NT), and a variable number of bytes when sent over the wire (UTF-8 encoding).

Before we started using Unicode, a single byte per character was used for western languages, with the ISO-8859-1 encoding. This encoding was fine in most cases but didn't contain some useful characters, such as curved quotes, both “double” and ‘single’, the trademark symbol™, or ligatures like œ.

<rabithole>The Œ and œ are the only French characters not present in ISO-8859-1, and it has been said that this is because French member of the ISO committee missed a session and that members from other countries simply decided to remove those characters during that session. (A perhaps more credible explanation is that the ISO committee members concluded that Œ and œ are ligatures rather than characters, and thus have no place in ISO-8859-1.)</rabithole>

Microsoft based its own character set on ISO-8859-1, but, in its infinite wisdom, decided to use some reserved codes of ISO-8859-1 for those "useful" characters, creating the Windows 1252 encoding.
The first 256 characters in Unicode come from ISO-8859-1, not Windows-1252, which means that the code for all those "useful characters" is higher than 255 in Unicode. The problem is that documents encoded with Windows-1252 are often incorrectly advertised to be in ISO-8859-1. The mistake is easy to make, as it works "in most cases". The error is so common, that the HTML5 spec says that browsers should parse documents advertised as using ISO-8859-1 as Windows-1252 (not trusting the advertised encoding!).

HTML5 only saves you as long as a Windows-1252 document is advertised as ISO-8859-1 to the browser. But if your Windows-1252 document is incorrectly opened as ISO-8859-1, and then saved as UTF-8, you end up with invalid Unicode characters, which won't be understood by browsers. Luckily, the solution to this problem isn't very complicated: it is just a matter of changing the code for characters that exist in Windows-1252 but not in ISO-8859-1 to their valid Unicode code, using a simple conversion table, and since version 3.9, you can setup Orbeon Forms to do this conversion automatically for you.