Monday, June 4, 2007

XPath: Reverse Axis, Evil at Times

XPath is both simple and powerful. Since XForms uses XPath, we benefit greatly from XPath, its simplicity and power in XForms. But while most of the time XPath is simple and comprehensible, it is also a sophisticated language. Most of the time, XPath expressions will do exactly what you would expect, but there are a few exceptions. I'll have a look in what follows at a case that still surprises me: the order and context position of nodes returned by XPath expressions. An XPath expression can return a sequence. Items in the sequence are in a certain order, and each of them has a context position. For instance, consider this document, with 3 employees, John, Peter, and Carl:
<company>
    <employee firstname="John"/>
    <employee firstname="Peter"/>
    <employee firstname="Carl"/>
</company>
Consider these 2 expressions:
  1. /company/employee[1]/following-sibling::employee
  2. /company/employee[3]/preceding-sibling::employee
The first expression returns the employees that follow the first employee. There is not much to be surprised about here: John is the first employee, so it returns Peter and Carl in that order. The second expression gets the employees before Carl. It returns John and Peter in this order, as all the path expressions in XPath return nodes in document order. Let's summarize:
  1. The 1st expression returns: Peter, Carl
  2. The 2nd expression returns: John, Peter
Now let's add the predicate [1] to both of those expressions:
  1. /company/employee[1]/following-sibling::employee[1]
  2. /company/employee[3]/preceding-sibling::employee[1]
When the value of a predicate is of a numeric type, as it is the case here, the predicate is called a numeric predicate. A numeric predicate is true if the value is equal to the context position and false otherwise. Then the question is: for each one of the two sequences, which item has as a context position equal to 1?
  1. The first sequence is composed of Peter and Carl in that order, and Peter is the employee with context position equal to 1.
  2. The second sequence is composed of John and Peter in that order, and here Peter, the second employee in the sequence, is the employee with context position equal to 1, not John which is the first employee in the sequence!
The reason for this potentially surprising result is that when you use a reverse axis, such as preceding-sibling, position is assigned in reverse order. So because a reverse axis is used, the context position of the last item in the sequence is 1. You can think of the engine as assigning context position starting from the node where you start your search: if you are "going down", as with following-sibling, context positions are assigned in document order, but if you are "going up" like with preceding-sibling, then context positions are assigned in reverse document order. Even if context positions are assigned differently depending on the type of axis you are using, the nodes returned by a path expression are always in document order. Does this make more sense?

No comments:

Post a Comment