Wednesday, November 8, 2006

XPath: Tuning your XPath Expressions

EngineWhen you write an XPath expression, you say what information you want to extract from an XML document, but you don't tell the XPath engine how to perform that task. Consider for instance the expression: /phonebook/person[starts-with(phone-number, '323') and last-name = 'Lee'] Imagine you are running this query on a hypothetical XML document that represents a whole phone book. The query retrieves all the persons from Hollywood (area code 323) with the last name Lee. How will the XPath engine execute this query? Let's see a few ways in which this could happen:
  • It can go through the list of persons and start by checking the first condition first. If the first 3 digits of the phone number are 323 then it checks if the last name is "Lee".
  • A more advanced engine might figure that the first test on the phone number is more expensive than the straight comparison with "Lee". So it might decide that it is more efficient to do the second comparison first, and only check the first 3 digits of the area code if it already knows that the last name is "Lee".
  • An even more advanced engine might maintain an index of the persons based on their last name. Based on this index it can quickly locate the persons with last name "Lee". A standalone XPath engine typically wouldn't index XML documents, but this can be certainly expected from an engine running in a database.
The XPath engine has a lot of freedom in the way it runs your XPath queries, and unless you know extremely well the engine you are using, you just can't say that a query will run more efficiently because it is written in one way instead of another. So start by writing your queries optimizing for human readability: make them explicit and simple to understand. For instance:
  • Instead of //person use /phonebook/person, because:
    • Using /phonebook/person might be more efficient: With //person some engines will traverse every element of the document, while they would only need to go through child elements of the root element with /phonebook/person.
    • But more importantly /phonebook/person states more clearly your intension and makes your code more readable.
  • In large XPath expressions, avoid duplicating part of the expression. For instance: (count(/company/department[name = 'HR']/employee), avg(/company/department[name = 'HR']/employee/salary)) This expression returns a sequence with the number of employees in the HR departments, and their average salary. Instead you can write it: for $hr in /company/department[name = 'HR'] return (count($hr/employee), avg($hr/employee/salary)) Unlike XQuery, XPath doesn't have a let construct for you to declare variables. In some cases however, you can get around this by using the for construct.
Don't try to optimize your XPath expression prematurely. Or as 37signals puts it in their book Getting Real: "It's a Problem When It's a Problem". Until then just write clean and readable expressions.

No comments:

Post a Comment