Sunday, March 27, 2005

The OPS Blog Sample Application, Part I

Just before this weekend, I launched a mini-project to create a new "Blog" sample application for Orbeon PresentationServer (OPS). The idea had been suggested independently by two users of OPS, and there was also my own inclination to write another cool example application for OPS that leverages XML.

Where do we Start?

I have been using w.bloggar, a blog client, since the beginning of the year. While limited, it actually handles some of the basic functionality of a blog. So, thinking in terms of services, the first thing I wanted to do was to understand better the functionality provided by the XML-RPC-based Blogger and MetaWeblog APIs (the latter being an extension of the former). My quick analysis is that those APIs essentially manage the following entities:

  • Blogs. A blog is an entity which defines an individual user's blog hosted by the application. A blog is identified by a blog id. It provides a URL, a name, and has associated categories, identified by name, that also provide associated URLs for the HTML and RSS versions of pages related to each category.

  • Posts. A post is an entity which defines a post within a blog. A post is identified by a post id. Its main features are a title, a link, and a "description" (the actual content of the post). Other RSS 2.0 attributes can be used as well, in particular a publication date and the associated categories.

First Goal

My first goal was to get something running quickly. "Running" is here defined by the following steps:

  1. Visibility from my blog client. I should be able to configure my blog client to access the OPS blog sample, even if the data exchanged is static and doesn't actually accomplish anything.

  2. Support basic operations. I should be able to create a post with my blog client and persist it. Then, retrieve it, edit it, and update it again.

  3. View blog posts. I should be able to visit a URL and access the last posts with title, date and content in HTML. Let's call this page the recent-posts page.

Storage Format

One of the initial ideas was that we would follow the spirit of OPS as much as we can. This implies sticking to using XML-friendly approaches. One of those is that we would use XML storage. OPS comes with an embedded eXist database, so why not use it?

How do you "design" an XML database schema? This is something not many of us are used to doing. I figured that the simpler the better: I would create two collections: a blogs collection, and a posts collection. Let's say user lambda has a blog: he would simply have an XML document describing that blog under the blogs collection. Then each of his posts will be a separate document under the posts collection.

This is not the only possible solution. Since posts are related to a unique blog anyway, you could embed all of a user's posts within the blog document. The benefit is that a single document would do, but then you work more with document updates rather than creating new documents for posts. This also means that a blog document could become pretty big. For now, we'll go with the first approach. I am waiting for comments on this!

Here is an example of a blog document according to this design:

  <blog>
  <blog-id>blog123</blog-id>
  <username>ebruchez</username>
  <name>My Cool Blog</name>
  <categories>
  <category>
  <name>General Stuff</name>
  <name>Cool Stuff</name>
  </category>
  </categories>
  </blog>

And here is an example of a post document:

  <post>
  <post-id>post456</post-id>
  <username>ebruchez</username>
  <blog-id>blog123</blog-id>
  <title>Post du Jour</title>
  <description>What a day...</description>
  <published>true</published>
  <date-created>2004-03-28T10:00:00</date-created>
  <categories>
  <category-name>General Stuff</category-name>
  </categories>
  <comments>
  <comment>...</comment>
  </comments>
  </post>

Note that post comments are not part of the MetaWeblog API, but here I decided to store them along with each post. Again, a different strategy could consist in creating yet another collection for comments.

Once this basic format established, I created stub Relax NG schemas to validate those two types of documents.

Hooking Up XML-RPC

This is actually quite straightforward with OPS: an XML-RPC call consists in an XML document sent as the body of the HTML request. A response consists in an XML document sent back. Such a model is implemented in just a few lines with the OPS Request generator, XML converter and HTTP serializer.

I then created a dispatcher in XPL that calls individual pipelines based on the XML-RPC method requested. So far, in the order in which I introduced them: blogger.getUsersBlogs, metaWeblog.getCategories, metaWeblog.newPost, metaWeblog.getRecentPosts, metaWeblog.getPost, metaWeblog.editPost, blogger.deletePost.

I should note that the XML-RPC format is very verbose. While adapted to mapping back and forth to good old function or method-based languages, it is far from a document-based approach to services, which would have been way more natural here. Therefore it is here almost needed to introduce a conversion layer from the XML-RPC API to a simpler format for internal use. Consider the following short example:

  <params>
  <param>
  <value>
  <string>705B0BBB-8DF7-DB98-1FA1-B416860AA61B</string>
  </value>
  </param>
  <param>
  <value>
  <string>ebruchez</string>
  </value>
  </param>
  <param>
  <value>
  <string>private</string>
  </value>
  </param>
  </params>

This could be represented, in a document-oriented service, as follows:

  <edit-post>
  <post-id>705B0BBB-8DF7-DB98-1FA1-B416860AA61B</post-id>
  <username>ebruchez</username>
  <password>private</password>
  </edit-post>

Which do you prefer? Unfortunately, the conversion task between one format and another cannot be easily automated, because the XML-RPC format's parameters do not have names.

Note that I also wrote a short Relax NG schema to validate XML-RPC requests and responses. Those schemas are hooked up in the XML-RPC dispatcher written in XPL and make sure we do not process or generate garbage. Long live Relax NG and XPL!

The bottom line is that the logic is now in place that implements the APIs mentioned above by hooking them up to eXist.

Overall Architecture of the Blog Sample
Overall Architecture of the Blog Sample

First Page

The recent-posts page was almost trivial to implement. It consists of a page model that calls a data access pipeline that retrieves the posts for a given user's blog. The page view just formats this data in HTML.

What Next?

I think that the following tasks are required to make the application usable:

  • XML-RPC Authentication. Right now, no authentication is not done at all for the XML-RPC calls. I can only reiterate my wish that simple HTTP authentication could be used!

  • Comments Page. A page showing an individual post with simple text comments.

  • Admin Page. This is needed to create a blog and related categories.

The source code is available from CVS, under src/examples/web/examples/blog.

2 comments:

  1. Hi Eric,
    Excellent idea. My thought to make the storage of the Weblog entry easier... Why not use RSS2.0 tags directly (or Atom). This way the RSS Feed would be very resource effective (either a XSLT or a SAX processor that filters out the not needed elements). A good into can be found here: http://blogs.law.harvard.edu/tech/rss
    Of course the format need to have additional fields since RSS doesn't contain the full text or comments. What do you think?
    ;-) stw

    ReplyDelete
  2. ). So far I hadn't looked at Atom, but I believe after a first look that I am more inclined to like what they are doing than all the mess we've had so far with RSS and the Blogger API.

    Converting from one format to another is going to be trivial anyway, but I agree that if we can follow an existing format, we'll be better off.

    ReplyDelete