Adventures in ad hoc RDFa development for MusicBrainz

user warning: Unknown column 'captcha_type' in 'field list' query: SELECT module, captcha_type FROM captcha_points WHERE form_id = 'user_login_block' in /var/www/drupal/sites/all/modules/captcha/captcha.inc on line 64.

This post is intended to describe, in some detail, the implementation of RDFa within the MusicBrainz NGS codebase.  The casual reader will perhaps not be interested in these details.  The intended audience includes future LinkedBrainz/MusicBrainz developers or other developers faced with the task of injecting RDFa into a Model-View-Template web application.

A brief review of MVT

The Model-View-Template (MVT) framework is an approach to web application development underlying popular web development frameworks like Ruby on Rails and Django (note MVT is often referred to as Model-View-Controller or MVC, but the concept is the same and our opinion is that MVT is a more appropriate name).  Put simply, MVT divides aspects of development into three distinct modular categories which helps in creating concise portable code.  The model portion relates to the database model - the tables and relations in a relational database and the queries used to access and update them.  The views portion of the code includes arbitrary business logic for manipulating data from the model and preparing it for presentation or storage.  The templates portion is concerned with page layout and formatting information for the end user.  Templates are populated dynamically by views to generate HTML pages.  

RDFa breaks from the clear distinctions of MVT.  RDFa injects elements of the RDF model into the templates layer of the MVT framework.  Further complicating matters in our application, the RDF model we are using is not the same model used in the MVT - we are embedding a translation of the MVT relational data model to RDF.  

The RDFa Macros solution

We will now describe our ad-hoc approach to embedding RDFa into an MVT web application.  Our solution is not particularly elegant (note the use of the adjective "ad-hoc") but so far it seems practical and functional.

The MusicBrainz NGS server uses the Catalyst web development framework - a Perl framework conceptually similar to Rails - which adheres to the MVT distinctions we discussed earlier.  Because our model is a translation of the relational data model into an RDF data model, we decided to involve only the templates layer of the MVT in our implementation - leaving the model and view layers essentially untouched.  The Template Toolkit (TT) package is used as the templating language in the MusicBrainz NGS server.  Template Toolkit is a powerful and flexible templating language implemented in Perl.  It allows for the use of template macros - small re-usable template functions.  We make extensive use of TT macros in our implementation.

In the MusicBrainz NGS code base, templates are located in the root directory and organized by entity pages.  For example, the templates that apply to music artist pages are found in root/artist while the templates that apply to release pages are in root/release.  A series of macros for creating href links, formatting dates, and similar tasks exists in the file root/components/common-macros.tt.  Our general approach is to high-jack these macros and replace them with variants that produce suitable RDFa.

All RDFa macros are contained in root/components/rdfa-macros.tt.  All the RDB to RDF modeling decisions are made in this file as well.  Each macro name is prefixed with rdfa_ for easy identification since these macros are called throughout the various templates.  The general organizational principle for the code in rdfa-macros.tt is as follows: more general macros and macros that constitute a modeling decision are found in the beginning of the file while more ad-hoc macros are towards the end of the file grouped by the entity pages to which they apply.

There was some pressure from the MusicBrainz development team to keep the "bloating" due to RDFa to a minimum - larger pages create a heavier load on the server.  This has led to some inter-dependent macros.  These generally follow a pattern where one macro declares namespaces and another macros uses these namespaces to minimize the size of the resulting html.  Comments in the code should indicate where this is the case.

There are a series of tests for the RDFa in the MusicBrainz NGS code base as well.  These exist in t/rdfa/. and utilize the test routines found in lib/MusicBrainz/Server/Test/RDFa.pm .  The tests load some test data into a test database, spin up a test web server, parse the RDFa of a given page, and verify a few basic things about the resulting RDF model.  These tests should ideally be expanded to cover more detailed aspects of the RDF models for each respective core entity.

Future work and possible alternatives

Expanding the RDFa tests we describe above is good place to begin future work.  Essentially, a given set of test sql is loaded to the test database from t/sql for each test.  Currently only a few aspects of the RDF model are verified (e.g. verifying the correct rdf:type predicate).  Each entity will require unique tests.  For example the release test should verify that the release event exists and has an appropriate date.

The most obvious way to up the elegance of our solution is to link the RDFa implementation directly with the D2RQ mappings that are being developed.  Currently we must maintain mappings in the RDFa macros file and in a D2RQ turtle file - not very elegant and prone to discrepancies.  We have begun a set of unit tests written in Python to try to help identify discrepancies but these are in early stages and are only a sort-of band-aid.  Ideally, all modeling decisions in rdfa-macros.tt would be removed and another template file would be generated algorithmically from a parsing of the D2RQ mapping file.

One alternative to this approach would involve a much deeper alteration of the MusicBrainz codebase.  Instead of just involving the template layer of the MVT, we involve the model and view layers as well.  When a page is requested, a set of view extensions are called in addition to current views.  These new views would provide a translation into the RDF model based on a D2RQ mapping.  That model would then somehow be serialized into RDFa.  One option would be the use of "invisible spans" that exist in the html but are simply not displayed (e.g. <div style="display:none"> ... </div>).  However, this would likely violate our minimal RDFa bloating requirement.

Hopefully this post have provided some clarity about the implementation of RDFa in MusicBrainz NGS.  Perhaps our story will prove helpful for other projects providing ideas for what to do (or what not to do).  If you have any thoughts or questions about our approach - we'd love to hear them - comments open!