The future of MusicBrainz URIs

  • user warning: Unknown column 'captcha_type' in 'field list' query: SELECT module, captcha_type FROM captcha_points WHERE form_id = 'comment_form' in /var/www/drupal/sites/all/modules/captcha/captcha.inc on line 64.
  • user warning: Unknown column 'captcha_type' in 'field list' query: SELECT module, captcha_type FROM captcha_points WHERE form_id = 'user_login_block' in /var/www/drupal/sites/all/modules/captcha/captcha.inc on line 64.

One of the great strengths of the MusicBrainz project is disambiguation.  Each artist, label, release, release group, recording gets its own universally unique identifier or, in MusicBrainz parlance, an MBID.  This allows us to easily distinguish between two recordings of the same work or two artists that happen to have the same name.

This is of particular importance in the context of linked data.  To adequately model music-related information on the web of data we must have Unique Resource Identifiers (URIs) for artists, recordings, works, etc.  In the LinkedBrainz project, we aim to make MusicBrainz-based URIs a hub for music-related linked data.  To this end, we have embedded RDFa into the MusicBrainz NGS release making the next generation of MusicBrainz URIs dereferenceable in the linked data sense.

Thing vs. Page about Thing

Let us digress for just a moment to discuss the subtle but import distinction in linked data between "the thing" and "the page about the thing".  Consider the MusicBrainz page about James Brown.  This page contains lots of information about James Brown "The Godfather of Soul".  But this page is not James Brown.  In our linked data, we will want to make statements about James Brown (e.g. "James Brown was born in 1933").  Of course, the page about James Brown was not born in 1933.  Therefore we need to make sure the URI for James Brown is distinct from the URI for the page about James Brown.  In the MusicBrainz NGS server this is achieved using a hash URI scheme.

URIs of the past

In the current MusicBrainz server, the URI for the page about James Brown is of this form:

 http://musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438.html

Note the '.html' suffix - a linked-data-compatible cool URI scheme could have been implemented removing this suffix to indicate the URI for James Brown (rather than the page about James Brown).  However the 303 redirects and dereferencing for this type of URI scheme were never implemented.

URIs of the future

After the MusicBrainz NGS release, the URI for the page about James Brown will become:

 http://musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438

While the URI for James Brown will be:

 http://musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438#_

Note the "#_" suffix that distinguishes the two URIs.  This is a sort-of nod to the Perl syntax where "$_" refers to "default input".  This is a somewhat arbitrary decision.  We could have chosen "#artist" but "artist" already appears in the URI.  We could have used the artist name to make a more human-readable URI but artist names often contain unicode characters and percent-encoding would make them long and unreadable.  So we stuck with short and sweet.

There are a few notable exceptions.  For a MusicBrainz release, in addition to the page about the release and the release itself we have a third URI for the release event.  This is of the form:

 http://musicbrainz.org/release/aaa64bfa-2f93-46ee-b6aa-48baf0775b4b#event

Note we are just changing the suffix to "#event" to represent the release event.  Similar patterns may be adopted to represent, for example, the lyrics of a musical work or the composition event of a work.  A definitive list of these patterns will be collected on the wiki.

I have started using #id to

I have started using #id to identify my things in any new RDF I have been creating. I used to be ideologically opposed to using 'id' as the identifier because all URIs are identifiers and this is just an instance of an identifier for a particular thing. However using 'id' is short, convenient and universal and you can quickly understand what it refers to.

Here is an example of the new identifier scheme that I am now using for dbpedia lite:
http://dbpedialite.org/things/26471#id

It is also likely that the BBC will start using #id for future projects.

nick.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options