The work of the project is divided into 5 work packages (WP), three of which cover the technical work and one covers each of dissemination and management.
WP1: Convert MusicBrainz to linked data (Months 1-6)
- WP1.1 Mapping of MusicBrainz NGS database schema to RDF: The MusicBrainz community is in the final stages of creating a more expressive schema for describing music-related metadata, called the Next Generation Schema (NGS). While NGS provides clear structure and semantics for the MusicBrainz data, it does not directly provide Linked Data. To adhere to the principals of Linked Data the MusicBrainz NGS must be expressed using RDF. A mapping of the MusicBrainz NGS to appropriate OWL/RDFS ontologies is required. This task will involve feedback from the JISC community in the form of mailing list discussions on the Linking Open Data mailing list as well as the Music Ontology Specification Group and the MusicBrainz community. The process of soliciting community feedback on early iterations of the mapping will ensure the use of the most appropriate and widely accepted ontologies and will encourage use of the resulting Linked Data resources output by the project.
- WP1.2 Implementation of serving of RDF: We will contribute appropriate content negotiation code and/or generation RDFa code to the MusicBrainz server code base. Most of the resources described by the MusicBrainz project are non-information resources, that is the URIs refer to real-world things such as music artists or albums. Following practices of Linked Data, these URIs should provide 303 redirects to appropriate information resources when they are dereferenced via HTTP. Depending on the content header in the HTTP request the redirect will point to either a human-readable HTML page or a machine-readable RDF document. If for some unforeseen reason including redirects and content negotiation in the MusicBrainz server proves to be impossible, RDFa can be embedded directly in the MusicBrainz HTML documents.
- WP1.3 Linking to other data sets: We will provide appropriate links to other datasets in the MusicBrainz RDF data to best meet the Linked Data recommendations. The MusicBrainz NGS contains a wealth of links to external resources including BBC Music, Discogs, IMDB, Wikipedia, and Myspace. These links can be used to create appropriate links to the corresponding Linked Data resources (e.g. DBPedia.org resources). Additional links can be automatically generated using the graph matching approach proposed by Raimond (PhD, 2009), and then manually checked using the tried-and-tested crowd-sourcing framework that powers the MusicBrainz project.
- Deliverable D1.1 (M2): RDF mapping of MusicBrainz NGS schema
- Deliverable D1.2 (M6): Publication of the MusicBrainz metadatabase to the semantic web
WP2: Creation of prototypes and tutorial material (Months 6-11)
- WP2.1 Creation of SPARQL endpoint: We will create and maintain a SPARQL endpoint that allows users to query the MusicBrainz RDF. In our previous work with DBTune.org we have served RDF from a Postgres database using a D2R server which serves as a translation layer between the relational database and a SPARQL endpoint. While the D2R software is of great utility, its performance as a SPARQL engine is limited by the underlying database schema. We plan instead to perform an RDF dump from the MusicBrainz database into a purpose-built triple store (e.g. 4store). We will need to address the scalability issues which arise when serialising a database of this size. Assuming the use of a purpose-built triple store provides significant performance gains compared to a D2R server configuration, an infrastructure that automates the RDF dumping task will be implemented such that changes to the MusicBrainz dataset propagate to the SPARQL endpoint at regular intervals making the SPARQL endpoint resource sustainable and up-to-date.
- WP2.2 Production of tutorial materials: We will design and produce tutorial materials, including sample SPARQL queries, screencasts and videos, describing how the structured data can be queried and accessed by third party tools and services. These tutorial materials will be released on our web site, as well as being presented at our workshops (see WP4). For the first workshop (M6), the more general tutorial material on music data and the semantic web presented at ISMIR 2009 will be refreshed. The second workshop (M11) will focus specifically on the project outputs.
- Deliverable D2.1 (M10): SPARQL endpoint serving MusicBrainz data Deliverable D2.2 (M6): Semantic web and music data tutorial materials
- Deliverable D2.3 (M11): Tutorial materials on using SPARQL to query the MusicBrainz data
WP3: Evaluation of opportunities and barriers (Months 1-12)
- WP3.1 Trust and provenance: The MusicBrainz database tracks the provenance of the data (complete edit history), so it would be possible to publish this with the MusicBrainz data, in consultation with the MusicBrainz community. We will investigate other trust and/or provenance issues as necessary in conjunction with WP1 and WP2.
- WP3.2 Opportunities and barriers: Issues will be tracked continually throughout the project as they arise (e.g. scalability in WP2.1). They will initially be discussed at weekly project meetings and notable issues will be logged on the project blog. These records will form the basis for a report which will be compiled at the end of the project.
- Deliverable D3.1 (M12): Report on opportunities and barriers in publishing the MusicBrainz database as a semantic web resource.
WP4: Dissemination and engagement (Months 1--12)
Different stakeholders will be engaged at various stages of the project. In the initial stages, the MusicBrainz and Linked Open Data communities will be engaged in the design of the RDF mapping (see WP1.1). Engagement with the JISC community will take place throughout the project via JISC programme events and the networking of the JISC Developer Community. End users in the higher education sector will be involved via two workshops in months 6 and 11.
- WP4.1 Public engagement: A project web site and blog will be established and maintained to report the progress of the project to the general public.
- WP4.2 Workshop 1: The first workshop will be held in conjunction with the annual Digital Music Research Network workshop at Queen Mary, which is widely attended by national and some international academics in the Digital Music field. We will present the results of publishing the MusicBrainz data (WP1), introduce our plans for the remainder of the project, and seek feedback from the end user community concerning the direction of the project. We shall also present an introductory tutorial on music information and the semantic web for those who are new to the field. The workshops will be open to members of the research community and the general public, and will be advertised in relevant mailing lists and web sites.
- WP4.3 Workshop 2: The second workshop will be held at the end of M11 and seek to promote the use of the linked data resources in the Music and Music Informatics communities. Advanced tutorials will be presented, explaining how to build tools and services using the SPARQL endpoint.
- WP4.4 Other dissemination activities: Outcomes of the work will be published in journals (e.g. Computer Music Journal, Journal of New Music Research) and major international conferences (e.g. International Society for Music Information Retrieval Conference), as well as JISC's programme meetings and other events. We will also make the project known through our related projects such as OMRAS-2 (www.omras2.org), Sustainable Software for Audio and Music Research (www.soundsoftware.ac.uk) and NEMA (nema.lis.uiuc.edu). In M12, the RA will visit end user sites to work with users who attended the workshops and require assistance in getting started with building applications.
- Deliverable D4.1 (M1): Establishment of project blog and web site, Core Resources Form and public version of project plan
- Deliverable D4.2 (M6,M11): Delivery of two workshops
- Deliverable D4.3: At least one publication in a refereed journal and international conference. (Due to the length of the project these are likely to occur after the project has ended.)
WP5: Management (Months 1-12)
The project will be managed on a day-to-day basis by the PI, with project meetings held weekly to assess progress and problems. This has been our practice throughout OMRAS-2. At the workshops, and at conferences that we visit, we will hold meetings with key representatives of our user communities, and use their feedback to steer remaining parts of the project and future work.
- Deliverable 5.1: Reports, budgets, plans as required by JISC Sustainability Sustainability of this work is ensured by publishing the data on the MusicBrainz server itself. MusicBrainz has been running for about 10 years, and has support from major organisations such as Google and the BBC. In the unlikely event of MusicBrainz ceasing to exist, the database will be set up on the DBTune.org site. Software created in the project will be maintained by our project Sustainable Software for Digital Music and Audio Research, which is funded by EPSRC until 2014, and has plans for supporting long-term sustainability of research software. This strategy allows the project to have a continuing impact after it has ended.