The question was recently posed, how many pages will contain RDFa in the new MusicBrainz release? Here are some "back of the napkin" calculations.
In the most recent NGS database dump we have:
- 539,988 artists
- 710,665 release groups
- 850,853 releases
- 9,138,660 recordings
- 190,103 works
- 42,300 labels
First we can sum the release groups, releases, recordings, works, and labels and multiply by two because each of these entities has RDFa on their main page and their /details page.
But then with artist pages, it starts to really explode. The pagination stipulates listings of 50 entities per page. Every release group, release, recording and work is credited to an artist, so we count these again. We can estimate RDFa pages associated with artists as:
(release groups/50) + (releases/50) + (recordings/50) + (works/50)
That results in a whopping 12.1 million pages of RDFa!!! Note that there is a fair amount of overlap in terms of triples, however. For example, most of the triples in the release pages are also present in the artist /releases pages. Please let me know if I'm calculating this incorrectly, I know it seems really high! But if my mathing is correct - this is actually an underestimate because some artists will have less than 50 releases, recordings, works, etc.






Wow!! Any more news on this?
Wow!! Any more news on this?
I'm presuming this may well point to scalability and resulting performance issues? Or maybe not?
Are you able to provide any more info on the potential implications of this? Sounds like it's an important issue to track.
Cheers, Adrian
JiscEXPO Synthesis Liaison
Cool to have all that data,
Cool to have all that data, but perhaps it suggests some need to make sure there's a less redundant view of it. Perhaps a sitemap (see http://sindice.com/developers/publishing ) to help crawlers be more focussed and not grab the same thing many times over?
Very impressive guys, a real
Very impressive guys, a real achievement for the first half of the project. Question is now, what are you going to do to top this?! I look forward to reading more :) /dff
Post new comment