The question was recently posed, how many pages will contain RDFa in the new MusicBrainz release? Here are some "back of the napkin" calculations.
In the most recent NGS database dump we have:
- 539,988 artists
- 710,665 release groups
- 850,853 releases
- 9,138,660 recordings
- 190,103 works
- 42,300 labels
First we can sum the release groups, releases, recordings, works, and labels and multiply by two because each of these entities has RDFa on their main page and their /details page.
But then with artist pages, it starts to really explode. The pagination stipulates listings of 50 entities per page. Every release group, release, recording and work is credited to an artist, so we count these again. We can estimate RDFa pages associated with artists as:
(release groups/50) + (releases/50) + (recordings/50) + (works/50)
That results in a whopping 12.1 million pages of RDFa!!! Note that there is a fair amount of overlap in terms of triples, however. For example, most of the triples in the release pages are also present in the artist /releases pages. Please let me know if I'm calculating this incorrectly, I know it seems really high! But if my mathing is correct - this is actually an underestimate because some artists will have less than 50 releases, recordings, works, etc.