Work superclusters
I wanted lots of Harry Potter ISBNs, so I was doing some superduping. For example, I superduped 1551922460, the ISBN of the 1999 hardcover Raincoast Books manifestation of Harry Potter and the Prisoner of Azkaban.
If you combine and dedupe them, you get 169 ISBNs. If you superdupe them, you get 1083. That is, you take the first number from thingISBN that xISBN didn’t tell you about, and ask xISBN about it. That gives you a new set of ISBNs. Do that for all the thingISBN-only numbers, then reverse the process and ask thingISBN about the numbers xISBN told you about. Repeat, back and forth, until you’ve exhausted both sides and pulled all of the ISBNs out of their different partitions and put them into one big bucket.
I did this for all of the Harry Potter books, and after careful examination my keen eyes noticed something:
ISBNs Title
1083 isbns-01-philosophers-stone.txt
1083 isbns-02-chamber-of-secrets.txt
1083 isbns-03-prisoner-of-azkaban.txt
1083 isbns-04-goblet-of-fire.txt
1083 isbns-05-order-of-the-phoenix.txt
1083 isbns-06-half-blood-prince.txt
121 isbns-07-deathly-hallows.txt
3 isbns-0x-beedle.txt
53 isbns-0x-scamander.txt
Superduping the ISBNs of the first six Harry Potter books had given 1083 ISBNs for each! And sure enough they’re the same 1083 ISBNs. What’s going on here is that because of boxed sets and other collections, and possibly incorrect work-groupings by hand and by algorithm, once you start looking at one Harry Potter book through xISBN and thingISBN, you end up looking at all of them. Or almost all. The seventh one stands alone, but I think that will change in a year or two, and it will fall in with the others.
This work supercluster includes all of the Harry Potter books, the movies, some soundtracks, some scores, some derivative works like pop-up books, and more. It also includes books by Carl Sagan, Philip Pullman, and C.S. Forester (!).
This supercluster phenomenon is interesting. In part it’s caused by collected editions and boxed sets and no easy standard way of handling two works in one manifestation. Human and machine error is also involved. xISBN and thingISBN aren’t perfect, and superduping their results compounds errors from one into the other and you can end up with a bit of a mess.
(I tried superduping Pride and Prejudice and stopped when I started getting into the complete works of Shakespeare. I’ll post more about that if I try it again, but perhaps all the great works of English literature are in one giant confused FRBRy supercluster.)
Full FRBRization, where relationships between works and aggregate works (such as boxed sets and omnibus editions) are clearly specified, will mean this isn’t a problem. That’ll be a lot of work, though.
Using isbn2marc I found MARC records for 978 of the 1234 total ISBNs.
978 Harry Potter-related MARC records (1 MB MARC)
I ran them through the LC FRBRization tool and put them into OpenFRBR.
~/src/openfrbr$ ./script/console Loading development environment (Rails 2.1.0) >> Work.find(:all).size => 171 >> Expression.find(:all).size => 471 >> Manifestation.find(:all).size => 973 >> Person.find(:all).size => 22 >> Creation.find(:all).size => 138