A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

December 2008
M T W T F S S
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Work superclusters

Posted by: William Denton, 10 December 2008 7:46 am
Categories: Aggregates, OpenFRBR

I wanted lots of Harry Potter ISBNs, so I was doing some superduping. For example, I superduped 1551922460, the ISBN of the 1999 hardcover Raincoast Books manifestation of Harry Potter and the Prisoner of Azkaban.

If you combine and dedupe them, you get 169 ISBNs. If you superdupe them, you get 1083. That is, you take the first number from thingISBN that xISBN didn’t tell you about, and ask xISBN about it. That gives you a new set of ISBNs. Do that for all the thingISBN-only numbers, then reverse the process and ask thingISBN about the numbers xISBN told you about. Repeat, back and forth, until you’ve exhausted both sides and pulled all of the ISBNs out of their different partitions and put them into one big bucket.

I did this for all of the Harry Potter books, and after careful examination my keen eyes noticed something:

   ISBNs Title
    1083 isbns-01-philosophers-stone.txt
    1083 isbns-02-chamber-of-secrets.txt
    1083 isbns-03-prisoner-of-azkaban.txt
    1083 isbns-04-goblet-of-fire.txt
    1083 isbns-05-order-of-the-phoenix.txt
    1083 isbns-06-half-blood-prince.txt
     121 isbns-07-deathly-hallows.txt
       3 isbns-0x-beedle.txt
      53 isbns-0x-scamander.txt

Superduping the ISBNs of the first six Harry Potter books had given 1083 ISBNs for each! And sure enough they’re the same 1083 ISBNs. What’s going on here is that because of boxed sets and other collections, and possibly incorrect work-groupings by hand and by algorithm, once you start looking at one Harry Potter book through xISBN and thingISBN, you end up looking at all of them. Or almost all. The seventh one stands alone, but I think that will change in a year or two, and it will fall in with the others.

This work supercluster includes all of the Harry Potter books, the movies, some soundtracks, some scores, some derivative works like pop-up books, and more. It also includes books by Carl Sagan, Philip Pullman, and C.S. Forester (!).

This supercluster phenomenon is interesting. In part it’s caused by collected editions and boxed sets and no easy standard way of handling two works in one manifestation. Human and machine error is also involved. xISBN and thingISBN aren’t perfect, and superduping their results compounds errors from one into the other and you can end up with a bit of a mess.

(I tried superduping Pride and Prejudice and stopped when I started getting into the complete works of Shakespeare. I’ll post more about that if I try it again, but perhaps all the great works of English literature are in one giant confused FRBRy supercluster.)

Full FRBRization, where relationships between works and aggregate works (such as boxed sets and omnibus editions) are clearly specified, will mean this isn’t a problem. That’ll be a lot of work, though.

Using isbn2marc I found MARC records for 978 of the 1234 total ISBNs.

978 Harry Potter-related MARC records (1 MB MARC)

I ran them through the LC FRBRization tool and put them into OpenFRBR.

~/src/openfrbr$ ./script/console
Loading development environment (Rails 2.1.0)
>> Work.find(:all).size
=> 171
>> Expression.find(:all).size
=> 471
>> Manifestation.find(:all).size
=> 973
>> Person.find(:all).size
=> 22
>> Creation.find(:all).size
=> 138