A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

December 2008
M T W T F S S
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Work superclusters

Posted by: William Denton, 10 December 2008 7:46 am
Categories: Aggregates,OpenFRBR

I wanted lots of Harry Potter ISBNs, so I was doing some superduping. For example, I superduped 1551922460, the ISBN of the 1999 hardcover Raincoast Books manifestation of Harry Potter and the Prisoner of Azkaban.

If you combine and dedupe them, you get 169 ISBNs. If you superdupe them, you get 1083. That is, you take the first number from thingISBN that xISBN didn’t tell you about, and ask xISBN about it. That gives you a new set of ISBNs. Do that for all the thingISBN-only numbers, then reverse the process and ask thingISBN about the numbers xISBN told you about. Repeat, back and forth, until you’ve exhausted both sides and pulled all of the ISBNs out of their different partitions and put them into one big bucket.

I did this for all of the Harry Potter books, and after careful examination my keen eyes noticed something:

   ISBNs Title
    1083 isbns-01-philosophers-stone.txt
    1083 isbns-02-chamber-of-secrets.txt
    1083 isbns-03-prisoner-of-azkaban.txt
    1083 isbns-04-goblet-of-fire.txt
    1083 isbns-05-order-of-the-phoenix.txt
    1083 isbns-06-half-blood-prince.txt
     121 isbns-07-deathly-hallows.txt
       3 isbns-0x-beedle.txt
      53 isbns-0x-scamander.txt

Superduping the ISBNs of the first six Harry Potter books had given 1083 ISBNs for each! And sure enough they’re the same 1083 ISBNs. What’s going on here is that because of boxed sets and other collections, and possibly incorrect work-groupings by hand and by algorithm, once you start looking at one Harry Potter book through xISBN and thingISBN, you end up looking at all of them. Or almost all. The seventh one stands alone, but I think that will change in a year or two, and it will fall in with the others.

This work supercluster includes all of the Harry Potter books, the movies, some soundtracks, some scores, some derivative works like pop-up books, and more. It also includes books by Carl Sagan, Philip Pullman, and C.S. Forester (!).

This supercluster phenomenon is interesting. In part it’s caused by collected editions and boxed sets and no easy standard way of handling two works in one manifestation. Human and machine error is also involved. xISBN and thingISBN aren’t perfect, and superduping their results compounds errors from one into the other and you can end up with a bit of a mess.

(I tried superduping Pride and Prejudice and stopped when I started getting into the complete works of Shakespeare. I’ll post more about that if I try it again, but perhaps all the great works of English literature are in one giant confused FRBRy supercluster.)

Full FRBRization, where relationships between works and aggregate works (such as boxed sets and omnibus editions) are clearly specified, will mean this isn’t a problem. That’ll be a lot of work, though.

Using isbn2marc I found MARC records for 978 of the 1234 total ISBNs.

978 Harry Potter-related MARC records (1 MB MARC)

I ran them through the LC FRBRization tool and put them into OpenFRBR.

~/src/openfrbr$ ./script/console 
Loading development environment (Rails 2.1.0)
>> Work.find(:all).size
=> 171
>> Expression.find(:all).size
=> 471
>> Manifestation.find(:all).size
=> 973
>> Person.find(:all).size
=> 22
>> Creation.find(:all).size
=> 138

7 Comments

  1. Managing bibliographic phenomena of that magnitude was the main motivation for developing the FRBR diagrammatic technique and conceptual data model that was presented at iPRES 2008.

    There are sufficiently many distinguishing attributes and relationships in FRBR to adequately represent the Harry Potter publishing phenomenon. For something *really* challenging, try “Star Wars.”

    Comment by Ron Murray — 10 December 2008 @ 9:11 am
  2. You’re right. Not only is the Star Wars bibliographic universe much larger, George Lucas keeps going back and editing his movies! That’s nasty.

    Comment by William Denton — 10 December 2008 @ 2:10 pm
  3. (I tried to comment with OpenID but my provider, Yahoo, uses OpenID 2.0 and this blog is OpenID 1.0. In the process I lost the comment. Phooey :)

    So I was only asking – is this FRBRization tool generally available. I couldn’t find a link. Perhaps it is pointed to from openfrbr site but that’s currently offline. I’m hoping to get some of this data into RDF/SPARQL to play with…

    Comment by Dan Brickley — 21 December 2008 @ 1:03 pm
  4. 3rd attempt to leave a comment.

    1st was an openid version mismatch.

    2nd was: “The requested URL /2008/12/10/work-superclusters/comment-page-1 was not found on this server.”

    3rd: where is this frbrization tool? anything opensource?

    Comment by Dan Brickley — 21 December 2008 @ 1:04 pm
  5. Jodi Schneider’s paper about it gives a link to a fresh version of the Library of Congress’s files, which aren’t on their main site yet:

    Temporary link to new files: http://memory.loc.gov/natlib/cred/marcFRBR.tar.gz

    Permanent link to LC FRBR Display tool:
    http://www.loc.gov/marc/marc-functional-analysis/tool.html

    I’m not sure about the licensing for non-US people, but the source is open.

    Comment by William Denton — 22 December 2008 @ 1:07 pm
  6. I have no idea if anyone looks at comments on month old messages, but, I saw the discussions of collections and Star Wars last week, and it reminded me of some notes that I wrote down last month … it’s a bit disorganized, as I started approaching the issue of aggregated works, looked at the issue of the previous/next relationships not being an attribute of the entity, manifestations that aggregate expressions, and then after reading the comments last week, some thoughts about boxed sets and aggregated manifestations:

    http://www.annoying.org/frbr/frbr_aggregate_works.txt

    And, I should probably put this into context — although the examples I use in there are bibliographic, the stuff I’m cataloging (or trying to) isn’t. See Proc. ASIS&T 2008 for my short paper. Pre-print available at:

    http://vso1.nascom.nasa.gov/vso/misc/jhourcle_ASIST_2008.pdf

    Comment by Joe Hourcle — 5 January 2009 @ 10:23 pm
  7. [...] something Owen mentioned that resonates with some of my thinking on List Intelligence: Superduping/Work Superclusters, in which we take an ISBN, look at its equivalents using ThingISBN or xISBN, and then for each of [...]

    Pingback by Open Data Processes: the Open Metadata Laundry « OUseful.Info, the blog… — 9 August 2011 @ 8:20 am

Comments RSS

Sorry, the comment form is closed at this time.