A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

April 2007
M T W T F S S
« Mar   May »
 1
2345678
9101112131415
16171819202122
23242526272829
30  

20 April 2007

Superduping 2

Filed under: Implementations, LibraryThing, OCLC — William Denton @ 7:52 am

Here’s an example of how superduping and the thingISBN and xISBN services can go slightly wrong. The item in hand is my copy of From the Mixed-Up Files of Mrs. Basil E. Frankweiler, by E.L. Konigsburg. It was first published in 1967, and won the Newbery medal the next year. My item is an exemplar of, if I read the copyright page correctly, the forty-third impression of the Dell Yearling edition first published in 1977. The ISBN is 0440431808.

If we query thingISBN for 0440431808, we get back a cluster of 29 ISBNs. If we query xISBN for 0440431808, we get back a cluster of 34 ISBNs. I won’t show them all.

Here’s some of the output from my superduping script. I’ll include the title if it’s not what we expect. First, we check the ISBNs that thingISBN gave us.

Super   Source  ISBN            ts      xs
0       T       0140306811
1       T       032111583X      28      33 + 2  Final cut pro 3 for Macintosh

The first ISBN is in both initial result sets, so it’s moved to the superdupe array. The second ISBN isn’t in xISBN’s result set, so it’s checked over there, and found to be part of a cluster of two ISBNs. But the book is Final Cut Pro 3 for Macintosh! What the heck?

I’m not sure what’s going on. The LibraryThing page for this work shows 032111583X as the ISBN of a 1974 Atheneum hardcover manifestation. It’s the last in the list on the left-hand side, and if you follow the link there you can look up the book at various book-selling services. Notice the cover LibraryThing shows is for Final Cut Pro 3 for Macintosh, and notice too that if you follow the links to the booksellers you get some strange results. I searched listings at BookFinder and it appears there was a 1974 hardcover edition from Atheneum, but the ISBN was 0689205864.

So somehow the ISBN of a whole other work got into our list. Let’s see what happens. Those ISBNs will be queried at xISBN, so maybe things will spiral out of control.

Super   Source  ISBN            ts      xs
2       T       0333100646
3       T       0333462874
4       T       0395732514      25      32 + 1  Explore

The fifth ISBN checked from thingISBN’s initial result set hadn’t been checked at xISBN, so we check it, and find it’s a singleton. This means that it either is a singleton, or that xISBN knows nothing about it. Why does it say the title is Explore? Look up 0395732514 at WorldCat and you get three different listings! There’s the book we have in hand, a series (I think) called Invitations to Literacy, and whatever Explore is. The cover shown is for a Beverly Cleary book.

“I’m dashed confused,” I hear you say, and so am I. I have no idea what’s going on. Here are some points of interest from the rest of the output. Some other unrelated book comes in, and then we get into some non-English books. The Dutch one, Het Wonderlijke Archief van Mevrouw Fitzalan, must be a translation of our book, but I don’t know if the others are the right book or something else. (Sorry about the character sets not displaying properly.)

24      T       0807275565      5       17 + 1  Discovery packs of learning (eng)
...
37      X       8205104719      0 + 0   6       To rømlinger og en engel  (nor)
38      X       8906501692      0 + 0   5       Kʻu�llodia u�i pimil = (kor)
39      X       9021471272      0 + 0   4       Het wonderlijke archief van mevrouw Fitzalan  (dut)
40      X       9510008230      0 + 0   3       Vanhan rouvan salaiset paperit  (fin)
41      X       9510043346      0 + 0   2       Vanhan rouvan salaiset paperit  (fin)
42      X       9570803193      0 + 0   1       Base fu ren di dang an chu  (chi)

Final results:

Combining and deduping: 42
Superduping: 43 ISBNs
thingISBN: 29 at start; 15 calls; 0 ISBNs added; 14 unknown
    xISBN: 34 at start; 9 calls; 9 ISBNs added; 0 unknown

So after all of that we only got one new ISBN that we wouldn’t have had by just combining and deduping. It’s 382731755X, incidentally, which xISBN clusters with 032111583X. It’s another edition of Final Cut Pro 3 for Macintosh. At least three of the ISBNs are not for From the Mixed-Up Files of Mrs. Basil E. Frankweiler, and the number may be higher, depending on what the non-English books are. Whether we superdupe or just combine and dedupe, our data sources are mixing in some wonky results.

Conclusion: Sometimes this stuff just doesn’t make sense. Remember what the Stoic sage Epictetus advised two thousand years ago: “Some things are up to us and some are not.”

Next up: Jane Austen, Frodo Baggins, or Horatio Hornblower spiral out of control.