A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

March 2007
M T W T F S S
« Feb   Apr »
 1234
567891011
12131415161718
19202122232425
262728293031  

Comparing xISBN and thingISBN

Posted by: William Denton, 23 March 2007 7:51 am
Categories: Implementations, LibraryThing, OCLC

I whipped up a little Ruby script to compare results from LibraryThing’s thingISBN and OCLC’s xISBN. (Tim Spalding of LibraryThing does some comparisons in his announcement of thingISBN, which is where I linked. He’d even added an option to thingISBN so it would return xISBN results as well, but OCLC put the kaibosh on that.)

#!/usr/local/bin/ruby

# Use thingISBN and xISBN and put their answers together to get
# the most ISBNs of other manifestations of a work, given the ISBN of
# one manifestation of said work. Eliminate duplicates.

# Change the ISBN to anything you want. 

# Richard Pevear's new translation of THE THREE MUSKETEERS by Alexandre Dumas.
isbn = '0670037796'

# Oxford Classics edition that WorldCat has only one other manifestation for
# isbn = '0192835750'

# Anthony Powell, BOOKS DO FURNISH A ROOM, Fontana pb
# isbn = '0006130879'

# Charles Willeford, THE BURNT ORANGE HERESY, Black Lizard
# isbn = '0887390250'

require 'net/http'

require 'rubygems'
require 'xmlsimple'

puts "Finding manifestations of #{isbn} ..."

# First, get data from thingISBN at LibraryThing

thingURL = "http://www.librarything.com/api/thingISBN/"

url = thingURL + isbn
xml_data = Net::HTTP.get_response(URI.parse(url)).body

data = XmlSimple.xml_in(xml_data)

thingISBNs = []

data['isbn'].each do |i|
  thingISBNs << i
  # puts "thingISBN: #{i}"
end

# Next, get data from xISBN at OCLC

xISBNURL = "http://xisbn.worldcat.org/webservices/xid/isbn/"

url = xISBNURL + isbn + "?method=getEditions&format=xml"
xml_data = Net::HTTP.get_response(URI.parse(url)).body

data = XmlSimple.xml_in(xml_data)

xISBNs = []

data['isbn'].each do |i|
  xISBNs << i
  # puts "xISBN: #{i}"
end

allISBNs = (thingISBNs + xISBNs).uniq

xNotThing = []
thingNotX = []

allISBNs.each do |isbn|
   xNotThing << isbn if xISBNs.include?(isbn) and not thingISBNs.include?(isbn)
   thingNotX << isbn if thingISBNs.include?(isbn) and not xISBNs.include?(isbn)
end

puts " Known to thingISBN: #{thingISBNs.size} (#{thingNotX.size} of which not kn
own to xISBN)"
puts " Known to     xISBN: #{xISBNs.size} (#{xNotThing.size} of which not known
to thingISBN)"

puts "              Total: #{allISBNs.size}"

# Print ISBNs known to LibraryThing but not xISBN.
# thingNotX.sort.each do |isbn|
#   puts isbn
# end

I ran it on that first ISBN, the new and reportedly excellent Pevear translation of Dumas’s The Three Musketeers, and got this:

Finding manifestations of 0670037796 ...
 Known to thingISBN: 109 (52 of which not known to xISBN)
 Known to     xISBN: 226 (169 of which not known to thingISBN)
              Total: 278

“I say, what’s this?” I ejaculated, because I had just finished a P.G. Wodehouse novel. I’d imagined that xISBN, emerging as it does from OCLC’s vast WorldCat, made up of catalogue information from libraries all over (mostly from the United States), would vastly outnumber thingISBN, which draws on the work groupings done by users of LibraryThing, which, granted, is globally popular. But for this manifestation of this work, thingISBN knew of 109 manifestations of the same work (including the one I’d specified, so that’s 108 others), and 52 weren’t known to xISBN. xISBN, in turn, knew of 226 manifestations, 169 of which weren’t known to thingISBN. 278 different manifestations are known between the two.

So xISBN does outnumber thingISBN, but thingISBN knows about 52 manifestations that xISBN doesn’t! Is it because they aren’t in WorldCat, or because the work-grouping algorithm didn’t catch them?

I didn’t check this for all of the 52, but I did try it on my Oxford Classics edition of The Three Musketeers:

Finding manifestations of 0192835750 ...
 Known to thingISBN: 109 (108 of which not known to xISBN)
 Known to     xISBN: 1 (0 of which not known to thingISBN)
              Total: 109

This shows that this manifestation is one of the ones thingISBN has grouped into the work of The Three Musketeers, but xISBN thinks it stands alone. However, when you look it up at WorldCat, you find it’s been grouped with a 1956 edition from Longman’s (look under the Editions tab). I’m not sure what’s going on here but it seems odd. I expect both OCLC sources to agree.

(Conversely, I didn’t check the 169 manifestations that xISBN knows about that thingISBN doesn’t, so I don’t know if they’re not in LibraryThing at all or if they are but haven’t been grouped.)

Anthony Powell’s Books Do Furnish a Room has been published in a number of editions, and thingISBN wins for my Fontana paperback:

Finding manifestations of 0006130879 ...
 Known to thingISBN: 8 (7 of which not known to xISBN)
 Known to     xISBN: 1 (0 of which not known to thingISBN)
              Total: 8

Looking for other manifestations of Charles Willeford’s The Burnt Orange Heresy (an excellent noir crime novel about modern art — Willeford’s one of the great American writers of the twentieth century) gave me the sorts of results I’d expected in the first place, with thingISBN’s result being a proper subset of xISBN’s. This is for my Black Lizard edition:

Finding manifestations of 0887390250 ...
 Known to thingISBN: 3 (0 of which not known to xISBN)
 Known to     xISBN: 7 (4 of which not known to thingISBN)
              Total: 7

Upshot: If you have an ISBN in hand and want to find ISBNs of other manifestations of the same work, use both thingISBN and xISBN.


10 Comments »

  1. have you seen the xisbn library on rubyforge? http://rubyforge.org/projects/xisbn/

    It’s available as a gem, and has functions to query both xisbn and thingISBN. Both functions return the list of ISBNs as a simple array.

    Comment by James — 23 March 2007 @ 8:13 am
  2. One of the reasons why the results of xISBN may not match what you see in WorldCat.org is that we force the clusters in xISBN to be consistent: any of the ISBNs in a cluster can be used to retrieve exactly that cluster of ISBNs. WorldCat.org doesn’t enforce that, so multiple WorldCat.org works could contain the same ISBN.

    –Th

    Comment by Thom Hickey — 23 March 2007 @ 2:09 pm
  3. LibraryThing has that issue too. At present, the big feed distinguishes between the primary work for an ISBN and the others–that is, whether to allow an ISBN to cross works. The API does not yet.

    Comment by Tim — 23 March 2007 @ 2:58 pm
  4. http://www.librarything.com/thingology/2007/03/xisbn-and-thingisbn-compared.php

    Comment by Tim — 23 March 2007 @ 3:46 pm
  5. Hypothesis: preponderance of paperback editions in librarything (i.e., purchased by individuals); records contributed to OCLC tip the scale in [more-expensive] hardcover editions (purchased by libraries for longer-life circulation), not likely found in librarything;
    Hypothesis: scholarly works (not normally issued in paperback) more likely known in xISBN clusters than thingISBN; scholarly works having also smaller clusters of ISBNs (fewer editions ever published)
    Hypothesis: literary works (examples above) will have much higher degree of ISBN cluster in thingISBN (reflection of many many paperback editions published and purchased by individuals).

    Mia

    Comment by Mia Massicotte — 23 March 2007 @ 6:30 pm
  6. I think those are all true.

    Comment by Tim — 24 March 2007 @ 3:44 pm
  7. Thom’s explanation strikes me as the most plausible (duh :-) ; the xISBN workset algorithm is vulnerable to duplicate worldcat records, where one of the records has sufficient errors to prevent title and author from matching. My favourite example is ISBN 0838934854

    It might be very interesting to take a sample of thingISBN worksets whose manifestations are in worldcat, but which don’t get grouped by xISBN, and see if there are any obvious patterns of errors.

    It’d also be interesting to see if ignoring some worldcat records would yield tighter results. For example, if there are two records with the same ISBN, with one present in 10,000 libraries, and the other present in 3, the latter could be discarded.

    Simon

    Comment by Simon Spero — 25 March 2007 @ 3:53 pm
  8. Rather than one or the other “missing” something, could it reflect different evaluations of whether a manifestation belongs to the work or not?

    Even if we hypothetically imagine two different people (or communities) manually creating work-sets for a small enough corpus that every one could be individually examined and judged–different judgements could be made. But it’s hard to say what is “error” and what is “difference of opinion” in this case. Anyone have any light to shed?

    Comment by Jonathan Rochkind — 27 March 2007 @ 11:43 am
  9. I guess I’m seconding Jonathan’s comment. There’s a big difference in how these two sets of records have been merged. OCLC uses only the metadata record to determine “work-ness,” and the process is purely algorithmic. LibraryThing is allowing humans to make that judgment based on their (presumed) knowledge of the contents of the books. Since the FRBR definition of work is rather fluid, you could call either of these “FRBR’ized.” Or neither. And either of them could have some wrongly grouped items: OCLC, because the metadata records may not have sufficient data to make the analysis, and LibraryThing because humans may have different ideas on what makes something “the same work.”

    I am very interested in seeing what kind of algorithmic de-duping and grouping will be possible when we have a large corpus of books in full text. It will probably be possible to measure “sameness” but what will really be interesting will be the degrees of difference — the fuzzy set of “Romeo and Juliet,” in other words. There is some hint that Google is looking at this, or something similar.

    Comment by Karen Coyle — 30 March 2007 @ 12:13 pm
  10. [...] Comparing xISBN and ThingISBN (FRBR Blog) [...]

    Pingback by Libology Blog » ISBN-UPC-EAN Lookups — 23 June 2009 @ 7:06 pm

Comments RSSTrackBack URI

Leave a comment