A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

March 2007
M T W T F S S
« Feb   Apr »
 1234
567891011
12131415161718
19202122232425
262728293031  

Comparing xISBN and thingISBN (4): Fiction

Posted by: William Denton, 30 March 2007 7:11 am
Categories: Implementations, LibraryThing, OCLC

Today I compare what OCLC’s xISBN and LibraryThing’s thingISBN know about the fiction in my personal library. I have everything catalogued and stored in a MySQL database so it wasn’t hard to whip up a script to query the database, pull the ISBNs, and then run through each one and see what results the two services gave back. Remember, given an ISBN, they’ll give you a list of ISBNs of other manifestations (editions) of the same work. xISBN does this based on algorithms that OCLC people run on the enormous WorldCat database. thingISBN does this based on which books LibraryThing users have decided are two different editions of the same thing.

982 of my fiction books (in which I include plays and poetry) have ISBNs. Of the 982, xISBN’s results were greater for 520 and thingISBN’s were greater for 276. In 398 cases it was necessary to combine and de-dupe the results to get the best answers: that is, both services knew about some ISBNs the other one didn’t, so to get the most complete coverage I combined the sets of ISBNs and tossed out any duplicates. 398 / 982 = 41%.

xISBN knew about all of my books, but thingISBN had never seen 94 of them. The most-manifested book unknown to LibraryThing is K.C. Constantine’s The Man Who Liked to Look at Himself, the second in the great series of crime novels about Mario Balzic and Rocksburg, Pennsylvania. xISBN knows ten different manifestations of it.

Herewith, the books work for which the combined and de-duped total (the x+t column is over 200. The t is thingISBN’s results and x is xISBN’s.

x+t   t   x
715 170 642 0140449094 Don Quixote (Cervantes)
647 203 576 0140350160 Treasure Island (Stevenson)
519 213 442 0486280616 Adventures of Huckleberry Finn (Twain)
451 241 353 0192833553 Pride and Prejudice (Austen)
423 189 333 0670821624 The Odyssey (Homer)
402 195 330 014043237X Frankenstein (Shelley)
386 168 314 0192815989 Dracula (Stoker)
353 126 239 0048231134 The Two Towers (Tolkien)
320 152 263 0140366857 The Wind In the Willows (Grahame)

There are a lot of editions of those books! 715 of Don Quixote! Nothing surprising in the results, except that The Two Towers is alone of the three books in The Lord of the Rings. See below for more on that. One other Stevenson is in this list, and I’m sure The Iliad, Tom Sawyer, and other Austens would be too, if I owned them.

x+t   t   x
314 284  36 0393099776 Alice in Wonderland (Carroll)

So far xISBN has had the higher numbers but now it really seems to be letting the side down, but perhaps it’s not simple: my copy is a Norton Critical Edition that includes essays and commentary. It’s not a manifestation of the work Alice in Wonderland, it’s a manifestation of a newer work that contains Alice in Wonderland and a number of derivative works. Complicated. I can see why LibraryThing users grouped it together with all other editions of the book. However, by the FRBR model, it’s a separate work. Perhaps xISBN gives such a low number because it’s keeping it apart from the others. I didn’t check.

x+t   t   x
304 137 240 0486410250 Anne of Green Gables (Montgomery)
300 136 233 0192828398 Twenty Thousand Leagues Under the Sea (Verne)
294 140 217 0452269695 The Essential Dr. Jekyll and Mr. Hyde (Stevenson)

The last book there is actually a similar thing to Alice in Wonderland: it’s “The Definitive Annotated Edition of Robert Louis Stevenson’s Classic Novel.” Both services group it together with the regular editions. In FRBR terms, it’s a separate work.

x+t   t   x
293 122 225 0330242407 The Jungle Book (Kipling)
280 111 226 0670037796 The Three Musketeers (Dumas)
280 111 226 0192827510 The Three Musketeers (Dumas)
278 136 206 0192830937 Around the World in 80 Days (Verne)
271  81 240 0140444300 Les Miserables (Hugo)
233  91 190 034547242X The Hunchback of Notre Dame (Hugo)

Two editions of The Three Musketeers are grouped together at both places and recognized as being the same work. That’s good. thingISBN shows lower numbers than I’d expect for Les Miserables and The Hunchback of Notre Dame. Perhaps they’re just a bit less popular with its users so there aren’t as many manifestations to group.

x+t   t   x
221 217   5 0048231541 The Hobbit (Tolkien)

Problem at xISBN! This is a 1970s Unwin paperbacks edition, and there’s nothing special about it. By some mistake it’s not getting clustered with the hundreds of other editions WorldCat knows about.

x+t   t   x
201  98 157 0140126708 Animal Farm (Orwell)

That’s the last of the fiction in my collection that has over 200 different manifestations. To go back a bit, what about the other two books in The Lord of the Rings?

x+t   t   x
154 150   5 004823155X The Fellowship of the Ring (Tolkien)
130 126   5 0048231576 The Return of the King (Tolkien)

xISBN is preventing two of my three Unwin LOTRs, and The Hobbit, from clustering with other manifestations of the same works! Why does it handle my The Two Towers properly, but not the others? I have no idea. thingISBN groups them all properly.

A solution to this problem is what I’m going to call superduping. (I mentioned this a couple of days ago, and OCLC’s Xiaoming Liu suggested it too.) If thingISBN’s and xISBN’s results differ by, say, an order of magnitude, or there’s something that leads you to think one of them is missing a lot of ISBNs, or whenever you want the absolute maximum number of related ISBNs, try running through ISBNs from one set of results and query the other service about them until you have merged all possible sub-clusterings.

For example, for The Hobbit, where thingISBN said 217, and xISBN said 5, pick one of the 217 that’s not in xISBN’s 5 and query xISBN with it. You’ll probably get back hundreds of results, most of them duplicating thingISBN’s. If thingISBN still knows about numbers that xISBN hasn’t told you about, pick one of them and query xISBN with it. Continue until done, then go the other way and query thingISBN about any results from xISBN that haven’t shown up at thingISBN so far. When finished, you’ll know that all possible clusterings at both services have been joined by you into one big new set. Thus you use each service to correct any failings or lack of knowledge at the other.

Monday: my nonfiction. I’ll give you the top 20 or so, only two of which have over 200 manifestations. After that I’ll do some experiments with superduping (what will the new numbers be for my Tolkiens?) and perhaps I’ll pick out some oddities from all these results.


Ruby gem: xisbn

Posted by: William Denton, 29 March 2007 7:03 am
Categories: LibraryThing, OCLC

Tonight I was working on comparing all my be-ISBNed books at thingISBN and xISBN, but while testing and debugging I hit xISBN’s daily limit and had to stop. They very quickly fixed that but I’m going to delay a day in posting the results. The numbers were looking very interesting, both for how xISBN and thingISBN compared and for how many manifestations there are of books like Don Quixote and Treasure Island.

Instead, today I give a pointer to something that may help people who use the programming language Ruby.

A helpful commenter named James left a note last Friday with a pointer to the xisbn Ruby gem written by Ed Summers. Thank you, James! Thank you, Ed! If you’re a Ruby hacker doing anything with xISBN or thingISBN, it’ll be handy. You can say

require 'xisbn'
include XISBN
xs = xisbn('0394821998')
things = thing_isbn('0812548345')

You’ll get back arrays of ISBNs from both services. Run gem install xisbn to install it.

If you know of anything similar for other languages, please leave a comment or drop me a note.


Comparing xISBN and thingISBN (3)

Posted by: William Denton, 28 March 2007 7:31 am
Categories: Implementations, LibraryThing, OCLC

Today I’m comparing how some of my mathematics books fare in LibraryThing’s thingISBN and OCLC’s xISBN services. Given an ISBN, they each return a list of ISBNs of other manifestations (that is, editions) of the same work. Other manifestations that they know about. Of course, if they don’t know about a book, or don’t think it matches with any others, or in LibraryThing’s case the users haven’t grouped it, they won’t have anything to say about it.

Here’s a table showing the results. Each book takes up two rows. Yes, the formatting is a bit ugly, but you can bear it. The top row has the title and author. On the second row are some numbers. The first is the combined and de-duped count of how many ISBNs both thingISBN and xISBN know about. Next is the thingISBN count, then the xISBN count, then the count taken from WorldCat’s Editions tab. (WorldCat’s numbers will never be greater than 25, because 25 is the limit of results it will show.) xISBN and WorldCat’s Editions tab are both from OCLC, but their sources aren’t always in sync. Follow the links to see the raw results.

Some things to notice about the list:

  • These books tend to the academic side of things, but some are quite popular. (As math books go.)
  • Most of them are paperback. University libraries would more likely have them in hardcover, however, xISBN is bound to do a good job of grouping the two together.
  • No-one on LibraryThing has my old Linear Algebra textbook. It’s probably not in use in first-year algebra courses now. My edition of Flatland is completely unknown to thingISBN, which is very surprising. No-one there has Mathematics and the Imagination either. My edition is a Penguin paperback, and I see it in used bookstores occasionnally. The latter two results are unexpected.
  • thingISBN has a 28 count for my edition of Gödel, Escher, Bach, but xISBN doesn’t know about any others. xISBN is failing, or missing something.
  • Forever Undecided by Raymond Smullyan (mine is a trade paperback) gets a 5 at thingISBN but just a 2 at xISBN. I imagine it’s in a lot of libraries, though.
  • My two volumes of Heath’s translation of Euclid give confusing results. Volume 1 isn’t matched up with other editions at either place. Volume 2 gets a 19 from xISBN, but has no companions at thingISBN. Strange. Is it something to do with being Dover reprints? All of the 0-486 books are from Dover, who do a great job of reprinting old math books. Perhaps it’s because Euclid’s Elements has a confusing printing history.
  • Gödel’s Proof by Nagel and Newman is a classic, and thingISBN gives an 8, but xISBN only 1. I’m sure it’s widely held in many libraries and personal collections, so xISBN is failing or missing something.
  • Most things that aren’t extreme cases or probable misses or mistakes do well at both places. For example, Bertrand Russell’s Introduction to Mathematical Philosophy and Boolos and Jeffrey’s Computability and Logic, both old textbooks and classics in their fields, do about equally well.
  • I’m a bit surprised by the number of cases where thingISBN knows more than xISBN.
x+t   t   x  WC
Alan Turing: The Enigma (Hodges)
  8   8   1   4 0099116413
On Numbers and Games (Conway)
  2   2   2   4 0121863506
Elementary Differential Equations with Applications (Penney and Edwards)
  9   6   5   6 0132541297
Linear Algebra (Insel, Spence, and Friedberg)
  6   0   6   8 0135370191
Gödel, Escher, Bach: An Eternal Golden Braid (Hofstadter)
 28  28   1   0 0140055797
Mathematics and the Imagination (Kasner and Newman)
  4   0   4  20 0140803882
Forever Undecided: A Puzzle Guide to Gödel (Smullyan)
  5   5   2   3 0192821962
Reflections on Kurt Gödel (Wang)
  2   2   1   0 0262730871
The Fifty-Nine Icosahedra (Coxeter et al)
  3   2   3   1 038790770X
Differential Equations and Their Applications (Braun)
  9   4   7   7 0387908064
Uses of Infinity (Zippin)
  3   0   3   4 0394015630
The Universal History of Numbers (Ifrah)
  9   9   2   2 0471375683
Introduction to Mathematical Philosophy (Russell)
  8   6   6  23 0486277240
The Thirteen Books of Euclid's Elements (v 1) (Euclid and Heath)
  1   1   1   8 0486600882
The Thirteen Books of Euclid's Elements (v 2) (Euclid and Heath)
 19   1  19  25 0486600890
On Formally Undecidable Propositions (Gödel)
  1   1   1   4 0486669807
Proofs and Refutations (Lakatos)
  4   2   4   8 0521290384
Philosophy of Mathematics (Putnam and Benacerraf)
  4   4   2   2 052129648X
Computability and Logic (Boolos and Jeffrey)
  8   7   8   6 0521389232
Flatland (Abbott)
 28   0  28  25 0631029605
The Man Who Knew Infinity (Kanigel)
  4   4   3   8 0684192594
Godel's Proof (Nagel and Newman)
  8   8   1   2 0710070780
Geometry Revisited (Coxeter and Greitzer)
  3   2   2   3 088385600X
Calculus (Spivak)
  5   5   4   6 0914098772
x+t   t   x  WC

It would be interesting, though somewhat onerous, to do a more in-depth project comparing thingISBN and xISBN, perhaps by comparing results for random samples of different kinds of books from different kinds of libraries. This would tell us something about how well xISBN works and what sorts of books LibraryThing users have and how well they’ve made their clusters. On the other hand, if you’re actually implementing something and need the best results, the same holds true as yesterday: use both.

Upshot of this comparison based on a small sample of my math books: Sometimes xISBN misses manifestations that must be there; something about the data or its algorithm stops it from doing the clustering. Sometimes thingISBN doesn’t know anything about a given book. For best results, combine and de-dupe results from both services.

Tomorrow: who knows more about all the books in my library? Summary results only! No big table.

(Slightly edited after first posting.)


Comparing xISBN and thingISBN (2)

Posted by: William Denton, 27 March 2007 7:32 am
Categories: Implementations, LibraryThing, OCLC

Last week I posted Comparing xISBN and thingISBN, where I did a quick informal look at how the two services handled four books from my collection. The comments were interesting. Mia Massicotte speculated that paperbacks and fiction will probably do better in thingISBN and hardcovers and scholarly books will probably do better in xISBN. Today and tomorrow I’m doing some more comparisons to test this. Nothing scientific, just a bit of poking around with a few sets of books to see if any patterns emerge.

Today I do some paperback fiction and some picture books. Tomorrow I’ll do some mathematics books, most of which are fairly academic. Thursday I’ll try to run all my books through the services and post some aggregate numbers on who knows more.

In the result sets, you’ll see four columns of numbers at the left. x+t is first, but let me explain it third. t is the result count from thingISBN. x is the result count from xISBN. x+t is the count of the combined and de-duped results from both; that is, the two sets of ISBNs are put together and any duplicates removed. (Hence, this will always be equal to or greater than the greater of the thingISBN and xISBN result counts.)

WC is the count from WorldCat’s Editions tab. WorldCat never displays more than 25 other manifestations of a work, so this number will never be over 25. I asked Thom Hickey why xISBN and WorldCat’s Editions tabs sometimes showed different numbers and he said that the two systems get their clustering data from two different sources that may not be synchronized. They’re continuing to work on algorithms and the xISBN implementation so I expect both xISBN and the WorldCat numbers to get more accurate. For now, though, they can sometimes be quite different, which is interesting, so I’ve included them.

These books are all from my own library. I wrote a Ruby script to query my collection database and then check with LibraryThing and OCLC. My paperback fiction subjects here aren’t of Stephen King, Dan Brown, or Danielle Steel’s level of popularity, but I don’t have any of their books. I grabbed two sets of novels by writers I thought would give interesting results. Next time I might check George MacDonald Fraser, Kim Stanley Robinson, and Donald E. Westlake. They’re all still publishing today. Come to think of it, they’d probably be better subjects than the two I picked, but it’s too late now.

First, some paperback novels by Geoffrey Household. Rogue Male is certainly his best known book, and one of the best and most unusual thrillers of the past century. xISBN knows about 11 manifestations in a set including the one I have, thingISBN knows about 6, and between them they know about 13 different ones. WorldCat’s Editions tab matches two manifestations, and shows the one I have and one other. Household’s other novels are less popular and the numbers show that they’ve been printed in few manifestations. xISBN knows more about them than thingISBN does.

x+t   t   x  WC
  4   1   4   4   0140048359 Hostage: London
  5   2   5   7   0140052739 The Last Two Weeks of George Rivac
  3   0   3   6   0140045228 Red Anger
  3   1   3   4   0140068538 Rogue Justice
 13   6  11   2   0140006958 Rogue Male
  4   0   4  10   0140022732 A Rough Shoot

Next, here are books, almost all paperbacks, by John D. MacDonald, one of the greats of the paperback original era who didn’t get into hardcover originals until the early 1970s. His Travis McGee series for Fawcett Gold Medal was massively popular. These are the JDMs for which I have ISBNs. His early books came out before ISBNs were invented. I think they’ve all been reprinted during the ISBN era, so more recent editions have them, but a few of my copies are too early. I skipped them to save time.

You can see that I have two different manifestations (paperback and hardcover) of both Cinnamon Skin and One More Sunday. thingISBN and xISBN’s numbers for them all match up, which shows that they correctly group both of my manifestations together as being the same work.

All of the books with colours in the title, such as Free Fall in Crimson and The Lonely Silver Rain, are in the McGee series and have been reprinted many times. It’s not surprising to see double-digit numbers for most of them. Darker Than Amber and Nightmare in Pink are unusual: xISBN doesn’t know about any other matching manifestations, but thingISBN does. Seems odd. thingISBN wins there. For most of the others, xISBN has a slight edge, but both know about some that the other doesn’t.

WorldCat’s Editions tab usually groups more together than xISBN does, such as for Cinnamon Skin, where it groups 15 manifestations to xISBN’s 4 (and thingISBN’s 6).

x+t   t   x  WC
  3   3   1   0   0449129578 All These Condemned
  1   0   1   0   044902380X Ballroom of the Skies
  8   3   8  11   0449131793 Barrier Island
  3   3   2   6   0449137147 Border Town Girl
  3   3   1   0   0449141411 The Brass Cupcake
  4   3   3   7   0449141063 A Bullet for Cinderella
  7   6   4  15   0060149906 Cinnamon Skin
  7   6   4  15   044912505X Cinnamon Skin
  2   0   2   3   0449123596 Clemmie
  2   2   1   0   0449134296 Cry Hard, Cry Fast
 10  10   1   0   0449127524 Darker Than Amber
 14  11   8  19   039701032X A Deadly Shade of Gold
  4   0   4   8   0449143236 Death Trap
  3   1   3   4   0449140164 The Deceivers
 17  11  14  18   0449141497 The Empty Copper Sea
  4   4   1   1   0449140598 The Executioners
 17  10  15  16   0449144410 Free Fall in Crimson
 16  11  11  16   0449129152 The Girl in the Plain Brown Wrapper
 16  10  15  18   0449123995 The Green Ripper
  1   0   1   0   0449024814 A Key to the Suite
  8   6   7  11   0449125092 The Lonely Silver Rain
  8   8   1   0   0449129659 The Long Lavender Look
  2   2   1   0   0449129667 A Man of Affairs
  3   2   3   3   0449136027 Murder in the Wind
 10  10   1  21   0449133125 Nightmare in Pink
  8   4   8   9   044920703X One More Sunday
  8   4   8   9   0394536738 One More Sunday
  3   3   2   8   0449140806 Please Write for Details
  1   1   1   0   0881840114 Two

Upshot of paperback fiction: Seems like more often than not xISBN has the edge, but sometimes thingISBN knows more. Sometimes xISBN will fail to group your manifestation with others and give a misleading answer. For best results, combine them.

Next, some picture books. The Denton ones are by Kady MacDonald Denton, my mother. They’ve come out in hardcover, paperback, and often come out in a fresh edition a few years later. (All are excellent and I highly recommend them!) Most have been translated into several other languages, but that wouldn’t show up here. The two manifestations each of A Second is a Hiccup and Two Homes are hardcover and paperback; xISBN groups them but thingISBN hasn’t seen both. Le carrousel is the French version of A Second is a Hiccup but it’s alone. For these books, xISBN definitely knows more.

The Flack/Wiese and McCloskey classics are odd because for my editions of The Story About Ping and Make Way for Ducklings, xISBN doesn’t group them with the dozens of other manifestations. If it did, you’d see higher numbers for it than for thingISBN, as is true for Blueberries for Sal. More xISBN oddness, or a grouping failure.

Upshot: In general xISBN knows more about children’s books than thingISBN. However, in some cases xISBN will fail to group your manifestation with others. As usual, group both sets of results together.

x+t   t   x  WC
  3   0   3   4   1550745549 A Child's Treasury of Nursery Rhymes (Denton)
  2   0   2   2   0416130127 The Christmas Boot (Denton)
  6   0   6   5   0744514401 Granny is a Darling (Denton)
  6   1   6   2   0753452243 In the Light of the Moon and Other Bedtime Stories (Denton and McBratney)
  1   0   1   1   0439974011 Le carrousel: Un poeme sur l'enfance (Denton and Hutchins)
  4   1   4   3   0439949033 A Second is a Hiccup: A Child's Book of Time (Denton and Hutchins)
  4   0   4   3   0439974003 A Second is a Hiccup: A Child's Book of Time (Denton and Hutchins)
  4   0   4   4   0744589258 Two Homes (Denton and Masurel)
  4   2   4   4   0763605115 Two Homes (Denton and Masurel)
 10  10   1   1   0140502416 The Story About Ping (Flack and Wiese)
 19   8  15  25   014050169X Blueberries for Sal (McCloskey)
 21  21   1   7   0140501711 Make Way for Ducklings (McCloskey)

A few points:

  • For best results, check both xISBN and thingISBN, and combine and de-dupe the results.
  • xISBN usually knows more, but sometimes gives back strange results.
  • For a possibly more expansive, though more resource-intensive, set of matching manifestations, form a new set of ISBNs by taking the first results from thingISBN and looking up each ISBN in turn at xISBN, and taking the first results from xISBN and looking up each ISBN in turn at thingISBN. That is, if thingISBN give a result count of 4 ISBNs and xISBN gives 6, look up each of thing’s 4 at xISBN and each of x’s 6 at thing. Form a new set of all the ISBNs returned, and de-dupe. Perhaps thingISBN groups two ISBNs that xISBN has in two different clusters, or vice versa. This would get around the strange behaviour shown above where xISBN only returns 1 result for Make Way for Ducklings: you’d have 20 fresh ISBNs from thingISBN to use when re-searching xISBN.
  • xISBN draws on WorldCat, which is made up of data from libraries all over the United States, and many from elsewhere around the world. Libraries do buy a lot of books, and in lots of different editions, and they’ve been doing so for decades. WorldCat’s database is huge, and I wouldn’t underestimate its holdings of any kind of book, be it cheap paperback or expensive academic text.
  • On the other hand, LibraryThing’s results are damned impressive. I also wouldn’t underestimate its holdings.
  • Its children’s book numbers are low, however. The Kady MacDonald Denton results above make me suspect that it won’t do well on children’s books that have not yet become classics that adults buy for themselves. How many LibraryThing users who are parents catalogue their children’s picture books? And how do those ownership numbers compare to the number of picture books they borrow from the library?
  • What about pre-ISBN books? I may test some of them.

de Oliveira Lima, An Adaptation of the FRBR Model to Legal Norms

Posted by: William Denton, 26 March 2007 7:53 am
Categories: Papers

João Alberto de Oliveira Lima sent me an e-mail telling me about a paper he wrote, “An Adaptation of the FRBR Model to Legal Norms,” which is available in Proceedings of the V Legislative XML Workshop (but not available on the web). He adds:

The FRBR model offers an excellent framework to deal with legal texts. In legal domain, we’ve a lot of derivations due to the constant amendments of normative acts. The application of the FRBR to legal norms is beyond the generic tasks (find, identify, select and obtain) of a catalog. It will be applied as a backbone to structure legal information systems.

The paper’s idea was well accepted in the workshop and now is a reality in the draft of the CEN METALEX standard and is influencing other projects like the Italian Norme in Rete, the Akoma Ntoso Project and the Brazilian LexML Project.

(Everyone is welcome to send me notices of publications, conference sessions, implementations, etc., and I’ll mention them here.)


Comparing xISBN and thingISBN

Posted by: William Denton, 23 March 2007 7:51 am
Categories: Implementations, LibraryThing, OCLC

I whipped up a little Ruby script to compare results from LibraryThing’s thingISBN and OCLC’s xISBN. (Tim Spalding of LibraryThing does some comparisons in his announcement of thingISBN, which is where I linked. He’d even added an option to thingISBN so it would return xISBN results as well, but OCLC put the kaibosh on that.)

#!/usr/local/bin/ruby

# Use thingISBN and xISBN and put their answers together to get
# the most ISBNs of other manifestations of a work, given the ISBN of
# one manifestation of said work. Eliminate duplicates.

# Change the ISBN to anything you want. 

# Richard Pevear's new translation of THE THREE MUSKETEERS by Alexandre Dumas.
isbn = '0670037796'

# Oxford Classics edition that WorldCat has only one other manifestation for
# isbn = '0192835750'

# Anthony Powell, BOOKS DO FURNISH A ROOM, Fontana pb
# isbn = '0006130879'

# Charles Willeford, THE BURNT ORANGE HERESY, Black Lizard
# isbn = '0887390250'

require 'net/http'

require 'rubygems'
require 'xmlsimple'

puts "Finding manifestations of #{isbn} ..."

# First, get data from thingISBN at LibraryThing

thingURL = "http://www.librarything.com/api/thingISBN/"

url = thingURL + isbn
xml_data = Net::HTTP.get_response(URI.parse(url)).body

data = XmlSimple.xml_in(xml_data)

thingISBNs = []

data['isbn'].each do |i|
  thingISBNs << i
  # puts "thingISBN: #{i}"
end

# Next, get data from xISBN at OCLC

xISBNURL = "http://xisbn.worldcat.org/webservices/xid/isbn/"

url = xISBNURL + isbn + "?method=getEditions&format=xml"
xml_data = Net::HTTP.get_response(URI.parse(url)).body

data = XmlSimple.xml_in(xml_data)

xISBNs = []

data['isbn'].each do |i|
  xISBNs << i
  # puts "xISBN: #{i}"
end

allISBNs = (thingISBNs + xISBNs).uniq

xNotThing = []
thingNotX = []

allISBNs.each do |isbn|
   xNotThing << isbn if xISBNs.include?(isbn) and not thingISBNs.include?(isbn)
   thingNotX << isbn if thingISBNs.include?(isbn) and not xISBNs.include?(isbn)
end

puts " Known to thingISBN: #{thingISBNs.size} (#{thingNotX.size} of which not kn
own to xISBN)"
puts " Known to     xISBN: #{xISBNs.size} (#{xNotThing.size} of which not known
to thingISBN)"

puts "              Total: #{allISBNs.size}"

# Print ISBNs known to LibraryThing but not xISBN.
# thingNotX.sort.each do |isbn|
#   puts isbn
# end

I ran it on that first ISBN, the new and reportedly excellent Pevear translation of Dumas’s The Three Musketeers, and got this:

Finding manifestations of 0670037796 ...
 Known to thingISBN: 109 (52 of which not known to xISBN)
 Known to     xISBN: 226 (169 of which not known to thingISBN)
              Total: 278

“I say, what’s this?” I ejaculated, because I had just finished a P.G. Wodehouse novel. I’d imagined that xISBN, emerging as it does from OCLC’s vast WorldCat, made up of catalogue information from libraries all over (mostly from the United States), would vastly outnumber thingISBN, which draws on the work groupings done by users of LibraryThing, which, granted, is globally popular. But for this manifestation of this work, thingISBN knew of 109 manifestations of the same work (including the one I’d specified, so that’s 108 others), and 52 weren’t known to xISBN. xISBN, in turn, knew of 226 manifestations, 169 of which weren’t known to thingISBN. 278 different manifestations are known between the two.

So xISBN does outnumber thingISBN, but thingISBN knows about 52 manifestations that xISBN doesn’t! Is it because they aren’t in WorldCat, or because the work-grouping algorithm didn’t catch them?

I didn’t check this for all of the 52, but I did try it on my Oxford Classics edition of The Three Musketeers:

Finding manifestations of 0192835750 ...
 Known to thingISBN: 109 (108 of which not known to xISBN)
 Known to     xISBN: 1 (0 of which not known to thingISBN)
              Total: 109

This shows that this manifestation is one of the ones thingISBN has grouped into the work of The Three Musketeers, but xISBN thinks it stands alone. However, when you look it up at WorldCat, you find it’s been grouped with a 1956 edition from Longman’s (look under the Editions tab). I’m not sure what’s going on here but it seems odd. I expect both OCLC sources to agree.

(Conversely, I didn’t check the 169 manifestations that xISBN knows about that thingISBN doesn’t, so I don’t know if they’re not in LibraryThing at all or if they are but haven’t been grouped.)

Anthony Powell’s Books Do Furnish a Room has been published in a number of editions, and thingISBN wins for my Fontana paperback:

Finding manifestations of 0006130879 ...
 Known to thingISBN: 8 (7 of which not known to xISBN)
 Known to     xISBN: 1 (0 of which not known to thingISBN)
              Total: 8

Looking for other manifestations of Charles Willeford’s The Burnt Orange Heresy (an excellent noir crime novel about modern art — Willeford’s one of the great American writers of the twentieth century) gave me the sorts of results I’d expected in the first place, with thingISBN’s result being a proper subset of xISBN’s. This is for my Black Lizard edition:

Finding manifestations of 0887390250 ...
 Known to thingISBN: 3 (0 of which not known to xISBN)
 Known to     xISBN: 7 (4 of which not known to thingISBN)
              Total: 7

Upshot: If you have an ISBN in hand and want to find ISBNs of other manifestations of the same work, use both thingISBN and xISBN.


Draft of RDA Chapter 3

Posted by: William Denton, 22 March 2007 7:13 am
Categories: RDA

The Joint Steering Committee for the Revision of AACR announced yesterday that they have made available the revised draft of RDA Part A, Chapter 3 (558 KB PDF). It’s 150 pages long but some FRBR material will jump out at on page four:

Alignment with FRBR

As agreed at the October 2006 JSC meeting, the elements covered in chapter 3 have been aligned more directly with the corresponding attributes of the manifestation as defined in FRBR. The realignment of instructions in chapter 3 has also resulted in the following FRBR attributes of manifestation being treated as separate elements: generation, foliation, font size, and reduction ratio.

To improve FRBR alignment, instructions on recording information relating to illustrative matter, duration and tactile systems of notation have been transferred to chapter 4 (Content), and these are included in the attached chapter 4 addendum.

To more closely reflect the FRBR attributes they relate to, the following instructions, included in the previous draft of chapter 3, will be transferred to other chapters:

  • instructions on recording information relating to mode of access (transferred to chapter 5 Terms of availability, etc.).
  • instructions on recording information relating to accompanying material and instructions on making notes on other formats available (transferred to chapter 7
    Related resources).

At the October 2006 JSC meeting it was agreed to use the phrase “considered to be important for identification or selection” in instructions where needed, so that the need to support FRBR user tasks was taken into account by cataloguers when determining whether to record additional information.

On a related note, Will RDA Be DOA? by Roy Tennant in Library Journal has caused some rather warm discussion in cataloguing mailing lists and blogs.


Kemp, Catalog/Cataloging Changes and Web 2.0 Functionality

Posted by: William Denton, 19 March 2007 7:57 am
Categories: Aggregates, Papers

Rebecca Kemp has a paper coming out later this year, but, happily, we can download a copy now: Catalog/Cataloging Changes and Web 2.0 Functionality: New Directions for Serials (883 KB PDF) (The Serials Libarian 53:4).

ABSTRACT. This article presents an overview of some of the important recent developments in cataloging theory and practice and online catalog design. Changes in cataloging theory and practice include the incorporation of the Functional Requirements for Bibliographic Records principles into catalogs, the new Resource Description and Access cataloging manual, and the new CONSER Standard Record. Web 2.0 functionalities and advances in search technology and results displays are influencing online catalog design. The paper ends with hypothetical scenarios in which a catalog, enhanced by the developments described, fulfills the tasks of finding serials articles and titles.

… The paper will be organized into four sections, the first of which will review recent changes in cataloging theory that have yet to be fully developed into cataloging practice, namely, the Functional Requirements for Bibliographic Records (FRBR). Introducing identifiers into serial records in accordance with FRBR entities will allow better collocation of like titles and differentiation between unlike titles. This section will conclude with a view of the potential serial “superwork record.”

(Thanks to Jonathan Rochkind for putting me wise to this. He recommends it.)


Digitized De Revolutionibus

Posted by: William Denton, 17 March 2007 7:34 am
Categories: Uncategorized

A weekend note about Copernicus’s De Revolutionibus, which I’ve mentioned before. I hope to post a couple more things about it, so here’s something for future reference: Octavo make available a digitized copy of De Revolutionibus. Have a look.

I skimmed through a few pages and didn’t see any annotations, but I didn’t look at it all. This item comes from the Warnock Library in Oakland, California, and Gingerich’s Census will give full notes on it. The other books from their collection digitized at Octavo are by Newton, Ben Franklin, Dr. Johnson, Robert Hooke, and others.

The same pictures are available at Rare Book Room but the interface is a bit awkward and it will resize your browser window. Here’s De Revolutionibus at the Rare Book Room.

So, two identical copies of the same digitization of the same item. FRBRously intriguing.


All thingISBN data available in one huge file

Posted by: William Denton, 16 March 2007 7:45 am
Categories: LibraryThing

LibraryThing has done something very useful and generous. As you know, Bob, their thingISBN service is akin to OCLC’s xISBN: give it the ISBN of a book and it will return a list of ISBNs of other editions of the same book. In FRBR terms, we say: give it the ISBN of a manifestation and it will give you a list of ISBNs of manifestations of the same work.

Tim Spalding says in thingISBN Data in One File:

APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.

He doesn’t give a direct link to the full file, probably because it’s 16 MB in size and search engine crawlers would go at it relentlessly, but all you have to do is look for where he gives the URL in plain text. He even gives some sample SQL to help get the stuff into a database. Download it, fool around with it, find new uses for it! Good for LibraryThing.


Next Page »