Comparing xISBN and thingISBN (2)
Last week I posted Comparing xISBN and thingISBN, where I did a quick informal look at how the two services handled four books from my collection. The comments were interesting. Mia Massicotte speculated that paperbacks and fiction will probably do better in thingISBN and hardcovers and scholarly books will probably do better in xISBN. Today and tomorrow I’m doing some more comparisons to test this. Nothing scientific, just a bit of poking around with a few sets of books to see if any patterns emerge.
Today I do some paperback fiction and some picture books. Tomorrow I’ll do some mathematics books, most of which are fairly academic. Thursday I’ll try to run all my books through the services and post some aggregate numbers on who knows more.
In the result sets, you’ll see four columns of numbers at the left. x+t is first, but let me explain it third. t is the result count from thingISBN. x is the result count from xISBN. x+t is the count of the combined and de-duped results from both; that is, the two sets of ISBNs are put together and any duplicates removed. (Hence, this will always be equal to or greater than the greater of the thingISBN and xISBN result counts.)
WC is the count from WorldCat’s Editions tab. WorldCat never displays more than 25 other manifestations of a work, so this number will never be over 25. I asked Thom Hickey why xISBN and WorldCat’s Editions tabs sometimes showed different numbers and he said that the two systems get their clustering data from two different sources that may not be synchronized. They’re continuing to work on algorithms and the xISBN implementation so I expect both xISBN and the WorldCat numbers to get more accurate. For now, though, they can sometimes be quite different, which is interesting, so I’ve included them.
These books are all from my own library. I wrote a Ruby script to query my collection database and then check with LibraryThing and OCLC. My paperback fiction subjects here aren’t of Stephen King, Dan Brown, or Danielle Steel’s level of popularity, but I don’t have any of their books. I grabbed two sets of novels by writers I thought would give interesting results. Next time I might check George MacDonald Fraser, Kim Stanley Robinson, and Donald E. Westlake. They’re all still publishing today. Come to think of it, they’d probably be better subjects than the two I picked, but it’s too late now.
First, some paperback novels by Geoffrey Household. Rogue Male is certainly his best known book, and one of the best and most unusual thrillers of the past century. xISBN knows about 11 manifestations in a set including the one I have, thingISBN knows about 6, and between them they know about 13 different ones. WorldCat’s Editions tab matches two manifestations, and shows the one I have and one other. Household’s other novels are less popular and the numbers show that they’ve been printed in few manifestations. xISBN knows more about them than thingISBN does.
x+t t x WC 4 1 4 4 0140048359 Hostage: London 5 2 5 7 0140052739 The Last Two Weeks of George Rivac 3 0 3 6 0140045228 Red Anger 3 1 3 4 0140068538 Rogue Justice 13 6 11 2 0140006958 Rogue Male 4 0 4 10 0140022732 A Rough Shoot
Next, here are books, almost all paperbacks, by John D. MacDonald, one of the greats of the paperback original era who didn’t get into hardcover originals until the early 1970s. His Travis McGee series for Fawcett Gold Medal was massively popular. These are the JDMs for which I have ISBNs. His early books came out before ISBNs were invented. I think they’ve all been reprinted during the ISBN era, so more recent editions have them, but a few of my copies are too early. I skipped them to save time.
You can see that I have two different manifestations (paperback and hardcover) of both Cinnamon Skin and One More Sunday. thingISBN and xISBN’s numbers for them all match up, which shows that they correctly group both of my manifestations together as being the same work.
All of the books with colours in the title, such as Free Fall in Crimson and The Lonely Silver Rain, are in the McGee series and have been reprinted many times. It’s not surprising to see double-digit numbers for most of them. Darker Than Amber and Nightmare in Pink are unusual: xISBN doesn’t know about any other matching manifestations, but thingISBN does. Seems odd. thingISBN wins there. For most of the others, xISBN has a slight edge, but both know about some that the other doesn’t.
WorldCat’s Editions tab usually groups more together than xISBN does, such as for Cinnamon Skin, where it groups 15 manifestations to xISBN’s 4 (and thingISBN’s 6).
x+t t x WC 3 3 1 0 0449129578 All These Condemned 1 0 1 0 044902380X Ballroom of the Skies 8 3 8 11 0449131793 Barrier Island 3 3 2 6 0449137147 Border Town Girl 3 3 1 0 0449141411 The Brass Cupcake 4 3 3 7 0449141063 A Bullet for Cinderella 7 6 4 15 0060149906 Cinnamon Skin 7 6 4 15 044912505X Cinnamon Skin 2 0 2 3 0449123596 Clemmie 2 2 1 0 0449134296 Cry Hard, Cry Fast 10 10 1 0 0449127524 Darker Than Amber 14 11 8 19 039701032X A Deadly Shade of Gold 4 0 4 8 0449143236 Death Trap 3 1 3 4 0449140164 The Deceivers 17 11 14 18 0449141497 The Empty Copper Sea 4 4 1 1 0449140598 The Executioners 17 10 15 16 0449144410 Free Fall in Crimson 16 11 11 16 0449129152 The Girl in the Plain Brown Wrapper 16 10 15 18 0449123995 The Green Ripper 1 0 1 0 0449024814 A Key to the Suite 8 6 7 11 0449125092 The Lonely Silver Rain 8 8 1 0 0449129659 The Long Lavender Look 2 2 1 0 0449129667 A Man of Affairs 3 2 3 3 0449136027 Murder in the Wind 10 10 1 21 0449133125 Nightmare in Pink 8 4 8 9 044920703X One More Sunday 8 4 8 9 0394536738 One More Sunday 3 3 2 8 0449140806 Please Write for Details 1 1 1 0 0881840114 Two
Upshot of paperback fiction: Seems like more often than not xISBN has the edge, but sometimes thingISBN knows more. Sometimes xISBN will fail to group your manifestation with others and give a misleading answer. For best results, combine them.
Next, some picture books. The Denton ones are by Kady MacDonald Denton, my mother. They’ve come out in hardcover, paperback, and often come out in a fresh edition a few years later. (All are excellent and I highly recommend them!) Most have been translated into several other languages, but that wouldn’t show up here. The two manifestations each of A Second is a Hiccup and Two Homes are hardcover and paperback; xISBN groups them but thingISBN hasn’t seen both. Le carrousel is the French version of A Second is a Hiccup but it’s alone. For these books, xISBN definitely knows more.
The Flack/Wiese and McCloskey classics are odd because for my editions of The Story About Ping and Make Way for Ducklings, xISBN doesn’t group them with the dozens of other manifestations. If it did, you’d see higher numbers for it than for thingISBN, as is true for Blueberries for Sal. More xISBN oddness, or a grouping failure.
Upshot: In general xISBN knows more about children’s books than thingISBN. However, in some cases xISBN will fail to group your manifestation with others. As usual, group both sets of results together.
x+t t x WC 3 0 3 4 1550745549 A Child’s Treasury of Nursery Rhymes (Denton) 2 0 2 2 0416130127 The Christmas Boot (Denton) 6 0 6 5 0744514401 Granny is a Darling (Denton) 6 1 6 2 0753452243 In the Light of the Moon and Other Bedtime Stories (Denton and McBratney) 1 0 1 1 0439974011 Le carrousel: Un poeme sur l’enfance (Denton and Hutchins) 4 1 4 3 0439949033 A Second is a Hiccup: A Child’s Book of Time (Denton and Hutchins) 4 0 4 3 0439974003 A Second is a Hiccup: A Child’s Book of Time (Denton and Hutchins) 4 0 4 4 0744589258 Two Homes (Denton and Masurel) 4 2 4 4 0763605115 Two Homes (Denton and Masurel) 10 10 1 1 0140502416 The Story About Ping (Flack and Wiese) 19 8 15 25 014050169X Blueberries for Sal (McCloskey) 21 21 1 7 0140501711 Make Way for Ducklings (McCloskey)
A few points:
- For best results, check both xISBN and thingISBN, and combine and de-dupe the results.
- xISBN usually knows more, but sometimes gives back strange results.
- For a possibly more expansive, though more resource-intensive, set of matching manifestations, form a new set of ISBNs by taking the first results from thingISBN and looking up each ISBN in turn at xISBN, and taking the first results from xISBN and looking up each ISBN in turn at thingISBN. That is, if thingISBN give a result count of 4 ISBNs and xISBN gives 6, look up each of thing’s 4 at xISBN and each of x’s 6 at thing. Form a new set of all the ISBNs returned, and de-dupe. Perhaps thingISBN groups two ISBNs that xISBN has in two different clusters, or vice versa. This would get around the strange behaviour shown above where xISBN only returns 1 result for Make Way for Ducklings: you’d have 20 fresh ISBNs from thingISBN to use when re-searching xISBN.
- xISBN draws on WorldCat, which is made up of data from libraries all over the United States, and many from elsewhere around the world. Libraries do buy a lot of books, and in lots of different editions, and they’ve been doing so for decades. WorldCat’s database is huge, and I wouldn’t underestimate its holdings of any kind of book, be it cheap paperback or expensive academic text.
- On the other hand, LibraryThing’s results are damned impressive. I also wouldn’t underestimate its holdings.
- Its children’s book numbers are low, however. The Kady MacDonald Denton results above make me suspect that it won’t do well on children’s books that have not yet become classics that adults buy for themselves. How many LibraryThing users who are parents catalogue their children’s picture books? And how do those ownership numbers compare to the number of picture books they borrow from the library?
- What about pre-ISBN books? I may test some of them.