My supderuping experiments were interesting in a few different ways, and I’m still trying some things out and hacking my scripts. I’ll give a few examples over a few days.
First, a brief introduction. Let’s consider Ross Thomas’s novel The Seersucker Whipsaw. My item of this work is an examplar of the 1985 Perennial Library paperback manifestation, which is an embodiment of the author’s final edited text. The ISBN is 0060807288.
If we query thingISBN for 0060807288, we get 3 ISBNs back:
0060807288 0060808497 0446401692
And if we query xISBN for 0060807288, we also get 3 ISBNs back:
0060807288 0060808497 0446401692
The two results sets are identical. Nothing further need be done. That was simple, eh? As far as we can tell, this work has had only three manifestations.
In fact that’s false: these are all paperbacks, respectively from 1985, 1987, and 1992. The first edition was published by Morrow in 1967. Why isn’t it included in the results? It doesn’t have an ISBN! It was published too early to have one. That first ISBNless manifestation is out of luck and won’t show up in any xISBN or thingISBN results.
“That isn’t fair,” I hear you cry. It isn’t. Books that predate International Standard Book Numbers get the cold shoulder from xISBN and thingISBN, which, as you may have noticed from their names, are about ISBNs. “How do we get around that?” I hear you ask. Every work, expression, manifesation, and item will need to have a unique identifier. If one exists (like an ISBN for a manifestation), we can use it. If none exists, we’ll have to make one up and have everyone agree on it. (Or make up several and map them from one to the other.)
For the second example, let’s use another Ross Thomas novel, The Fools In Town Are On Our Side. (The title is from The Adventures of Huckleberry Finn: “Hain’t we got all the fools in town on our side? And ain’t that a big enough majority in any town?”) My item is an examplar of the 2003 St. Martin’s trade paperback reprint, ISBN 0312315821. The first manifestation was published in 1970. It’s one of his best novels and has been reprinted more than The Seersucker Whipsaw.
If we query thingISBN for 0312315821, we get 4 ISBNs back:
0312315821 0380006871 0445405600 0445408677
And if we query xISBN for 0312315821, we get 8 ISBNs back:
0312315821 0340127376 0380006871 0417052502 0445405600 0445405619 0445408677 3548014402
The 4 thingISBN numbers appear in xISBN’s result set. In set theory lingo, one might say that xISBN’s results are a proper superset of thingISBN’s.
If we combine and dedupe the results, we’ll get 8 ISBNs, all the ones from xISBN. What would superduping give us? Might it find more?
As it turns out, no. Here’s what happens:
First run through ISBNs in thingISBN result set 0312315821 is in xISBN's result set 0380006871 is in xISBN's result set 0445405600 is in xISBN's result set 0445408677 is in xISBN's result set Now run through the ISBNs in xISBN's result set The above four have been examined already; don't look at them again 0340127376 is unknown at thingISBN 0417052502 is unknown at thingISBN 0445405619 is unknown at thingISBN 3548014402 is unknown at thingISBN
thingISBN can’t give us any leads on new ISBNs. Four ISBNs were known to both places; their clusters sort of lined up. We knew four other ISBNs, from xISBN, and threw them at thingISBN, but we didn’t turn up any previously undiscovered manifestations.
So in this case, combining and deduping gives the same results as superduping. “That’s boring,” I hear you say. Next time I’ll give examples of where superduping breaks apart clusters and gives more complete results. And I’ll show examples of how this can fly out of control and go haywire.