A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

May 2008
M T W T F S S
« Apr    
 1234
567891011
12131415161718
19202122232425
262728293031  

26 March 2008

Shafranovich, FRBRizing Amazon’s Content

Filed under: Blog Mentions, Implementations — William Denton @ 7:45 am

Yakov Shafranovich’s Monday blog post FRBRizing Amazon’s Content is very interesting.

About two weeks ago I accidentally stumbled on a third public service that does something similar. When Amazon launched their Kindle eBook reader they made lots of titles available as a Kindle eBook. HOWEVER, they did not want to change the ISBN numbers for these titles. So what they did is re-organize their catalog is a way that all editions of the same work now appear to be linked to together including audio, eBook, hard cover, etc. This ability is buried in their API right here and is called RelatedItems:

He’s put up bookchaser.com, a tool to compare the results from Amazon, thingISBN, and xISBN.. Compare Harry Potter and the Goblet of Fire (starting with the ISBN of a Canadian manifestation): 81 results at xISBN, 105 at thingISBN, and 23 at Amazon, or, as Shafranovich calls his application, amazingISBN. On the other hand, here’s a book where Amazon knows more than the other two services.


27 August 2007

Primo demo at Vanderbilt

Filed under: Implementations — William Denton @ 7:32 am

Marshall Breeding points out their test implementation of Ex Libris’s new catalogue application, Primo: code-named Alphasearch, it is “the first time that a live Primo implementation has been made available to the general public.”

It does some manifestation grouping. Here are some search results for The Three Musketeers. Notice the link “11 versions in 3 languages published between 1893-1976.”

I find the interface awkward, especially its insistence on doing everything as a search, so I’ll leave further exploration up to you.


8 August 2007

Coyle on merging at Open Library

Filed under: Blog Mentions, Implementations, Open Library — William Denton @ 7:58 am

Over at the Open Library project, Karen Coyle posted a link to a record-merging algorithm. On the mailing list where she announced this, she said, “This algorithm was designed to bring together what you would call ‘manifestations’ in FRBR-speak, and what OpenLibrary calls an edition. It can basically be summed up as ‘things you would assign the same ISBN to.’”

It may be temporary on her site (the URL implies so), but if this builds into something permanent at the Open Library then it will be very useful. Certainly the algorithm is of immediate interest to anyone grouping MARC or ONIX records.


8 June 2007

Monte Sano FRBR Floater

Filed under: Implementations — William Denton @ 11:57 pm

Anyone know more about Monte Sano Associates’ FRBR Floater? It’s a subscription service that, it says, adds FRBRized views to a library’s catalogue without the library having to do anything.

Their one page PDF brochure explains:

FRBR Floater, from Monte Sano Associates, is an innovative new service that enables users to view, in an easy-to-read OPAC window, the various editions and formats owned by the library of any title searched. The user may then simply browse the list and select the one item that is most appropriate.

Libraries need not recatalog their collections or manipulate their bibliographic databases, because we use a sophisticated algorithm, based on the new international FRBR standard, to harvest the needed data from your existing MARC catalog records. As a result, libraries and their users can enjoy the benefits of FRBR without the time and expense of database analysis and re-design.

Seems to be something like LibraryThing for Libraries. Is it vapourware? I e-mailed Monte Sano a while ago but never got a reply.


24 May 2007

Nelsonville OH gets Editions tab in Koha

Filed under: Implementations — William Denton @ 9:51 am

LibLime works on and sells support for the two big free and open source integrated library systems, Koha and Evergreen. In More Web Services: FRBR, xISBN, ThingISBN last week they announced that the Nelsonville Public Library added a module to their Koha installation that makes an Editions tab in their catalogue. It uses xISBN and thingISBN. (Nelsonville is a town in Ohio, in the United States.)

There’s even a nice set of system preferences to manage this new feature. They allow the library to turn the feature on/off, specify whether or not to use ThingISBN, and throttle the number of queries to the xISBN service, ensuring compliance to the terms of the free service (499 queries per day).

If you look at their entry for one of the manifestations of Harry Potter and the Prisoner of Azkaban and select the Editions tab you’ll see links to five other manifesations of the work (and one of Harry Potter and the Chamber of Secrets).

Congratulations to the Nelsonville Public Library and all others involved! I don’t know if this plug-in will be made available to others, but I hope it is. A mention of it may turn up on the library’s web page about Koha and their use of it.


19 May 2007

LibraryThing for Libraries at Danbury Library

Filed under: Implementations, LibraryThing — William Denton @ 7:30 am

Danbury Library is the public library in Danbury, Connecticut, a city of about 80,000 people which is currently punching above its weight in the library world because it’s the first system to implement LibraryThing for Libraries, the new service offered by LibraryThing.

It looks very nice. Here are some sample pages:

  • Harry Potter and the Prisoner of Azkaban. (Strangely, the book is sixth in the result list for “azkaban,” but that’s because of their catalogue’s search, nothing to do with LibraryThing.) Note the “Other editions and translation” section, which links to the movie on DVD (a related work), the book read by Jim Dale (different expression), and the Spanish translation (different expression). There’s also a link to another Harry Potter book, which is a mistake but easily corrected. This is now a semi-FRBRized catalogue! LibraryThing also supplies links to related books and tags, which you’ll want to look at.
  • They only have one item of The Hero with a Thousand Faces so there’s nothing FRBRy there.
  • They have four items that are exemplars of a particular manifestation of Pride and Prejudice, and it links to an audiobook and what seem to be literary criticism, but not the movie. As with Harry Potter, searching for “pride and prejudice” shows movie results before the books.

Those are the first three books I tried, and the FRBRization isn’t perfect, but I’m not complaining. Congratulations to the Danbury Library and LibraryThing on getting this implemented! There’s a lot more going on than just the FRBRization, of course, but I’ll leave that to others to discuss, and will just say that it’s very nice work that all other libraries will want to ponder.


25 April 2007

Superduping an omnibus

Filed under: Implementations, LibraryThing, OCLC — William Denton @ 7:26 am

Before doing The Hobbit, here’s an interesting and short example of superduping an omnibus. (Omnibus editions are example of “aggregates,” an unsettled subject in the FRBR world.) It doesn’t spiral out of control, but it does, shall we say, expand beyond its borders.

I have in hand Captain Hornblower RN, which contains three of the Horatio Hornblower novels by C.S. Forester: Hornblower and the Atropos, The Happy Return, and A Ship of the Line. Penguin did all the Hornblower novels in three omnibus editions: The Young Hornblower, this one, and Admiral Hornblower. They arranged them into internal chronological order, not publishing order. My item is an exemplar of the the seventh impression of the Penguin manifestation of 1987. The ISBN is 0140081771. The copyright page says it was first published this way by Michael Joseph in 1965.

If we query thingISBN for 0140081771, we get back a cluster of four ISBNs: 0140081771, 0316288934, 2258039622, and 3548256554.

If we query xISBN for 0140081771, we get back two ISBNs: 0140081771 and 5859590202.

Let’s superdupe and group together all fragmented clusters at both services by comparing ISBNs back and forth.

The first ISBN is the one we started with, and it’s known to both services so it’s added to the superdupe array.

Super   Source  ISBN            ts      xs      Title
0       T       0140081771                      Captain Hornblower RN

The next one isn’t known at xISBN.

Super   Source  ISBN            ts      xs      Title
1       T       0316288934      3       1 + 0   Captain Horatio Hornblower
2       T       2258039622      2       1 + 2   Capitaine Hornblower (fre)

Line 2 opens up a cluster at xISBN: 2258039622 and 2258039614. They are Capitaine Hornblower, a French translation.

Super   Source  ISBN            ts      xs      Title
3       T       3548256554      1       2 + 4   Hornblower, Der Kapitän (deu)

That German edition opens up a cluster of four at xISBN: 0140008357, 0141027053, 3548024815, and 3548256554. Now that we’ve run through all of the numbers from thingISBN, we start running through the ones from xISBN and looking them up at thingISBN.

Super   Source  ISBN            ts      xs      Title
4       X       0140008357      0 + 9   5       The Happy Return

Interesting. This ISBN, which we got from xISBN in the Hornblower, Der Kapitän cluster, opened up a cluster of nine books at thingISBN. They are: 0140008357, 0141027053, 0316289329, 0523003854, 0523407351, 0523413904, 0718104692, 0736606548, 1859989969. The Happy Return is one of the novels in the omnibus, and it’s not unexpected that it would turn up. Someone wanting to the read The Happy Return could find it in Captain Hornblower RN. Someone wanting the omnibus edition would probably be as happy with the three individual novels, and perhaps all they really need is one of the three.

Things continue apace for a little while:

Super   Source  ISBN            ts      xs      Title
5       X       0141027053                      The Happy Return
6       X       2258039614      7 + 0   3       Capitaine Hornblower (fre)
7       X       3548024815      7 + 0   2       Der Kapitän (deu)
8       X       5859590202      7 + 0   1       Kapitan Khornblouer (rus)

But now we get Beat to Quarters, which thingISBN clustered with The Happy Return. Why? It’s not part of this omnibus. The LibraryThing work information page shows that people have grouped the two together as being the same work. This may be because of an older omnibus edition that does group the two novels.

Now we run through a bunch more manifestations of Beat to Quarters and a couple of The Happy Return, and something in Swedish.

Super   Source  ISBN            ts      xs      Title
9       T       0316289329      7       0 + 5   Beat to Quarters
10      T       0523003854                      Beat to Quarters
11      T       0523407351      5       3 + 0   Beat to Quarters
12      T       0523413904      4       3 + 0   Beat to Quarters
13      T       0718104692      3       3 + 0   The Happy Return
14      T       0736606548      2       3 + 0   Beat to Quarters
15      T       1859989969      1       3 + 1   Hornblower and the Happy Return (audio)
16      X       0736688986      0 + 0   3       Beat to Quarters (audio)
17      X       0736691286      0 + 0   2       Beat to Quarters (audio)
18      X       9137058126      0 + 0   1       Order Och Kontraorder (swe)

I was expecting that Hornblower and the Atropos and A Ship of the Line (the other two novels in the omnibus) would show up as individual works, but they didn’t. I didn’t do any deep investigation into this, to check how xISBN handles aggregates or what LibraryThing users do with such collections. One possible cause may be that omnibus editions are far more popular than individual ones, at least during the ISBN era.

Instead, because of what may be some overzealous grouping by a LibraryThing user Beat to Quarters came into the mix. It isn’t part of the omnibus in hand, and if we wanted to keep to just Captain Hornblower RN we’d have been better off not superduping. On the other hand, it certainly is related and of interest to the reader, so no harm is done. Perhaps it would help the user. Ideally, a catalogue would tie together all the Hornblower novels and the various omnibus editions so they are all easy to navigate.

Combining and deduping: 5
Superduping: 19 ISBNs
thingISBN: 4 at start; 8 calls; 9 ISBNs added; 6 unknown
    xISBN: 2 at start; 10 calls; 12 ISBNs added; 5 unknown

Superduping other omnibuses might show the constituent works being pulled out, but I’ll stop with this.

By the way, Forester’s Hornblower stories are all excellent and I recommend them.

I’m going to go on to some larger examples of superduping, but first I’m going to take a sidestep and bring in a new tool that will make it easier to see what’s going on.


20 April 2007

Superduping 2

Filed under: Implementations, LibraryThing, OCLC — William Denton @ 7:52 am

Here’s an example of how superduping and the thingISBN and xISBN services can go slightly wrong. The item in hand is my copy of From the Mixed-Up Files of Mrs. Basil E. Frankweiler, by E.L. Konigsburg. It was first published in 1967, and won the Newbery medal the next year. My item is an exemplar of, if I read the copyright page correctly, the forty-third impression of the Dell Yearling edition first published in 1977. The ISBN is 0440431808.

If we query thingISBN for 0440431808, we get back a cluster of 29 ISBNs. If we query xISBN for 0440431808, we get back a cluster of 34 ISBNs. I won’t show them all.

Here’s some of the output from my superduping script. I’ll include the title if it’s not what we expect. First, we check the ISBNs that thingISBN gave us.

Super   Source  ISBN            ts      xs
0       T       0140306811
1       T       032111583X      28      33 + 2  Final cut pro 3 for Macintosh

The first ISBN is in both initial result sets, so it’s moved to the superdupe array. The second ISBN isn’t in xISBN’s result set, so it’s checked over there, and found to be part of a cluster of two ISBNs. But the book is Final Cut Pro 3 for Macintosh! What the heck?

I’m not sure what’s going on. The LibraryThing page for this work shows 032111583X as the ISBN of a 1974 Atheneum hardcover manifestation. It’s the last in the list on the left-hand side, and if you follow the link there you can look up the book at various book-selling services. Notice the cover LibraryThing shows is for Final Cut Pro 3 for Macintosh, and notice too that if you follow the links to the booksellers you get some strange results. I searched listings at BookFinder and it appears there was a 1974 hardcover edition from Atheneum, but the ISBN was 0689205864.

So somehow the ISBN of a whole other work got into our list. Let’s see what happens. Those ISBNs will be queried at xISBN, so maybe things will spiral out of control.

Super   Source  ISBN            ts      xs
2       T       0333100646
3       T       0333462874
4       T       0395732514      25      32 + 1  Explore

The fifth ISBN checked from thingISBN’s initial result set hadn’t been checked at xISBN, so we check it, and find it’s a singleton. This means that it either is a singleton, or that xISBN knows nothing about it. Why does it say the title is Explore? Look up 0395732514 at WorldCat and you get three different listings! There’s the book we have in hand, a series (I think) called Invitations to Literacy, and whatever Explore is. The cover shown is for a Beverly Cleary book.

“I’m dashed confused,” I hear you say, and so am I. I have no idea what’s going on. Here are some points of interest from the rest of the output. Some other unrelated book comes in, and then we get into some non-English books. The Dutch one, Het Wonderlijke Archief van Mevrouw Fitzalan, must be a translation of our book, but I don’t know if the others are the right book or something else. (Sorry about the character sets not displaying properly.)

24      T       0807275565      5       17 + 1  Discovery packs of learning (eng)
...
37      X       8205104719      0 + 0   6       To rømlinger og en engel  (nor)
38      X       8906501692      0 + 0   5       Kʻu�llodia u�i pimil = (kor)
39      X       9021471272      0 + 0   4       Het wonderlijke archief van mevrouw Fitzalan  (dut)
40      X       9510008230      0 + 0   3       Vanhan rouvan salaiset paperit  (fin)
41      X       9510043346      0 + 0   2       Vanhan rouvan salaiset paperit  (fin)
42      X       9570803193      0 + 0   1       Base fu ren di dang an chu  (chi)

Final results:

Combining and deduping: 42
Superduping: 43 ISBNs
thingISBN: 29 at start; 15 calls; 0 ISBNs added; 14 unknown
    xISBN: 34 at start; 9 calls; 9 ISBNs added; 0 unknown

So after all of that we only got one new ISBN that we wouldn’t have had by just combining and deduping. It’s 382731755X, incidentally, which xISBN clusters with 032111583X. It’s another edition of Final Cut Pro 3 for Macintosh. At least three of the ISBNs are not for From the Mixed-Up Files of Mrs. Basil E. Frankweiler, and the number may be higher, depending on what the non-English books are. Whether we superdupe or just combine and dedupe, our data sources are mixing in some wonky results.

Conclusion: Sometimes this stuff just doesn’t make sense. Remember what the Stoic sage Epictetus advised two thousand years ago: “Some things are up to us and some are not.”

Next up: Jane Austen, Frodo Baggins, or Horatio Hornblower spiral out of control.


19 April 2007

Superduping 1

Filed under: Implementations, LibraryThing, OCLC — William Denton @ 7:28 am

Here’s a simple example of superduping working well. We’ll start with an item in my collection, my copy of the 2005 HarperCollins trade paperback manifestation of Flashman on the March, the latest in the series of novels by George MacDonald Fraser about the outrageously libidinous and cowardly scoundrel Harry Flashman. The ISBN is 0007201532. It’s a UK edition; I ordered it from over the pond because the release here was delayed by six months or so.

If we query thingISBN for 0007201532, we get back a cluster of five ISBNs:

000719739X
0007197403
0007201532
1400044758
1400096464

And if we query xISBN for 0007201532, we get back a singleton of just one ISBN:

0007201532

xISBN’s one result is also in thingISBN’s results, so xISBN’s cluster is a proper subset (fully contained in and not equal to) of thingISBN’s. This doesn’t happen often, and it shows some kind of problem or lack of information in how xISBN does its clustering. Happily, we can use the human-generated thingISBN cluster to improve results.

Notice that if we combine and dedupe the results, we just end up with thingISBN’s cluster.

Here’s how we’ll superdupe it. I’ll show the output from my superduping script and explain it line by line. We start off with two arrays of ISBNs, ts and xs, which at the start are set equal to the result sets. (They are pronounced tees and exes, as in t-plural and x-plural.) Whenever an ISBN is in both arrays we’re going to remove it from both and add it to the superdupe array. If it isn’t in both, we’ll look it up at the other service.

The Super column is how many ISBNs are in the superdupe array when this iteration starts. Source is T if the ISBN is coming out of ts and X if it’s coming from xs. The ts and xs columns show how many ISBNs are left in each array.

Super   Source  ISBN            ts      xs
0       T       000719739X      5       1 + 1

Explanation: Start with superdupe empty, with 0 items. Start with the thingISBN numbers (T) and take the first ISBN from the sorted list: 000719739X. Right now there are five ISBNs in ts and 1 in xs, that is, our original unaltered result sets. Look up 000719739X at xISBN and get back one ISBN: 000719739X, the one we queried about. xISBN doesn’t have anything clustered with it; it’s another singleton. The + 1 means we add that ISBN to xs because now we have checked it at xISBN. Then, because that number is in both arrays (it was in ts to start with and we just added it to xs), delete it from both arrays and add it to superdupe. Now superdupe has one ISBN in it, as shown at the start of the next line, and ts has four and xs has one.

Super   Source  ISBN            ts      xs
1       T       0007197403      4       1 + 4

When this ISBN is looked up at xISBN, a cluster of four come back! They are:

0007197403
1400044758
1405611154
1405621028

All of these are pushed onto xs. The first two were in thingISBN’s initial result set (in fact, the first is the one we queried about), but the last two are new. This is the third cluster of xISBN results we’ve seen so far (two singletons plus this) and we are using thingISBN’s cluster to group them all together. That’s superduping! Remove 0007197403 from both arrays and push it onto superdupe, which now has two numbers.

Super   Source  ISBN            ts      xs
2       T       0007201532
3       T       1400044758
4       T       1400096464      1       2 + 1

Three more ISBNs pulled out of ts. The first two above are also in xs, so there’s no need to look them up at xISBN. They are deleted from ts and xs and pushed onto superdupe. (The counts for ts and xs aren’t there partly because of where things get printed in my script and partly because nothing interesting happens so I don’t bother reporting it.) The third line there, for 1400096464, shows that it’s the last ISBN in ts (the 1 under ts) and that we have two in xs, plus one added by querying xISBN about 1400096464 and getting another singleton back. Remove 1400096464 from both arrays and push it into superdupe.

Now comes the really interesting part as we run through the ISBNs remaining in xs. These are ISBNs that were part of xISBN result sets but that we have not yet seen or checked at thingISBN. Just as we checked thingISBN-generated ISBNs at xISBN, now we will check xISBN-generated numbers at thingISBN.

5       X       1405611154      0 + 0   2
6       X       1405621028      0 + 0   1

But we don’t find anything interesting. The + 0 shows that thingISBN doesn’t even know about these numbers, much less have any more clusters of numbers to give back. If it had, we’d have pushed those numbers onto ts and started over again, back and forth until all numbers have been checked everywhere.

The results:

Combining and deduping: 5
Superduping: 7 ISBNs
thingISBN: 5 at start; 3 calls; 0 ISBNs added; 2 unknown
    xISBN: 1 at start; 4 calls; 6 ISBNs added; 0 unknown

Combining and deduping gave us five ISBNs but superduping gave us seven. That’s not a huge improvement, but I suspect this is all of the existing manifestations.

We made three calls to thingISBN and four to xISBN. Two ISBNs were unknown to thingISBN, and none were unknown to xISBN.

“How do I know all those ISBNs really represent manifestations of Flashman on the March?” I hear you cry. I checked, and they do. But that’s not always the case, as we’ll soon see.


18 April 2007

Superduping: slow introduction

Filed under: Implementations, LibraryThing, OCLC — William Denton @ 7:30 am

My supderuping experiments were interesting in a few different ways, and I’m still trying some things out and hacking my scripts. I’ll give a few examples over a few days.

First, a brief introduction. Let’s consider Ross Thomas’s novel The Seersucker Whipsaw. My item of this work is an examplar of the 1985 Perennial Library paperback manifestation, which is an embodiment of the author’s final edited text. The ISBN is 0060807288.

If we query thingISBN for 0060807288, we get 3 ISBNs back:

0060807288
0060808497
0446401692

And if we query xISBN for 0060807288, we also get 3 ISBNs back:

0060807288
0060808497
0446401692

The two results sets are identical. Nothing further need be done. That was simple, eh? As far as we can tell, this work has had only three manifestations.

In fact that’s false: these are all paperbacks, respectively from 1985, 1987, and 1992. The first edition was published by Morrow in 1967. Why isn’t it included in the results? It doesn’t have an ISBN! It was published too early to have one. That first ISBNless manifestation is out of luck and won’t show up in any xISBN or thingISBN results.

“That isn’t fair,” I hear you cry. It isn’t. Books that predate International Standard Book Numbers get the cold shoulder from xISBN and thingISBN, which, as you may have noticed from their names, are about ISBNs. “How do we get around that?” I hear you ask. Every work, expression, manifesation, and item will need to have a unique identifier. If one exists (like an ISBN for a manifestation), we can use it. If none exists, we’ll have to make one up and have everyone agree on it. (Or make up several and map them from one to the other.)

For the second example, let’s use another Ross Thomas novel, The Fools In Town Are On Our Side. (The title is from The Adventures of Huckleberry Finn: “Hain’t we got all the fools in town on our side? And ain’t that a big enough majority in any town?”) My item is an examplar of the 2003 St. Martin’s trade paperback reprint, ISBN 0312315821. The first manifestation was published in 1970. It’s one of his best novels and has been reprinted more than The Seersucker Whipsaw.

If we query thingISBN for 0312315821, we get 4 ISBNs back:

0312315821
0380006871
0445405600
0445408677

And if we query xISBN for 0312315821, we get 8 ISBNs back:

0312315821
0340127376
0380006871
0417052502
0445405600
0445405619
0445408677
3548014402

The 4 thingISBN numbers appear in xISBN’s result set. In set theory lingo, one might say that xISBN’s results are a proper superset of thingISBN’s.

If we combine and dedupe the results, we’ll get 8 ISBNs, all the ones from xISBN. What would superduping give us? Might it find more?

As it turns out, no. Here’s what happens:

First run through ISBNs in thingISBN result set
0312315821 is in xISBN's result set
0380006871 is in xISBN's result set
0445405600 is in xISBN's result set
0445408677 is in xISBN's result set
Now run through the ISBNs in xISBN's result set
The above four have been examined already; don't look at them again
0340127376 is unknown at thingISBN
0417052502 is unknown at thingISBN
0445405619 is unknown at thingISBN
3548014402 is unknown at thingISBN

thingISBN can’t give us any leads on new ISBNs. Four ISBNs were known to both places; their clusters sort of lined up. We knew four other ISBNs, from xISBN, and threw them at thingISBN, but we didn’t turn up any previously undiscovered manifestations.

So in this case, combining and deduping gives the same results as superduping. “That’s boring,” I hear you say. Next time I’ll give examples of where superduping breaks apart clusters and gives more complete results. And I’ll show examples of how this can fly out of control and go haywire.


Next Page »