A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

May 2013
M T W T F S S
« Jul    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Open Library, FurBurrization

Posted by: William Denton, 9 October 2009 7:03 am
Categories: Open Library

Hi. I’m a month behind with this one. You know how it goes sometimes. On Friday 4 September on the ol-discuss mailing list Lee Passey posted FurBurrization:

There are two things that continue to make OpenLibrary less than useful for me: 1. lack of links to source materials outside of Internet Archive (e.g. Google) and 2. the tremendous amount of unreconciled duplicate date in the OL archive.

Given the home-grown database management system used by OpenLibrary, it seems to me that the best way to solve problem #2 will probably be to FRBRize the existing data. It’s been a long time since I’ve seen any update on this effort; how’s it going?

There was a short thread following with some back and forth. Karen Coyle replied:

Lee, FRBR-ization is in test at the moment. The hardest part is figuring out a good user experience when only a few items have multiple editions. But that is in progress.

Yes, there are duplicates. Those will be removed by re-running the duplicate detection on the database, and that will be more efficient once the WorKs are gathered together because that pinpoints a lot of the duplicates. The algorithm will only take us so far, however, so there are plans to provide support for merging of items and authors by users. It all has to be coordinated with re-directing from previously used IDs so that no linking is broken.

I’m not sure about your #1 — there are links to Google on the Edition pages when a Google item is detected, and the link states whether it is a snippet, full view or now view. http://openlibrary.org/b/OL2873790M/Raintree-County [with ISBN] http://openlibrary.org/b/OL6026352M/Raintree-County [without ISBN]

Is this not what you are looking for?

Lee Passey said:

So, can you give us any more details, or is this something you will simply present to us when it’s completed to your satisfaction? For me, personally, the web interface is irrelevant; I’m much more interested in that the data will look like when it is retrieved via an API, and that can be exposed long before you have figured out how to present a “good user experience.”

Karen Coyle followed up:

Sorry, I have no idea how it will work in the APIs, and don’t know if that’s been worked out yet. I’ll try to get an answer for that, but it’s possible that it isn’t known yet. We’re still trying to figure out which data elements will be resident on the Work record/template and which on the edition(Manifestation) template. Just to give you an idea of the progress. And in terms of merging manifestations into works, we’re using author, title and uniform title when it is available. The first pass will have errors, and there will need to be a way to allow users to merge and unmerge as a way to correct those.

And then she said:

The database will consist of Works and Manifestations (called ‘editions’) — it will not be just display, but two linked bibliographic ‘levels’. There isn’t enough information in the bibliographic data to provide accurate expressions, but it should be possible to at least provide a separate view of different languages. (Not to mention that there isn’t a lot of agreement in the bibliographic world about where to divide expressions and works… but that’s a whole different conversation.)

Browse the whole thread to get the whole exchange, but it’s all been over for a month.


Open Library FRBR work

Posted by: William Denton, 23 January 2009 9:02 pm
Categories: Open Library

From Progress on finding works, e-mail sent by Edward Betts to the ol-discuss list:

http://edwardbetts.com/ol/Arthur_Conan_Doyle.html lists 393 titles
found for books by Arthur Conan Doyle, sorted by the number of library
records for each title. The top of the list pretty clearly shows works
by the author. Some of the errors at the bottom of the list are caused
by titles entered in a foreign language, I think I might be able to
match some of these automatically. The table can be resorted by title.

Follow the thread for replies.


OpenLibrary gets into FRBR

Posted by: William Denton, 19 August 2008 7:16 am
Categories: Open Library

Karen Coyle sent a message to an Open Library mailing list on Sunday: Work and Edition Fields.

In beginning our journey into “frbrization”, I have gone over the list of fields in the OL record and have separated them in terms of Work and Edition. http://openlibrary.org/about/work_edition. Comments welcome. Note that we haven’t yet grappled with “expression” in terms of FRBR, but I’m not sure that we’ll be able to have a separate level for expressions since we may not get the data that we need to make that distinction. I suspect that some expressions will get treated as editions (manifestions, in FRBR-speak) and others will be treated as Works. For example, we probably will not have a way to know when a book is a translation of another book. That would be an expression in FRBR, but it may be treated as a Work in OL until we find some way to bring together the translations for a work.


Open Library developers’ meeting videos

Posted by: William Denton, 10 March 2008 7:59 am
Categories: Open Library

The Open Library developers’ meeting on 29 February was recorded, and the videos are now up on their web site.


Open Library developers’ meeting

Posted by: William Denton, 5 March 2008 7:53 am
Categories: Open Library

Last Friday I attended a meeting the Open Library hosted in San Francisco. It was a solid geekfest, and one of the most inspiring days of my career as a librarian. My thanks to Brewster Kahle (director of the Internet Archive, where the Open Library is hosted), Aaron Swartz (Open Library project leader; here’s him talking about it last November), Alexis Rossi (Internet Archive project leader) and everyone else for organizing the day and inviting me.

It was a meeting for developers and library geeks, to introduce them to the Open Library and its framework and API (details here), to discuss what should happen next, and generally to get a bunch of people into one room, get them excited about the project, and to encourage them to work on it. It worked.

Me at the Internet Archive

I’d met some of the people before (Sian Meikle, Bess Sadler, Rob Styles, Casey Bisson, Jeremy Frumkin), and got to meet some others for the first time (Ed Summers, Karen Coyle, Emily Lynema, Eric Morgan, Matt Cordial, David Strauss), and merely saw some from across the room (Roy Tennant, Terry Reese). There were about thirty people there; I’m sorry I can’t list them all.

The day began at 8:30 with us all arriving at a building in the Presidio and getting some breakfast. Brewster Kahle introduced the project, then Aaron Swartz spoke and went into some technical details, as did an OL programmer. There were some lightning talks by people there — quick talks about things they were working on that were related to or might be useful in the Open Library. People suggested topics for further discussion, and after lunch we broke up into three (four?) groups.

The merging and identifiers group deep in discussion

The one I sat in on was about merging and identifiers. I didn’t write down any notes, but other people did (and the entire event was filmed), which I hope turn up online. A few points:

  • It’s generally agreed that the Open Library needs to be FRBRized. (Certainly Brewster Kahle thinks so.) But how? We need to get down to business and do it.
  • Use existing algorithms: the Library of Congress’s FRBR display tool, OCLC’s, Karen Coyle’s MELVYL merge algorithm.
  • Use whatever data LibraryThing might provide. In particular, could we use the work groupings that LibraryThing users have made to train an algorithm to do work matchings? If we know that LibraryThing says books A, B, C, and D are all the same work, look at what they have in common and assign weightings based on that, not on predetermined weights like “a match on ISBN counts for N” and “a 90% match on title counts for M.”
  • Rob Styles had a way of generating “organic” identifiers, based on facts known about a work or expression or manifestation, instead of using index numbers from database tables. For example, “hamletshakespeare” (or an MD5 checksum thereof) might identify Shakespeare’s Hamlet as a work. There was some debate about whether how useful this would be.
  • If algorithms are used for FRBRization and other work, but every page on the Open Library is a editable by all users, then how would we manage the mix of people and machine changes? How to stop the machines from overwriting corrections by people?
  • What (possible Ajaxy) tools could be used to help people group manifestations into expressions and works? Nothing exists now, but it can be built.
  • Either Rob Styles or Karen Coyle said that ISBNs don’t represent manifestations, they represent everything that has that ISBN on it. Sometimes that is a manifestation. Sometimes it’s not. ISBNs as a means of identifying manifestations are unreliable.
  • So how can we make manifestation-, expression-, and work-level identifiers, and share them around the world? Should the Open Library be the sole authoritative source for such numbers? Everyone has there own identifying numbers for things. Can the Open Library act as a translation tool to turn one ID number into others?
  • What about OCLC’s planned GLIMIR, Global Library Manifestation Identifier? Roy Tennant mentioned them but OCLC isn’t making anything public about it yet. I’ll post as soon I hear more.

Other things were discussed, but that’s about all I remember. I hope Karen and others in the group post about it.

Around 4 the groups came back together and reported on what they’d talked about, what next steps could be taken, etc. Then we went outside for a group picture and visited the Internet Archive’s office, across the street. Then it was off for some wine and cheese and relaxed chit-chat. A fine day.

Aaron Swartz and Brewster Kahle

I have some ideas for FRBRy things to hack on. I’ll post about whatever I do. I encourage you to look at the Open Library and get involved somehow.


Brewster Kahle: “Do this FRBR thing”

Posted by: William Denton, 28 February 2008 7:21 am
Categories: Conferences,Open Library

I’m not at Code4Lib 2008. I wish I was. Hello to anyone there!

Nicole Engard’s Code4Lib 2008: The Internet Archive and Karen Coombs’s Code4Lib Day 1 Morning Talks both report on the talk that Brewster Kahle (of the Internet Archive) gave.

Engard writes:

The next step according to Brewster is to build the catalog and “we finally need to do this FRBR thing – come on guys, it’s not that hard!!!” Even if the digital copy of the book isn’t available yet, it makes sense to provide pages for the book with catalog data that pulls information from sites like Amazon and other book information sites.

Coombs summarizes this part of his talk: “FRBR is a must!!”


Swartz on the Open Library

Posted by: William Denton, 27 October 2007 7:23 am
Categories: Audio/Video,Blog Mentions,Open Library

Aaron Swartz was at the Berkman Center for Internet & Society at Harvard Law School on Tuesday and gave a talk about what the Open Library is doing and how it’s going. David Weinberger was there and blogged it. If you listen to the audio recording of Swartz’s talk(58 MB MP3) then you’ll hear that at about the seven minute mark he talks about FRBR. The Open Library plans on FRBRizing its collections, and from the sounds of it they’ll go beyond the usual stuff when they do relations betweens different entities. Excellent. Around the twenty-five minute mark, there’s a question about FRBR and how the relationships will be chosen and made. The whole thing is worth a listen.

UPDATE: Around thirty-six minutes in, Greg Crane is asked a question and some interesting stuff follows.


Coyle on merging at Open Library

Posted by: William Denton, 8 August 2007 7:58 am
Categories: Blog Mentions,Implementations,Open Library

Over at the Open Library project, Karen Coyle posted a link to a record-merging algorithm. On the mailing list where she announced this, she said, “This algorithm was designed to bring together what you would call ‘manifestations’ in FRBR-speak, and what OpenLibrary calls an edition. It can basically be summed up as ‘things you would assign the same ISBN to.’”

It may be temporary on her site (the URL implies so), but if this builds into something permanent at the Open Library then it will be very useful. Certainly the algorithm is of immediate interest to anyone grouping MARC or ONIX records.


Open Library

Posted by: William Denton, 17 July 2007 7:47 am
Categories: Open Library,Uncategorized

FRBRization is planned as part of The Open Library (just opened in demo mode) from the Internet Archive. This is wild stuff. Go look at it.