Last Friday I attended a meeting the Open Library hosted in San Francisco. It was a solid geekfest, and one of the most inspiring days of my career as a librarian. My thanks to Brewster Kahle (director of the Internet Archive, where the Open Library is hosted), Aaron Swartz (Open Library project leader; here’s him talking about it last November), Alexis Rossi (Internet Archive project leader) and everyone else for organizing the day and inviting me.
It was a meeting for developers and library geeks, to introduce them to the Open Library and its framework and API (details here), to discuss what should happen next, and generally to get a bunch of people into one room, get them excited about the project, and to encourage them to work on it. It worked.
I’d met some of the people before (Sian Meikle, Bess Sadler, Rob Styles, Casey Bisson, Jeremy Frumkin), and got to meet some others for the first time (Ed Summers, Karen Coyle, Emily Lynema, Eric Morgan, Matt Cordial, David Strauss), and merely saw some from across the room (Roy Tennant, Terry Reese). There were about thirty people there; I’m sorry I can’t list them all.
The day began at 8:30 with us all arriving at a building in the Presidio and getting some breakfast. Brewster Kahle introduced the project, then Aaron Swartz spoke and went into some technical details, as did an OL programmer. There were some lightning talks by people there — quick talks about things they were working on that were related to or might be useful in the Open Library. People suggested topics for further discussion, and after lunch we broke up into three (four?) groups.
The one I sat in on was about merging and identifiers. I didn’t write down any notes, but other people did (and the entire event was filmed), which I hope turn up online. A few points:
- It’s generally agreed that the Open Library needs to be FRBRized. (Certainly Brewster Kahle thinks so.) But how? We need to get down to business and do it.
- Use existing algorithms: the Library of Congress’s FRBR display tool, OCLC’s, Karen Coyle’s MELVYL merge algorithm.
- Use whatever data LibraryThing might provide. In particular, could we use the work groupings that LibraryThing users have made to train an algorithm to do work matchings? If we know that LibraryThing says books A, B, C, and D are all the same work, look at what they have in common and assign weightings based on that, not on predetermined weights like “a match on ISBN counts for N” and “a 90% match on title counts for M.”
- Rob Styles had a way of generating “organic” identifiers, based on facts known about a work or expression or manifestation, instead of using index numbers from database tables. For example, “hamletshakespeare” (or an MD5 checksum thereof) might identify Shakespeare’s Hamlet as a work. There was some debate about whether how useful this would be.
- If algorithms are used for FRBRization and other work, but every page on the Open Library is a editable by all users, then how would we manage the mix of people and machine changes? How to stop the machines from overwriting corrections by people?
- What (possible Ajaxy) tools could be used to help people group manifestations into expressions and works? Nothing exists now, but it can be built.
- Either Rob Styles or Karen Coyle said that ISBNs don’t represent manifestations, they represent everything that has that ISBN on it. Sometimes that is a manifestation. Sometimes it’s not. ISBNs as a means of identifying manifestations are unreliable.
- So how can we make manifestation-, expression-, and work-level identifiers, and share them around the world? Should the Open Library be the sole authoritative source for such numbers? Everyone has there own identifying numbers for things. Can the Open Library act as a translation tool to turn one ID number into others?
- What about OCLC’s planned GLIMIR, Global Library Manifestation Identifier? Roy Tennant mentioned them but OCLC isn’t making anything public about it yet. I’ll post as soon I hear more.
Other things were discussed, but that’s about all I remember. I hope Karen and others in the group post about it.
Around 4 the groups came back together and reported on what they’d talked about, what next steps could be taken, etc. Then we went outside for a group picture and visited the Internet Archive’s office, across the street. Then it was off for some wine and cheese and relaxed chit-chat. A fine day.
I have some ideas for FRBRy things to hack on. I’ll post about whatever I do. I encourage you to look at the Open Library and get involved somehow.