Open Library, FurBurrization
Hi. I’m a month behind with this one. You know how it goes sometimes. On Friday 4 September on the ol-discuss mailing list Lee Passey posted FurBurrization:
There are two things that continue to make OpenLibrary less than useful for me: 1. lack of links to source materials outside of Internet Archive (e.g. Google) and 2. the tremendous amount of unreconciled duplicate date in the OL archive.
Given the home-grown database management system used by OpenLibrary, it seems to me that the best way to solve problem #2 will probably be to FRBRize the existing data. It’s been a long time since I’ve seen any update on this effort; how’s it going?
There was a short thread following with some back and forth. Karen Coyle replied:
Lee, FRBR-ization is in test at the moment. The hardest part is figuring out a good user experience when only a few items have multiple editions. But that is in progress.
Yes, there are duplicates. Those will be removed by re-running the duplicate detection on the database, and that will be more efficient once the WorKs are gathered together because that pinpoints a lot of the duplicates. The algorithm will only take us so far, however, so there are plans to provide support for merging of items and authors by users. It all has to be coordinated with re-directing from previously used IDs so that no linking is broken.
I’m not sure about your #1 — there are links to Google on the Edition pages when a Google item is detected, and the link states whether it is a snippet, full view or now view. http://openlibrary.org/b/OL2873790M/Raintree-County [with ISBN] http://openlibrary.org/b/OL6026352M/Raintree-County [without ISBN]
Is this not what you are looking for?
So, can you give us any more details, or is this something you will simply present to us when it’s completed to your satisfaction? For me, personally, the web interface is irrelevant; I’m much more interested in that the data will look like when it is retrieved via an API, and that can be exposed long before you have figured out how to present a “good user experience.”
Sorry, I have no idea how it will work in the APIs, and don’t know if that’s been worked out yet. I’ll try to get an answer for that, but it’s possible that it isn’t known yet. We’re still trying to figure out which data elements will be resident on the Work record/template and which on the edition(Manifestation) template. Just to give you an idea of the progress. And in terms of merging manifestations into works, we’re using author, title and uniform title when it is available. The first pass will have errors, and there will need to be a way to allow users to merge and unmerge as a way to correct those.
The database will consist of Works and Manifestations (called ‘editions’) — it will not be just display, but two linked bibliographic ‘levels’. There isn’t enough information in the bibliographic data to provide accurate expressions, but it should be possible to at least provide a separate view of different languages. (Not to mention that there isn’t a lot of agreement in the bibliographic world about where to divide expressions and works… but that’s a whole different conversation.)
Browse the whole thread to get the whole exchange, but it’s all been over for a month.