A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

October 2006
M T W T F S S
« Sep   Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Melvyl Recommender Project

Posted by: William Denton, 18 October 2006 7:46 am
Categories: Implementations

I haven’t mentioned the Melvyl Recommender Project at the California Digital Library before. They recently put out Full Text Extension Supplementary Report (652 KB PDF) and it has a very interesting section on FRBR. Check it out.

Using Lucene, they indexed a bunch of bibliographic records. Then they used an algorithm of their own devising to group manifestations together into works. (They found OCLC’s Work-Set Algorithm too restrictive.)

Title       All titles match exactly                       +100
            All titles match after subtitles are removed    +80
             as above
            One list is a (nonempty) subset of the other    +80
            No match                                       -100
Author      All author s match exactly                     +100
            Keyword match (all words in shorter author      +80
             match longer author), for all authors in list
            One list is a (nonempty) subset of the other    +80
            No match                                       -100
Date        Exact match                                     +50
            +/- two years                                   -25
            No match                                        -50
Identifier  All identifiers match exactly                  +100
            One list is a (nonempty) subset of the other    +80
            No match                                          0  

Total score Minimum threshold for match                      150

They end by saying (and I think they mean “expression” where they say “item”):

The current algorithm attempts to form groups that represent a single FRBR “work.” It would be interesting to pursue a two level decomposition of the records into “work” and “items” within each work. It is unknown whether the metadata would support such a decomposition.

Certainly further tuning of the matching algorithm would be necessary and desirable if it were to be used in a production system. Inevitably there will be many corner cases that result in poor groupings using the current simplistic algorithm. Additionally, the creators of Melvyl developed a separate, somewhat different algorithm for grouping serials (as opposed to monographs), and it seems likely we would discover the need for this as well.

Though our project was able to obtain Library of Congress authority files, which are generally considered a necessary step in FRBRization, we ran out of time to integrate them into our FRBR process. Certainly this should be considered as a likely way to improve grouping for that fraction of documents that match the authority files (a baby step would be to quantify that fraction.)


No Comments »

No comments yet.

Comments RSSTrackBack URI

Leave a comment