A weblog following developments around the world in FRBR: Functional Requirements for Bibliographic Records.

Maintained by William Denton, Web Librarian at York University. Suggestions and comments welcome at wtd@pobox.com.


Confused? Try What Is FRBR? (2.8 MB PDF) by Barbara Tillett, or Jenn Riley's introduction. For more, see the basic reading list.

Books: FRBR: A Guide for the Perplexed by Robert Maxwell (ISBN 9780838909508) and Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools edited by Arlene Taylor (ISBN 9781591585091) (read my chapter FRBR and the History of Cataloging).

Calendar

May 2007
M T W T F S S
« Apr   Jun »
 123456
78910111213
14151617181920
21222324252627
28293031  

Pride and Prejudice 7.5: Harry Potter and the Bride of Pemberley

Posted by: William Denton, 17 May 2007 10:34 am
Categories: Pride and Prejudice

Last night I realized there was perhaps another way of running MARC records through the LC FRBR Display Tool, and I tried it. It’s easier, cleaner, and involves less character-obliterating. I’ll post about it next week, but in the meantime, here’s a FRBRization of all the Harry Potter books by J.K. Rowling. I superduped the ISBNs of my copies of the books, found MARC records for 404 different manifestations, ran them through the tool, and ended up with this XML file. The results are good. Even some of the American Harry Potter and the Sorceror’s Stone expressions and manifestations are grouped in with Harry Potter and the Philosopher’s Stone.

The question marks (“?”) you’ll see are my replacements for problem characters I had to clear out to get things to work. I didn’t wipe any MARC fields.


Pride and Prejudice 7: Triumph!

Posted by: William Denton, 7:24 am
Categories: Pride and Prejudice

Yesterday, in the sixth entry in this series, Bad MARC Data, I left you at this thrilling error:

Transforming the MARCXML into FRBR XML and saving to pp.xml ...
Error on line 15908 column 46 of file:///usr/home/wtd/frbr-lc-tool/tmp/slimfrbr.xml:
  Error reported by XML parser: Character reference "&#31" is an invalid XML character.

As it turned out, &#30 caused problems too, and I got rid of them both by the old Perl technique of editing files in place:

perl -pi.bak -e 's/\&#(30|31)//g' slimfrbr.xml

Now I’m editing generated files mid-process, which is bad. Tough.

The next step in the process now worked:

java -jar saxon7.jar -u -o clean.xml slimfrbr.xml \
  http://www.loc.gov/standards/marcxml/frbr/v2/clean.xsl

But then this didn’t:

java -jar saxon7.jar -u -o match.xml clean.xml \

http://www.loc.gov/standards/marcxml/frbr/v2/match.xsl

It generated a big ugly stack trace. I switched to using Saxon (version 8.9). Why it works and the older version doesn’t, I don’t know, nor, at this point, did I particularly care.

saxon -u -o match.xml clean.xml \
 http://www.loc.gov/standards/marcxml/frbr/v2/match.xsl

It complained: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor. But it worked.

But then the final FRBRizattion XSL failed!

$ saxon -u -o pp.xml match.xml \
 http://www.loc.gov/standards/marcxml/frbr/v2/FRBRize.xsl
Validation error on line 18 of http://www.loc.gov/standards/marcxml/frbr/v2/FRBRize.xsl:
  Cannot convert string " " to a double
Transformation failed: Run-time errors were reported

What does line 18 of that XSL file say?

<xsl:sort
 select="normalize-space(translate(substring(marc:datafield[@tag=130
 or @tag=240 or @tag=243 or
 @tag=245][1]/marc:subfield[@code='a'][1],marc:datafield[@tag=130]/@ind1 |
 marc:datafield[@tag=240 or
 @tag=243 or @tag=245][1]/@ind2),
 'abcdefghijklmnopqrstuvwxyz,.;/-:[]()','ABCDEFGHIJKLMNOPQRSTUVWXYZ'))"/>

The XSL is expecting to find a number in the first or second indicator of the 245 Title Statement field, but in a few cases it’s seeing a space and it gets confused.

$ grep 245 match.xml | grep '" "'
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1="1" ind2=" ">
      <datafield tag="245" ind1=" " ind2="0">
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1=" " ind2=" ">
      <datafield tag="245" ind1=" " ind2=" ">

The first indicator says whether or not there should be a title added entry, 0 for no, 1 for yes. The second indicator tells how many nonfiling characters there are at the start of the title (for The Three Musketeers it would be 4, so the title sorts under Three, not The).

The second indicator might matter for how the FRBRizing algorithm works, but I didn’t care. I just set all these bad indicators to 0 with Perl again. This is the second time I edited generated files mid-process, but I was so close to the end nothing could restrain me now.

perl -npi.bak -e 'next unless /tag="245/; s/" "/"0"/g;' match.xml

And then this step worked:

saxon -u -o pp.xml match.xml \
 http://www.loc.gov/standards/marcxml/frbr/v2/FRBRize.xsl

And the final step worked:

saxon -a -o pp.html pp.xml

Phew! You can see the results of FRBRizing a superduped Pride and Prejudice here.

Tomorrow: some comments. Have a look at the FRBRized results and give them a think, and leave a comment below or tomorrow.


Pride and Prejudice 6: Bad MARC Data

Posted by: William Denton, 16 May 2007 7:00 am
Categories: Pride and Prejudice

Today we’re going to try to run the big MARC file through Library of Congress’s FRBR tool. If that sentence is complete gibberish, look at the previous entry in this series and get caught up.

I have my marc2frbr.sh script, I have the MARC file, I have the LC stuff. When I ran the test with the Mahler MARC file the LC provided, everything went perfectly fine. Not an error in sight. Now I’ve got hundreds of MARC records I downloaded from hither and possibly yon. Perhaps one or two of them will cause a problem?

$ ./marc2frbr.sh pride-and-prejudice.marc pp
Transforming pride-and-prejudice.marc to MARCXML ...
** Error: Invalid directory length
   Record Number: 4240822
   Character: 90496

** Error: Directory not terminated
   Record Number: 4240822
   Character: 90604
[blah blah more errors blah blah]
[blah blah ugly stack trace blah blah]
[blah blah more errors blah blah]

“I say, Denton old bean,” you say. “That doesn’t look good.”

Indeed it doesn’t. There are several things I could have done at this stage. I took the “just hack on it until it works” approach. I didn’t care what the problem was with record number 4240822. I just wanted it gone. “Begone,” quoth I. If I had some fancy MARC editor, I might have fired it up and fixed the problem. I didn’t. I brute-forced it.

I did so with MARC/Perl, “a Perl 5 library for reading, manipulating, outputting and converting bibliographic records in the MARC format.” It’s a pretty hairy Perl module, more complicated and more powerful than the Ruby MARC library I’ve mentioned before.

Before I could get rid of that record, I wanted a Perl script that would just open up the MARC file and parse it. That’s always the first step in doing anything like this. I wrote:

#!/usr/local/bin/perl -w

my $marcfile  = shift;
die "Usage: $0 marc.mrc" unless defined $marcfile;

use MARC::Batch;
my $batch = MARC::Batch->new('USMARC', $marcfile);
$batch->strict_off();

while (my $record = $batch -> next()) {
    print $record->title(), "\\n";
}

I ran it and got this error: utf8 "\xB9" does not map to Unicode at /usr/local/lib/perl5/5.8.8/mach/Encode.pm line 166.

The second record in the file had some kind of character encoding problem. I hate those. I asked about it on the perl4lib mailing list, and Jason Ronallo said it was probably there because the ruby-marc library I’d used in my MARC-record-grabbing script had a bug. I upgraded the library to the new bug-free version and started my script over, but it was going to take hours to run.

Ronallo had suggested editing out the offending character. I fired up Emacs and used the hex mode (M-x hexl-find-file) to get the that character out of the way. And then another character. And another one. And another one. I ended up searching for multiple occurrences of thirty different characters and replacing them all with spaces. They were mostly in Chinese and Eastern European records.

If anyone ever says to you, “I say, old thing, would you mind editing this MARC file in a hex editor — your favourite, of course, I don’t care which — and removing all of these thirty different Unicode characters wherever they appear, until this Perl script runs more or less cleanly and doesn’t die with an error in the Encode module,” then I suggest you reply, “Sorry, old man, not even for one of Antoine’s famous omelettes and a half bot. of the Widow. And stop speaking like you’re in a Wodehouse novel.” (Why didn’t I do it in Perl? I didn’t think to try. My sed doesn’t seem to handle hex, and for some reason I went straight to my editor after that.)

Anyway, I ended up with a file that could, with errors, be parsed by a Perl script: pride-and-prejudice-cleaned-chars.marc. I wasn’t any closer to running the MARC file through the LC tool, though: I still had record number 4240822 to get rid of.

I’m not going to go into all of the gory detail here, but I wrote another Perl script, marc-wiper.pl, which deleted that record and twenty-four others. They were all bad enough that the LC tool just couldn’t parse them. So, away with them.

On top of that, the LC tool was complaining about some particular MARC fields in some records. Some, such as 400, don’t exist in the MARC 21 specification. Others, like 520 Summary, Etc. do, but I didn’t care.

I ended up wiping all mentions of these fields: 200, 240, 280, 380, 400, 450, 500, 520, 600, 840. “240!” you cry. Yes, I deleted all mentions of 240 Uniform Title. Two records had the same record number and had some kind of problem with their 240, and when I couldn’t get rid of them individually I got fed up and just wiped all 240s. That is not the proper way to treat such an important field in an experiment like this, but I told you I was brute-forcing it.

I seem to have mixed up something and now when I run the Perl script I get some warnings, so this may not work for you. I don’t recommend trying it, in any case. If you run marc-wiper.pl on the previous MARC file, you should end up with clean-pride-and-prejudice-cleaned-chars.marc. You may not. You can just download it directly, but I wouldn’t bother. There’s more ugliness to come.

Finally I could get through the first step in the process:

$ ./marc2frbr.sh clean-pride-and-prejudice-cleaned-chars.marc pp
Transforming clean-pride-and-prejudice-cleaned-chars.marc to MARCXML ...

But then …


Transforming the MARCXML into FRBR XML and saving to pp.xml ...
Error on line 15908 column 46 of file:///usr/home/wtd/frbr-lc-tool/tmp/slimfrbr.xml:
  Error reported by XML parser: Character reference "&#31" is an invalid XML character.

And on that cliffhanger, I leave you until tomorrow.


WoGroFuBiCo 2 commentary

Posted by: William Denton, 15 May 2007 7:52 am
Categories: Conferences,Library of Congress

(The Pride and Prejudice example took longer than expected to recreate and document, but it should be up tomorrow.)

Structures and Standards for Bibliographic Data, the second meeting of the Working Group on the Future of Bibliographic Control (WoGroFuBiCo) took place last Wednesday. Naturally FRBR came up, along with lots of other issues, so here are a few links to get you started.


Pride and Prejudice 5: Mahler

Posted by: William Denton, 14 May 2007 7:41 am
Categories: Pride and Prejudice

All right. If you haven’t downloaded the Library of Congress’s FRBR Display Tool, my shell scripts, and the big MARC file, go back to the last post, Pride and Prejudice 4: FRBR Display Tool and get yourself set.

Here are the files in the LC zip file:

-rw-r--r--  1 wtd  wtd     1076 Mar 30  2004 MARC2FRBR.BAT
-rw-rw-rw-  1 wtd  wtd  1270415 May  7  2003 MARC4J-B.ZIP
-rw-rw-rw-  1 wtd  wtd   261744 Jun  3  2003 MARC4J.JAR
-rw-r--r--  1 wtd  wtd      940 Feb  4  2003 MARCXML.BAT
-rw-rw-rw-  1 wtd  wtd     7009 Mar 24  2004 README.TXT
-rw-r--r--  1 wtd  wtd      941 Mar 30  2004 SLIM2FRBR.BAT
-rw-rw-rw-  1 wtd  wtd    25166 Jan 28  2004 mahler.html
-rw-rw-rw-  1 wtd  wtd    52148 Dec 19  2003 mahler.mrc
-rw-rw-rw-  1 wtd  wtd    47895 Aug 16  2005 mahler.mrk
-rw-rw-rw-  1 wtd  wtd    34495 Jan 28  2004 mahler.xml
-rw-rw-rw-  1 wtd  wtd    13563 Jul 28  2006 marcxml.jar
-rw-rw-rw-  1 wtd  wtd   966191 Nov 11  2003 saxon7.jar

README.TXT says at the bottom:

3.  Example using the MARC file "mahler.mrc"

C:\MARCXML>marc2frbr mahler.mrc mahler

will produce two files

mahler.xml:  The FRBR xml file
mahler.html: The FRBR HTML display file which you can view using any
             web browser.

As I said last time, I use Unix so I need a shell script, and I wrote marc2frbr.sh. Let’s run it on the sample MARC file that LC provided, and make sure everything is tickety-boo:

$ ./marc2frbr.sh mahler.mrc mahler
Transforming mahler.mrc to MARCXML ...
Transforming the MARCXML into FRBR XML and saving to mahler.xml ...
Transforming the FRBR XML into HTML and saving to mahler.html ...
Complete

Worked perfectly! The HTML generated is mahler.html. Have a look and see what it looks like. If you have any MARC editors or analysis tools, use them to have a look at mahler.mrc and compare the two.

Also of interest is mahler.xml, from which the HTML is generated. It uses the Metadata Object Description Schema (MODS) to help define metadata elements in a FRBR hierarchy with works, expressions, and manifestations. Here’s a section of the file, representing one particular work (which you can look for in the HTML):

<work>
  <mods:name type="personal">
    <mods:namePart>Mahler, Gustav, 1860</mods:namePart>
    <mods:role>
      <mods:text>creator</mods:text>
    </mods:role>
  </mods:name>
  <mods:titleInfo>
    <mods:title>Symphonies, no. 3, D minor. O Mensch. Vocal scores</mods:title>
  </mods:titleInfo>
  <expression>
    <mods:typeOfResource>text</mods:typeOfResource>
    <mods:language authority="iso639-2b">ger</mods:language>
    <manifestation>
      <imprint>
        <mods:titleInfo>
          <mods:title>Sehr Langsam, misterioso</mods:title>
        </mods:titleInfo>
        <mods:note type="statement of responsiblity">Gustav Mahler.</mods:note>
        <mods:originInfo>
          <mods:dateIssued>1896</mods:dateIssued>
        </mods:originInfo>
        <mods:physicalDescription>
          <mods:extent>1 ms. vocal score ([4] p.) ; 35 cm.</mods:extent>
        </mods:physicalDescription>
        <mods:identifier type="lccn">82770782</mods:identifier>
      </imprint>
    </manifestation>
  </expression>
</work>

(mods:something means that the something element comes from MODS and is being kept in the mods namespace. This is useful when mixing metadata elements from two different schemas.)

Even if you’re new to metadata in XML you can probably get the gist of what’s going on up there. There’s a work, with a creator and a title, and it has an expression and a manifestation. The MODS elements define attributes of the FRBR entities. All of this is transformed into HTML, as linked above.

That’s enough of LC’s example. Now we’re sure everything works with the test data. How well it work with the MARC records I downloaded? Not very well, but we’ll deal with that.

If you have any questions about any step of this worked example, or if you’re doing it at home and run into some trouble, feel free to leave a comment. Comments are always welcome!


Pride and Prejudice 4: FRBR Display Toolkit

Posted by: William Denton, 11 May 2007 7:41 am
Categories: Pride and Prejudice

In Pride and Prejudice 3: MARC I told you about how I had superduped my copy of Jane Austen’s Pride and Prejudice, got 792 ISBNs, queried open Z39.50 servers around the world, and retrieved MARC records for 383 of those manifestations.

That’s a nice chunk of data to work with. “What,” I hear you cry, “will we do with it?” “We’re going to bung it into Library of Congress’s FRBR Display Tool and see what happens,” I reply. “Right ho!” you say.

Here’s a brief extract about the tool. Make your way over there to read the whole thing and poke around.

The FRBR Display Tool … is an XSLT program that transforms the bibliographic data found in MARC record retrieval files into meaningful displays by grouping the bibliographic data into the “Work,” “Expression” and “Manifestation” FRBR entities….

The FRBR Display Tool sorts and arranges bibliographic record sets using the FRBR model. It then generates useful hierarchical displays of these record sets containing works that consist of multiple expressions and manifestations.

The tool is very flexible. Because the tool is written in XSLT, it is easy to augment based on an institution’s individual needs. Likewise, the output may be augmented by simply changing the XSL stylesheet that controls display. No change in the XSLT program is needed.

The tool does not search bibliographic databases to create the record set on which it operates. A retrieved file (e.g., an OPAC search result) of MARC unit records must be created before using the tool.

We have the MARC records so we’re set to begin.

If you want to follow along at home, please download the FRBR Display Tool and get it working for next week. It requires Java.

The toolkit includes some Java JAR files and other things, including MARC2FRBR.BAT, a DOS or Windows batch file. That’s no good to me (I run FreeBSD) so I made a shell script replacement, marc2frbr.sh.

As it happened, while processing the MARC records I ran into various problems, which I’ll describe and solve next week. I ended up installing Saxon 8.9 to get around one of them. You’d be best off to install it too. For it, I made another new version of the batch file, new-marc2frbr.sh.

I worked through this at the command line on my Unix box, and that’s what I’ll show, so if you’re not using a Unix or a Linux, you’ll have to figure it out yourself or just follow and see the results.

“Strike me pink,” you say. “Do you mean you’re going to take those 383 MARC records and FRBRize Pride and Prejudice before my very eyes?”

Yes!


xISBN v2 pricing discussions

Posted by: William Denton, 10 May 2007 7:12 am
Categories: OCLC

In case you didn’t see the comments on yesterday’s post (which you should), or read some discussion of this elsewhere, I thought I’d point out the new xISBN pricing sheet. The main page about xiSBN says, “The xISBN Web service is free for non-commercial use when usage does not exceed 500 requests per day…. The service is also available on a subscription basis for non-commercial and commercial use for usage that meets or exceeds 500 requests per day.”

There was some discussion on the code4lib mailing list following Eric Hellman’s announcement.

In What’s a Web Service Worth, Richard Wallis of Talis said, “OCLC are dipping their toe in the water on behalf of many of us who will be watching this service closely.”

I wish I knew more about how OCLC made the decisions it did. How did they decide on their pricing? What have they seen in the xISBN v1 usage logs? Who do they expect will pay? Who’s already paying?

You might not expect a change in the license on an obscure FRBR-related algorithm to generate this much discussion, but there’s more here than just ISBNs. OCLC, an enormous American non-profit organization made up of member libraries, had a free service that now they’re charging for. Talis, a UK for-profit company with a large union catalogue of its own but no xISBN-equivalent service, is watching to see how this evolves.

LibraryThing, a small American for-profit company that charges very little for its services, has thingISBN, which is free. Tim Spalding, who runs it, said on the code4lib list, “I’d love to see LT’s member-driven data mashed up with more traditional work-set analysis. At this point everyone is free to try that on their own, but we’ll move to a more traditional copyleft license, so improvements like that have to be shared.”

Behind some of the discussion about all this is the feeling, which I share, that all bibliographic metadata should be free. If it were, it would be simple to build a free xISBN replacement. Of course, it isn’t. But some of it is. Perhaps enough. My Pride and Prejudice experiment is one start along that path, and I’ll get back to it tomorrow and give you something to do over the weekend.


xISBN v2 available

Posted by: William Denton, 9 May 2007 7:31 am
Categories: OCLC

OCLC’s Eric Hellman announced that the new version of xISBN is up. The API documentation for xISBN explains it all and has lots of examples. It’s a nice improvement over the first version and has lots of new options.

Xiaoming Liu, also of OCLC, gave a short talk about this at the Code4Lib conference in Georgia in February. It was recorded, and the videos are online: go to the Code4Lib 2007 Lightning Talks page, browse down the Wednesday talks, and choose whichever of the “xISBN Update” links you prefer.

As you know, Bob, if you give xISBN an ISBN, it will return a list of ISBNs of other manifestations of the same work. LibraryThing’s thingISBN does the same thing, but people decide how to group things into works, instead of algorithms, as at OCLC. Both have been the subject of much discussion here lately. They’re both very useful and work very well together.


More on DC/RDA data model meeting

Posted by: William Denton, 7 May 2007 7:28 am
Categories: RDA,Semantic Web

Here are some more links to discussion about the Dublin Core/Resource Description and Access meeting in London last week.

Andy Powell was there and posted about it in When Worlds Collide …, so go read that.

Jenn Riley knows a lot about metadata, and she has some reservations about the new project. In DC and RDA – The Beginning of a Beautiful Friendship? she says:

There’s nothing in the announcement that indicates the development of RDA proper will be affected by this work; in fact, the indication in the announcement that funding will be sought for the activities outlined implies the work is a long way off, likely entirely too late to have any real effect on RDA. This seems to be to be entirely backwards – trying to harmonize DC principles with RDA after the fact. Didn’t the DC community learn its lesson about the pitfalls of this approach when developing the Abstract Model, only realizing long after developing a metadata element set that it would benefit from an underlying model.

This general approach failed miserably with the DC Libraries Application Profile. There, the application profile developers wanted to use some elements from MODS, but weren’t able to because MODS doesn’t conform to the DCMI Abstract Model. So basically what the DC community said here was that application profiles are great, they form the fundamental basis of DC extensibility, but, oh yeah, you can’t actually use elements from any other standards unless they conform to the Abstract Model, even though are no approved encodings for even DC itself more than two years after the Abstract Model was released. OK then. Way to foster collaboration between metadata communities.

Jonathan Rochkind says in RDA, JSC, DCAM, RDF, FRBR that he doesn’t grok the Dublin Core Abstract Model. He points to Towards an Interoperability Framework for Metadata Standards (1 MB PDF) by Mikael Nilsson, Pete Johnston, Ambjörn Naeve, and Andy Powell, and says it’s been helpful in helping him get a better idea of what it means. Follow the links in his blog post for other interesting stuff.

Lorcan Dempsey’s Data Convergence says the new plan is “interesting food for thought.”


RDA + DC + FRBR + FRAD + RDF = OMG!!!

Posted by: William Denton, 4 May 2007 10:10 am
Categories: RDA,Semantic Web

Exciting news came out yesterday about a Monday-Tuesday meeting at the British Library with bigwigs from RDA and Dublin Core. (Remember, RDA is Resource Description and Access, an in-progress revision to cataloguing rules, and the Dublin Core Metadata Intitiative (DCMI) is behind a widespread and fairly simple metadata schema, Dublin Core. RDA and DC are sometimes contrasted because the former is (or will be) a huge book full of complicated rules and DC can be used extremely casually.)

Recommendations:

The meeting participants agreed that RDA and DCMI should work together to build on the existing work of both communities.

The participants recommend that the RDA Committee of Principals and DCMI seek funding for work to develop an RDA Application Profile — specifically that the following activities be undertaken:

  • development of an RDA Element Vocabulary
  • development of an RDA DC Application Profile based on FRBR and FRAD
  • disclosure of RDA Value Vocabularies using RDF/RDFS/SKOS

Outcomes:

The benefits of this activity will be that:

  • the library community gets a metadata standard that is compatible with the Web Architecture and that is fully interoperable with other Semantic Web initiatives
  • the DCMI community gets a libraries application profile firmly based on the DCAM and FRBR (which will be a high profile exemplar for others to follow)
  • the Semantic Web community get a significant pool of well thought-out metadata terms to re-use
  • there is wider uptake of RDA

UPDATE: Don’t miss this wiki set up for the meeting, with lists and notes and links and useful background information. Thanks to Christine Schwartz for pointing it out; I missed it in all the excitement.

Karen Coyle’s Astonishing Announcement: RDA Goes 2.0 says “The call for a modernization of the library approach to metadata has been heard…. This is nothing short of revolutionary.” She does a mini-interview with Dublin Core honcho Diane Hillman, who says about the proposed vocabulary: “It will look something like the Dublin Core registered terms…. Having the formal vocabulary means that there can be a testbed for the many and complex relationships that are being expressed in RDA, FRBR and FRAD.”

Alistair Miles was one of the people at the meeting, and in RDA: Resource Description and Access he says, “The main outcome of the meeting was a proposal to jointly develop a new Dublin Core Application Profile for libraries, based on the RDA and on FRBR. The profile would also be closely linked to the ePrints application profile developed in the UK for Intute. The ePrints AP is an example of how the Dublin Core community is moving beyond the “15 elements” towards providing a general framework to support rich and highly structured metadata, via the Dublin Core Abstract Model [latest draft].”

(Heery and Patel define an application profile as “schemas which consist of data elements drawn from one or more namespaces, combined together by implementors, and optimised for a particular local application.” There are lots of different kinds of metadata systems out there, and chances are if you’re doing some work and need some metadata, you need a few terms from this one, a few from that one, and one from over there. Or perhaps one existing system has everything you need, but in fact it has too much, so you pick out of it just what you need. You write up your own set of rules about how you’re going to use the elements you picked for your particular circumstances, and that’s an application profile.)

Finally, in RDA/FRBR and the Semantic Web, Ed Summers points out that this is what Ian Davis, Richard Newman, and Bruce D’Arcus began in 2005: Expression of Core FRBR Concepts in RDF. Last June Ian Davis posted Harry Potter in FRBR where he showed what a FRBRization of Harry Potter and the Goblet of Fire can look in RDF. When the new work is done, we can FRBRize the same book again and see what’s different in the new rules.

The RDA + DC announcement is very interesting. Among other things, the work will bring some very useful standardization and formalization to all this stuff. We’ll have a standard set of terms that everyone can use, whether they’re reading Resource Description and Access and cataloguing a book at the National Library of Canada or whanging some metadata into a blog’s RSS feed in New Delhi. Also, we’ll have an application profile and defined rules about how to use it and express it in formats like RDF. Everyone everywhere can use the same words, with agreed-upon meanings, when they want to use RDA and FRBR and FRAD to describe things. I bet a lot of people will use it.


« Previous PageNext Page »