Today I got a very disappointing note in my inbox, from the US National Libraries RDA Test Project. I guess I’d call it a “ding” letter, and I have to say it was more than a bit surprising. I had volunteered to help with the testing, not by creating records, mind you, but in analyzing the records other people create. Given the fact that I’ve been the co-chair of the DCMI/RDA Task Group, done the major part of the work in registering the RDA schemas and vocabularies, and have been involved in building the XML schemas that will be the basis of much of the data creation for many early RDA implementations, I figured my experience might come in handy. But apparently not …

Dear Diane: Thank you for your interest in the US National Libraries RDA Test project. The RDA Test Steering Committee regrets that you could not be selected as a formal test participant. Interest in the project was much greater than the Steering Committee originally anticipated, and it was necessary to select test partners from more than 90 applications. Every applicant had a great deal to offer to the project, and each was carefully considered. The Steering Committee based its final selections on the goal of ensuring that the RDA Test will reflect a cross-section of US cataloging agencies balanced by size, type of organization, OPAC and cataloging systems used, and areas of specialization in cataloging and collection development.

The Steering Committee will share the methodology for the test on its Website at URL . If you are interested in conducting your own test of RDA, we encourage you to produce records following this methodology and to share the results with the Steering Committee during the test period.

Thank you again for your interest in the RDA Test.

So, exactly what are they testing that makes my knowledge and experience useless? Darned if I know. But I can’t get beyond the notion that the testing regime I see described on the website is pretty limited, and it’s hard to imagine what the results can really tell us, aside from the obvious difficulties people will encounter in attempting to cram a FRBR-based structure into any one of our current flat MARC-based library systems.

Much more interesting, to me anyway, is the idea of what RDA records might look like in straight XML or RDF, without the necessity of the contortions involved in making it all “fit” into a MARC system. Without the layer of MARC contortion we might really be able to figure out whether catalogers could adjust to RDA and create FRBR-based records. It would be nice to think that some of the open source systems would find a way to play with these records and test some more forward-looking, rather than backward-looking implementation issues.

Any volunteers for an alternate testing regime?

By Diane Hillmann, May 29, 2009, 5:08 pm (UTC-5)

This week, Karen Coyle wrote a post about LCSH as linked data: beyond “dash-dash” which provoked a discussion on the id.loc.gov discussion list.

It seems to me that there are several memes at play in this conversation:

LCSH and SKOS

As Karen points out, LCSH is more than just a simple thesaurus. It’s also a set of instructions for building structured strings in a way that’s highly meaningful for ordering physical cards in a physical catalog. In addition, each string component has specific semantics related to its position in the string, so it’s possible, if everyone knows and agrees on the rules, to parse the string and derive the semantics of each individual component. The result is a pre-coordinated index string.

These stand-alone pre-coordinated strings are perhaps much less meaningful in the context of LOD, but this certainly doesn’t apply to the components. I think what Karen is pointing out is that, while it’s wonderful to have a subset of all of the components that can be used to construct LC Subject Headings published as LOD, there’s enough missing information to reduce the overall value. As I read it, she’s wishing for the missing semantics to be published as part of the LCSH linked data, and hoping that LC doesn’t rest on its well-earned laurels and call it a day.

Structured Strings

Dublin Core calls the rules that define a structured string a "Syntax Encoding Scheme" (SES) and basically, that’s what the rules defining the construction of LC Subject Headings seem to be. It’s structurally no different than saying that the string "05/10/09", if interpreted as a date using an encoding scheme/mask of "mm/dd/yy", ‘means’ day 10 in the month May in the year 2009 using the Gregorian calendar. Fascinatingly, that same ‘date’ can be expressed as a Julian date of "2454962", but I digress.

As far as I can tell, no one has figured out a universally accepted (or any) way to define the semantic structure of a SES in a way that can be used by common semantic inference engines, and I don’t think that anyone in this discussion is asking for that. What’s needed is a way to say "Here’s a pre-coordinated string expressed as a skos:prefLabel, it has an identity, and here are it’s semantic components."

Additional data

So…

"Italy--History--1492-1559--Fiction"

…is expressed in id.loc.gov/authorities/sh2008115565#concept as…

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix terms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://id.loc.gov/authorities/sh2008115565#concept>
    skos:prefLabel "Italy--History--1492-1559--Fiction"@en ;
    rdf:type ns0:Concept ;
    terms:modified "2008-03-15T08:10:27-04:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
    terms:created "2008-03-14T00:00:00-04:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
    owl:sameAs <info:lc/authorities/sh2008115565> ;
    skos:inScheme
        <http://id.loc.gov/authorities#geographicNames> ,
        <http://id.loc.gov/authorities#conceptScheme> ;
    terms:source "Work cat.: The family, 2001"@en . 

…and has a 151 field expressed in the authority file as…

151 __* |a *Italy* |x *History* |y *1492-1559* |v *Fiction

…which has the additional minimal semantics of…

<http://id.loc.gov/authorities/sh2008115565#concept>
    loc_id:type "Geographic Name" ; #note that this is also expressed as a skos:inScheme property
    loc_id:topicalDivision "History" ;
    loc_id:chronologicalSubdivision "1492-1559" ;
    loc_id:formSubdivision "Fiction" ;
    loc_id:geographicName "Italy" .

…and this might also be expressed as…

<http://id.loc.gov/authorities/sh2008115565#concept>
   loc_id:type id.loc.gov/authorities/sh2002011429 ;
   loc_id:topicalDivision id.loc.gov/authorities/sh85061212 ;
   loc_id:formSubdivision id.loc.gov/authorities/sh85048050 ;
   loc_id:geographicName id.loc.gov/authorities/n79021783 ;
   dc:temporal "1492-1559" ;
   dc:spatial sws.geonames.org/3175395/ ;
   dc:spatial id.loc.gov/authorities/n79021783 .

Making sure that those strings in the first example are expressed as resource identifiers is also something that I think Karen is asking for. (BTW, The ability to lookup a label by URL at id.loc.gov is really useful)

I should point out that Ed, Antoine, Clay, and Dan’s DC2008 paper detailing the conversion of LCSH to SKOS goes into some detail (see section 2.7) about the LCSH to SKOS mapping, but doesn’t directly address the issue that Karen is raising about mapping the explicit semantics of the subfields.

By Jon, May 20, 2009, 3:45 pm (UTC-5)

Friday I was in Hyde Park, NY, at the site of the Franklin D. Roosevelt Presidential Library and Museum, attending the NYLINK Annual Meeting. I’d been asked to come and talk about change in cataloging. NYLINK, formerly the SUNY/OCLC Network, is one of the struggling regional service providers that used to be the primary brokers for OCLC services for smaller libraries, and did most of the training for OCLC in the process. These organizations are now providing other kinds of services and training for a wide variety of libraries, and are still one of the best places for figuring out what working librarians are thinking.

I was particularly impressed with how this day-long meeting was organized. After the welcome and logistics, two speakers—Liz Chabot from Ithaca College and Susan Currie from Binghamton University—got things off with a bang by proposing seven “what ifs”:

… we stopped cataloging?
… librarians individually and as a profession promoted, used and helped develop Wikipedia?
… we accepted open source software as a way of being in control of the customer experience?
… we required all library staff to have expertise using technology?
… mistakes were expected and embraced—and librarians became the mistake masters?
… we didn’t make our customers work so hard?
… we let customers determine their loan periods?

Meeting participants spent some time talking about these questions, in the meantime getting to know one another at their tables, and getting fired up about the topics. The morning speaker, Joe Lucia from Villanova University, then started off with this energized group, and did a great job putting the important issues in a more general perspective. He spoke compellingly about our plight in libraries and the parallels with newspapers, described changes in how people deal with linear text vs. Hypertext, and pointed out the work of R. David Lankes (a colleague of mine at Syracuse University), who talks about the “library as conversation.” Joe’s primary point in bringing together these threads is his contention that libraries are not in the “information” business so much as we are in the “knowledge and conversation” business. He talked very passionately about the library as a “commons” where knowledge and conversation happen with the active engagement of the library as organization, bringing students, faculty and others out into the open for exchanges of ideas, not just providing places for their solitary study.

But of course it will be no surprise that I was most engaged with the question “What if we stopped cataloging?” I’m not sure that question is the right one, or perhaps just not specific enough. I might recast it as “What if we stopped cataloging the same old stuff?” By “same old stuff” I mean the secondary products of academic pursuits: books, journals, government documents, etc., the stuff that we have always cataloged. What if we began to catalog different stuff, the stuff we’re publishing ourselves, the podcasts we create of the talks given by our own faculty and those who visit us to discuss their research, the exhibitions we create out of our collections and materials borrowed from others, the primary materials our former faculty members have donated into our care—the things that we’ve always consigned to others (archivists, perhaps?) or not cataloged at all. What we have called cataloging, when it is not done for the first time, is these days for the most part consigned to staff who may not be catalogers, bought from vendors, or automatically claimed from larger databases to populate ours. Insofar as we have used these more efficient, more automated strategies, we have indeed “stopped cataloging.” What we haven’t done is modified our missions and our budgets to take more responsibility for these other often unique things that we have neglected, and as such we have been the instruments of our own demise.

So as we hear the bemoaning of the profession of cataloging, and ourselves sometimes obsess about the “how” of the changes in our lives brought on by technology, financial distress, and various other pressures, we would do well to remember that behind all this, the mission of the institutions we call home is changing significantly, and we can’t answer those “how” questions without looking anew at those shifting missions.

My presentation and Corey Harper’s came after lunch, and the group was exceptionally ready to hear what we had to say, thanks to the fabulous preparation provided by NYLINK, Joe Lucia, Liz Chabot and Susan Currie. I had some great conversations there, and on the ride home thought quite a bit about how proud I am to be a librarian, and the wonderful opportunities I continue to have to get to know some of the best people in the business.

By Diane Hillmann, May 11, 2009, 9:06 am (UTC-5)

This week I’ve been on the road, doing presentations at the New Jersey Library Association meetings on Wednesday, and Five Colleges in Western MA Friday morning. I’ve been doing all this travel in the faithful MetadataMobile, increasingly an object of interest, amusement and (almost) veneration by those who have heard about her. It’s been a great trip, with wonderful audiences, and as always, I learn a lot about what’s on the minds of those in the trenches.

I arrived later than I’d expected at NJLA, due to missing the exit I was supposed to use on the Garden State Parkway. I was very busy looking for a coffee and a rest stop, and it probably whizzed right by me. By the time I realized I was way too far down the Parkway, I decided to get off and use my newly acquired New Jersey Official Map to get me to my destination. I know, it’s very retro as a strategy, but maybe I was feeling that a little self-punishment was in order, or maybe I just needed to get off the big roads for a while.

As a child of the fifties, my first stop was a gas station as I needed to gas up and figured I’d confirm my location on my handy-dandy map and get some advice on my route. I had forgotten that New Jersey doesn’t do self-service gas pumping, and I’d also forgotten that virtually all the NJ gas stations I’ve visited in the past decade are staffed by (and possibly owned by) persons from somewhere else. When shown a map and asked for directions, they invariably act like people who have been lately dropped into their current location from the sky with no idea how to locate themselves in the world they inhabit (pretty close to reality, probably). I knew pretty much where I was, but wanted to figure out how to get to a big secondary East/West road that would take me to where I needed to be. It was hopeless—these guys (they’re always guys) couldn’t even locate the town they were in on my map so I thanked them for their trouble and got back on the road. I ended up following my nose (and the MM’s internal compass, which tells me whether I’m going N, S, E, W or any rational combination thereof). Thankfully, in NJ if you go east you find the Atlantic eventually.

It was a gorgeous day to be taking the long way, and I passed by what remains of the agricultural areas in NJ, including a whole bunch of horse farms (one was called “Due Process Stables” and you could pretty much guess how that was funded). I got to the conference venue in Long Branch (right on the beach) in time to take a nice walk on the boardwalk and decompress from those long hours behind the wheel.

The program was about FRBR and RDA for real people, and Rhonda Marker from Rutgers presented first. Rhonda and I have presented together before, and generally have fun with it, since we sometimes disagree and are very open about it, to the general amusement (or maybe consternation) of the audience. We only had two hours, which wasn’t enough, but the audience was very receptive and had some good questions. The slides for this presentation are available here.

From Long Branch, NJ, I pointed the MetadataMobile north, through the horror of Big Apple driving (BIG trucks!), though Connecticut and up to Amherst, Mass., for a confab with a group from the Five Colleges. This had been in the planning works for some months, but was made more interesting by the announcement a few weeks ago of some pending reorganization of technical services operations in the colleges in response to the same kinds of financial pressures being experienced everywhere. I’d been forewarned about this by the organizers of this session, and included some of the issues most relevant to the group in my presentation (though the slides, available here don’t always reflect this directly). My underlying point was that they were not, in fact, dealing with only one crisis—the financial meltdown—but two, if you include (as I do) the pending changes in how we do business as data creators and managers, changes that are absolutely necessary to avoid the continuing marginalization of our efforts to provide information to our users.

One thing that impressed me about this group was that it included a number of the library chiefs, systems folks and others not part of the cataloging cohort that normally predominates in my audiences. This is good—very good, in fact—because it signals to me that the issues I talk about are being seen as crucial to the discussions around a change in mission and strategy essential to creating positive change over the longer term, not just arguing against cuts in budgets for the short term. But, like President Obama points out, the one-crisis-at-a-time approach doesn’t work in a context of multiple crises related to one another in fundamental ways. Attempting the financially necessary reorganization without a refocus on mission, in the face of the huge challenges we face in remaining relevant in the current information environment, is self-defeating at best, suicidal at worst. I spoke to one of the chiefs after the presentation, and he said that he wished his public services folks had been there, too, and I agreed. Certainly the demise of the library catalog as we know it will affect their work hugely, and it behooves us all to break down those barriers of specialization as we face issues of our own survival as information providers.

What this also suggests is that the focus of the testing being done by the US national libraries cannot be limited to cost benefit analysis. If we fail to look at the issues of most importance to libraries, as the LC Working Group on the Future of Bibliographic Control certainly did, we risk our future entirely. Of course, in the current environment particularly, we need to pay attention to costs and efficiency as we move forward, but they cannot be our sole criteria for decision making.

For those of you knitters among my readers, I also made a pilgrimage to Webs and spent a goodly portion of my speaker’s fees to support the local economy. It was a great, though at this point I’m going to have to do a “real” retirement sooner than anticipated to use up all the accumulated yarn and patterns I’ve amassed. Either that or recruit a chauffeur on these trips so I can knit on the way.

By Diane Hillmann, May 3, 2009, 2:23 pm (UTC-5)

There’s nothing quite so humbling as reading something one wrote some time ago, particularly if the ideas expressed are, if not quite so old as oneself, at least betray an expired shelf life. This is particularly a problem when the item in question “looks” new, e.g., is recently “published.” These uncomfortable thoughts arose this week because the long-planned festschrift for my late colleague, Tom Turner, was released last week.

Metadata and Digital Collections: a Festschrift in honor of Tom Turner. Edited by Elaine Westbrooks, Associate Dean of University Libraries at the University of Nebraska at Lincoln, and Keith Jenkins, GIS/Geospatial Applications Librarian at Cornell University’s Albert R. Mann Library. Ithaca, New York, USA: Cornell University Library, 2009.

And my contribution, “Looking back—looking forward: reflections of a transitional librarian” is the last one on the table of contents.

Tom has been gone for about six years now, and in fact “Metadata in Practice,” published in 2004 (and still selling pretty well, thank you very much!) was dedicated to Tom, who was something of a missing link between me and my co-editor, Elaine Westbrooks. To explain that odd notion: for a time during my tenure at Cornell—at the beginning of the Age of Metadata—Tom was the only other person who understood what I was talking about when I tried to discuss metadata. Then Elaine came along, to take over Tom’s responsibilities when he became ill. When Tom unfortunately left us, I decided to become something of a mentor to Elaine, as Tom had been. I never asked her whether she wanted one, mind you, just noticed that she was smart and understood deadlines, and asked her to be co-editor of the book. It certainly worked out for me, especially when I broke my right arm while we were still setting up the book and the contract … but that’s another story. I’m not sure all the reasons why the release of the festschrift took so long, but the result was that something I wrote eons ago (in Internet time) is now “just out.” I’m not assigning blame, mind you—I’m hardly one to cast the first stone—but it seemed like a good idea to point out where my thinking has moved on beyond that article and where the issues I brought up then still have legs.

The issue that still speaks loudest (at least to me) is the one about standards maintenance. The bald fact is, in 2005, when I wrote most of the article that appears in the festschrift, I wasn’t yet convinced that MARC was dead, and I was looking at the issue of how standards were maintained primarily in the context of Dublin Core. Most of my experience with “process” in that context had been on MARBI, and in fact MARBI was a direct progenitor to the Dublin Core Usage Board, on which body I served until last fall. In the festschrift article, I take a few potshots at MODS (as is my usual habit) primarily because, at the time, the maintenance process for MODS consisted of one mailing list dominated by XML geeks. The MODS squad has wised up since then, and has set up a maintenance group with some credibility, but some of the damage has been done and can’t easily be undone. (And if you think I’m picking unfairly on MODS, ask me about DC Source).

Process is not an insignificant issue in the life of a would-be standard, and it is indeed staring us in the face yet again, in the guise of RDA. There has been a bit of conversation on the RDA-L list recently, precipitated by the announcement that Tom Delsey will be stepping down as Editor once RDA is officially released. So the question was, “what’s the plan for maintenance?,” and the answer given by Marg Stewart was just the sort of answer that was intended to reassure, but certainly didn’t reassure me: “The JSC will revert to its process of handling updates to RDA as it had for AACR2. Namely, the update process will be handled by the JSC itself.”

Why am I not reassured? I can’t help looking back at the process of creating RDA in the first place. In the beginning, there was the closed room with the JSC working under the cone of silence. Then, after rumblings from the field, drafts were released only to specific people in selected organizations who then became gatekeepers for their communities. Then when that didn’t work very well, drafts were released publicly. But still, the same small group made the decisions, managed the discussion, and held their meetings essentially behind closed doors, with decisions handed down from on high in a process that seemed anachronistic at best and elitist at worst. As another un-reassured commentator noted:

“And the report of the last JSC meeting shows us, that we have created a process which is so huge, that we couldn’t manage it: only a third part of the suggestions could be discussed. In spite of modern means of communication, it’s not sure, that we are able to handle a world wide process for developing a cataloging code.”

I think we need to look at the process of updating and maintaining RDA anew, recognizing that the process that created the text was problematic at many levels, certainly overly expensive, and in this age of social networking and virtual communities, unnecessarily limited in its reach. There’s also the issue that the work done to create the vocabularies and schemas is now essential to RDA, and there needs to be a process that updates those vocabularies, and provides for their rational expansion (into full vocabularies, not just term lists) as well as their extension (with new terms and more appropriate participation). What the community can ill afford is a continuation of JSC as the Marie Antoinette of organizations, perfecting their embroidery skills while the revolution continues on outside the gates of their artificial world. The current process is ponderous, exclusionary, and ensures that the “quality control” measures operate primarily as a bottleneck, without meeting the overall goals of providing consistency and utility.

What might we like to see in an RDA update process going forward? To start that conversation, let me suggest the following requirements:

1. The process must include explicit ways for a broader array of libraries to participate, not just the Anglo-American national libraries and library societies. Including these libraries only in the review process is insufficient and condescending.

2. The process must be more nimble and lightweight, using the community based tools that enable all of us to operate as a virtual community, instead of relying on expensive face-to-face meetings in closed rooms and decision-making based only on formal proposals. The NSDL Registry will be adding capability that will allow simple threaded discussions at a granular level (by property or term), which, with appropriate management, could provide a much better way to flesh out and expand the current vocabularies. This sort of capability is essential at all levels of a newly envisioned maintenance process.

3. The process must not be based primarily on the elevation of “control” as the first principle. The assumption that seems to be behind the assertions by JSC that they must be the center of the process is that this is the only one way to assure that the textual guidance and the vocabularies grow intelligently. This assumes the re-creation of the AACR2 update process with the same cumbersome, centralized methods, and the multi-year time lines for change. To my mind, this is the most Marie Antoinette-ish notion of all, an absolute denial of reality.

Can we finally look at what worked and didn’t with the RDA development process, at what the tools available to us provide to meet our needs for broad participation and quality control, and design something that makes more sense? We cannot just keep maintaining the powdered wigs and the formal dancing in the face of the revolution happening outside our gates.

By Diane Hillmann, April 9, 2009, 1:02 pm (UTC-5)

This past Friday I dug out my go-to-meeting clothes and made one of my occasional forays down one hill and up the other hill to my former place of work, Cornell’s Olin Library. The occasion was the monthly meeting of the CUL Metadata Working Group which I try to show up for when I’m not traveling. This month’s speaker was Sarah Shreeves, from the University of Illinois, Urbana-Champaign (UIUC). Now, I’ve known Sarah for some years, and have worked on a project or two with her, so this seemed like a perfect opportunity to dress up and get out of my cozy office for a morning to see what Sarah was up to. It helps that the weather is improving, this being pre-Spring in Ithaca (still largely gray skies, but mostly without the white underlay).

Sarah’s presentation is described on a WG page devoted to her visit, though her slides aren’t up yet as I write this. She talked primarily about a project UIUC is working on with the University of Wisconsin, called BibApp. The project’s aims are focused around research and collaboration on college campuses, looking for ways to bring together experts, their output, and information about them. Sarah’s role at UIUC includes their institutional repository, the point person at UW is Dorothea Salo, a self-described “repository rat,” and the presentation made quite clear how BibApp provides important support for IR’s in general as well as alternatives to the clunky interfaces most of them come with.

From my admittedly metadata-centric point of view the most interesting things Sarah demonstrated were the backend work they do with author name disambiguation and automating the determination of what can legally be deposited in the IR from the oeuvre of a particular author, based on the publisher of the item. Depending on the research community, these items may have metadata available from A & I providers, MARC metadata, or no metadata at all (book chapters and conference publications often fall in that last category). The project is developing a tool that can support the work of librarians or other trained staff in determining who an author with a particular surname and forename initial might be, using a combination of human and machine intelligence. Although it’s not authority work as we know it, once we get to a mashup phase in the new world of authorities, the data provided by this tool will be quite useful, based as it is on the occurrences in the publications themselves, as well as linkages to additional information on the author (his/her institutional affiliation, email address, etc.) There’s already a somewhat primitive XML output, but even as it flashed by (not on the presentation, but as part of the discussion) I could see some possibilities.

In this application, being able to match the publisher name in the metadata with data about publisher policies to determine whether pre-prints, post-prints, author copies or all of the above can be legally deposited saves a great deal of time both for librarians and faculty. As Sarah was talking about this, I was struck by how clear a use case this was against traditional library practice of transcription of publisher information. I suggested this to Sarah and she agreed that they really needed standard forms, not transcribed data, to make this sort of tool function well. RDA’s approach is still focused on the idea that transcription is the best way to assure uniformity, but in my opinion that approach only provides the illusion of uniformity, and the number of LC Rule Interpretations for AACR2 on this point suggests that in a world where human resources will be spread thinner than ever it’s time to jettison transcription of important data elements. The alternative that we need to implement is to treat publishers as corporate bodies and control their names as we do other names, and this will certainly be technically possible in RDA, but is not what the guidance instruction tells catalogers to do.

Another really nifty feature Sarah demonstrated was the analyses of publication patterns of different groups of researchers that were enabled by the tool once the publications themselves were available in one place. The BibApp tools are being tested using a variety of research groups, some science and others humanities, highlighting important differences between the cultures of these research areas. I was impressed by how sensible the approach of the developers has been, and how they appear to be using a nicely iterative approach to working through their results and keeping their goals in mind. In a world where university administrators need reminding of the importance of libraries and the work they do, this sort of project stands out like a beacon—meeting a number of promotional and marketing goals through a very librarian-ish organizational approach with some quick wins in the bargain.

That said, this is not the ginzu knife that some may be looking for, and the developers are sensibly determined to keep it light weight and focused. Nice work, Sarah, and a very effective presentation as well.

By Diane Hillmann, March 23, 2009, 8:50 am (UTC-5)

I’m at the code4lib conference this week and I’m writing this as a reminder to myself for next year that, well, I just don’t care…

I don’t care about the latest faceted search. I don’t care about the latest Open Source OPAC replacement. I don’t care about the latest Open Source ILS. I don’t care about project updates or software demos. I don’t care about API/standards/vendor/LC/OCLC bashing. I don’t care about reinventing the 20th century training wheels that library coders are so obsessed with (it’s their job after all).

The people here are great, and I care about them both individually and as a class — I have a genuine affinity for library geeks of all kinds and this is, after all, a conference that tends to filter out everyone else. So it’s not like it’s not nice to be here, but the information being presented is useless to me to the point of irritation. And despite the fact that I like so many of the folks here, there’s definitely a cliquish feel to the whole thing with the hugely active IRC back channel of ongoing commentary (which really should be displayed where everyone including the presenters can read it), and the in-jokes (dongles). However welcoming, it can be a hard place to be a newbie.

I want to stand up and shout “Folks. Listen. The world has moved on!! Deal with it.” I haven’t felt the need to set foot in a library, except for a meeting, in the last 10 years. I haven’t used a library OPAC or ILS, except for the purpose of researching the system itself in at least a year. I have logged into a university library to get access to subscription-only journal articles, but I actually found those articles in Google. My library’s only utility to me is as the proxy gatekeeper for ACM or Elsevier, and these days it only keeps me out.

Many of the presentations and discussions are about alternatives: full-text vs. metadata; cataloger-created metadata vs. user-created metadata; Open Source vs. commercial; Google vs. the OPAC; data distribution standard A vs. standard B; marc vs. everything else. There’s a corollary notion that interoperability means everyone has to agree at some level. What really needs to happen is that everyone needs to get used to the idea that the ideal world in which everyone supports a few standards and data formats is the one we used to have, or imagine we had, and it’s gone. And it’s not coming back.

What we have now is chaos. Many, many people are working very hard to maintain control over increasingly chaotic data and less-and-less predictable user behavior. They’re trying to hang on to users, even scholars, who are increasingly getting their information through Google or YouTube. They’re trying to figure out how to deal with steadily increasing digital collections, steadily decreasing user satisfaction, tone-deaf vendors, and sinking budgets.

Let it go. Chaos is good. Keep your systems open and flexible. Watch. Listen. Integrate instead of compete with Google. Integrate data from the social networks. Share everything. Aggregate. Watch. Listen.

Embrace.

By Jon, February 25, 2009, 2:31 pm (UTC-5)

Some of you may know that a few years ago (2005 actually, as I look at my files) I worked with the ALCTS Continuing Education group to set up a “Metadata Standards and Applications” workshop, as part of the series of workshops they were developing with LC funding. At the time, I was intrigued by the set of course series objectives I was given by the committee:

• To equip catalogers to deal with new types of resources and to recognize their unique characteristics
• To equip catalogers to evaluate competing approaches to and standards for providing access to resources
• To equip catalogers to think creatively and work collaboratively with others inside and outside their home institutions
• To ensure that catalogers have a broad enough understanding of the current environment to be able to make their local efforts compatible and interoperable with other efforts
• To prepare catalogers to be comfortable with ambiguity and being less than perfect
• To enable practicing catalogers to put themselves into the emerging digital information environment and to continue to play a significant role in shaping library services

All these seemed really important to me, and I was eager to try and set up something to meet those objectives. At the time, the notion was that LC would hold rights to the work, and I would contract to deliver the workshop materials. It wasn’t a lot of money, but I was sensible both of the honor of it and the challenge. After a great deal of back and forth with the committee and others, the first workshop was held in July of 2006, as a “train the trainers session.” The idea, as in most of the other successful ALCTS efforts, was that a group of trainers would be able to spread the effort around, and provide a variety of face-to-face efforts for both general groups of catalogers and specialized groups. At the time, workshop organizers needed to work with LC CDS to arrange for materials to be printed for trainers and participants—but last year LC got out of the business of printing materials (and attempting cost recovery from the workshops) and made the materials available for free.

I think I presented the workshop somewhere between 6-8 times, usually with another trainer (two full days is a long haul for one person, even one who knows the materials as well as I do). I’ve no real idea how many times it was presented by all the folks who were in the trainers group, but it was my understanding that it was one of the most popular workshops in the series. In fall of 2007 I did a full revision of the slides, and sent them to CDS, but I found out later that the updates I did were never integrated with the workshop materials. In April of last year I was contacted by one of the participants in an earlier workshop, who wanted me to present it to a group of Canadians in the fall. We agreed on dates and a co-trainer and everything was going along as usual when I got an email from someone I knew slightly at LC who’d taken over the management of their training efforts.

At that point I was told that the materials had been entirely revised by two LC staffers (both of whom I knew) and that the version they’d created was now the “approved” version. I’d had no idea this was in the works, hadn’t been consulted or even informed, so needless to say I was pretty unhappy about this news. I was reassured that this was inadvertent, that everyone thought that someone else had “informed” me of this, but it seemed to me that if there had been a good faith effort to consult with me about suggested changes, the question of “being informed” would never have arisen.

When I was able to view the slides, I was even more unhappy about the situation. Not only did they now have LC slide themes, but the whole thrust of the workshop had changed. There were many more slides about MODS, MADS and METS (accompanied by more LC examples), and a new emphasis on PREMIS (including what I felt was an unrealistically complex exercise). There were also errors introduced about OAI-PMH (that section was cut drastically), and misstatements about Dublin Core, and a number of other worrying changes. I told my contact at LC that I would not present the Canadian workshop with the “approved” slides but would update the slides I had to reflect some of the changes in those slides that I thought were improvements. Thankfully I had help from Rhonda Marker, who’d presented with me before and was booked for the Canadian adventure, and we finished a new improved version of the slides in time for the Canadian workshop. We tweaked them a bit more, and shipped them to LC.

There were a few more interchanges, but it became increasingly clear that what was at issue was not the examples, or the specific content of slides, but a very basic difference of approach. My approach was not necessarily to provide solutions for librarians, but to survey the landscape with them and try to affect how they thought about their choices and how they make decisions. When I presented the workshop I spoke frankly about the advantages and disadvantages of the available metadata formats, and particularly how I thought they might hold up in the future. I tried to provide perspectives that were not just library perspectives on the world, recognizing that they were probably working with technical people with whom they had no shared language or frame of reference. It seemed to me that they needed to understand what was going on outside libraryland, and to think differently about how technologies being discussed in that world might affect their work.

I didn’t see that in the LC slides. I was told that the changes made were in response to comments by other trainers and participants, but yet, the evaluations from the workshops I gave were quite positive, and this sense was confirmed by several other trainers I knew, who did the workshop frequently and had made their own modifications in the slides and exercises to suit their styles and experience (this was always fine with me, by the way). One statement in the back-and-forth between me and LC stands out—the LC revisers of the slides felt that the workshop should be more “practical,” which their version indeed was, if one bought into the notion that what librarians really needed was more about MODS, METS, MADS and PREMIS.

As I pointed out to my contact at LC, I disagree pretty strongly with that approach. Librarians need to look beyond standards (whatever your definition of standards) created within the library community, based on the ways we’ve always done things in libraries. New standards are coming down the pike: RDA, FRAD, etc. Unlike MODS and MADS, these are based on the FRBR model, not the MARC model (if there is indeed such a thing) and have formal structures that can be used easily by libraries and by others outside libraries. We do our colleagues no favors by promoting solutions based primarily on how comfortable and familiar they seem, and what institution is responsible for maintaining them. Will MODS and MADS survive the transition to RDA, FRAD and FRBR-aware data? I don’t think so, and I can’t therefore recommend those solutions to librarians looking for answers to their digital project questions. As I said to my contact at LC:

“[Librarians] need to be able to THINK about these standards in a broader context, ‘get’ the issues they present in an implementation environment, and understand how they can make decisions today that will still seem sensible five or ten years down the road.”

If that’s impractical, so be it.

So, the upshot is that I’ve asked that my name be taken off LC’s version of the slides, and I’ve made it very clear that they should take me off the trainers list too, because I won’t present their version of “Metadata Standards and Applications.” I’ve also told them that they are welcome to use any of the slides I’ve sent them for any version of the workshop. My slides will continue to be available on my website at http://managemetadata.org/msa_r2/ and I’m hoping to continue to develop them to include RDA and other new standards as they become available. So, for those of you planning to present the workshop, expect to be asked to agree to give the workshop as LC has revised it (I hear that the LC slide themes will be removed, but the content will remain the same) or you won’t be able to advertise it as a Cat21 workshop.

I’ll be doing my workshop, or nothing; but that’s okay. I’ve got places to go, many interesting things to do, and I’m done arguing about this one.

By Diane Hillmann, February 6, 2009, 4:55 pm (UTC-5)

Yes, it was laughable to imagine that I might blog from Midwinter—silly me. My schedule was insanely full and I didn’t even get to peek into the exhibit hall, much less visit and pick up some swag for the grandchildren. As things move along with RDA, and as I become even more convinced that despite its flaws we need to have it out and used, my conference program became even more ridiculously focused on anything RDA. So there I was at both CC:DA meetings, the RDA Testing meeting, the RDA Implementation meeting, and on, and on.

I posted briefly before I left for Denver on the work Metadata Management Associates (me and Jon) will be doing with ALA Publishing in relation to the RDA Online product. Some of this will involve integrating the RDAVocab registered elements, roles, relationships and value vocabularies, with the online product, thus ensuring that catalogers using the online product and applications using the Registry output will be linking to and/or referencing the same information (no synchronization issues!) Other tasks involve building XML schemas so that those seeking to build data—whether in the tool or in another application—have somewhere to start. We’ll be working on other bits and pieces to enable support for those trying to see their way forward from where they are to a world where RDA with a FRBR foundation is the way we do description. This is a tremendously exciting project for us and it was nice to see that when I explained it to others they thought so too!

The RDA Testing meeting, led by Beacher Wiggins, was an interesting one—more so than I had anticipated. The idea is that the effort will be led by LC, NLM and NAL, but it will be open and participants from a variety of venues (including library students and recent graduates) are encouraged to sign up. Beacher emphasized that testers will use the environments they have available, which for most means MARC systems, but if RDA Online can produce records as well, that’s an added opportunity, particularly for systems eager to figure out how these records might behave in newer systems. Testing will last about six months and include a training phase, an active record creation phase, and an assessment phase. After that, the three national libraries will make individual “go” or “no go” decisions. There will be a website at some point, most likely a subset of the website of the LC Working Group on the Future of Bibliographic Control.

There were a lot of questions about this testing effort, only some of which were asked or answered at the meeting. One of my concerns (which was too fuzzy and formless to ask at the meeting) has to do with the usefulness of testing how much time it takes to create a record under the old regime (AACR2 and MARC21) vs. the new regime (RDA and MARC21? or RDA and the RDA schemas and tools?)—you see part of the problem here. The deck seems pretty stacked towards the old and familiar, despite some attempts to create more balance by ensuring that some of the anticipated participants—the library students and recent grads primarily—will have as little experience with the old regime as they do with the new one. In all cases there will be “subjective” assessment solicited from participants as well as the “objective” results (the time invested in record creation). Part of the plan is that there will be a 20-30 item list of resources to be cataloged, and most participants will do one regime or the other (not both) for a particular resource, with hopes that aggregation of a large-ish number of results will provide a more reliable measure. I’m not sure the “timing test” is going to be particularly useful, given the number of uncontrollable variables likely to be introduced along the way—particularly in whatever technical environment is used—but I’m sympathetic with the notion that looking for a useful objective test in this situation is extremely challenging. It’s obvious to me that we’ll learn something from the objective and subjective portions of the testing, but whether what we learn is going to be clear enough to support either option with sufficient clarity for those who prefer evidence-based reality is quite another story.

Given the fact that the testing results are likely to be subject to various interpretations and caveats, it’s hard to imagine how any of the national libraries could justify a “no go” decision, and what that would actually entail, if one or more might be inclined in that direction. One hears rumors, which naturally can’t be attributed, that one or two are so inclined but it’s something I’m really curious about. Would a “no go” decision be something like “no go until 2011″ or “we’re sticking with AACR2 and MARC21 until Hell freezes over,” and how long would that be supportable? Two years, five years, maybe? It all depends on how long and how messy one thinks the transition would be, and how many would be inclined to lag far behind the early adopters waiting for some undefined
tipping point. I tend to think that there will be enough of the early adopters, particularly in the open source development community, that by the time the “go”/”no go” decisions are made, the “go” decision would be almost inevitable. But I’m an optimist on this, as I’m told fairly frequently.

Of course, the timeline for testing is dependent on RDA Online being released on time (3rd quarter, 2009) complete with vocabularies. I’m an optimist on that, too, which is a good thing, since some of it depends on me continuing to work relentlessly on some of my task list. More on that topic in other posts …

By Diane Hillmann, February 2, 2009, 6:46 pm (UTC-5)

Some of you will already have seen the blog post by Patrick Hogan on the ALA Tech Source blog. This is an exciting new collaboration for us, and in some ways represents the culmination of the DCMI/RDA Task Group efforts to register the RDA Elements, Roles, and value vocabularies. We figured that seeing the vocabularies in the Registry would highlight the importance of the formal representation of those vocabularies, and lead to good ideas for embedding them into actual products of use to libraries. This is certainly what seems to be happening.

Because this announcement came out so close to Midwinter, much discussion on the implications of this collaboration will happen in the context of RDA programs, meetings and hallway chats. I will certainly be reporting on some of those discussions from Midwinter or shortly thereafter.

Stay tuned.

By Diane Hillmann, January 21, 2009, 3:52 pm (UTC-5)