Like most people who do blogging (whether regularly, or sporadically like I do), I keep a list of ideas for posts, which I often add to, but less often write up. I’ve been a very poor blogger recently, and it’s not because nothing is going on, LOTS is going on. Perhaps it’s more that I’m waiting for some point where I could nail something down, and that moment seems not to be arriving. But one of my notes caused me toI re-read a post I did over a year ago, and look at some of the other parts of that interview with the three luminaries at that ALISE program. I ran across a comment Janet Swan Hill made when asked about lessons learned from the last transition to AACR2 :

“So I … think … the loss of independence, the loss of autonomy is one of the largest themes that I have seen. Another huge theme that I have seen in that period of time is we are still undergoing a period of grieving, I think, for the fact that we are learning that we have to put up with good enough.”

I agree with Janet’s insight—I see that kind of grieving frequently (most often displayed as anger) coming up in the cataloging venues of our profession. I sympathize, actually, much more than I often articulate—I’m far more likely to display frustration instead. But I think the problem lies in our definition of quality—we’ve put ourselves in a box where according to our deeply held notions of what quality is, we can never again achieve anything we can be proud of, because the world won’t pay for that particular kind of quality control anymore. All of us as human beings want to be appreciated for what we do, to achieve mastery in the area of work we’ve chosen, and somehow, many think it’s not possible to do that anymore in the world we see coming.

This is not entirely an illusion. The reality is that the old world where we built and maintained by hand our catalogs for users who needed our work to find the resources they required is gone, never to return. In fact, studies suggest that many of the newer users don’t understand the catalog at all, and use it infrequently, if ever. Certainly, because most libraries still have catalogs and still create information for them, it may be possible to maintain the illusion that there will always be catalogs, and therefore, there must always be catalogers to maintain them. We do all this work on computers, isn’t that enough?

Well, no, unfortunately it’s not enough, because we’re still creating catalogs and catalog cards, despite the computer technology we use today to create catalog records. But though I can understand the dismay about that disruptive fact, it seems to me that there’s plenty to look forward to. Make no mistake, with that forward looking vision there are still humans—well-trained and competent humans—continuing to pay attention to quality in their data, although using different techniques and certainly fewer human resources. Far too often the changes we see coming are translated in our brains as the death of quality in our world, but I don’t think that’s the case. How we define, measure, and assure quality will change, no doubt about it, but first we need to think realistically about what it means.

If we’re lucky and we do a good job figuring all this out, it will be ‘good enough.’ I would contend that ‘good enough’ was always the best we had on offer—there was never perfection, not ever. I remember when I was still working at Cornell, having routines that were run after every data load, to catch the known typos and other problems (some of which we’d created ourselves). Given how our catalogs were structured, this was important work, and made a difference to our users.

I can remember, too, during the many moons I spent on MARBI, that there were many discussions about whether or not the definition or structure of a particular field or subfield could potentially be misinterpreted or misused. My colleague Paul Weiss was particularly likely to argue that we should prevent people from doing such things, and one year I got a baseball cap with ‘USMARC Police’ stitched across the front that I would throw across the table when he started up that argument yet again. My point was that there was no MARC Police, and we’d better give up any fantasy that anyone would be on the enforcement end of good practices. (Although I recently noted that there’s a musical group called ‘Marc Police’ out there). More globally, there are no data police, so instead of pretending and wearing our baseball caps to prove they exist, we need to figure out some useful strategies for this new world we’re venturing into.

Consider, if the changes we’re talking about come to pass (and I believe they will), we’ll have statements instead of records, much less text, more batch improvement strategies, and to go along with that, different ways to measure quality. I wrote some of this up with my colleague Tom Bruce a few years ago, and it’s available here. The big message is that we need to change the conversation about quality and talk about it in an entirely new way. Quality is not about eyeballs trained one-by-one on individual records, but about new methods, new tools, and new attitudes. We will certainly need to use both our computer resources and our human resources more intelligently and flexibly, to share what we learn (whether effective or not), and to work closely with other collaborators in our endeavor—particularly the developers and coders who know more about what computers can do (and how to do it), than we do.

But I do keep my ‘USMARC Police’ cap in my office, just in case I ever need to throw it again.

By Diane Hillmann, December 7, 2011, 9:44 am (UTC-5)

Recently I retweeted the following:
“nice quote “your data ages like fine wine, whereas your software applications age like fish” in @mattwall’s j.mp/o8zsQG (via @edsu)”

Since then I’ve been thinking about the important lesson encapsulated in those less-than-140 characters, and how we’ve not really internalized this lesson in LibraryLand, no matter how many times we’ve migrated data. I remember many years ago, when I was working in the Cornell Law Library in the catalog card era, we were told by a university official that in case of fire, everyone should grab a shelf list drawer or two and head out the door. We were pretty stunned by this instruction, but they’d worked it all out—that catalog was the biggest investment the library had, and the only way to re-create it after a fire (if for nothing else than to determine the insurance to be paid for all those lost books), was via that shelf list.

Although a lot has changed since then, and most of those catalog cards were long ago recycled as scrap paper, the data they contained is (are?) still around, and still powering the online catalogs at Cornell. The catalog card drawers themselves were part of an ancient (and esthetically pleasing) piece of furniture, rescued from Boardman Hall, which was torn down in the 1950s to make way for Olin Library, a move many believe was a terrible mistake (Olin is the only modern building on Cornell’s Arts Quad). But I digress.

Like most libraries Cornell used OCLC’s services to create catalog cards, not paying much attention to the data being created as part of that process until well down the road. Also like many, Cornell actually had a clutch of ‘holding libraries,’ physical spaces associated with particular schools and programs, each creating what was effectively it’s own database via OCLC. But unlike most, Cornell bit that multiple-records bullet early and when the data was loaded into NOTIS, there was only one iteration of a bibliographic record, with all the local ‘holding libraries’ attached to it. A mini-version of OCLC’s ‘master record’, is one way to look at it, I suppose. It was a sensible, if not particularly popular move, and we all had occasion later to thank our lucky stars we had crossed that bridge as a group, rather than as individuals, when we saw the headaches our comrades were coping with.

My last data migration for Cornell was the one that moved data from the old NOTIS system to Voyager, and it was a year-long project that, if nothing else, reaffirmed my biases towards standard data. Although, like everyone else, we had some standard data (MARC bibs and authorities) and a lot of non-standard data (acquisitions and circulation), the bibliographic portion was, we agreed, the most important part, because everything else ‘hung off’ that bib record. Clearly, the data remained where our investment lay—by the end we weren’t even installing new versions of NOTIS in all the modules we used (and the ones we did install turned out to be mistakes). NOTIS was very old fish indeed by the time we moved to Voyager, and Voyager now, like most of the so-called ‘new generation’ of integrated library systems based on relational databases, is fast becoming a pungent geriatric fish as well.

Enough of looking back (interesting as that can be). The questions now revolve around how different we think our future will look. Will we continue to use/reuse our considerable legacy of data to build the services we want moving forward? If so, what are the steps we need to take, to transform our legacy data to RDA or any other more modern packaging for our data? We have a large number of value vocabularies as well as the MARC 21 schema we still rely on, which we will need to consider part of that plan for re-use.

I’ve seen a lot of ‘new rules for data’, but these are mine:
–Data should be able to be encoded in a variety of ways, to suit a variety of functions, uses, and systems
–Data should be managed at a granular, statement level, but also be available in a variety of record ‘formats’ (with records being understood as primarily an on-the-fly method of aggregating data for a variety of downstream users)
–Although current data is expressed mostly as text strings, data improvement strategies will be designed to change most of them to URIs as soon as practicable.
—Data definitions and specifications will be easily available on the web, allowing mapping to be simpler and easier to tweak

And the most important rule:
—Never, never make data decisions to fit the system flavor of the month, and ‘out’ any system that degrades our data as the price of functionality

This is not to say that the transition of our old data to what we need for a newer environment is going to be seamless, lossless or even easy. It will be none of those things. But I would contend that it’s not rocket science either, and we’d be well advised not to indulge in needless hand-wringing until we’ve explored the issues more fully. Stay tuned …

By Diane Hillmann, September 8, 2011, 9:30 am (UTC-5)

I’m supposed to be writing a paper (as part of a team and as designated herder) but like most people I have strategies for avoiding such tasks, not necessarily in ways that are entirely useless, just useless in the context of a particular deadline. In this instance, I’ve been listening to an interview of Janet Swan Hill done last summer at ALA Annual and now available on a website called “Gathering our Stories: Developing a National Oral History Program of Retiring/Retired Librarians”.

It’s definitely worth listening to the interview—Janet has been present for many of the important moments in the collective past of most catalogers in this country, and her viewpoint is always worth listening to. This is not to say that I always agree with her—I don’t always, and in particular I don’t agree with her position on RDA. Some of that disagreement arises from the fact that she, like most catalogers (and far too many library administrators), thinks of RDA as the successor to AACR2, the cataloging ‘rules’. I, on the other hand don’t care at all about the rules (there, I’ve said it, are you all happy now?) Instead I see the potential of RDA elsewhere: in the vocabularies specifically, and not incidentally in the revolution they represent in the way we envision our future in metadata. Put more succinctly, it’s not what we say, but how we say it, that makes RDA a big leap forward.

But Houston, we have a problem. Janet defines it very well (quotes from the transcript accompanying the video interview above):

“I suspect that we will go ahead and implement RDA, uh, after I retire. (Laughter) I suspect that many libraries will not implement it because one of the things that proponents of RDA are most eager to say is “Oh, it won’t make that much difference. Your old records will be compatible with the new ones.” So a lot of libraries are going to listen to that and say so why should I implement the thing.”

People, this is a huge problem for us. It’s a REVOLUTION we’re talking about with RDA, not just shifting the deck chairs on the Titanic, and it has little or nothing to do with the rules. And yes, it will cost us something to implement, but the ridiculous testing regime initiated in part because Janet (yes, this Janet) convinced the LC Working Group on the Future of Bibliographic Control to include a recommendation that work on RDA be suspended, will not help us determine whether or how we should implement RDA.

If I sound frustrated, it’s because I am. For most of the past few years, as the RDA Vocabularies have been developed, the marketing effort for RDA mounted by the JSC and ALA Publishing has been wholly focused on the guidance text and the RDA Toolkit. Only very recently have the vocabularies and their value been included in the educational efforts that have been mounted nationally and internationally around RDA. [See the ALA Webinars coming up for evidence of change.]. The small, cranky group that developed the vocabularies has gotten even crankier as a result, but there are days when I worry that without better understanding of what RDA represents, our efforts will be too little, too late. As we all wait for the result of the Testing Theater effort (see this previous post for my opinion on that) it seems less and less likely that a clear message will emerge from that confused process, and we definitely need a clear message from those most librarians still consider the leaders of the US library community.

The most recent cause for concern has been the draft ‘PoCo Discussion Paper on RDA Implementation alternatives‘. The beginning portion ended with literally the only sentence in the problem statement portion of the report that I could easily agree with: “In any scenario PCC must adapt to a hybrid environment.” But the question is, will that hybrid environment be facing backward or forward? And the question not asked in the report, but definitely assumed: in that inevitable hybrid environment, what would be the role of an organization such as PCC? The current value of the PCC is built almost entirely on the consensus-based environment of the past, where agreements on basic functionality of cataloging records emerged from a common necessity to provide a standard ‘floor’ below which efforts to rein in costs should not sink. But is that value the same in the future environment? Based on this report, it seems clear that the thinking of the writers of the discussion paper is still deeply embedded in the past, and they see the future as an entirely problematic extension with few opportunities for libraries or users in the change that RDA represents. “Perpetuating the hybrid environment long term will have a negative (and costly) impact on our catalogs and on all areas of bibliographic control.”

It seems very clear from the issues presented in the discussion paper that the negative view of the future stems from the lack of understanding of what will actually need to change to enable libraries to fully implement RDA, and what that change offers us at this critical time for libraries. A real RDA implementation, with the benefits already under extensive discussion in the library community, cannot, CANNOT, actually happen in a MARC environment with the inwardly focused assumptions in evidence in the discussion paper. This is not to say that documentation, training, protecting our legacy in terms of our MARC records and authority files are not rightfully topics that we ought to be discussing, but those discussions need to happen with fuller understanding of the environment we will be working in as we move our focus to the web, and away from our current catalogs. [See Karen Coyle’s TechSource reports here and here for a great start in understanding what we need to do.]

At the end of its paper, the Task Group proposes the following:

“Recognizing that there is a cost associated with choosing a direction that is different from the US national libraries, recognizing that PCC institutions will face a hybrid environment, and recognizing that there is a value to the PCC in member contributions from either rule set, the PCC should formally adopt RDA, regardless of the outcome of the US RDA Test, and the decision of the US national libraries, but it should set no time limit on implementation of RDA by PCC institutions.”

I heartily agree with this conclusion, and I say to the Task Group—tell us how we can help you consider some options that don’t stop with the unsustainable assumption of cramming RDA into MARC. Persuade us that moving to RDA is something we should embrace. Because the route you seem to outline can’t result in success, and libraries need successful paths, as well as correct decisions.

By Diane Hillmann, April 24, 2011, 3:55 pm (UTC-5)

At my keynote at Code4Lib a few weeks ago [recorded here about 90 minutes in], I got a good laugh when I equated the continuum that catalogers and programmers inhabit to that described by Kinsey in his famous discussion of sexuality. Since then, perhaps as a response to my presentation and Eric Hellman’s at the end of Code4Lib there seems to have been a resurgence of the conversation that comes and goes, particularly on cataloging blogs and discussion lists, about whether catalogers should learn to code and thus, perhaps, shift their personal position on the continuum I was describing (though probably not on Kinsey’s).

Some examples of this discussion can be found here and here.

To be honest, I get a little frustrated by these conversations, mostly because I think they miss the point about what it is that both catalogers and programmers bring to the table. Far too often, the conversation devolves to: ‘Why can’t you be more like me?’ I frankly don’t think that point makes any more sense now than it did some decades ago when the same arguments were made in support of all librarians learning to catalog. It’s not that I’m trying to discourage librarians, particularly the cohort at the beginning of their careers who see technology as a big part of their futures, from delving more deeply in the mysteries of code. Those who see the value and have the opportunity to learn should take advantage of that, just as more programmers working in the library sector should be exploring the history and culture of knowledge organization in libraries [A good place to start: The Intellectual Foundation of Information Organization, by Elaine Svenonius. Cambridge, MA : MIT Press, 2000]. Note that I didn’t say ‘cataloging’, because it’s more than that, just as what programmers do and how they think is only partially about coding. Whatever we can do to move ourselves closer to the middle of that continuum, to understand more about how technology works under the hood, and more about how library data was organized and created over the last century or so, the better we’ll be able to work together to solve the problems we see limiting our forward progress. For me, it’s about respect and understanding, which may or may not include emulation.

I’m perfectly willing to admit that some of my irritation with the argument that learning to code is necessary for librarians is that I don’t know programming at all, and the likelihood that I’ll learn to program at this stage of my life is similar to the likelihood that I’ll grow a few more inches (in a vertical direction, mind you) before I shuffle off this mortal coil. I don’t think my lack of programming knowledge has impeded me in learning what I need to know about the technology that interests me and has been the focus of my career for the last 15 years or so. In fact, one of the compliments I received a few years ago from a programmer is one I particularly treasure: he told me that I thought like a programmer. It will surprise nobody that I have no idea what that really means, but I took it as a compliment, and it was certainly meant as one.

I’m far more interested in learning more about ontologies, knowledge and vocabulary management, and information architecture, and it seems to me that this is an area where the significant gaps in librarian knowledge affect our ability to envision our future and make it happen. For the most part, we have some basic understanding about vocabularies but it’s almost entirely built on MARC (mine certainly was a few years ago), and that’s not going to help us much moving forward. This area is not, in my experience, one where programmers have either interest or knowledge, but it’s a natural extension of the path librarians are already headed down.

According to Myers-Briggs, I’m an ENTJ, and aside from learning the interesting categorizations of people that is a big part of Myers Briggs, my take-away from the workshops I attended was that there’s no good-better-best kind of personality or approach for any particular profession, task or team. Particularly for a team, what you want is diversity, not a group that thinks all one way. I’ve never forgotten this point, and still think it’s the key to any of our endeavors. I think the Code4Lib model is a terrific one for getting our heads together and figuring out how to move forward, and I hope to continue to look for ways to get more catalogers to attend and think about how they might contribute, as well as airing these issues in their own venues. (And many thanks to the programmers who show up regularly at ALA!)

Aside from my strong feeling that there are other, more significant gaps in our knowledge than coding, there are two additional aspects of this ‘librarians-should-be-coders’ discussion that really worry me: first is that it will discourage those who don’t have the opportunities to learn coding from learning what they need to know to understand the technology that drives our world, well enough to participate in the change we need. My second big concern is that we’ll start focusing again on the ‘why can’t you be more like me’ instead of remembering that we need the skills and understanding of a broad range of librarians and technologists to get where we need to go, not just the ones who have been convinced that coding is the best way to prove their enthusiasm and commitment to moving ahead.

By Diane Hillmann, March 2, 2011, 5:12 pm (UTC-5)

Some of you have already seen the live feed or the recordings for last week’s Code4Lib conference. If you have, you might already know that I was the keynote speaker for that conference. (The archive page is here, my part is about 90 minutes into session 1; slides are available too). The whole story of how I got there is interesting, and beyond that I’d like to talk about what I took away from it. I attended all of Tuesday and Wednesday, and left Thursday morning (after my return from ALA Midwinter in January, I’ve developed a strong disincentive to book the last flights into Ithaca from anywhere), thus missing the Thursday morning events. I’ve since caught up with those recordings.

The invitation came from conference host Robert McDonald, and was totally out of the blue. Code4Lib has an admirable process for choosing keynoters–they have a wiki and backchannel list (that anyone can join), which keeps the voting off the main discussion list. I’ve never attended Code4Lib before, though I’ve been a lurker and an occasional participant in the discussions on the list for some years, and I know many of the regulars. As someone who hadn’t attended the conference before, it never dawned on me to participate in the voting. I didn’t get the most votes, but when the high vote getter turned them down, I was asked. At first I was pretty intimidated by the whole idea, but that passed fairly quickly, and I started to get excited by the challenge it represented, both for me personally, and as a representative of a whole host of librarians who never get a chance to talk to a room full of library programmers. It was clearly not an opportunity to be wasted.

I gave a lot of thought to what I wanted to talk about, and started and abandoned several topics before settling on one. It clicked for me when I participated in a discussion at ALA Midwinter amongst attendees at the organizing meeting for the Linked Library Data IG. The discussion was about the discouraging fact that programmers and librarians (particularly catalogers) don’t seem to be connecting on the important issues of our libraries, instead we talk past one another. I think the general assumption is that this is a cultural divide, and it is on a superficial level, but a much more important reason is that we almost never gather together to discuss where we’re going. We all work for institutions that we believe are critically important in today’s society, but we’re not working together to solve the problems we can see in front of us.

So my talk for C4L covered a number of areas, including advice to programmers on how to find and connect with librarians/catalogers in their institutions who might be ready to work with them more closely, and what the priorities should be for that work. Despite a fairly rough start to the talk (the IU laptop I was using had a new version of PowerPoint that behaved quite oddly in presenter mode), it went fairly well and the response was wonderful. During the rest of that day and the following one, I had some great conversations with other attendees about the issues I brought up, and there will be some follow through on several of those. I was very pleased in particular that my plea for building demonstration projects that would show how the RDA Vocabularies can be used was taken very seriously, and I will be following up on that one.

One question I threw out to the audience was whether anyone had read our article in DLib, ‘RDA Vocabularies: Process, Outcome, Use’. About a half dozen had, but probably twice that many tweeted the URL, so perhaps some more have read it subsequently. I’m not sure why such a disappointing number have seen the article, but I hope that some who are interested in moving away from the frustrating parsing of MARC data will see the light.

I also talked a bit about how the library world had been ill-served by the narrow marketing of RDA as primarily the guidance text (it’s still happening, unfortunately), as well as the whole RDA testing regime. Because the tests crammed RDA data into MARC, it really doesn’t operate as a test of RDA itself, or of the usefulness of FRBR. What we’ve ended up with is a vast amount of misunderstanding: many traditionalists still believe that RDA is not that different from AACR2, while those who believe that RDA isn’t enough change (or the change we need, to coin a phrase) believe the same thing but come to a different conclusion. As I said to the C4L group: “I get why catalogers like MARC, but I don’t get why you guys aren’t all over the RDA Vocabs.”

After my own time in the spotlight, I became just another participant (the difference was that everybody knew who I was and I had to squint at their badges to see who they were). Thankfully nobody got freaked out that I was knitting socks while listening to other people’s presentations (and at least one pulled out her half-knitted sock to show me). With a laptop in front of me (not to mention IRC and Twitter), I wouldn’t have heard a thing. But, listening to the wide variety of presentations, I was very impressed by the amount of creativity, and the diversity of projects presented. I understood most of it, at least at a general level (though not perhaps on an operational one), and took some notes about a few insights I wanted to think about as I work on various projects. It was really a great conference, and the organizers did a fabulous job with everything. Do take a look at the video, and think about how you might make some connections with the catalogers or programmers in your life. We are all in this together, and we need to find better ways to converse and collaborate to make our ideas real.

Oh, and lest I forget, thanks to all the folks who shared their wonderful and special beer with me during the after hours social time in the hospitality suite. You just may have turned me from an always-wine to a sometimes-beer broad. (And don’t worry, Declan, the beer washed out of my jacket just fine!)

By Diane Hillmann, February 15, 2011, 8:41 am (UTC-5)

Some of you may already have seen the announcement from the Cornell Legal Information Institute about our new project for the Library of Congress, where we propose to build some new ways to access legislative information. If you stumbled upon the original announcement (I’m betting few of you did, except for the odd law librarian in the bunch), you’ve perhaps been waiting with bated breath for me to spill more of the details of this, which Tom promised I would in his announcement. I’ve been distracted by a few other things in the meantime (like ALA Midwinter) but figured I’d better fulfill Tom’s promise before I get too far into the project and forget what we were thinking about when we wrote the proposal.

As is usual for those kinds of things, we looked around at what other people were building in other jurisdictions and noticed that a lot of people were using FRBR to model legislative information, including the UK. (For more about that project see this blog post). This decision made no sense to me, in particular (I’m not sure how much my LII colleagues know about FRBR, but I’ve been immersed in it for a while now) and I was pretty adamant that we shouldn’t go down that road. It’s unclear to me what the reasons would be to adopt the FRBR model in a legislative context, but I could speculate that part of it is that there’s been a lot of buzz around FRBR in the bibliographic community, and if you’re trying to do something sooner rather than later, reusing something that’s already there seems attractive. The Library of Congress, in its solicitation, was strongly in favor of reusing not only FRBR, but also the standards they have developed over the years for bibliographic data. In the normal course of things, that all makes a lot of sense, but for us it made much more sense not to adopt an explicitly bibliographic model which worked reasonably well for literary works but not so well for the kind of shape-shifting that goes on in the life of legislation.

From the proposal:

“Traditionally, libraries have approached the question of incorporating specialized kinds of materials into their descriptive workflow by focusing on the similarities between the new materials and the materials for which they normally provide descriptive metadata. In the past, this worked well–materials in newer formats and those for use in special communities were able to be incorporated into existing tools with a minimum of fuss. In the area of legal materials, the treatises, standard monographic materials, and standard serial titles were in general easily incorporated, while loose-leaf services and other materials with updating services were not. Primary legal materials were treated either as collections, as serials, or, in the case of most legislative materials, as standard monographs. Now that the digital revolution is well upon us, with full text more available and users’ experience with search engines generating more pressure to look beyond the simple access to printed materials, we’re starting to see more clearly how limiting our traditional approaches have been.

There are several areas where the traditional bibliographic approaches fail:
* the model of ‘stand-alone’ monographs with few (if any) relationships are
insufficient to provide the functionality desired for specialized legal materials
* the new bibliographic approaches, such as RDA, are based on a FRBR model of published works, which, while rich in relationships, provide neither element sets nor
relationships particularly useful for primary legal materials, legislation in particular
* primary legal materials have traditionally been entered under jurisdiction with collective
uniform titles which are often meaningless to users
* insufficient distinction is made between jurisdiction and place”

Once we decided that we needed to start from the beginning on a model that worked specifically with legislation, and we thought about what kinds of materials we needed to cover, it seemed fairly clear that we had to think about an events-based model. Well sure, you say, but isn’t FRBR about events, too? Definitely, but those events have to do with traditional publications, not legislation, where events like ‘House vote’ constitute the kinds of events we need to think about.

One question that came up was whether this approach would end up building a silo from which legislative materials would never emerge to play well with related legal materials. One way we hope to forestall that possibility is to build the descriptions around these legislative events in ways that they can be reused in other environments, even bibliographic environments. Not too surprisingly for those of you who follow this blog, we’re talking about using some of the strategies for building the legislative data that we used to build the RDA Vocabularies.

We’re going to use the process defined by the DCMI Singapore Framework, which expects us to build use cases to figure out what people want to do with the data, functional requirements from those use cases with which to test our model, and a model that grows from those solid foundations. From there we will define description set profiles for our data, and, we hope, have something useful to talk about. I’m guessing we’ll be talking about it all along the road, if for no other reason that we’re very excited by this project. We think we’ll learn a lot and enjoy the process, so do wish us well with it!

By Diane Hillmann, January 18, 2011, 9:07 am (UTC-5)

At the Friday ‘Big Heads’ meeting much of the conversation revolved around Incrementalism vs. Revolution, as have so many conversations, about so many things. Someone quoted David Mamet (I can’t find the quote) that what we need is sledge hammers, not chisels, and I thought it was a notion too good to pass up as a jumping off point to discuss that meeting.

There were a lot of interesting topics discussed at the meetings, but as is my habit I’m going to focus only on the topics of interest to me. As usual there were a number of vendors in the audience, and when a few of the ‘heads’ at the main table voiced the expectation that they would be depending on the vendor community for help as they experienced additional staff reductions and resource constraints in general, the vendors came up to the microphones to respond. A couple of vendors expressed their concern that the library community in general has not been able to articulate what they want from vendors, and this has made it difficult for them to develop business plans. I hear a variation of this line when I stroll the exhibit halls and talk to vendors about what their plans are for RDA implementation. Almost always I hear that they have not heard from their customers about what they want, and they’re waiting for that before making plans. As a result, when I’m presenting to librarians about RDA, I tell them that they should be talking to their vendors, asking when and how they will be implementing, etc., etc. The problem with that approach is that a) most of the time the librarians don’t know what to ask, beyond the when and how; and b) when they get an answer they often don’t know how to interpret it. Maybe I’m slow, but I’m coming to the conclusion that I should stop telling people to talk to their vendors about RDA. I’m not sure it matters.

I went up to the microphone for one of my usual rants, after hearing quite enough of this dancing around. Here’s the reality, as I see it:

1. Libraries are unlikely to agree on what they want (this has been true in the past, and will likely be true in the future)
2. Given the generally low level of understanding of RDA data issues across the library spectrum (and certainly the vendors), it’s unlikely that any articulation of needs to vendors would represent something that vendors could rely on to build a business plan
3. Vendors are still talking about the provision (e.g. sale) of bibliographic records as a basis for their services to libraries.

My rant included all three of those points and more. Little over a year ago, the R2 report on the marketplace for MARC records (upon which I blogged) assumed that there is a marketplace for MARC records which will continue and that a direct return on investment is possible (or desirable) for creators of data. I said then, and still believe, that such a viewpoint is both unrealistic and in fact destructive to the task of moving forward into a world where data is not the coin of the realm but freely available (this is the basis for linked open data) and the investment and return on investment is around data services, not data sale. After my rant to Big Heads, one of the vendors came up to talk to me and offered up some useful nuggets to support my view: a) they provide records, but don’t make much money on them; b) the realm of digital metadata is vastly more complicated than that for physical metadata. It’s a huge challenge for vendors to operate in this world, but clearly the usual answers are no longer working, even as the data revolution is not yet fully upon us. The inevitable conclusion is that vendors who wait for their customers to tell them what they want may not survive the coming revolution. This is no time for chisels.

In this context it’s good to meditate on Henry Ford’s famous statement: “If I had asked people what they wanted, they would have said faster horses.”

By Diane Hillmann, January 9, 2011, 5:25 pm (UTC-5)

Friday I attended the RDA Update, organized as the “Briefings From RDA Test Participants.” The room was full (overfull, actually), and I ended up sitting in the back on a chair pulled from the main seating area towards the back wall. Beacher Wiggins provided the background and updated the group on the plan and timetable. He suggested that there were three scenarios possible for the decision: one was that the group would agree to adopt RDA, another was that they would decide not to (either for now, or presumably ever), and a third was that they would decide to implement if and when the JSC made some specified changes in the rules. I was a bit taken aback by this last option, since it seemed very heavy handed and somewhat threatening. Of course, there will be options available for any implementing library or group of libraries (national or otherwise), but it seems a bit much to believe that among those options there might not be ways for LC/NLM/NAL to meet their specific or collective needs without holding the US RDA implementation hostage to their desires. If I were representing a non-US constituency (which in a small way I am, as the DCMI liaison to CC:DA) I would certainly take this possibility seriously, if nothing else as a gesture of US-centrism that should be repudiated by the rest of the US and international cataloging and metadata community. By all means, LC/NLM/NAL triumvirate, do what you think best, but don’t throw your considerable weight and credibility around in aid of getting what you think you want, or, just to prove you can. We look bad enough to the international community as it is, please don’t make that worse!

The presentations started out well, with Chris Cronin (U. Chicago) giving a useful summary of his group’s experience. He was followed by Penny Baker (Sterling and Francine Clark Art Institute) who had a very flashy set of slides that did not work well in a room with too much light, and too many people. While various people played with the lights, she tried to get through her slides, but was having trouble seeing the laptop when the lights were down (and her slides were visible), and lost her place a few times. Her main point (as far as I could tell)—that her group was able to show that the RDA relationships worked well in providing ways to link together the very interesting materials they chose to catalog—got lost in the shuffle. Towards the end, someone figured out how to dim the lights sufficient to see the slides without plunging the room into darkness, and the room burst into cheers. The speaker, misinterpreting the audience response, thought she was being cued to finish up, and did so, apologizing as she left, by saying: “Sorry it took so long and was so messy.” The group in the back with me agreed that this was basically the story of RDA, though we should probably not expect a similar apology from JSC.

The remainder of the speakers plodded on with little to say that interested me: they did their testing work, gave their feedback, and determined internally whether they would continue doing ‘RDA Cataloging’ until the big decision comes down from the LC/NLM/NAL triumvirate, presumably on stone tablets for which some poor schlumpf will have to create a preservation strategy.

I have been dubious since the beginning about the usefulness of this testing regime, lately going so far as to compare it with the ‘Security Theater’ we are subjected to at airports these days (I have metal knees, so am always treated to a full, and now even more intrusive ‘pat down’, something that makes me long for a naked scanner at my local airport). The analogy here is that ‘Security Theater’ is to real security as ‘RDA Testing Theater’ is to real testing, one that includes the FRBR part of RDA and not just a smattering of rules changes and some token relationships. I still think that it’s hard to justify the time and expense of the testing that has just concluded, which tests RDA only as used in a MARC environment, not RDA itself. The result of this from the point of the community has been useful insofar as it has provided an avenue for some initial training and participation, but not so useful from the point of view of really providing any understanding of RDA implementation. Far too many catalogers think (hope?) that RDA can be implemented without much change in what they do, which qualifies in my opinion as a very poor result indeed.

By Diane Hillmann, January 9, 2011, 9:33 am (UTC-5)

One continuing theme of the recently concluded DC-2010 is that of the perpetual search for consensus on what the hell DCMI should be doing. I know this continual search for identity is a common phenomenon with this sort of organization, as it is for the human adolescent hovering around the age of 15 years. Like with the teenager, it should be seen as a healthy thing, and as most of us older than 15 know, it pretty much goes on for the remainder of life.

For me the conference was preceded by a half day DCMI Advisory Board meeting, where one topic was the revision of the DCMI mission statement as well as the perennial topic of the conferences and how the conference series can be optimally funded and continued, and what exactly is its value for the organization. As usual, there was not much consensus, either at the beginning or the end of the discussion, though it must be said that the conference itself probably shifted some opinions about the value proposition. Generally the AB meeting has been scheduled after the conference, with the idea that this shift in perspective is a good thing for sparking discussion, but for logistical reasons the meeting was held prior to the conference this year.

As it turned out, this change in placement of the AB meeting was unfortunate, given that Mike Bergman’s keynote on Friday morning contributed some important outside opinion to that basic question of mission. (Mike’s post about the keynote is here.) The fact that Mike arrived in time to sample a good chunk of the conference and to talk to a variety of participants gave his opinion the credibility that only exposure to the culture of the organization and the personalities that affect that culture can bring to a keynote. It was clear to me, in talking to him during the conference, that his view of DCMI was not the insider view of a contentious, financially strapped and sometimes dysfunctional organization, but instead one that included recognition of the experience, knowledge and potential there as well. In a nutshell, Mike Bergman was telling us that DCMI’s role in the emerging linked data world was critical, and should be focused primarily on expanding the presence of useful semantics available to the the web world, closing the ‘semantic gap’ he sees limiting the growth of linked open data.

Later in the day, my task was to lead a discussion of the work of the DCMI/RDA Task Group, of which I’m co-chair with Gordon Dunsire. Gordon and I had both prepared slides for that meeting—mine covering the history and work of the task group, what we’d learned, and what remained, and Gordon’s covering the related important work he’d been doing in parallel with IFLA. I’ve been frustrated for some time with the lack of attention and traction we’ve received for this important work, both from DCMI and the Joint Steering Committee for the Development of RDA (JSC). We have found ourselves at the stage where DCMI is waiting for JSC to make some statement about the work done (in the form of approval), and the JSC is waiting for DCMI to endorse the Task Group’s assertions made about the work done and its usefulness for the Semantic Web—or at least this is the way it seems from the point of view of the co-chairs. It’s as if both groups are standing opposite one another in a middle school gym, the ‘boys’ and the ‘girls’ waiting for someone to move towards the middle. Nobody seems to want to make that first move, and though in the case of the TG, each conversation with representatives of both parties seems to be positive, resolution of the concrete issues moves at a frustratingly glacial pace.

But as I spoke about this work to various people, continuing to think about the ongoing conversation about what the role of DCMI should be, particularly in the context of Mike’s keynote, it struck me that the DCMI/RDA Task Group was in some sense a model for what DCMI could do to fulfill the role Mike saw for us in the world. In essence, the TG came into being because Don Chatham at ALA Publishing took the initiative to bring DCMI and the JSC together, where the DCMI message was “How can we help?” The rest is history, but we seem not to have learned from that how powerful that simple question is, and where it could lead. DC-2010 brought a number of new communities to the conference, representing a variety of groups interested in moving into the wider web world of information, but lacking in-house knowledge and skills necessary to make progress. Helping them move forward requires much more than attracting them to the conference and talking to them in the hall or after tutorials. We need to offer more concrete help, like we did for the JSC, and move the knowledge and experience that the DCMI Community has assembled into the broader world of information.

By Diane Hillmann, October 23, 2010, 11:26 am (UTC-5)

This morning’s highlight was Stu Weibel’s opening keynote address to the assembled conference attendees (yesterday included primarily workshop sessions and tutorials). Stu was asked to talk about DC’s past and future, and he gave many of us food for thought.

The first thing he did was typical Stu—he took a photo of the assembled group. I hope those will show up on his blog sometime soon (if so I’ll link to them). Will they look that much different from those he took 15 years ago, when Dublin Core began, or those taken at various points along the way? As is also typical, he asked how many people were returning participants in DC, and how many were brand new to the DC conference. Surprising to some of us, roughly one third of the group were new participants, a nicely healthy proportion of new people.

Stu asked some interesting questions, and gave DCMI some letter grades for performance in a number of areas. His first question, “Why didn’t we just stop after the 15 elements” suggests the possibility that nothing done since then (the mid-nineties) was worth the effort. He points out that a number of the assumptions made at that time have since been repudiated by experience—the Web is more than just a collection of document-like-objects that can be described in much the same way that we’ve traditionally described library materials. He got a laugh when he reminded us (including the three of us who were actually there in Mar. 1995 when DC was born) that we thought we could solve the syntax problem fairly easily—but of course our notion of syntax in those days was HTML.
Stu gave some personal assessments of the 15 years of DCMI in the form of grades:

For providing an international basis for the effort (A)
For including a diverse group of participants (B+)
For becoming a sustained and solvent organization (D)
Moving work from consensus to completion (C)
Establishing objectives and completing them (C-)
Documentation of decisions (B+) (I have to say I think he was somewhat over generous on this one, for reasons I’ll explain later)

Stu also pointed out the places DCMI got stuck along the way, among those were ‘tarpits’ (his word) of our own making—for example, data models (for which he gave DCMI an ‘F’). Those that were not our fault or not entirely of our making were things like the aforementioned syntax confusion (C+), which he believes stemmed from trying to do too much and getting overwhelmed (probably the rapid evolution of syntaxes had an effect, too). The ‘tarpits’ created by others included LOM (learning object metadata, promulgated primarily by IEEE), and INDECS (a now dead effort that was touted some years ago as a business model based approach).

Some other points Stu made which are hard to argue include the notion that cooperation with organizations with different business models is difficult, but such cooperation is critically important, for reasons around convergence of effort, identification of similar models and related technologies—all amplifying the network effect of what DCMI has done. Increasingly, cooperation is seen as an expensive value, particularly in terms of time and travel, and certainly DCMI has seen those issues having a big effect on conference planning and ability to build on past efforts. Stu also gave DCMI an ‘A’ for its standardization efforts, including the work to make DC a recognized standard via IETF, NISO, and ISO. He pointed out that these efforts were essential to allow DC to be adopted by government agencies and others that have requirements for such an endorsement.

On the Singapore Framework, Stu is dubious, in particular thinking of the four levels of interoperability [link] that include the Description Set Profile (DSC) and the Dublin Core Abstract Model (DCAM). He pointed out that lots of metadata is still used primarily at the lowest, human level, and is not yet useful to machines at the moment. This issue is particularly relevant given that the DCAM is currently under review, with opinions flying around (in the halls, on twitter, etc.) in ways they didn’t even when that document was new, and pretty much nobody understood it. Understanding is still an issue, to a great extent because specification and user documentation are not the same thing (something that the DCMI technical folks don’t seem to understand). Stu contends that the DCAM has failed, and felt that its authors still don’t agree about its motivations or implications. It will be interesting to see whether that assessment is widely held throughout the conference.

On linked data, Stu was clearly somewhat ambivalent, calling it “An aspirational target of great promise and unproven benefit”. Stu was around and in the fray when the RDF standard was still in diapers, and recalled for the audience how easy we all thought it would be then to bring its promise to fruition. At least a decade later, we’re still trying to do that.

In Stu’s opinion, the Web is the data model, and we shouldn’t deviate from it. He pointed out that with the issues of flexibility vs. constraints we are drawn in by the Siren Call of Flexibility but would be better off with more constraints.
On the positive side, Stu points out that the linked data bulge has brought us a strong commitment to identifiers, some useful conventions about vocabularies and syntax, some tools to build ontologies and models, and some expectations of utility due to broad adoption – network benefits, in other words. But we still need to worry about data quality, usefulness, and bridging the boundaries between the existing semantic communities.

Stu is skeptical of linked data as the new grail, but still thinks we’re on an exciting threshold—we need metadata more than ever, but we’re drowning in it. He quoted Tom Baker: “Data that cannot speak for itself will be more vulnerable to becoming irrelevant”.
Stu’s last points covered DCMI as an ongoing experiment in social engineering. He cited Malcolm Gladwell’s article in the Oct. 4 New Yorker where Gladwell asserts that social media are largely broadly disseminated, networked, weak tie activities, with low barriers, low commitment, low persistence. Gladwell contends that systemic change requires strong ties, hierarchical social structures, leadership organization, and f2f work. Stu believes that DCMI is a strong tie phenomenon, and its impact is amplified by this fact.

By Diane Hillmann, October 21, 2010, 4:20 pm (UTC-5)