Transcript for:
Overview of MARC and Library Cataloging

Great. Well, good morning, afternoon, and evening, everyone. As Sylvia mentioned in my introduction, my primary responsibility at Penn State is working on the library's catalog.

This has required deep familiarity with it, and particularly with the data, which is encoded as MARC, Machine Readable Cataloging. In the course of my work, I've gotten really interested in library technology history, including of the MARC standard, the functions that initially serve, the good and bad ways it's been updated, and then the functions that it serves now. And so while I can't get into all of that in this 15-minute presentation, I'm happy to go into greater depth if folks have questions that flow from it. So first, I'm going to set the scene.

Libraries have been cataloging books for ages. They've used handwritten volumes, manuscript rolls, and then thanks to Charles Ami Cutter, we shifted to handwritten cards and then eventually to typed cards. Principles from the field like Ranganathan's Five Laws of Library Science have continued to influence practice, like every book its reader. Catalogers might have handwritten descriptions of information about a book and then taken it to typists who would actually create these cards in the mid-century. Now we flash forward to the 1960s.

People are experimenting with systems that let you input, index, search, and retrieve data. And for libraries, this seems like a huge opportunity. Not only could librarians offer entirely new ways to search their collections, but they might reduce the effort of individually writing everything out and typing everything up by sharing files. They might also be able to search another library's collections remotely instead of doing something like trading microfiche, which is actually things that have been done, or printing big volumes. But it's also the 1960s.

So there are some constraints. Most libraries were nowhere near having any kind of technical infrastructure that they would need for this. My hometown library didn't get computers until the late 80s, early 90s. So MARC had to be designed to meet these technological needs, that was the whole point, but it also had to be able to handle and print a catalog card.

And in fact this led to a booming business in the printing of catalog cards because instead of having to type them manually, you could use one record and print and sell library cards to a whole bunch of institutions, which then would be able to enjoy the effort and not have to spend all that time typing the same information themselves. So some bonuses there. But the other challenge that they faced was computing processing power and file storage was almost unimaginably different when compared to today's computers. In the mid-1970s, Penn State bought its first computer. It cost hundreds of thousands of dollars.

It could be accessed through up to 16 terminals that were all wired into it. So not separate computers, but all stations wired into one computer. Requirements were things like removable storage had to be able to have half a million characters.

There wasn't even the language of processing that we might use today. And when I've done interviews with library developers who worked in the 70s or even the 80s. I've heard stories about things like, yeah, and then we had to cut down on the text of the error message because the program was getting too big. So Mark had to be very small and very efficient. I saw a question in the chat about how they interchanged.

They actually used to mail tapes, data tapes to each other, and then you could load the records off of the tape and select which records you wanted to keep, which is kind of fun. It was like sneaker net, but for computers because we didn't really have the Internet. So.

We know the constraints, but what kind of data is it having to manage? These are some of the basic bibliographic record needs. You have to be able to search for something by the title, the author, the subject, maybe an identifier.

ISSNs and ISBNs were just coming into being around this time. Similarly, barcoding technology was really new. Sometimes the title might be specialized. So concerto is not a good title. You'll get tons of stuff back for it.

You have to have the special. music name for a concerto. You might want to be able to limit or sort your results by something like publication date, which could also be a range, or by format or material type.

Even before we started having things like DVDs, CD-ROMs, video games, which is one of the examples I'll be using here, we had like journals, musical scores, archival things, government documents. And so if you said, well, I really only want to search government documents, you should be able to limit your search to doing that. And then once you get the record, you have to display information to help the person determine, is this the item that they're looking for?

So this is both massively simplified and actually still... the primary areas that we focus on today. Can you search by some of these main ideas and fields? Can you limit and sort?

And can you look at the item and determine if it's what you want? So what did they come up with? Well, they came up with this.

And my display here is being a little weird. Those boxes are actually supposed to be separators. But is that one really long, big string with a bunch of like text and stuff? Yes.

Fortunately, we don't have to look at it like that. So this is a screenshot of that same MARC record which Sylvia, if you are checking on the chat, yeah, thank you so much. Sylvia is putting some links in the chat because that's, you can actually go view this live yourself either during or after the presentation.

So this is the same record formatted really nicely by a MARC parser. And one of the first things you may notice is that the giant block of numbers and letters have been replaced by much shorter strings of letters and numbers. So let's switch back quickly and you'll see that even accounting for the different format, this big block of numbers that I've highlighted here, they just don't show up in the pretty view. This is in the raw record and they tell the parser, okay, here's a field, here's where you find it. Here's a field, here's where you find it.

Here's a field, here's where you find it. This is a map so that this, it tells where every byte starts, not the not like every letter or anything, but it says, okay, you know, you've got your 245 field. I could probably hunt down the 245 in there and find out what byte it started at. So these records come with their own little built-in maps to help the system parse them. They also have what's called a leader, which is up top.

I'll be talking about the leader and the fixed fields at the beginning, but you see those first five numbers, 0, 3, 7, 2, 8. This is about a four kilobyte file and that is file size. So it starts off by telling the system, okay, this is the size of the file you're looking at. Then give some other data.

And then it says, okay, here's the map to find everything else in the file. So it is really self-describing at a level that you don't see in most other systems today. All right.

So moving back forward, this record is from 2021. So it's got a lot more fields of flexibility than an early MARC record, but it has the same overall components that I'll be talking about. First, MARC records have two main sections. This bit at the top is what's called the leader and the fixed fields.

These include data about the record, like I showed you with the 03728, as well as data about the thing that's being described. So these are a mix of numbers, dates, codes, and blanks. And when I look at this, I can actually see things that jump out at me.

Oh, it has a single publication date. That's 2021. It was published in California. It's in English. I'll show you.

where we see these as we go through. But this is now, it's done through drop down menus. People don't hand code this stuff in. Back in the day, they had forms that they would do, and then they would type the forms into little boxes on the screen. So people have never had to just type out this whole string, to my knowledge.

So what else can we tell about this record? Well, it's a computer file. It's a monographic work, which just means it's a thing in itself.

It's not a journal or something like that. It was published in 2021. It's for adults. It's stored on a physical medium, which is some sort of direct electronic material. So that could be often a CD-ROM, but it could be a USB or something.

It's a game. It's in English. It was published in California. We have to switch for clutter.

It's an electronic resource. It's on an optical disc. It's multicolored. It's a CD, DVD size, and it's got sound.

So if you put all of this together, you've got a colorful game that's meant for adults, something that's complete in itself. It's got sound. It's in English.

It was published by a California company in 2021, and it's on an optical disc. That doesn't tell you everything about it, but these data points might be useful in searching, faceting, or displaying the record. And some things aren't as useful, like it has sound.

You know, when we're looking for video games, we don't distinguish by sound. These ones have sound. These ones don't have sound.

But the overall descriptive universe gives us ways to, say, discover how many items in our catalog have sound, for example. So all of these things are really specific data points where you have letters or numbers that you actually choose to represent what's going on here. And there's lots of coded tables.

But again, we use systems that make it much easier so people don't have to memorize this. And then there's everything else in the record. which is a lot longer than what I'm showing here.

This section includes everything from the identifiers, to the title, to the publication information, to information about how many discs it has. There's also long note sections which are just pretty much textual and they contain info about everything of which games are on which discs to a blurb from the container to the ratings to the system requirements because this is an xbox game and not a ps4 game or ps5 game um the each of these fields has a designated field code these are three characters long so a 245 is the title a 505 is the table of contents a 521 is the target audience note which could say hey this is a book written for preschoolers, or this is rated M. And they include other kinds of encoding data that I'm going to talk about briefly, what are called indicators and subfields. So first there's indicators.

Every mark field after the leader has two spots for indicators, indicator one and indicator two. These either refine the field by telling you more about what's in that field. So we have these big concepts for some fields. And then the indicators will tell you more detail about what's actually in the field.

Or they can even contain specific machine processing instructions for the record, like that map that said, go look for this field at this byte. They could contain that kind of stuff too. So a 264 is a more recent field because we used to have a 260 and they said, you know what, publication can actually mean a lot of things.

It can mean the copyright date. It can mean where it was published. It can mean where it was manufactured.

We want to be able to record all of this. So they created 264, which is broadly about publication. And then we have these two different indicators here. Indicator one is for actual publication. Indicator four is for copyright.

We get to have indicators zero, one, two, three, and four, and have all that information. But normally that's not needed. It would be really cool for a rare book.

That's a good example of one where you say, okay, it was published by this person, and it was printed by this specific rare book. printing house and you can get really into the minutiae of that. That doesn't apply so much here, but we have two different ones. And so you use the indicators to disambiguate between kinds of fields. You can also use them for things like saying, hey, the vocabulary in this field comes from one of these major standards, or it comes from a different standard that I will then specify later.

So the other thing that indicators can do is provide really specific machine related data. So I had to switch records for this because I needed a book that began with something other, an article or the like. Some books titles begin with an article like the, an, a. Sometimes they, or le, la, le, they can begin with a quotation mark or something that should not be indexed when we're making a title index. And so if you remember, these had to be printed for alphabetical cards.

Sometimes title indexing could be really important, especially in the early days, and we still sometimes use it now. So what we have here is an instruction saying, OK, chop the first four characters off of this, T-H-E and space. You got to remember the space so that it indexes as happiness industry. This is a really efficient way of not repeating things, not having to deal with all the edge cases of what might be in a title. because again we are working in the 1960s, not dealing with all the languages that might be represented because we will record the actual language of an item in the title even if we also might record an English translation as well.

So while we write our records pretty much in English here, and Mark is multilingual, someone, if we're doing a book from say you know, an Arabic library. I actually do not know how, how in, if there are articles or the like in Arabic, but the Arabic cataloger does. And so they can decide if there should be something there.

So these are actually still hand-coded every time by the cataloger. And a lot of it is muscle memory, right? You see a the, you think for, but I think that's kind of amazingly archaic and also really beautiful.

So the last thing I'll talk about are subfields. If we look at the publication field again, the first one, we see that there are three subfields, and we will go back to the first two. These are the place of publication, the 264A, and the 264B, which is the publisher's name.

Now, Mark still encodes punctuation for display and printing, which is a whole thing. It can be kind of a nightmare. But when we index it in our catalog, we index this as a string. which means we make it nice and pretty and printable, we put it in the record, and then we index that electronic arts so that if I search by publisher in our advanced search, I can get everything that was published by electronic arts and then see that we have 50 games by them because we have somebody who collects video games, which is really cool.

So this is a way in which the subfields can break up stuff that might need to be indexed separately while still allowing it to be all in one field together. So you could have lots of separate things, but again, we're trying to figure out how to print a catalog card, how to print a pretty record on the screen with minimal effort, and then also how to put in something like, but we want to get the publisher info. So, this is all still in use. Why?

Well, in the context of the 1960s, it was really exciting, but it's still in use today, and I think that there are five reasons which I have ordered from least to most impactful. The first is small is still good. An extract of our 8 million records is quite small compared to if we had the same data broken up with like XML tags or even JSON and the like.

Once you add all that granularity that's being represented by like subfield B or a G or something like that, you're going to get a much larger file. So small is good. We have years of refinement, so we have added things to it, whether it's new fields or new options within a field.

So it is possible to describe many, not all and not always well, more modern things with it. One of the reasons for MARC was its shareability and reusability, and that continues to this day. We have systems for interchange. We primarily at this point go through one major vendor, but you can reuse records. We got a record, we just got a report this morning, there was a title error in a record in the catalog.

We could fix it manually, but we could also go to the vendor system and overlay, is what we call that record, protect all of our local info, like where's the book located and really important stuff. and bring in all that new info that maybe somebody has updated and improved since. And we can also add things back if we want to.

And on that note, we have workflows, systems, and data. We have all of this legacy material and processes. And to a great extent, much of the time, a lot of it works. And so we would have to change our workflows.

We would have to change our underlying systems. We would have to migrate all of our data. And of course, that has a huge impact on the people who are actually doing this work.

And we would need really robust systems to move forward. And that moves into the last thing, which is the time, energy and funding. Libraries across the world and including in the U.S. are facing a lot of budget crises, cuts in positions and other things.

And if we have time, energy and funding for things, it doesn't seem like. there is always the best return. When we have something we can use, it would be an enormous shift for the entire profession to switch to something else. And people are working on ideas for these things.

But until there's a way to shift everyone, and there's a way to do it that takes into account the time, the energy, the funding, the need for workflows, the need for systems, and the need to handle our data, I don't really foresee us going anywhere. So I have covered a lot of... ground. I've tried to make it something that's approachable for people.

I'm happy to get into in-depth stuff with the records that you could look at, or you could send me something on Mastodon later. And yeah, so this is the stuff that's still behind what you're using in library catalogs today. And I see that there's been a lot of chat.

So if folks would like to, Sylvia, if you'd like to go through the questions, I am happy to answer things now. Okay, thank you very much. I shared a lot of the excitement with the people in the chat, like how interesting is this and all kind of associations and questions. So I tried to keep track of the order of questions in the chat.

Of course, other hands are also welcome. There were a number of questions by from Muvu. I do know if you would like to ask some of them yourself. I saw the first one was answered. by Ruth already during the talk, but there were a few others about place codes and human readability of the initial map and so on.

I will ask you to unmute if you would like, you can speak otherwise I will read them for you. Hello, can you hear me? Yes. Yeah, wait. I sort of forgot which one of them I asked.

But yeah, I mean, regarding the fields, I mean, I'd like to do the question that I asked last. Who decides which field gets modified? I mean, as the mark file format, did it change after it first got formed? Yeah. Yeah, so the, let's see.

I'm looking at some of the other questions too and seeing how they might relate to this. So Mark is primarily managed. I don't want to get this wrong, but by the Library of Congress.

And so they take proposals that is in the U.S., but they kind of handle it for the world. There were at points various versions of MARC. There was a big format integration in the 1990s when a lot of things came together.

So there was U.S. MARC and there were other kinds of MARC. And we pretty much have MARC 21, which was the MARC for the 21st century. So some of the things like the... codes that get decided for locations.

I saw that in the chat about California. U.S. states all have locations. Some areas have broken down locations.

Part of it is based on the size of the country and how many places of publication there are. I want to say that in the U.K., you've got London, Oxford, and Cambridge because they are all major publishing locations, and then you kind of have just like Britain, Scotland, Wales, North Ireland. as locations. There are lists of these codes. I will put a link to the specification in the chat in a minute.

So people will put forward proposals and say we need to do this. The proposals tend to come from groups of users like music people saying, I want to be able to specify every part of the music. Like this piece is for four violins and two cellos and a flute. And they bring this problem, they maybe bring some proposals, there's a working group, it goes back and forth, and then there's a decision made. And sometimes you look at it and go, well, that's not functionally useful.

And sometimes you look at it and go, oh, that's actually kind of good. That makes a lot of sense. And another question, I mean, I mean, these librarians when they are studying library science, are they expected to get familiar with MARC there itself or do they get familiar with it after becoming librarians?

That is a really good question. As a rule, they tended to, and I'm going to grab the MARC bibliographic, I should say I was talking specifically about bibliographic, there are some other subtypes of MARC that aren't used as well and you won't encounter them in your library catalog the same way. So that's the specification link I just put in the chat. As a rule, there is maybe one or two classes in library school.

I don't think that they always do a good job. And I actually tried to frame this presentation in ways that I would have liked to hear about. Like, okay, think about it as having these two big parts, the leader and fixed fields, the other parts, and then the subfields, the indicators, and how these function, how it is from a historical perspective.

Most people have some ideas about MARC. but it really varies a lot based on your library school experience. I mentioned MARC.

Oh, so MARC was developed in the US by the Library of Congress. A lot of other people have been using MARC, so it's pretty much been, it's pretty much used worldwide by libraries at this point. There were periods where people, like I think the UK, tried to invent their own mark. Part of the advantage of it is that it is mostly letters and numbers.

So you can translate what those letters and numbers mean. And then of course, the text, right? But the text, you just write in your own language.

So it's used in a lot of libraries. I know I see a lot of German records, for example, when I'm going through this big database of all the shared records. I definitely have seen Chinese and Japanese records as well.

So it's really used all over. But the specification comes out of the US and is still pretty much controlled by the Library of Congress for better and worse. And like I said, there was a big fracture, and then kind of everybody came back together because of the need for interchange.

I had seen somebody ask about the question about the map up front, and someone else asked about the systems thing. And the answer is, yes, there has been a lot of work put into designing systems that can read this. So early on, I'm not entirely sure. Like I said, people would mail each other tapes of things.

We would get a tape from the Library of Congress and load it onto a local computer and select out the records we wanted. And then we would mail them a tape or mail somebody else a tape of all of our records. And they would say, okay, so this is what Penn State has. And they might put it on a shared server that that way, if somebody wanted something through interlibrary loan.

they could get in touch with us. We came up with better methods, like what's called the Z3950 protocol, which is an international standard that allows people to query other people's catalogs. So even today, you can use a Z3950 protocol to say, give us the MARC record for, I don't know, managing the library automation project, which is an old book that I have on my shelf right now. And it'll do that. You can get ours right out of the...

right out of the thing, and then you could download it and upload it to your system. And you don't have to pay for that. That's completely free. We have this big shared database as well, but we have these free things. So there's been a lot of work put into that.

And I think that's why, like I said, the big impediments are really workflows, systems, training habits, and just getting everybody onto it. So I think that's... Yeah, like when I was designing our catalog stuff, I had to make all of these decisions by myself because the software we were using allowed us to index mark, but it had very basic default indexing. And so it's like, okay, well, what are we going to do with the 521 note? What do we want to call it?

Do we want to break it out? Do the 264s get broken out by type? Or do I just say it's all publication information?

It doesn't matter to the end user. It's more for the librarians and for researchers that are trying to do data stuff. So.

All of these decisions have to be made every time. And we're using a system that's 20 years old right now to handle ours. And we're probably not going to migrate anytime soon because it, again, mostly works. Someone had asked about examples of where it doesn't work well. And I've been trying to think.

I think that it is tricky for things like, really things like video games, games, physical objects, stuffed animals. We have those. We have an education library. We have puppets.

The farther something diverges from a book and the more unique an item it is, archival materials is maybe a good example. You can make a stub record that says, here's the info about the collection, but you can't do the very low level description. So there's a whole other standard to say.

here's the entire contents of this collection. That's like 50 boxes, right? Like a mark record is like, I want a title. I want some creators.

I want some subjects. I want an abstract. I just, I don't want to give you 50 boxes worth of information. That's not what it was built for.

So people have invented whole other standards to deal with that. So I don't think a substitute necessarily does. And I say about, I hadn't known about libraries having puppets.

It depends a lot. It depends partly on your country and your area, but like our library at Penn State, we support so many departments that we have things like we have GPS, we have an easy watt meter to help you figure out if your house is using too much electricity and where. We have puppets, we have skulls, not real. Do museums have their own mark-like thing to catalog inner museum stuff?

And museums can, yeah, real skulls is a whole other thing. That's museums and they're welcome to them. I don't want to, yeah, that's, there's a whole ethical thing in that.

But I did once have a bunch of fake, but really nice quality skulls hanging out in my office because I'm a digital person and I didn't need my shelves. And the person who was supposed to catalog all the medical bottles was like, please, Ruth, I need to put these somewhere. So I was like, it's my own little mausoleum, my own little wall of skulls. I found it very happy.

I gave them all sticky notes with names. But do museums. Museums have management systems. To my knowledge, they don't tend to use MARC for their things like paintings and the like.

There are some standards for that. People sometimes use MARC records for that kind of thing as just like using it as a database management system, but it's not well suited. to that I guess it's more well suited to say this is a book that has illustrations or museum libraries so for example could totally use it um but much like with archival collections it's not great what do I think the future of mark is oh my gosh um I think that for now the future of mark is gosh this is being recorded sorry this is a a thing that people have a lot of opinions on I think that There are attempts to come up with these standards to replace MARC.

I don't think MARC is going anywhere as long as we have, as long as we don't have functional systems and practices and lots of heavy lifting. done to get us there. The transition from printed catalog cards to MARC took us a long time.

Obviously, computing moves a lot faster these days than it did back then. These same computers, you know, that like cost $300,000 might probably a Raspberry Pi could do as well. But at the same time, I see the future of MARC as being sort of lightly tweaked to go on to try to accommodate things. while people try to sort this stuff out amongst themselves. There are some libraries and places that are doing heavy lifting trying to get these systems in place.

If and when they do, if and when it's built into the kind of vended systems that we're already buying, because like this 20-year-old system that we're on, when it gets built into that and becomes more of a feasible transition for most libraries, then I think that we could see a transition, but I am not going to put a time date on it. And I don't know if it's going to be the one that everybody's working on now, or if that one will eventually be put aside and something else will come along. Or if we'll decide to keep doing MARC and then do something else to augment MARC records. Someone asks if this covers virtual or digital records like eBooks.

Yes. So We have records for physical books and we have records for e-books. So that is the leader will tell you like this is an online resource, for example, versus a print book. It'll say it's a monograph, but one will say like this is in print and another one will say this is an online resource.

It can be kind of annoying because those mean you have two separate records for the same book. But we also tend to license our books. So that means that when you have a book that's. that you've licensed access to for, say, a couple of years, and you have one that you own, we want to be able to delete the record instead of going in and editing, adding a link. But you can put URLs in them, some of the earliest URLs, like a URL in the indicators.

You could say, like, this is an HTTPS URL. You can also say, this is an FTP URL, and this is an email URL. I don't even know what that would mean, really.

But that is the 856 field, if you want to look at the MARC documentation and see what it... allows for. Yeah, so we keep marked records for our journals to which we have revocable subscriptions. At Penn State, every month we reload all of the journals we subscribe to, which is a little annoying, but it's easier than trying to pick them all each time. So we go to the vendor, we get a list of every journal we subscribe to and all the records for those, we reload them back in and we delete the old ones.

UK MARC, I don't think that's still in use. I think that format integration did away with that, but I am not entirely sure. And like everything that's required to transition, it's possible somebody is still using it.

It was certainly used in the UK at the time, but I think everybody's been using MARC 21 now for a while. I'm going to go grab the 856 in all of its glory. put in here.

So you can see like you have telnet kinds of things, you have dial-up kinds of things, you could specify all that stuff because at the time that was useful. And we tend not to take things out of MARC only to put them in because records, if you had something that said it was dial-up, that actually might be useful to find and remove all those links or update them or something else. And if we changed it to mean something else now then that would be a problem. Does Mark have problems with Unicode? I kind of wish that one of my co-workers were here for this because he, yes, sort of.

There are several different ways of recording Mark. You have like Mark 8 and Unimark, and these are not standards. The codes and everything else are still the same. It's the file, the actual file encoding process.

And they record characters slightly differently. The systems that we use to share stuff used to be really intolerant of any differences in encoding, which was kind of good because you had to fix your thing before it went in or fix it. Might work fine in your system. But now you can download a record and come up with just all kinds of weird random characters.

We sometimes get vendor supplied records, which have their... own weird encoding things because they were created by a machine, not by a person. And if any of you do scripting, you know that like... sometimes when you're scripting something, if it's like scraping the table of contents is a good example. Machine generated table of contents note is a, like a prefix that a lot of them include.

And whenever I see that, I'm like, okay, you know, it's entirely based on how well the machine did. So that wildly varies. And we definitely have a lot of finer places and we have about 400,000 records in the catalog.

that can't be exported and re-imported because they will break. And so that's too many to fix manually. And that's just in our catalog. Yeah. Yeah.

It's a fun thing. Like we're managing data, some of which was encoded 40 years ago on a completely different system. Some of which was encoded by vendors running scripts to try to auto-create data from eBooks. Some of which was encoded by people at other libraries that then we've integrated into our system.

And one of the things I didn't mention in my presentation is coverage. You can make really basic MARC records or you can make really in-depth ones. Both can be valid.

But it does affect our ability to use some of the features because if, you know, half the people aren't encoding one of the fixed fields that are leaving it blank because it's valid to leave it blank, then you can say, well, I want... data, I want things that were published in California. And that one's actually required. But we still have some really bad data from back in the day about that. But you might have a field like that, where it's like, yeah, half the time people didn't record where it was published.

So we can say, well, to our knowledge, these books are published in California. And these books don't have that. And therefore, we can't tell. And we might be able to pull the 260 and 264s out of all of the other books, run some processing. look for like CA in all capitals or Calif or other ways that people abbreviate it, do some stuff.

So it is, yeah, it is a really interesting legacy system. And that also, that also affects our ability to migrate forward. We have to be able to handle that. And yes, unknown or missing data tends to be left blank. Every once in a while, I'll find a language code that's N slash A.

this is not a valid language code. Language codes are the classic three-letter, you know, S-P-A, D-E-U, E-N-G, all those classic language codes. And N slash A is not appropriate.

Or, and I think if there is an unknown, you use something like U-U-U or something like that. There is a code for, I don't know the language or language doesn't apply or something. And yeah. you sometimes end up with weird bad data.

So blank is better. Okay, yes, the MARC 21 formats are updated two times per year. Yes, that sounds about right. So that's when the proposals come in, and often they will be small things. So like we add a subfield here, we add a new indicator there, we add a little way of doing something.

Sometimes there'll be a new field added. I'll just tab over. quick to see if I could find like the most recent set of Mark 21 updates. Yeah. I hadn't thought about, I mean, okay.

Um, so one example is, uh, okay. I'm looking at the, uh, Yeah, oh yes, no, I mean, MARC 21 means 21st century. So one thing I should add that MARC has, which is interesting, is there are nine XX fields is what they're kind of called, and these are fields that allow you to create local data.

So the 90 field in any hundreds allows you pretty much to create something local. So 292 would be local. I actually have never thought of 292 before, but 590, 690, or 9 anything.

So 900. So like we use 949 for our barcodes. And we actually keep the barcodes in a separate place in our like management system. But then when we export it, we put that into like the 949 along with the library name and everything else.

So there are ways to also specify your own thing. And this allows people to create their own local data as well. And those are some of the protected fields that when we. put somebody else's data in. One, we don't put in their specialized fields, and two, we keep our own.

So even if we're putting another, a newer record on top of ours, we keep all of our own things. And yeah, the spec does still get enhanced modified. I'm trying to recall the best way, unfortunately.

I don't, like, I find the alerts that it's been updated, but that's not terribly helpful. There should be something in some sort of news thing. to say oh hey these are the most recent changes that have been made yeah so update number 34 was put in um changes indicated in red and so forth unfortunately yeah i don't actually see the contents of update 34 so oh here we go list of changes included i'll put this in the chat so or actually i can just uh I could just switch my screen share, can't I? Here we go. So this is an example.

We have a couple of new fields, indicator values, info. So this is about parallel description in another language of cataloging, which is a repeatable field and representative expression characteristics. These are some.

like the key of representative expression in 384 indicator value two, it's a first indicator. So essentially it'll say like, hey, data provenance in your 856 now, or data provenance subfield in all of these fields. So you can say where you got your data about this entry. So data provenance seems to have been the most recent big thing.

And because the fields are sometimes done differently here, we've got like seven here, we've got like E, here we've got L, and R means it's repeatable, and Mood mean it's not repeatable. Some changes in content designator names, physical medium, and field description of scopes. This is just like some redefinitions. Reduction rate ratio renamed to reduction rate ratio in physical medium.

And institution to which field applies is now available in this. These are all extremely, extremely niche things. But this is the kind of stuff where if you're building a system, you can start putting this in. And when you're building a catalog, sometimes you don't put this in right away because there's no data for it right away. But then, you know, like you can go through and do the last two years of changes in a big go or something like that.

So like data provenance is probably what we want to add. And so if we go to see bibliographic. And we'll go back to 856. We'll go to full. So here's data provenance, standard information governing access.

These are all new ones, terms governing access. So these are all fairly recent. And yeah, this was hand-coded as HTML way back in the day, and I think they just updated as HTML. So this is an example with the indicators.

You can say this is actually the resource, like an ebook. This is a version of the resource, maybe like a scan. This is a related resource.

So it's not actually the same thing as the item. And this is we don't know. Then information about finding it. Your big one is your you, which is the URI.

Normally it's just a URL to something. You can have labels. We don't do a lot of this data. You know, you'll just say like. because you don't need to know the operating system, right?

Or the port. Just say, here's the U and here's the link text right here. And you combine the U and the Y and you build a beautiful link and everybody's happy and goes home. Are there any Easter eggs or fun tricks in Mark? I think that the Easter eggs for me are mostly the history things where you're just like, oh.

We got to Telnet links, eh? You know, a dial-up specific thing, and you have no idea what that would mean and why it would be different exactly from HTTP. So I think that those are kind of the fun Easter eggs or the random things that you run into in history.

Yeah, Telnet is pretty great, Eric. I grew up using my library's catalog for Telnet. Are MARC records made?

searchable for researchers are central databases of Mark Rutgers collected from Ferris libraries? Yes and no. The Library of Congress allows you to download MARC records. We do have the Z3950 protocol that I mentioned, so someone could do certain things. People are sometimes willing to share their MARC records from an institution.

We have some terms and conditions on our vendor ones, but we're happy to share some of our others. So the answer is that people doing MARC record research are more likely to use the Library of Congress's data. messing around to see if I can pull up the Z3950 protocol so that I could show a quick way of how you would get it. Something called WorldCat. Yes, it is based on Mark.

So WorldCat is run by a vendor. This is the vendor to whom we contribute all of these records. And we give them our records.

and we buy access to more records, and there's some like token exchanges involved, but it definitely, it's definitely something that I wish were more open. So you can search by specific format types. Just looking over to, okay sorry my fingers were not on the right place on the keyboard, so that's going to be an interesting search. I was trying to search Never Home Alone. Yeah, so WorldCat is, it is based on all of these records and also it knows like what we hold.

So I can search here and then it will tell me what's in the Penn State libraries. And I know that bringing this up might have made my computer a little more, a little slower but this is a book I have checked out right now. These are all based on and derived from MARC records. And it has information too about how you can find stuff. So it says like, get it at Penn State.

So let's click that and see what happens. You see, it says that Penn State has it. So we tell them what we have and then they're going to redirect it. And let's see what it does. I think it'll actually send me to the wrong page.

Yeah, we've been having a problem with this one. with this linking, but if you click on the Penn State URL in the list of locations where you can find it, you should still get it. So this is all based on MARC. They also incorporate other services like reviews from places and the like. This should take us to my baby, the catalog.

And here you see we did an identifier search for the ISBN. And then we come here. And voila, here's the book. You can click on the record.

And this is a good example. Like we have a little stub up there. Now we have all the info. It's showing what's available. You can see I have that one checked out.

You have your subjects. So we set these up so that like you can search for biology, popular works, or all biology. This is a good example just of what we do with data in the catalog today. So this is all of our biology popular works. So for people who don't want to read, you know, biology textbooks or serious work, you know, like, oh, you can learn about biological organisms or have no idea what that is.

We have some we have some weird books. We collect books over time. Yeah. What what software is being used for library management at Penn State? So behind the scenes, we use something called Circe Symphony.

That is a very old system that allows us to manage everything from library circulation to mark records to buying books and that sort of stuff. Koha is a fairly free open source one. It's a little better for smaller libraries.

We have 20-something campuses. 30-something libraries that all have to be in one thing. So it's very complicated.

And once we got this set up, it's going to be hard to move. But the front end is something called Blacklight, which is a Ruby-based app running against Solr, which is an indexing software. And it's fairly commonly used in libraries to do catalogs in libraries that don't want to migrate to newer systems.

Because sometimes people get rid of these mark-based classic catalogs. Yeah, it's spelled symphony like the symphony. So bye, class.

Um, it was originally called Circe Unicorn, which was really fun. And it was created first in 1982, I want to say at Georgia Tech. And some elements of it have not changed a lot since, which is also interesting, partly because they work, partly because inertia, pros and cons.

But the front end catalog was so old, that we had the option of switching to something newer that we didn't like as well or building our own. And that's, that's my job was to help us build our own. which we have mostly done. We still have a few things left like browsing by call number, which will allow us to browse across all libraries and create like a big virtual shelf so that you could see everything that the library has as though our 30-something libraries were all lined up together and all shelving together, which I think is really exciting and we're hoping to get it out soon. So it's just a lot of technical complexity.

I see we have about seven minutes left. I don't know if there are other questions or... There was actually a question how that was spelled. The Circe Symphony? Was that the name you used?

Okay. Yes. So Circe is the vendor and they also have a product called Horizon that they... I think that one they bought from Dynex when they merged.

So you had Unicorn and you had Circe Unicorn and Dynix Horizon. No, it's not free or open source. We pay non-disclosure agreement amounts of money for that. And of course, back in the day when it was adopted, there was not really open source software for this.

And when we migrate, there is one open source software that's looking promising, but it's... it'll have to see how robust it gets and if it can handle all of our complexities, because it's a lot more than just MARC. It's like all the business functions of the library, talking to our fund accounting system and the like. Thank you, Sam, for coming. I hope, yeah, Australia gets some rest.

Oh, what makes cohort different? So I would have to dig in and look at Koha specifically, but it I don't feel qualified offhand to make that declaration, not having done a recent evaluation of it and how it would or wouldn't meet our needs. But we have big things like we have to be able to hook into the provost's office and other places. And I think that if we did enough local customization of Koha, we could probably make it work. But I genuinely don't know how much work we'd have to do.

The upside is we could customize it. The downside is we'd have to put a lot of time and effort into that. And there are vended systems that we could use instead.

I think there probably are people who've done evaluations. Most large libraries aren't on Koha. And some of it is inertia, right? Like I said, migrating to anything is a lot of work.

And so if we do migrate to something, we want to be sure that it does our stuff out of the box or we put a lot of work into making sure that it can handle it. Because. one of my areas of research is actually the psychological impact of large technological changes.

Because I think, you know, like I said, catalogers just know to put a four if they see a V, they don't even have to think about that anymore once they get used to it. And similarly, like circ people, right, you know, you hit like tab, tab this, and then you swipe the barcode, and then you hit tab, or enter, or you don't hit anything, and then you swipe the barcode again. And then when you're done, you like hit this and this to print a receipt. All of that becomes muscle memory. And then you start having to think about it again when you get to a new system.

And so for every single person in the library, besides all the breakdowns, training functions and the like, there's also that component. So I think that's one reason we don't migrate a lot besides the cost. But yeah, I'd be interested in seeing where Koha is now.

But we haven't looked at it a lot. Maybe if we do migrate, that'll be one of the ones we look at. But Folio is the... more intended for academic libraries, one that we would be looking at. It's kind of still under construction, but it's also got open development and there's a number of places that are committing to it that are bigger libraries like us.

I'm so glad everybody came today. I find this really exciting. I find it exciting to talk to people who are not part of libraries.

Would QR codes be more convenient to scan than barcodes? Is there a reason to scan barcodes? I think that that's a fairly negligible one. We have barcodes, we have barcode scanners.

It's fairly straightforward. And barcoding projects are actually a big thing too, because we have millions of physical items. So anything that we would change would require every physical item getting a new thing.

So at the time it was barcoding and these were projects where they'd like take the whole summer, bring on a bunch of students, bring in a bunch of other people, pull a bunch of staff out of their regular jobs and say, go barcode a shelf. And then put it together. And people wrote papers about the most efficient ways to run a giant barcoding project.

So a QR code could probably work. It could work for CERC from a phone, which are like things that people are developing. But in terms of resources, it's probably not something we could pull through. Great.

Well, I guess I will end the recording now.