Amen to that! I'm continually struck by how powerful and versatile MarcEdit is and what a gift it is to the library community. Vendor records would be the death of me without it (instead of just making me very tired and out of sorts).
Richard A. Stewart
Cataloging Supervisor
Indian Trails Public Library District
355 South Schoenbeck Road
Wheeling, Illinois 60090-4499
USA
Tel: 847-279-2214
Fax: 847-459-4760
[log in to unmask]
htpp://www.itpld.lib.il.us
>>> Kimberly Montgomery 03/13/13 3:54 PM >>>
... And thank you very, very much for creating MarcEdit at all! It is one of the most useful tools I have encountered. It solves a lot of problems that come in vendor records.
Kimberly Montgomery
[log in to unmask]
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Reese, Terry
Sent: Wednesday, March 13, 2013 4:49 PM
To: [log in to unmask]
Subject: MarcEdit and long records: was: MarcEdit & Voyager
I figure I probably should clarify what MarcEdit does when it encounters long-records, what you can do to identify long records, and future plans.
How MarcEdit handles Long Records:
The answer is depends on if you are processing XML data or if you are processing MARC data. Let's start with XML data.
Working with XML Data
When working with XML data in MarcEdit, MarcEdit *will* automatically truncate records and split fields if the data fields or records are too long. This occurs automatically (with a caveat) and it happens because it is very likely you will get XML data that goes beyond the limits of a MARC record. Now, I say that this happens automatically - and that's true if you have enabled MarcEdit to process MARCXML data using the Native (non-XSLT) processing option. This is set in the Preferences (see the highlighted image attached).
[cid:image001.jpg@01CE2013.68476560]
When the non-XSLT process option is enabled for MARCXML=>MARC processing - the tool *will* automatically do this truncation and field splitting. It has to because more and more XML data is being created that doesn't fit within the MARC container structure. If you don't want it to truncate - you can use the XSLT processing option. Just realize, that this is much, much slower and has some practical size limitations when processing XML data. For example, the Native MARCXML option was created specifically for processing XML data over 1 GB in size. In fact, I use this method all the time to process databases over 200 GB with good performance. This simply isn't possible using the XSLT option due to the limitations of DOM. *However* -- when a record is truncated - MarcEdit tells you it has done it. You can see an example of the truncation message in the attached image.
[cid:image002.jpg@01CE2013.68476560]
You will see that the results text turns indigo, and the text tells the user a record had to be truncated during the conversion process. At present, MarcEdit *does not* tell the user if it automatically splits a MARC field when processing data from *XML into MARC*. However, I'm willing to add a flag that identifies this action if there is interest. *Moreover*, as I noted, this *only* occurs when processing data from XML to MARC, and when using the Native MARCXML processing option.
Working with MARC data
When MarcEdit processes MARC Data - it *does not* truncate or split fields automatically (or by request). There are a number of reasons for this - but at this point, I've made the decision that the MARCEngine will not perform this task for you. And largely, because I find that most of the time field data becomes unwieldy - it is due to a conversion from XML to MARC - and those instances are handled above.
If you receive a record that is too long - MarcEdit doesn't process the file and will provide an error message that you have records in your file that are too long. If you have fields that are too long, MarcEdit can generally process them using the permissive algorithm and will notify you that it used the fallback by changing the results text to *red*.
Identifying MARC records that may have these two particular errors is a fairly simply process. Using the MARCValidator - you can select the Identify Invalid records option and identify which records are too long (and after the update - have incorrect field lengths). Once identified, the Validator has another option that allows you to extract the invalid records and move them to a separate file for processing - leaving you with a file of valid records and a file of invalid data records.
Processing records that are too long in the future
I haven't made any decisions on if I'm interested in changing the behavior of how MarcEdit handles these types of records into the future. I've had a request to allow the MARCEngine to process records that are too long (rather than just stopping the process and flagging them) within MarcEdit's permissive mode. It's possible - I use to do it, but it did cause some processing issues when working with data that didn't follow a specific set of assumptions (which primarily would affect international, non-MARC21 users, which is why I don't process these records now). I've had requests to automatically truncate and split data fields directly within the MARCEngine (i.e., when processing MARC data) - but again - I don't see this being a big issue (since long records generally are a result of transformations - and MarcEdit can handle that) so I haven't been particularly interested in doing that either. Rather, I've been working on strengthening the Validator to give folks tools to identify and isolate these outlier records, and unless I hear otherwise, this is generally going to be the development philosophy that I use going forward.
I hope all that makes sense. If you have specific questions or need some clarification (in case I wasn't completely clear above) - let me know.
--TR
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Kimberly Montgomery
Sent: Wednesday, March 13, 2013 11:28 AM
To: [log in to unmask]
Subject: Re: [MARCEDIT-L] MarcEdit & Voyager
If there were an automatic truncation, I would like to have the ability to turn it off. MarcEdit is the only way I have of dealing with problem records that won't load. So I would like to be able to review each problem record separately and make my own determination of what to do.
That said, one vendor sent a record with a table of contents that was the equivalent of twenty-one pages long in Word. It would take too long to edit bunches of these down to a reasonable size on a regular basis, so I have instructed the LTA who works with me to just delete such machine-generated nonsense for one discovery record project we have.
Kimberly Montgomery
Electronic Resources Cataloger Librarian
Cataloging Services Department
University of Central Florida Libraries
[log in to unmask]
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Stacy Pober
Sent: Wednesday, March 13, 2013 2:10 PM
To: [log in to unmask]
Subject: Re: MarcEdit & Voyager
I have not gotten this error problem for quite a while. It could be that the "cascading renumbering" after an oversize field is no longer affecting subsequent records. I was working with an older version of Marcedit when this happened.
As to Terry's question of what should be done, I routinely validate any file before converting it from mrk to mrc. Having the validator detect oversize fields would allow me to preserve the data by splitting it into multiple fields. Once the information is gone, it's often hard to find the version with the full field information.
I get collection sets from OCLC where some publisher has mechanically truncated the 520 or 505 fields (long before the 10k character limit) and those always look terrible. They stop mid-sentence or mid-word. So I don't really like the idea of automatic truncation of oversize fields.
--
Stacy Pober
Information Alchemist
Manhattan College Library
Riverdale, NY 10471
[log in to unmask]
On Wed, Mar 13, 2013 at 1:01 PM, Heidi P Frank > wrote:
aah, so this explains things for me too - I had a file of 1424 records
for archival collections, and 2 of those were "bad" records, where the
MARC fields got "shifted" within the record. I did find them with the
MARC validator because those 2 records had a number of invalid MARC
fields, and am pretty sure it had to do with field lengths over the
limit.
however, mine were interspersed as well - meaning, only those 2 were
affected and the ones following were fine.
anyway, Terry, does this mean your fix would keep the MARC fields from
renumbering in the bad record if one of the fields is over-length, and
that I'd be able to just delete the field that is too long? my only
alternative that I could think of is to delete the long fields from
the original MARCXML files pre-MarcEdit, but am hoping I could somehow
make it part of my automated tasks.
cheers,
heidi
Heidi Frank
Electronic Resources & Special Formats Cataloger
New York University Libraries
Knowledge Access & Resources Management Services
20 Cooper Square, 3rd Floor
New York, NY 10003
212-998-2499 (office)
212-995-4366 (fax)
[log in to unmask]
Skype: hfrank71
On Wed, Mar 13, 2013 at 12:49 PM, Kathy Martlock > wrote:
> Terry -- You are more than welcome to hijack any thread you like.
>
> Stacy -- It's a one bad record buried in the midst of other records.
>
> Kathy
>
>
> On Wed, Mar 13, 2013 at 12:25 PM, Reese, Terry >
> wrote:
>>
>> I hadn't realized the validator didn't identify field length errors - but
>> I guess that makes sense given what it exactly does. I've added this error
>> check to the next update.
>>
>>
>>
>> (sorry to hijack this thread for a moment. Unfortunately, I don't use
>> voyager so I can't offer a suggestion)
>>
>>
>>
>> --TR
>>
>>
>>
>> From: MarcEdit support in technical and instructional matters
>> [mailto:[log in to unmask]] On Behalf Of Stacy Pober
>> Sent: Wednesday, March 13, 2013 9:11 AM
>> To: [log in to unmask]
>> Subject: Re: [MARCEDIT-L] MarcEdit & Voyager
>>
>>
>>
>> I've had that happen to files. Usually, there is one problem record and
>> whatever error it contain somehow corrupts subsequent records in the file,
>> changing their field numbers to invalid/inappropriate ones.
>>
>>
>>
>> In the files I've worked with, it was pretty straightforward, in that the
>> problem record corrupted all of the following records. If you can find the
>> last good record the problem usually starts after that. A particular type
>> of error that would cause this would be a field length over 10,000
>> characters. Oversize field lengths are not detected by the MarcEric
>> validation.
>>
>>
>>
>> If you are having alternate good/bad records, then your problem is
>> something else.
>>
>>
>>
>>
>>
>> --
>> Stacy Pober
>> Information Alchemist
>> Manhattan College Library
>> Riverdale, NY 10471
>> [log in to unmask]
>>
>>
>>
>> On Wed, Mar 13, 2013 at 9:36 AM, Kathy Martlock > wrote:
>>
>> Well, on a second look at this process this a.m. I see that the field data
>> comes through fine; however, the field tags are wrong, i.e. the 245 field is
>> numbered 501 -- the 100 field is numbered 000, etc..
>>
>>
>>
>> Does anyone know how to fix this?
>>
>>
>>
>> Thanks, again.
>>
>>
>>
>> Kathy
>>
>>
>>
>> On Tue, Mar 12, 2013 at 5:11 PM, Kathy Martlock > wrote:
>>
>> Thank you ... thank you ... I cannot thank you enough ... so simple, yet
>> effective. Exactly what I wanted.
>>
>>
>>
>> Kathy
>>
>>
>>
>> On Tue, Mar 12, 2013 at 4:53 PM, Bornheimer, Bee >
>> wrote:
>>
>> This may be a crazy (and wrong) way to do this, but I just tried this by
>> copying and pasting the text (minus the heading) in a text file, then
>> changing the extension to .mrc and letting MarcEdit do its magic.
>>
>>
>>
>> From: MarcEdit support in technical and instructional matters
>> [mailto:[log in to unmask]] On Behalf Of Kathy Martlock
>> Sent: Tuesday, March 12, 2013 1:29 PM
>> To: [log in to unmask]
>> Subject: MarcEdit & Voyager
>>
>>
>>
>> Hi Everyone --
>>
>>
>>
>> I am rather new at MarcEdit and am just discovering all the features
>> available. I want to say thanks to all of you for this ListServ - it has
>> proven to be very helpful.
>>
>>
>>
>> I have been unable to find help on this and figured it was time to reach
>> out to the experts.
>>
>>
>>
>> Here's my mystery :
>>
>>
>>
>> I am using MarcEdit and Voyager.
>>
>>
>>
>> I am trying to open the err.imp marc records that do not bulk import
>> properly.
>>
>>
>>
>> When I open the report file at err.imp.yyyymmdd.hhmm, I get a very nice
>> looking web page that shows a heading and what appears to be a tab delimited
>> marc record (but I don't think it is). I can get the actual marc record
>> from the original source file (which is huge and cumbersome) or I can
>> contact our database administrator and he can send me the marc file(s) that
>> have errored/rejected/discarded.
>>
>>
>>
>> Is there an easier way ... can I somehow capture the marc file from the
>> web page report? (without being db admin)?
>>
>>
>>
>> Thanks for any help you can offer,
>>
>>
>>
>> Kathy Martlock
>>
>> Library Specialist III - Cataloging
>>
>> Z. Smith Reynolds Library
>>
>> Wake Forest University
>>
>> ____________________________________________________________
>>
>> ________________________________________________________________________
>>
>> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
>> and instructional support in MarcEdit. If you wish to communicate directly
>> with the list owners, write to [log in to unmask] To
>> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
>> [log in to unmask]
>>
>> ________________________________________________________________________
>>
>> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
>> and instructional support in MarcEdit. If you wish to communicate directly
>> with the list owners, write to [log in to unmask] To
>> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
>> [log in to unmask]
>
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]
___________________________________________________________
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|