I *think* I'd try to do this in Perl, since it practically
shouts for a simple equals test, even though my Perl is
so rudimentary as to verge on nonexistence. Basically,
redefine the newline as the MARC record delimiter; define
variables based on the values of =035 $a(LIBRIS) and =776$w,
when they match, mark the field for deletion (then delete it),
and reset the variables to zero after each record. I do
something like this to identify records in which the image
count in the physical description no longer matches the current
image count (to identify rescanned image sets).
Alternatively, avoiding Perl and sticking to basic utilities,
extract the values of =035 $a(LIBRIS) and =776$w into
two simple sorted uniq'd lists, compare the lists
using COMM (the unix command-line utility, available also
as a Windows port) so as to find all the IDs that appear
in both lists (i.e. COMM -12 035list.txt 776list.txt >
sharedIDs.txt), then use the resultant 'shared IDs' list
to remove the 035s that have those IDs, using whatever
method you have to hand to make batch changes taking
a list of IDs as input. This method assumes that the
=035 $a(LIBRIS) value from one record will never match
the =776$w value from a different record, which may be
too bold an assumption.
Both methods assume that you're working with either a *.mrk
txt file or a MARCxml file.
pfs
On Fri, Apr 7, 2017, at 04:40, Andreas Hedström Mace wrote:
> Hi!
>
> I have a smaller set of records (~500) that have two 035$a for LIBRIS
> (the Swedish union catalogue), one for the print edition and one for the
> electronic, where I have been trying for the last week or so to delete
> the 035$a for the electronic version as these today have separate
> records. One part of the problem is that it is difficult to know which
> 035 field is which, just looking at the MARC data, as this is not shown.
> But for an even smaller sample of records (~170) the 035$a identifier is
> also found in 776$w. These I would like to delete.
>
> A snippet of the data looks like this:
> =035 \\$a(DIVA)urn:nbn:se:su:diva-26822
> =035 \\$a(LIBRIS)11441661
> =035 \\$a(LIBRIS)11488189
>
> =776 08$iÄven utgiven elektroniskt$tGestaltandets pedagogik om att skapa
> konsthantverk /$d2009$z9789171558589$w11441661
>
> I’ve been trying to remove the 035$a that matches the 776$w, but
> unfortunately my knowledge of regex is somewhat lacking. I’ve managed to
> delete all fields that contain (LIBRIS) while keeping the 035$a that have
> other identifiers (DIVA or OCLC for example), but haven’t been able to
> actually match/single out the identifiers on 035$a(LIBRIS)/776$w. (The
> closest I got was with
> "(=035.{4}\$a\(LIBRIS\))([^\r\n]*)|(=776.*\$w)(\2[^\n] \n)” I think.)
>
> Any help with getting a working regex would be greatly appreciated! (If
> this kind of conditional delete is possible in MarcEdit, or if I should
> look at other options?)
>
> Best regards,
> Andreas Hedström Mace
> Stockholm University Library
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for
> technical and instructional support in MarcEdit. If you wish to
> communicate directly with the list owners, write to
> [log in to unmask] To unsubscribe, send a message
> "SIGNOFF MARCEDIT-L" to [log in to unmask]
--
Paul Schaffner UM Library : Digital Content & Collections
[log in to unmask] | http://www.umich.edu/~pfs/
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|