MARCEDIT-L Archives

January 2013

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Fox, Chris" <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Fri, 18 Jan 2013 11:41:27 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (1066 lines)
Your earlier post came through to me. I wasn't able to get back to this yesterday but I'll see if I can today.  The file of records I am working on somehow got corrupted and I've been trying to address that problem. I might post this separate problem to the list a bit later today.  Thanks, everyone, for your help.
Chris

On Jan 18, 2013, at 6:31 AM, "McDonald, Stephen" <[log in to unmask]<mailto:[log in to unmask]>> wrote:

I already posted about this earlier.  Did it not come through?  This is indeed the problem.  The “[^]” needs to be changed to “[^ ]”, adding a space after the caret.  Without the space, the caret was modifying the ] character, which means that the opening bracket [ was not properly closed.  All of the parsing after that point was messed up as the parser kept looking for the matching closing bracket.

                                                                                Steve McDonald
                                                                                [log in to unmask]<mailto:[log in to unmask]>

From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Andy Helck
Sent: Thursday, January 17, 2013 5:30 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Converting 300 field to provider neutral


I also count 9 & 9, so I am not sure what the problem is. A couple of things did catch my eye though:



this construct got me wondering:

    ([^:]|[^ ]:)

its not wrong per se, but could be simplified

    (.:)    since the meaning of the original is kinda like "give me a letter that is not : or one that is not space" any letter satisfies that constraint, including : and space.



this one has me a bit more puzzled:

    ([^;]|[^];)

because the second character class [^] has no members. I don't know if this a problem, but I think its also equivalent to

    (.;)    since again the meaning is "give me a letter that is not a semicolon or is not {empty set} which again is pretty much everything.



I did wonder if the [^] is breaking the regex because it expects some content -- but I could not find an answer to this online, or find time to try it.



Hope this helps,





Andy Helck
Wilkinson Public Library
Telluride CO 81435
________________________________
From: MarcEdit support in technical and instructional matters [[log in to unmask]<mailto:[log in to unmask]>] on behalf of McDonald, Stephen [[log in to unmask]<mailto:[log in to unmask]>]
Sent: Thursday, January 17, 2013 1:36 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [MARCEDIT-L] Converting 300 field to provider neutral
No, the number of parentheses matches.  There are nine ( and nine ).

                                                                                Steve McDonald
                                                                                [log in to unmask]<mailto:[log in to unmask]>

From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Doug Rippey
Sent: Thursday, January 17, 2013 2:21 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Converting 300 field to provider neutral

Chris:

You are correct about the meaning of the message.  In the regex you give, I count nine left parens, but only eight right parens:

                =300  \\\\\$a((.*([^:]|[^<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> ]:)\$b)|([^$]*(ill|map|port))|(.*([^;]|[^];)\$c)|(.*\$b[^$]*(\d| )cm))


One possible fix (there may well be other syntax issues, too):  Adding one more right parens to the end of the regex made a Find All execute for me in ME 5.9.4756 (slightly behind the current rev):


                =300  \\\\\$a((.*([^:]|[^<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> ]:)\$b)|([^$]*(ill|map|port))|(.*([^;]|[^];)\$c)|(.*\$b[^$]*(\d| )cm)))

That regex is definitely something that can only be run at the front end of your process, and even then it may drop some records with RDA-acceptable content in MARC 300s.
I would tread with caution, and carefully consider this in light of whatever local policy decisions you may have made.

Doug Rippey
Metadata Technician IV
Penrose Library
Archives Processing and
   Digital Content Metadata
303.668.7669 (mobile)
[log in to unmask]<mailto:[log in to unmask]>
http://library.du.edu<http://library.du.edu/>
<image001.jpg>





From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Fox, Chris
Sent: Thursday, January 17, 2013 10:30 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [MARCEDIT-L] Converting 300 field to provider neutral


Kristina and other MarcEditors,



I need help with a regular expression I received from Kristina Spurgin a few months ago as part of her process for converting 300 fields to provider neutral (see below for full procedure).  The regex in question is the first one:



=300  \\\\\$a((.*([^:]|[^<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> ]:)\$b)|([^$]*(ill|map|port))|(.*([^;]|[^];)\$c)|(.*\$b[^$]*(\d| )cm))



When I run this command using Find All, with Use Regular Expressions checked, I get the following error message:

<image002.jpg>



I’ve been over this regex with a fine-tooth comb, and I can’t seem to find where an extra parentheses is or where one might be lacking.  I think that’s what the error message is saying.  Can you, Kristina, or anyone else, tell me why this regex isn’t working?  I don’t know how to do regex at all, so I have to depend on the willingness of those who do to help me out.  Thank you in advance.



Chris Fox

Catalog Librarian

McKay Library

Brigham Young Univ.-Idaho

[log in to unmask]<mailto:[log in to unmask]>



-----Original Message-----

From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Kristina Spurgin

Sent: Thursday, September 13, 2012 9:57 AM

To: [log in to unmask]<mailto:[log in to unmask]>

Subject: Re: Converting 300 field to provider neutral



Here's what we do. It may not catch everything or work in all cases, but we find it acceptable:



Check for non-standard formatting/punctuation of 300s by doing regex find all on:

=300  \\\\\$a((.*([^:]|[^<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> ]:)\$b)|([^$]*(ill|map|port))|(.*([^;]|[^];)\$c)|(.*\$b[^$]*(\d| )cm))



If no results are found:

  - run MARCedit task "Eresourcify 300s" (task text file attached:

task_x-300e.txt)



If any results are found:

  - run MARCedit task that cleans up common punctuation/formatting problems (task file attached: task_x-300cl.txt)

  - do regex find all search again

  -- if there are no results, run the "Eresourcify 300s" task

  -- if there are results, clean up manually. If there are recurring patterns in the problems, send examples to Kristina so she can write them into the cleanup task



-=-=



To test the tasks, you will need to put them in the MARCedit Application Data directory, which will be located somewhere like:



C:\Users\{Your Windows User Name}\AppData\marcedit\macros or C:\Documents and Settings\{Your User Name}\Application Data\marcedit\macros



Then, you will need to open the "_tasks.txt" file found in the Application Data directory with a text editor, and add two new lines pointing to the new tasks. You should just be able to copy an existing line, change the name of the task to suit you (first thing on the line), and edit the file name at the end of the file path string.



Restart MARCedit, and you should be able to try the tasks out.



best,

-=-

Kristina M. Spurgin

    E-RESOURCES CATALOGER

      E-Resources & Serials Management, Davis Library

                       University of North Carolina at Chapel Hill

              CB#3938, Davis Library -- Chapel Hill, NC 27514-8890

                            919-962-2050 -- [log in to unmask]<mailto:[log in to unmask]>





On 9/13/2012 10:24 AM, Donley, Leah wrote:

> Hi Shelley,

>

>

>

> Thank you for taking the time to look at this.  I would love to simplify my steps :) I tested your suggestion and while it works on the examples I mentioned, my file also contains records that are already in provider neutral format, and those ended up with an extra "1 online resource":

>

> \\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> online resource (1 online resource (xiv, 480 p.))

>

>

>

> I don’t understand how, but part of my second step seems to prevent that from occurring.  The files I’m working on include almost every possible 300 field format so I'm trying to catch and address them all correctly in the simplest way possible.  These examples are also not addressed by my procedure:

>

> \\$ap.$ccm<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>.

>

> \\$ap<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>. cm.

>

>

>

> Using your below steps, they change to:

>

> \\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> online resource (p.)$ccm.

>

> \\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx> online resource (p. cm.)

>

>

>

> I’m currently handling these examples by extracting the records (I sort using the “Select Individual Records to Make” tool to find the ugly 300 fields and replace them all with “1 online resource.).  If it’s not possible to catch these another way, I think going forward I will incorporate your steps at this point because the result is preferable over a blanket replace with “one online resource” which may or may not be accurate.  Ideally, I would love a procedure that perfectly addresses all of the variations I’m coming across, but understand that may be wishful thinking!

>

>

>

> Thanks,

>

> Leah

>

>

>

>

>

> -----Original Message-----

>

> From: MarcEdit support in technical and instructional matters

> [mailto:[log in to unmask]]<mailto:[mailto:[log in to unmask]

> u.edu<http://u.edu>]> On Behalf Of Shelley Doljack

>

> Sent: Wednesday, September 12, 2012 1:34 PM

>

> To: [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>

>

> Subject: Re: [MARCEDIT-L] Converting 300 field to provider neutral

>

>

>

> It seems to me that the steps you have are over-complicating it. This is what I would do (and tested it and it seems to work):

>

>

>

> Original 300 fields:

>

> =300  \\$avi<file:///\\$avi<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 514 p.

>

> =300  \\$av<file:///\\$av<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>.

>

> =300  \\$aix<file:///\\$aix<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 341 p. :$bill.

>

> =300  \\$axviii<file:///\\$axviii<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 263 p. :$bill. ; $e1 CD-ROM (4 3/4

> in.)

>

> =300  \\$axiii<file:///\\$axiii<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 357 p. :$bill. ;$c24 cm.

>

> =300  \\$ax<file:///\\$ax<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 399 p. :$bill. ;$c24 cm. +$e1 CD-ROM (4

> 3/4 in.)

>

>

>

> 1. Use the Edit Subfield tool:

>

> Field: 300

>

> Subfield: a

>

> Field Data: a(.+)

>

> Replace with: a1 online resource ($1)

>

> check regex

>

> click replace text button

>

>

>

> 300 fields after step 1:

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (vi, 514 p.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (v.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (ix, 341 p. :)$bill.

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (xviii, 263 p. :)$bill. ;

> $e1 CD-ROM (4 3/4 in.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (xiii, 357 p. :)$bill. ;$c24 cm.

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (x, 399 p. :)$bill. ;$c24

> cm. +$e1 CD-ROM (4 3/4 in.)

>

>

>

> 2. Use the Find/Replace tool to :

>

> Find: (=300.+)([\s]:\))

>

> Replace: $1) :

>

> check regex

>

> click replace all button

>

>

>

> 300 fields after step 2:

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (vi, 514 p.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (v.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (ix, 341 p.) :$bill.

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (xviii, 263 p.) :$bill. ;

> $e1 CD-ROM (4 3/4 in.)

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (xiii, 357 p.) :$bill. ;$c24 cm.

>

> =300  \\$a1<file:///\\$a1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> online resource (x, 399 p.) :$bill. ;$c24

> cm. +$e1 CD-ROM (4 3/4 in.)

>

>

>

> 3. Remove $c's and $e's as you like.

>

>

>

>

>

> Regards,

>

> Shelley

>

>

>

> ----- Original Message -----

>

>> From: "Leah Donley" <[log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>>

>

>> To: [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>

>

>> Sent: Wednesday, September 12, 2012 6:43:08 AM

>

>> Subject: Converting 300 field to provider neutral

>

>>

>

>>

>

>>

>

>>

>

>> Good morning,

>

>>

>

>>

>

>>

>

>> I’ve come across another 300 field “format” in ebook records. I’d

>

>> like to find a way to change these to provider-neutral format, but

>

>> again, my procedure (copied below) is not picking these instances

>

>> up. A few examples are below. Any suggestions would be appreciated –

>

>> whether it’s modifying the regex in a step or adding another

>

>> step(s).

>

>>

>

>>

>

>>

>

>> Example 300 fields:

>

>>

>

>> \\$aviii<file:///\\$aviii<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 248 p.

>

>>

>

>> \\$axiv<file:///\\$axiv<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>, 480 p.

>

>>

>

>> \\$av<file:///\\$av<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>>.

>

>>

>

>>

>

>>

>

>>

>

>>

>

>> 1) Add semicolon to 300 $a that ends without semicolon or colon, and

>

>> replace it with a 300 $a that does:

>

>>

>

>> FIND: =300\s\s\\\\\$a(\d)\sv.$

>

>>

>

>> REPLACE: =300 \\$a$1<file:///\\$a$1<https://mail.telluridelibrary.org/owa/UrlBlockedError.aspx>> v. ;

>

>>

>

>> USE REGULAR EXPRESSIONS checked

>

>>

>

>>

>

>>

>

>> 2) This step that will insert “1 online resource” and move the

>

>> original contents of subfield a into the parentheses:

>

>>

>

>> Find: (?<one>=300.*\$a)(?!1 online resource \()(.*?)( [;:].*)

>

>>

>

>> Replace: ${one}1 online resource ($1)$2

>

>>

>

>>

>

>>

>

>> 3) This will remove any subfield c’s

>

>>

>

>> Find: (=300.*)( ;\$c.*)

>

>>

>

>> Replace: $1

>

>>

>

>>

>

>>

>

>> 4) Remove semicolons from ends of any subfield b’s

>

>>

>

>> Field: 300

>

>>

>

>> Subfield: b

>

>>

>

>> Field data: b(.+)\s;

>

>>

>

>> Replace with: b$1

>

>>

>

>> Click "Replace text"

>

>>

>

>>

>

>>

>

>> 5) Remove “trick” semicolon you inserted into subfield a in the first

>

>> step:

>

>>

>

>> Field: 300

>

>>

>

>> Subfield: a

>

>>

>

>> Field data: a(.+)\s;

>

>>

>

>> Replace with: a$1

>

>>

>

>> Click "Replace text"

>

>>

>

>>

>

>>

>

>> 6) My final step is to add 300 fields to the records that don’t have

>

>> them using the add/delete field function

>

>>

>

>>

>

>>

>

>> Thank you,

>

>>

>

>> Leah Donley

>

>>

>

>>

>

>>

>

>>

>

>>

>

>> Leah Donley

>

>> Information Specialist

>

>> Brookhaven National Laboratory

>

>> Email: [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>

>

>>

>

>>

>

>>

>

>> _____________________________________________________________________

>> ___

>

>>

>

>> This message comes to you via MARCEDIT-L, a Listserv(R) list for

>

>> technical and instructional support in MarcEdit. If you wish to

>

>> communicate directly with the list owners, write to

>

>> [log in to unmask]<mailto:MARCEDIT-L-request@listser<mailto:[log in to unmask]:MARCEDIT-L-request@listser>

>> v.gmu.edu<http://v.gmu.edu>>. To unsubscribe, send a message

>

>> "SIGNOFF MARCEDIT-L" to [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>.

>

>

>

> --

>

> Shelley Doljack

>

> E-Resources Metadata Librarian

>

> Metadata and Library Systems

>

> Stanford University Libraries

>

> [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>

>

> 650-725-0167

>

>

>

> ______________________________________________________________________

> __

>

>

>

> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>. To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]<mailto:[log in to unmask]<mailto:[log in to unmask]:[log in to unmask]>>.

>



________________________________________________________________________



This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask]<mailto:[log in to unmask]>. To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]<mailto:[log in to unmask]>.
________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask]<mailto:[log in to unmask]>. To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]<mailto:[log in to unmask]>.

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask]<mailto:[log in to unmask]>. To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]<mailto:[log in to unmask]>.

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2