MARCEDIT-L Archives

August 2012

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Heidi P Frank <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Wed, 1 Aug 2012 15:08:16 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (212 lines)
YES!  This seems to get it!   I didn't realize you could have
quantifiers applied to the negation class, so that's extremely useful
to know.  thanks for figuring this out, and also for sending the
tutorial.

cheers,
heidi

Heidi Frank
Electronic Resources / Special Formats Cataloger
New York University Libraries
Technical Services Department
20 Cooper Square, 3rd Floor
New York, NY  10003
212-998-2499 (office)
212-995-4366 (fax)
[log in to unmask]
Skype: hfrank71


On Wed, Aug 1, 2012 at 10:10 AM, Nickeson, Walter
<[log in to unmask]> wrote:
> Yes, I apologize for my incorrect answer. Please let me try again:
>
> =100.+\$a[^\$]+\$d[^\$]+\$e
>
> This says: Find a 100 field in which subfield $a is followed by one or
> more characters that are NOT the subfield delimiter until subfield $d is
> found; furthermore, this subfield $d must itself be followed by one or
> more characters that are NOT the subfield delimiter until subfield $e is
> found. So in your first example below, what I think happens is that the
> regex engine, after it's matched "$a", continues along until it comes to
> a delimiter, but that delimiter is not the first part of "$d" (it's the
> first part of "$c"), so the matching stops and the next 100 field is
> examined; thus this field is not included in your search results. The
> same thing happens after it's matched a subfield $d, which is the case
> of your second example. The first delimiter after "$d" is not part of
> "$e" so the match fails, and this field too is discarded.
>
> Thanks to insight from Regular-expressions.info
> <http://www.regular-expressions.info/repeat.html>.
>
> This worked on my little test file; I hope it works for you.
>
> *****************************************
>   Walter F. Nickeson, Catalog &
>     Metadata Management Librarian
>   Rush Rhees Library
>   University of Rochester
>   Rochester, NY  14627-0055
>   [log in to unmask]
>   (585) 273-2326  fax: (585) 273-1032
> *****************************************
>> -----Original Message-----
>> From: MarcEdit support in technical and instructional matters
>> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
>> Sent: Tuesday, July 31, 2012 5:49 PM
>> To: [log in to unmask]
>> Subject: Re: [MARCEDIT-L] question about regular expression to specify
>> particular subfields
>>
>> Thanks for the quick suggestion - but it didn't work :(
>> It's still matching headings like this:
>> =100  1\$aLimerick, Thomas Dongan$cEarl of$d1634-1715$ecreator
>> and
>> =100  1\$aDe Peyster, J. Watts$d1821-1907$qJohn Watts$ecreator
>>
>> It seems like the .+? is still matching any element before it sees a
>> $d (or $e) together - i.e., not distinguishing the presence of just
>> the $ by itself.  so as long as the next character after the $ is not
>> a "d" (or "e"), then it keeps (greedily!) matching the next wildcard
>> character.  (oh why can't my regexes really be lazy :)
>>
>> for now, it seems this expression is working best (though it seems a
>> little *too* specific):
>> (=100  1.)(\$a[\w\s,.-]*)(\$d[\w\s,.-]*)(\$e.*$)
>>
>> I appreciate your taking a stab at it, and if you've any other ideas
>> for identifying the end of a subfield, please share...
>> thanks very much!
>> heidi
>>
>> Heidi Frank
>> Electronic Resources / Special Formats Cataloger
>> New York University Libraries
>> Technical Services Department
>> 20 Cooper Square, 3rd Floor
>> New York, NY  10003
>> 212-998-2499 (office)
>> 212-995-4366 (fax)
>> [log in to unmask]
>> Skype: hfrank71
>>
>>
>> On Tue, Jul 31, 2012 at 4:51 PM, Nickeson, Walter
>> <[log in to unmask]> wrote:
>> > I think this will work ...
>> >
>> > Find: (=100.+\$a.+?\$)(d.+?\$)(e.+)
>> >
>> > The use of the question mark changes the find from greedy, in which
> as
>> > many characters as possible are grabbed until the last available
> match,
>> > to lazy, in which as few characters as possible are grabbed until
> the
>> > first available match. So the first matching group says: Find a 100
>> > field and stop at the first subfield delimiter following subfield
> $a.
>> > (Without the question mark, the match would go on as long as needed
> to
>> > find "$d". The "lazy operator" ends the match quickly.) The second
>> > matching group says: Continue if the subfield code following that
> first
>> > delimiter is "d", and stop at the next subfield delimiter. The third
>> > matching group says: Continue if the subfield code after this next
>> > delimiter is "e".
>> >
>> > I don't think you have to be concerned with specifying word or other
>> > characters; you can just accept any old character, using the dot
>> > notation.
>> >
>> > *****************************************
>> >   Walter F. Nickeson, Catalog &
>> >     Metadata Management Librarian
>> >   Rush Rhees Library
>> >   University of Rochester
>> >   Rochester, NY  14627-0055
>> >   [log in to unmask]
>> >   (585) 273-2326  fax: (585) 273-1032
>> > *****************************************
>> >
>> >> -----Original Message-----
>> >> From: MarcEdit support in technical and instructional matters
>> >> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
>> >> Sent: Tuesday, July 31, 2012 1:59 PM
>> >> To: [log in to unmask]
>> >> Subject: [MARCEDIT-L] question about regular expression to specify
>> >> particular subfields
>> >>
>> >> Hi all,
>> >> I'm trying to set up regular expressions to find particular
> subfield
>> >> patterns within the 100/700 fields, but I'm not sure how or if I
> can
>> >> specify the end of each subfield...
>> >>
>> >> For example, I'm looking for 100 fields having subfields = $a $d $e
>> >> only.  (i.e., I don't want patterns $a $c $d $e, or $a $d $q $e)
>> >> Is there a way to specify that you want all of $a but if there's a
> $c
>> >> before $d, don't match?
>> >>
>> >> I've tested this regex:
>> >> (=100  1.)(\$a[\w\s,.]*)(\$d[\w\s,.]*)(\$e)
>> >>
>> >> saying, I want $a followed by any word character, space, comma or
>> >> period, any number of times - (\$a[\w\s,.]*)
>> >> then followed by $d with the same - (\$d[\w\s,.]*)
>> >> then followed by $e
>> >>
>> >> It seems to work, but is there a way to say you want $a followed by
>> >> any character but NOT including a dollar sign within those
> characters?
>> >>   or is the above the best means to do so?    I'm just worried
> there
>> >> are characters I'm not thinking about that could appear in $a and
>> >> would prefer to say, find any character except $.
>> >>
>> >> anyone have ideas or examples how to do this?
>> >>
>> >>
>> >> NOTE:
>> >> I found in one of Terry's ppt presentations an example where he
> uses
>> >> [^$] - in this expression:
>> >> (=856.{4})(\$u.*[^$])(\$u.*)
>> >> and explains that the second group should match $u but stop at the
> end
>> >> of the subfield.
>> >>
>> >> However, it seems this is only the case in his example because it's
>> >> expecting only two $u subfields that he's splitting into separate
>> >> 856s, so [^$] is not really saying stop at the end of the subfield,
>> >> but rather stop when you see a dollar sign before another $u, so if
>> >> there were intermittent subfields, they would still match the
> pattern.
>> >>  ...I'm also wondering why the $ doesn't need to be escaped in the
>> >> negation clause, like this: [^\$]
>> >>
>> >> any insights much appreciated!
>> >> heidi
>> >>
>> >> Heidi Frank
>> >> Electronic Resources / Special Formats Cataloger
>> >> New York University Libraries
>> >> Technical Services Department
>> >> 20 Cooper Square, 3rd Floor
>> >> New York, NY  10003
>> >> 212-998-2499 (office)
>> >> 212-995-4366 (fax)
>> >> [log in to unmask]
>> >> Skype: hfrank71
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2