MARCEDIT-L Archives

August 2012

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Nickeson, Walter" <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Wed, 1 Aug 2012 10:10:07 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (186 lines)
Yes, I apologize for my incorrect answer. Please let me try again:

=100.+\$a[^\$]+\$d[^\$]+\$e

This says: Find a 100 field in which subfield $a is followed by one or
more characters that are NOT the subfield delimiter until subfield $d is
found; furthermore, this subfield $d must itself be followed by one or
more characters that are NOT the subfield delimiter until subfield $e is
found. So in your first example below, what I think happens is that the
regex engine, after it's matched "$a", continues along until it comes to
a delimiter, but that delimiter is not the first part of "$d" (it's the
first part of "$c"), so the matching stops and the next 100 field is
examined; thus this field is not included in your search results. The
same thing happens after it's matched a subfield $d, which is the case
of your second example. The first delimiter after "$d" is not part of
"$e" so the match fails, and this field too is discarded.

Thanks to insight from Regular-expressions.info
<http://www.regular-expressions.info/repeat.html>.

This worked on my little test file; I hope it works for you.

*****************************************
  Walter F. Nickeson, Catalog &
    Metadata Management Librarian
  Rush Rhees Library
  University of Rochester
  Rochester, NY  14627-0055
  [log in to unmask] 
  (585) 273-2326  fax: (585) 273-1032
*****************************************
> -----Original Message-----
> From: MarcEdit support in technical and instructional matters
> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
> Sent: Tuesday, July 31, 2012 5:49 PM
> To: [log in to unmask]
> Subject: Re: [MARCEDIT-L] question about regular expression to specify
> particular subfields
> 
> Thanks for the quick suggestion - but it didn't work :(
> It's still matching headings like this:
> =100  1\$aLimerick, Thomas Dongan$cEarl of$d1634-1715$ecreator
> and
> =100  1\$aDe Peyster, J. Watts$d1821-1907$qJohn Watts$ecreator
> 
> It seems like the .+? is still matching any element before it sees a
> $d (or $e) together - i.e., not distinguishing the presence of just
> the $ by itself.  so as long as the next character after the $ is not
> a "d" (or "e"), then it keeps (greedily!) matching the next wildcard
> character.  (oh why can't my regexes really be lazy :)
> 
> for now, it seems this expression is working best (though it seems a
> little *too* specific):
> (=100  1.)(\$a[\w\s,.-]*)(\$d[\w\s,.-]*)(\$e.*$)
> 
> I appreciate your taking a stab at it, and if you've any other ideas
> for identifying the end of a subfield, please share...
> thanks very much!
> heidi
> 
> Heidi Frank
> Electronic Resources / Special Formats Cataloger
> New York University Libraries
> Technical Services Department
> 20 Cooper Square, 3rd Floor
> New York, NY  10003
> 212-998-2499 (office)
> 212-995-4366 (fax)
> [log in to unmask]
> Skype: hfrank71
> 
> 
> On Tue, Jul 31, 2012 at 4:51 PM, Nickeson, Walter
> <[log in to unmask]> wrote:
> > I think this will work ...
> >
> > Find: (=100.+\$a.+?\$)(d.+?\$)(e.+)
> >
> > The use of the question mark changes the find from greedy, in which
as
> > many characters as possible are grabbed until the last available
match,
> > to lazy, in which as few characters as possible are grabbed until
the
> > first available match. So the first matching group says: Find a 100
> > field and stop at the first subfield delimiter following subfield
$a.
> > (Without the question mark, the match would go on as long as needed
to
> > find "$d". The "lazy operator" ends the match quickly.) The second
> > matching group says: Continue if the subfield code following that
first
> > delimiter is "d", and stop at the next subfield delimiter. The third
> > matching group says: Continue if the subfield code after this next
> > delimiter is "e".
> >
> > I don't think you have to be concerned with specifying word or other
> > characters; you can just accept any old character, using the dot
> > notation.
> >
> > *****************************************
> >   Walter F. Nickeson, Catalog &
> >     Metadata Management Librarian
> >   Rush Rhees Library
> >   University of Rochester
> >   Rochester, NY  14627-0055
> >   [log in to unmask]
> >   (585) 273-2326  fax: (585) 273-1032
> > *****************************************
> >
> >> -----Original Message-----
> >> From: MarcEdit support in technical and instructional matters
> >> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
> >> Sent: Tuesday, July 31, 2012 1:59 PM
> >> To: [log in to unmask]
> >> Subject: [MARCEDIT-L] question about regular expression to specify
> >> particular subfields
> >>
> >> Hi all,
> >> I'm trying to set up regular expressions to find particular
subfield
> >> patterns within the 100/700 fields, but I'm not sure how or if I
can
> >> specify the end of each subfield...
> >>
> >> For example, I'm looking for 100 fields having subfields = $a $d $e
> >> only.  (i.e., I don't want patterns $a $c $d $e, or $a $d $q $e)
> >> Is there a way to specify that you want all of $a but if there's a
$c
> >> before $d, don't match?
> >>
> >> I've tested this regex:
> >> (=100  1.)(\$a[\w\s,.]*)(\$d[\w\s,.]*)(\$e)
> >>
> >> saying, I want $a followed by any word character, space, comma or
> >> period, any number of times - (\$a[\w\s,.]*)
> >> then followed by $d with the same - (\$d[\w\s,.]*)
> >> then followed by $e
> >>
> >> It seems to work, but is there a way to say you want $a followed by
> >> any character but NOT including a dollar sign within those
characters?
> >>   or is the above the best means to do so?    I'm just worried
there
> >> are characters I'm not thinking about that could appear in $a and
> >> would prefer to say, find any character except $.
> >>
> >> anyone have ideas or examples how to do this?
> >>
> >>
> >> NOTE:
> >> I found in one of Terry's ppt presentations an example where he
uses
> >> [^$] - in this expression:
> >> (=856.{4})(\$u.*[^$])(\$u.*)
> >> and explains that the second group should match $u but stop at the
end
> >> of the subfield.
> >>
> >> However, it seems this is only the case in his example because it's
> >> expecting only two $u subfields that he's splitting into separate
> >> 856s, so [^$] is not really saying stop at the end of the subfield,
> >> but rather stop when you see a dollar sign before another $u, so if
> >> there were intermittent subfields, they would still match the
pattern.
> >>  ...I'm also wondering why the $ doesn't need to be escaped in the
> >> negation clause, like this: [^\$]
> >>
> >> any insights much appreciated!
> >> heidi
> >>
> >> Heidi Frank
> >> Electronic Resources / Special Formats Cataloger
> >> New York University Libraries
> >> Technical Services Department
> >> 20 Cooper Square, 3rd Floor
> >> New York, NY  10003
> >> 212-998-2499 (office)
> >> 212-995-4366 (fax)
> >> [log in to unmask]
> >> Skype: hfrank71

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2