Yes, I apologize for my incorrect answer. Please let me try again:
=100.+\$a[^\$]+\$d[^\$]+\$e
This says: Find a 100 field in which subfield $a is followed by one or
more characters that are NOT the subfield delimiter until subfield $d is
found; furthermore, this subfield $d must itself be followed by one or
more characters that are NOT the subfield delimiter until subfield $e is
found. So in your first example below, what I think happens is that the
regex engine, after it's matched "$a", continues along until it comes to
a delimiter, but that delimiter is not the first part of "$d" (it's the
first part of "$c"), so the matching stops and the next 100 field is
examined; thus this field is not included in your search results. The
same thing happens after it's matched a subfield $d, which is the case
of your second example. The first delimiter after "$d" is not part of
"$e" so the match fails, and this field too is discarded.
Thanks to insight from Regular-expressions.info
<http://www.regular-expressions.info/repeat.html>.
This worked on my little test file; I hope it works for you.
*****************************************
Walter F. Nickeson, Catalog &
Metadata Management Librarian
Rush Rhees Library
University of Rochester
Rochester, NY 14627-0055
[log in to unmask]
(585) 273-2326 fax: (585) 273-1032
*****************************************
> -----Original Message-----
> From: MarcEdit support in technical and instructional matters
> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
> Sent: Tuesday, July 31, 2012 5:49 PM
> To: [log in to unmask]
> Subject: Re: [MARCEDIT-L] question about regular expression to specify
> particular subfields
>
> Thanks for the quick suggestion - but it didn't work :(
> It's still matching headings like this:
> =100 1\$aLimerick, Thomas Dongan$cEarl of$d1634-1715$ecreator
> and
> =100 1\$aDe Peyster, J. Watts$d1821-1907$qJohn Watts$ecreator
>
> It seems like the .+? is still matching any element before it sees a
> $d (or $e) together - i.e., not distinguishing the presence of just
> the $ by itself. so as long as the next character after the $ is not
> a "d" (or "e"), then it keeps (greedily!) matching the next wildcard
> character. (oh why can't my regexes really be lazy :)
>
> for now, it seems this expression is working best (though it seems a
> little *too* specific):
> (=100 1.)(\$a[\w\s,.-]*)(\$d[\w\s,.-]*)(\$e.*$)
>
> I appreciate your taking a stab at it, and if you've any other ideas
> for identifying the end of a subfield, please share...
> thanks very much!
> heidi
>
> Heidi Frank
> Electronic Resources / Special Formats Cataloger
> New York University Libraries
> Technical Services Department
> 20 Cooper Square, 3rd Floor
> New York, NY 10003
> 212-998-2499 (office)
> 212-995-4366 (fax)
> [log in to unmask]
> Skype: hfrank71
>
>
> On Tue, Jul 31, 2012 at 4:51 PM, Nickeson, Walter
> <[log in to unmask]> wrote:
> > I think this will work ...
> >
> > Find: (=100.+\$a.+?\$)(d.+?\$)(e.+)
> >
> > The use of the question mark changes the find from greedy, in which
as
> > many characters as possible are grabbed until the last available
match,
> > to lazy, in which as few characters as possible are grabbed until
the
> > first available match. So the first matching group says: Find a 100
> > field and stop at the first subfield delimiter following subfield
$a.
> > (Without the question mark, the match would go on as long as needed
to
> > find "$d". The "lazy operator" ends the match quickly.) The second
> > matching group says: Continue if the subfield code following that
first
> > delimiter is "d", and stop at the next subfield delimiter. The third
> > matching group says: Continue if the subfield code after this next
> > delimiter is "e".
> >
> > I don't think you have to be concerned with specifying word or other
> > characters; you can just accept any old character, using the dot
> > notation.
> >
> > *****************************************
> > Walter F. Nickeson, Catalog &
> > Metadata Management Librarian
> > Rush Rhees Library
> > University of Rochester
> > Rochester, NY 14627-0055
> > [log in to unmask]
> > (585) 273-2326 fax: (585) 273-1032
> > *****************************************
> >
> >> -----Original Message-----
> >> From: MarcEdit support in technical and instructional matters
> >> [mailto:[log in to unmask]] On Behalf Of Heidi P Frank
> >> Sent: Tuesday, July 31, 2012 1:59 PM
> >> To: [log in to unmask]
> >> Subject: [MARCEDIT-L] question about regular expression to specify
> >> particular subfields
> >>
> >> Hi all,
> >> I'm trying to set up regular expressions to find particular
subfield
> >> patterns within the 100/700 fields, but I'm not sure how or if I
can
> >> specify the end of each subfield...
> >>
> >> For example, I'm looking for 100 fields having subfields = $a $d $e
> >> only. (i.e., I don't want patterns $a $c $d $e, or $a $d $q $e)
> >> Is there a way to specify that you want all of $a but if there's a
$c
> >> before $d, don't match?
> >>
> >> I've tested this regex:
> >> (=100 1.)(\$a[\w\s,.]*)(\$d[\w\s,.]*)(\$e)
> >>
> >> saying, I want $a followed by any word character, space, comma or
> >> period, any number of times - (\$a[\w\s,.]*)
> >> then followed by $d with the same - (\$d[\w\s,.]*)
> >> then followed by $e
> >>
> >> It seems to work, but is there a way to say you want $a followed by
> >> any character but NOT including a dollar sign within those
characters?
> >> or is the above the best means to do so? I'm just worried
there
> >> are characters I'm not thinking about that could appear in $a and
> >> would prefer to say, find any character except $.
> >>
> >> anyone have ideas or examples how to do this?
> >>
> >>
> >> NOTE:
> >> I found in one of Terry's ppt presentations an example where he
uses
> >> [^$] - in this expression:
> >> (=856.{4})(\$u.*[^$])(\$u.*)
> >> and explains that the second group should match $u but stop at the
end
> >> of the subfield.
> >>
> >> However, it seems this is only the case in his example because it's
> >> expecting only two $u subfields that he's splitting into separate
> >> 856s, so [^$] is not really saying stop at the end of the subfield,
> >> but rather stop when you see a dollar sign before another $u, so if
> >> there were intermittent subfields, they would still match the
pattern.
> >> ...I'm also wondering why the $ doesn't need to be escaped in the
> >> negation clause, like this: [^\$]
> >>
> >> any insights much appreciated!
> >> heidi
> >>
> >> Heidi Frank
> >> Electronic Resources / Special Formats Cataloger
> >> New York University Libraries
> >> Technical Services Department
> >> 20 Cooper Square, 3rd Floor
> >> New York, NY 10003
> >> 212-998-2499 (office)
> >> 212-995-4366 (fax)
> >> [log in to unmask]
> >> Skype: hfrank71
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|