MARCEDIT-L Archives

October 2019

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Lisa Hatt <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Wed, 16 Oct 2019 18:43:58 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (35 lines)
On 10/16/2019 7:48 AM, Terry Reese wrote:

> Fuzzy matching was designed specifically to work for near term
> matches.  I was introduced for a library combining authority files
> from different languages.

This question now is merely academic since the project itself has been 
completed (we determined most 500 fields in our WMS local bib data were 
either duplicated or sufficiently otherwise represented in the master 
records, and nuked and paved them), but --

Do you have any samples for data that would be considered match/no match 
at the various confidence level settings? When I was working on my 
project earlier this year to try to see whether the contents of 500 
fields in one file were matched by any 500, 505, or 520 fields in 
another file, I turned it down as far as it would go, but the results 
seemed inconsistent. Sometimes things were considered a match and 
discarded from the file when the contents of the fields were so wildly 
different I didn't understand how even 65% confidence would have matched 
them -- or even in some cases where the field to be matched to was 
entirely absent, a "match" was found anyway -- and in other cases data 
that was almost identical except for spaces around punctuation marks 
(the sort of thing you advised me in the first place to turn on fuzzy 
matching to handle) was considered unique.


-- 
Lisa Hatt
Cataloging | De Anza College Library
[log in to unmask] | 408-864-8459

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2