Subject: | |
From: | |
Reply To: | |
Date: | Wed, 16 Oct 2019 18:43:58 +0000 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On 10/16/2019 7:48 AM, Terry Reese wrote:
> Fuzzy matching was designed specifically to work for near term
> matches. I was introduced for a library combining authority files
> from different languages.
This question now is merely academic since the project itself has been
completed (we determined most 500 fields in our WMS local bib data were
either duplicated or sufficiently otherwise represented in the master
records, and nuked and paved them), but --
Do you have any samples for data that would be considered match/no match
at the various confidence level settings? When I was working on my
project earlier this year to try to see whether the contents of 500
fields in one file were matched by any 500, 505, or 520 fields in
another file, I turned it down as far as it would go, but the results
seemed inconsistent. Sometimes things were considered a match and
discarded from the file when the contents of the fields were so wildly
different I didn't understand how even 65% confidence would have matched
them -- or even in some cases where the field to be matched to was
entirely absent, a "match" was found anyway -- and in other cases data
that was almost identical except for spaces around punctuation marks
(the sort of thing you advised me in the first place to turn on fuzzy
matching to handle) was considered unique.
--
Lisa Hatt
Cataloging | De Anza College Library
[log in to unmask] | 408-864-8459
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|
|
|