On 11/13/2012 9:32 AM, Leslie Engelson wrote:
> If I’m understanding you correctly, I cannot get a file
> of only the records that are unique. Is that correct?
I'm not aware of a way to do that in MARCedit.
I wrote a Ruby script that can output a file of the records that are
unique, but it will only match on one field. I usually use it on the 001.
Using it requires you to:
- run the script from the command line
- have Ruby 1.9 installed
- have the following Ruby Gems installed:
-- trie
-- marc
If that sounds doable to you, let me know and I'll document the script a
little better and send it your way.
-=-
Kristina M. Spurgin
E-RESOURCES CATALOGER
E-Resources & Serials Management, Davis Library
University of North Carolina at Chapel Hill
CB#3938, Davis Library -- Chapel Hill, NC 27514-8890
919-962-2050 -- [log in to unmask]
On 11/13/2012 9:32 AM, Leslie Engelson wrote:
> I had it match on the 001 and then on the 035. I’ve compared these
> fields in a handful of records and they are exact matches. I’m
> essentially looking for a couple of unique records out of over 12,000
> duplicates. If I’m understanding you correctly, I cannot get a file
> of only the records that are unique. Is that correct?
>
>
>
> Leslie
>
>
>
> From: MarcEdit support in technical and instructional matters
> [mailto:[log in to unmask]] On Behalf Of Reese, Terry Sent:
> Monday, November 12, 2012 3:06 PM To: [log in to unmask]
> Subject: Re: Comparing two files of records and extracting unique
> records
>
>
>
> The Dedup tool allows you to merge multiple files together and look
> for duplicates. Essentially, the tools purpose is to look for
> duplicate records and print out a file that doesn’t include dups.
> The second file provides the file of records that is left out.
>
>
>
> When the program looks for dedups, these are exact matches. So, if
> you looked at a 020$a for example and one record has an isbn and the
> second has an isbn + (ebook) -- these won’t match due to the fact
> that the data isn’t normalized.
>
>
>
> --tr
>
>
>
> ************************************* Terry Reese, Associate
> Professor Gray Family Chair for Innovative Library Services 121
> Valley Library Corvallis, OR 97331 tel: 541.737.6384
> *************************************
>
>
>
> From: Leslie Engelson Sent: November 12, 2012 12:09 PM To:
> <mailto:[log in to unmask]> [log in to unmask] Subject:
> Re: [MARCEDIT-L] Comparing two files of records and extracting unique
> records
>
>
>
> I have been playing around with this deduping feature all morning and
> am quite confused as to what it’s doing.
>
>
>
> I’m unclear as to what the file created from step 5 contains. Is this
> all the unique records from both files? Only the unique records from
> the first file? Only the unique records from the second file?
>
>
>
> What does the second file contain (from step 9)? I thought it would
> have the duplicate records but my results aren’t confirming that.
>
>
>
> When I select Print unique items, nothing prints.
>
>
>
> I need to dedupe two files and have as a result, a file of unique
> records but have followed the steps listed below and am not getting
> these unique records.
>
>
>
> Thanks for your help.
>
>
>
> Leslie
>
>
>
> Leslie Engelson
>
> Technical Services Librarian
>
> 224 Waterfield Library
>
> Murray State University
>
> Murray, KY 42071
>
> 270-809-4818
>
> <mailto:[log in to unmask]> [log in to unmask]
>
>
>
>
>
>
>
> From: MarcEdit support in technical and instructional matters [
> <mailto:[log in to unmask]> mailto:[log in to unmask]]
> On Behalf Of Reese, Terry Sent: Wednesday, October 03, 2012 4:52 PM
> To: <mailto:[log in to unmask]> [log in to unmask]
> Subject: Re: Comparing two files of records and extracting unique
> records
>
>
>
> So, if I was going to give this a first try – what I would end up
> doing is the following:
>
> 1) Open MarcEdit
>
> 2) Select Tools/Find Duplicate Records
>
> 3) First file selected would be the 12,692 record file (since
> this is the record you want)
>
> 4) Second file added would be the 10,914 record file
>
> 5) I’d set a save file
>
> 6) I’d set my match point as the ISBN. MarcEdit will compare
> all 020s in a record, it creates a separate hash object, with
> embedded hashes for objects with multiple identifiers – so multiple
> 020’s shouldn’t matter as long as they indeed match.
>
> 7) Leave Dedup on as blank
>
> 8) Select Print unique items as the Option
>
> 9) I wouldn’t worry about saving the deduped items
>
> 10) Then I would process.
>
>
>
> If that doesn’t get what you are looking for – zip the files and send
> them my way ( <mailto:[log in to unmask]>
> [log in to unmask]) and I’ll take a quick look to see why
> they aren’t deduping. As I say, so long as an 020 that can work as a
> match point exists, this should work.
>
>
>
> --tr
>
>
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for
> technical and instructional support in MarcEdit. If you wish to
> communicate directly with the list owners, write to
> <mailto:[log in to unmask]>
> [log in to unmask] To unsubscribe, send a message
> "SIGNOFF MARCEDIT-L" to <mailto:[log in to unmask]>
> [log in to unmask]
>
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for
> technical and instructional support in MarcEdit. If you wish to
> communicate directly with the list owners, write to
> [log in to unmask] To unsubscribe, send a message
> "SIGNOFF MARCEDIT-L" to [log in to unmask]
>
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|