MARCEDIT-L Archives

November 2012

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Kristina Spurgin <[log in to unmask]>
Reply To:
Date:
Tue, 13 Nov 2012 13:20:30 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (209 lines)
On 11/13/2012 9:32 AM, Leslie Engelson wrote:
 > If I’m understanding you correctly, I cannot get a file
 > of only the records that are unique. Is that correct?

I'm not aware of a way to do that in MARCedit.

I wrote a Ruby script that can output a file of the records that are
unique, but it will only match on one field. I usually use it on the 001.

Using it requires you to:
- run the script from the command line
- have Ruby 1.9 installed
- have the following Ruby Gems installed:
-- trie
-- marc

If that sounds doable to you, let me know and I'll document the script a
little better and send it your way.

-=-
Kristina M. Spurgin
    E-RESOURCES CATALOGER
      E-Resources & Serials Management, Davis Library
                       University of North Carolina at Chapel Hill
              CB#3938, Davis Library -- Chapel Hill, NC 27514-8890
                            919-962-2050 -- [log in to unmask]


On 11/13/2012 9:32 AM, Leslie Engelson wrote:
> I had it match on the 001 and then on the 035. I’ve compared these
> fields in a handful of records and they are exact matches. I’m
> essentially looking for a couple of unique records out of over 12,000
> duplicates. If I’m understanding you correctly, I cannot get a file
> of only the records that are unique. Is that correct?
>
>
>
> Leslie
>
>
>
> From: MarcEdit support in technical and instructional matters
> [mailto:[log in to unmask]] On Behalf Of Reese, Terry Sent:
> Monday, November 12, 2012 3:06 PM To: [log in to unmask]
> Subject: Re: Comparing two files of records and extracting unique
> records
>
>
>
> The Dedup tool allows you to merge multiple files together and look
> for duplicates.  Essentially, the tools purpose is to look for
> duplicate records and print out a file that doesn’t include dups.
> The second file provides the file of records that is left out.
>
>
>
> When the program looks for dedups, these are exact matches.  So, if
> you looked at a 020$a for example and one record has an isbn and the
> second has an isbn + (ebook) -- these won’t match due to the fact
> that the data isn’t normalized.
>
>
>
> --tr
>
>
>
> ************************************* Terry Reese, Associate
> Professor Gray Family Chair for Innovative Library Services 121
> Valley Library Corvallis, OR  97331 tel: 541.737.6384
> *************************************
>
>
>
> From: Leslie Engelson Sent: ‎November‎ ‎12‎, ‎2012 ‎12‎:‎09‎ ‎PM To:
> <mailto:[log in to unmask]> [log in to unmask] Subject:
> Re: [MARCEDIT-L] Comparing two files of records and extracting unique
> records
>
>
>
> I have been playing around with this deduping feature all morning and
> am quite confused as to what it’s doing.
>
>
>
> I’m unclear as to what the file created from step 5 contains. Is this
> all the unique records from both files? Only the unique records from
> the first file? Only the unique records from the second file?
>
>
>
> What does the second file contain (from step 9)? I thought it would
> have the duplicate records but my results aren’t confirming that.
>
>
>
> When I select Print unique items, nothing prints.
>
>
>
> I need to dedupe two files and have as a result, a file of unique
> records but have followed the steps listed below and am not getting
> these unique records.
>
>
>
> Thanks for your help.
>
>
>
> Leslie
>
>
>
> Leslie Engelson
>
> Technical Services Librarian
>
> 224 Waterfield Library
>
> Murray State University
>
> Murray, KY 42071
>
> 270-809-4818
>
> <mailto:[log in to unmask]> [log in to unmask]
>
>
>
>
>
>
>
> From: MarcEdit support in technical and instructional matters [
> <mailto:[log in to unmask]> mailto:[log in to unmask]]
> On Behalf Of Reese, Terry Sent: Wednesday, October 03, 2012 4:52 PM
> To:  <mailto:[log in to unmask]> [log in to unmask]
> Subject: Re: Comparing two files of records and extracting unique
> records
>
>
>
> So, if I was going to give this a first try – what I would end up
> doing is the following:
>
> 1)      Open MarcEdit
>
> 2)      Select Tools/Find Duplicate Records
>
> 3)      First file selected would be the 12,692 record file (since
> this is the record you want)
>
> 4)      Second file added would be the 10,914 record file
>
> 5)      I’d set a save file
>
> 6)      I’d set my match point as the ISBN.  MarcEdit will compare
> all 020s in a record, it creates a separate hash object, with
> embedded hashes for objects with multiple identifiers – so multiple
> 020’s shouldn’t matter as long as they indeed match.
>
> 7)      Leave Dedup on as blank
>
> 8)      Select Print unique items as the Option
>
> 9)      I wouldn’t worry about saving the deduped items
>
> 10)   Then I would process.
>
>
>
> If that doesn’t get what you are looking for – zip the files and send
> them my way ( <mailto:[log in to unmask]>
> [log in to unmask]) and I’ll take a quick look to see why
> they aren’t deduping.  As I say, so long as an 020 that can work as a
> match point exists, this should work.
>
>
>
> --tr
>
>
>
> ________________________________________________________________________
>
>  This message comes to you via MARCEDIT-L, a Listserv(R) list for
> technical and instructional support in MarcEdit. If you wish to
> communicate directly with the list owners, write to
> <mailto:[log in to unmask]>
> [log in to unmask] To unsubscribe, send a message
> "SIGNOFF MARCEDIT-L" to  <mailto:[log in to unmask]>
> [log in to unmask]
>
>
> ________________________________________________________________________
>
>  This message comes to you via MARCEDIT-L, a Listserv(R) list for
> technical and instructional support in MarcEdit.  If you wish to
> communicate directly with the list owners, write to
> [log in to unmask] To unsubscribe, send a message
> "SIGNOFF MARCEDIT-L" to [log in to unmask]
>

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2