MARCEDIT-L Archives

September 2015

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Terry Reese <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Tue, 22 Sep 2015 09:14:21 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (34 lines)
Hi Caitlin, 

I wanted to answer this question directly because it comes up occasionally.  Exporting a set of records directly into a delimited format without user intervention (selection of fields to export) isn't something that is supported.  I've made an offer in the past, and it still stands, that if someone could provide a set of sane expectations for how this might work -- I'd consider taking it on.  The problem though, as I see it right now -- you'd need a column for ever field/subfield combination in a record, and you wouldn't know how many combinations that would be until you either preprocessed the records, or reserved columns ahead of time.  So, say we reserved all the potential columns ahead of time for speed -- there is the potential for 999 different numerical fields, with each field holding an average of 7 potential subfield combinations.  That means you'd need to reserve almost 7,000 columns.  However, we have fields -- those need their own columns.  That means that you add ~1000 more columns -- now we are up 8,000.  Since MarcEdit supports flavors of MARC beyond MARC21 -- that means you need make room for alpha numeric fields.  I see regular combinations of all alpha or mixed field titles -- so lets say another 2000 columns.  We are now up to a spreadsheet with potentially 10,000 columns and that still doesn't potentially capture all the variations that could be possible.  Because fields can be longer than 3 bytes if the leader value is set -- I've seen them -- the gist is, creating a spreadsheet for all potential combinations comes up with a potential spreadsheet size with more columns that most software can support.

If we got the just in time method -- we are still looking at significant values -- especially if we assume each field needs their own columns -- assuming dups.  I took a recordset of 2500 records, and then ran them through the field count tool to get an idea of what we are talking about.  With just these 2500 records, assuming each field/subfield pair got their own column, would generate a spreadsheet with 3493 columns.  Again, that is still a lot of column data.

So -- it's not that I haven't been willing to consider how to make this work -- I just don't see a way to make it work feasibly.  This is why the tool asks users to define the fields, and tell it how it wants to break those fields/subfields up.  

--tr

-----Original Message-----
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Hammer, Caitlin
Sent: Monday, September 21, 2015 6:03 PM
To: [log in to unmask]
Subject: [MARCEDIT-L] Exporting to CSV / Using MARC Edit with Open Refine

Hi,

I'm a new user and I'm trying to use MARC Edit with Open Refine.  Ideally, I'd like to be able to bring a set of about 5,000 records into MARC Edit, do some clean-up, then export to CSV so that I can use Open Refine's clustering function to fix typos and errors in subjects.  I see that I can use the Export Tab Delimited Records function to identify and export specific fields, but I don’t see an efficient way to export all of the fields to csv at once.  I feel like I must be missing something - help please?

Thanks,

Caitlin Hammer
Integrated Library System Program Office Library of Congress Washington, DC 20540-4010 [log in to unmask]
202.707.2757

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2