MARCEDIT-L Archives

November 2011

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Wilson, Margaret" <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Mon, 7 Nov 2011 00:25:16 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (56 lines)
If I understand this thread correctly, I have a similar problem with a Films on Demand file I'm working on.   Some, but not all, apostrophes are encoded incorrectly throughout the file.  Would the search you are adding enable me to find those records?

Margaret Wilson
University of Kansas Libraries

________________________________________
From: MarcEdit support in technical and instructional matters [[log in to unmask]] on behalf of Reese, Terry [[log in to unmask]]
Sent: Friday, November 04, 2011 1:38 PM
To: [log in to unmask]
Subject: Re: [MARCEDIT-L] Correcting character encoding issues

Yes -- I can add a search to Marc Spy.

--TR

-----Original Message-----
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Shelley Doljack
Sent: Friday, November 04, 2011 11:37 AM
To: [log in to unmask]
Subject: [MARCEDIT-L] Correcting character encoding issues

Hi Terry,

I found a whole bunch of character encoding problems in a set of vendor ebook MARC records. The LDR/09 value is "a" for UTF-8 but when I break the file (I do not select the checkboxes "translate to UTF-8/MARC-8"), the diacritic is not correct. For instance, where there should be an a umlaut, marcedit displays:

=100  1\$aKl{copy}?ger, Roland.

My font settings are set to Arial Unicode MS, so that's not the problem. And I'm pretty sure marcedit is not the problem either. I've used the MARC Spy tool to see the hex code points for the a umlaut and they are C3 3F. But C3 3F is not a valid UTF-8 sequence, I think. The a umlaut should be C3 A4 according to http://www.fileformat.info/info/unicode/char/e4/index.htm. When I change it to those values and save the file, I get the correct diacritic displaying when I re-break it.

I was wondering if you could add a search or find function to the MARC Spy tool so correcting the incorrect code points would be easier. Or maybe others on this list could recommend how they deal with correcting character encoding issues. I normally let the vendor know, but this vendor particular vendor is supposedly not able to replicate the problem or they don't want to deal with it.

I tried attaching a zip and .mrc file of records with character encoding issues but the list rejected my email both times. If anybody wants a copy of the file, let me know and I'll send it to you directly.

Thanks,
Shelley

----
Shelley Doljack
E-Resources Metadata Librarian
Metadata and Library Systems
Stanford University Libraries
[log in to unmask]
650-725-0167

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2