MARCEDIT-L Archives

November 2010

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Archives and Collections Society <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Fri, 19 Nov 2010 11:04:16 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (28 lines)
At 08:57 AM 11/19/2010 -0500, Dillon, Vicki wrote:

>While this info is not directly related to MarcEdit, it may be useful.  An 
>Alexander Street Press TCPT collection file that we processed was not 
>MARC8 or UTF8.  It appeared to be ISO 8859-1, since using a conversion 
>from that to UTF8 gave reasonable results. [snip]

Probably a MS Win glitch, please beware Windows "Latin-whatever". You 
should stick to Official ISO Latin-1 and not use the Windows CP 1252 
codepage. In true ISO Latin-1, character codes in the range 127-159 are 
undefined. The Microsoft CP 1252 ("Windows Latin-1") has assigned these 
undefined codes to various values e.g, in Windows Latin-1, the Euro symbol 
has the code 128 rather than its proper Unicode representation \u20AC [the 
same problem happens to users with ISO-8859-15 - "Latin-9".]

*nix users should look at iconv (e.g. iconv -f ISO-8859-1 -t UTF-8 
filename.txt -- Win users often have options, e.g. "TextPad" (by Helios) 
allows you to "save as" UTF-8. OpenOffice (both *nix and win flavours) 
allow you to "save as" "text encoded" with numerous options including the 
true ISO standards.

Best regards,
Paul

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2