Subject: | |
From: | |
Reply To: | |
Date: | Fri, 19 Nov 2010 11:04:16 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
At 08:57 AM 11/19/2010 -0500, Dillon, Vicki wrote:
>While this info is not directly related to MarcEdit, it may be useful. An
>Alexander Street Press TCPT collection file that we processed was not
>MARC8 or UTF8. It appeared to be ISO 8859-1, since using a conversion
>from that to UTF8 gave reasonable results. [snip]
Probably a MS Win glitch, please beware Windows "Latin-whatever". You
should stick to Official ISO Latin-1 and not use the Windows CP 1252
codepage. In true ISO Latin-1, character codes in the range 127-159 are
undefined. The Microsoft CP 1252 ("Windows Latin-1") has assigned these
undefined codes to various values e.g, in Windows Latin-1, the Euro symbol
has the code 128 rather than its proper Unicode representation \u20AC [the
same problem happens to users with ISO-8859-15 - "Latin-9".]
*nix users should look at iconv (e.g. iconv -f ISO-8859-1 -t UTF-8
filename.txt -- Win users often have options, e.g. "TextPad" (by Helios)
allows you to "save as" UTF-8. OpenOffice (both *nix and win flavours)
allow you to "save as" "text encoded" with numerous options including the
true ISO standards.
Best regards,
Paul
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|
|
|