MARCEDIT-L Archives

February 2014

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Bryan Baldus <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Thu, 13 Feb 2014 12:09:44 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (34 lines)
On Thursday, February 13, 2014 11:00 AM, Jacquie Slater wrote:
>I’m wondering if anyone can help me with a diacritics problem:
>The file I received from the vendor (.mrc), when viewed in Notepad, shows the diacritics like this (as I want them to appear in the end):
>Le roi Arthur et les Chevaliers de la Table Ronde se lancent à la conquête du Graal, chevauchant de fantômatiques montures dans un bruitage de noix de coco cognées.
>When I MarcBreak the records (clicking on Translate to MARC-8, which I’ve always done) for editing, the line above now appears as:
>Le roi Arthur et les Chevaliers de la Table Ronde se lancent &#xFFFD; la conqu&#xFFFD;te du Graal, chevauchant de fant&#xFFFD;matiques montures dans un bruitage de noix de coco cogn&#xFFFD;es
>I thought these were the Unicode equivalents, but each diacritic has been translated to the exact same code, even though they are different in the original source.  After editing the records and recompiling into MARC, the &#xFFFD remains.
>I’ve spent a lot of time trying to figure this out now, without any success, so decided it was time to turn to the experts.

When looking at the record in Notepad, a Unicode version of a similar record (LCCN 2009381805) looks like:
une légende en devenir : exposition présentée aux Champs Libres à Rennes
in MARC-8:
une lâegende en devenir : exposition prâesentâee aux Champs Libres áa Rennes

In your case, it looks like the records are encoded in something else (I'm not overly familiar with character encoding issues, but it looks like the standard Windows encoding, or something similar (ISO 8859-1?)):
une légende en devenir : exposition présentée aux Champs Libres à Rennes

Unfortunately, I'm not familiar with a user-friendly way of converting from that encoding to MARC-8/UTF-8.

The description of Yaz, mentioned at [http://www.loc.gov/marc/marctools.html], mentions that it converts records between " the following encodings: UTF-8, ISO-8859-1, ISO-5426, ISO-5428, Danmarc2 + all encodings supported by a local iconv library. "

Sorry I can't be more help,

Bryan Baldus
Senior Cataloger
Quality Books Inc.
The Best of America's Independent Presses
1-800-323-4241x402
[log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2