At 12:25 AM 11/7/2011 +0000, Wilson, Margaret wrote:
>If I understand this thread correctly, I have a similar problem with a
>Films on Demand file I'm working on. Some, but not all, apostrophes are
>encoded incorrectly throughout the file. Would the search you are adding
>enable me to find those records?
The "apostrophe problem" (Basic Latin 0027) is complex as there are
multiple character codes in multiple character sets. MarcEdit (and Koha as
well as other MySQL based applications) try and resolve this by using UTF-8
which should allow the necessary rigour, hence the need for conversion. The
most common starting point for errors is CP-1252 (default Windows code
page, which if confused with ISO-8859-1 or worse ANSI gives errors
interpreting Microsoft "smartquotes") which uses a backquote character
(often but not always converted correctly to UTF-8 0027), but there are
many more: (Unicode) 02B9 is a modifier letter prime, 02BC is a modifier
letter apostrophe, 2032 is a prime, A78C is a latin small letter saltillo, etc.
All of these can be found, used as apostrophes, in various biblio resources
that we have come across; our technical policy is that a very robust
conversion to UTF8 is required on all biblios from unknown and
Windows-produced sources. Some|most of this can be automated (macros in
oOo|LibreOffice is our method of choice) but a good "Mark 1 eyeball" (no
pun intended) over the finished file can find a glitch or two that are
capable of giving MarcEdit a dose of indigestion -- Gedit (Linux), Vim
(*nix and now windows) and TextPad (windows), are all capable of
sophisticated search and editing.
I'm sure Terry is on top of all of this -- I'm just saying that it's a
complex matter that must be resolved before any data can be handled by
MySQL, perl, php, etc as the apostrophe is a necessary part of the coding
and must be escaped or treated differently if it's part of the data.
</end of rant about apostrophes>
Paul
Tired old sys-admin
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|