MARCEDIT-L Archives

November 2011

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Mon, 7 Nov 2011 10:39:55 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (41 lines)
At 12:25 AM 11/7/2011 +0000, Wilson, Margaret wrote:
>If I understand this thread correctly, I have a similar problem with a 
>Films on Demand file I'm working on.   Some, but not all, apostrophes are 
>encoded incorrectly throughout the file.  Would the search you are adding 
>enable me to find those records?

The "apostrophe problem" (Basic Latin 0027) is complex as there are 
multiple character codes in multiple character sets.  MarcEdit (and Koha as 
well as other MySQL based applications) try and resolve this by using UTF-8 
which should allow the necessary rigour, hence the need for conversion. The 
most common starting point for errors is CP-1252 (default Windows code 
page, which if confused with ISO-8859-1 or worse ANSI gives errors 
interpreting Microsoft "smartquotes") which uses a backquote character 
(often but not always converted correctly to UTF-8 0027), but there are 
many more: (Unicode) 02B9 is a modifier letter prime, 02BC is a modifier 
letter apostrophe, 2032 is a prime, A78C is a latin small letter saltillo, etc.

All of these can be found, used as apostrophes, in various biblio resources 
that we have come across; our technical policy is that a very robust 
conversion to UTF8 is required on all biblios from unknown and 
Windows-produced sources.  Some|most of this can be automated (macros in 
oOo|LibreOffice is our method of choice) but a good "Mark 1 eyeball" (no 
pun intended) over the finished file can find a glitch or two that are 
capable of giving MarcEdit a dose of indigestion -- Gedit (Linux), Vim 
(*nix and now windows) and TextPad (windows), are all capable of 
sophisticated search and editing.

I'm sure Terry is on top of all of this -- I'm just saying that it's a 
complex matter that must be resolved before any data can be handled by 
MySQL, perl, php, etc as the apostrophe is a necessary part of the coding 
and must be escaped or treated differently if it's part of the data.

</end of rant about apostrophes>

Paul
Tired old sys-admin 

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2