MARCEDIT-L Archives

May 2015

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Terry Reese <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Thu, 21 May 2015 23:47:40 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (229 lines)
2 - that makes sense.  My system defaults to UTF8 so System.Default gives a
different value.  However, I've always found when working with the
StreamReaders/Writers, you should always set your encoding because the
System.Default will not be static from machine to machine.  

3 -- I'll have to check; I changed internally how the values are queued so
this may not be set.  There is a different function that actually will
return records as an array, which is the easiest way to expose count.  But
I'll see why this value doesn't appear like it's being set.

--tr

-----Original Message-----
From: MarcEdit support in technical and instructional matters
[mailto:[log in to unmask]] On Behalf Of Brown, Alan
Sent: Thursday, May 21, 2015 9:39 AM
To: [log in to unmask]
Subject: Re: [MARCEDIT-L] encoding of marc strings from
MARCEngine5.Query.Z3950Search

Thanks Terry,

1, We did ask Innovative to change the encoding, our db is UTF-8 and when
checked using yaz-client the marc is utf-8

2, Tried a few ways of writing to file, StreamWriter, File.WriteAllText. I
don't think I had a BOM problem, the issue appeared to be that the system
was encoding a utf-8 string as latin-1 (which I guess is Encoding.Default
for this computer).

What worked for me in the end was

byte[] bytes = Encoding.Default.GetBytes(mrc); File.WriteAllBytes(found,
bytes);

For both MARC-8 AND UTF-8 targets. It became a bit clearer to me when I
rewrote the code in Perl and didn't specify any encoding for the output
filehandle.

3, I cannot seem to get Query.Hits to display anything other than zero. But
if I do

string mrc = query.Z3950Search("9781119010173", 7); int hits = query.Hits;
Console.WriteLine(hits);

I get 0 even though there is a hit. Am I missing something here as well?

Sorry for all of the questions.
regards

Alan


-----Original Message-----
From: Terry Reese [mailto:[log in to unmask]]
Sent: 20 May 2015 16:25
Subject: Re: encoding of marc strings from MARCEngine5.Query.Z3950Search

This is coming from Innovative -- so two comments -- though I think the
second is the most applicable:

1) Are you sure Innovative is sending you the data in UTF8?  By default,
Innovative doesn't -- it provides data in ISO-8859-1 unless you specifically
ask them to change the encoding.  So, your database may be UTF8 (or MARC8),
but your Z39.50 data would not be.  I'd confirm that the string is actually
what you think that it is.  

2) The second thing I would look at is how you are writing your file.  The
StreamWriter class always include the BOM characters to the beginning of a
UTF8 encoded file -- this will invalidate a MARC file as these characters
are not part of the spec.  When creating the StreamWriter class, you have to
specifically tell the class not to automatically output the BOM characters.
The reason the Streaming functions work is by default -- marcedit filters
those characters (and a few others) from the data since they are not valid.
If you are reading and writing to files directly -- you will need to do that
process yourself.


--tr

-----Original Message-----
From: MarcEdit support in technical and instructional matters
[mailto:[log in to unmask]] On Behalf Of Brown, Alan
Sent: Wednesday, May 20, 2015 4:03 AM
To: [log in to unmask]
Subject: [MARCEDIT-L] encoding of marc strings from
MARCEngine5.Query.Z3950Search

Hi,

This is probably me but I am struggling a bit on how to deal with the output
of a search in c#. e.g. a simple isbn search.

string mrc = query.Z3950Search("9781119010173", 7);

Our library z3950 is UTF-8. But if I write this mrc to a file or just write
to console the record is invalid. In order for this to be a valid record I
need to change the encoding of the string from the default to UTF-8. After a
bit of googling I got

byte[] bytes = Encoding.Default.GetBytes(mrc); mrc =
Encoding.UTF8.GetString(bytes);

This string writes to file nicely and looks right in the console. Everything
would be fine except our Bibliographic data supplier uses MARC8 and for
this, the mrc string seems to write to console OK, but when writing to file
I get an invalid record with some unexpected characters again. Is there a
similar way to encode this string so it produces a valid record?

The MARCEngine5.MARC21.MARC2stream method appears to do the right thing
regardless of the encoding of the z39 target. However I would like  to work
out how to save the raw marc correctly

Sample code below

--snip--
       public static void Main()
        {

            string path = "c:\\utils\\scripts\\files\\";
            string found = path + "found.mrc";

            Query query = new Query();


            query.Start = 0;
            query.Limit = 1;
            query.Host = "library.bury.gov.uk";
            query.Database = "INNOPAC";
            query.Port = 210;

            query.Syntax = "USMARC";


            string mrc = query.Z3950Search("9781119010173", 7);

            using (StreamWriter writer = new StreamWriter(found)) {
                writer.Write(mrc);
            }
            Console.WriteLine(mrc);
            MARC21 rec = new MARC21();


            string mn = rec.MARC2Stream(mrc);
            Console.WriteLine(mn);
            mrc = rec.Mnemonic2Stream(mn);


            Console.WriteLine(mrc);
            //byte[] bytes = Encoding.Default.GetBytes(mrc);
            //mrc = Encoding.UTF8.GetString(bytes);

}

-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted with it
is for the intended recipient(s) alone. It may contain confidential
information that is exempt from the disclosure under English law and may
also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take
any action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by
using the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may
be intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any
response to it under the Freedom of Information Act 2000 unless the
information in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at [log in to unmask] and on fax
number
0161 253 5119 .
*************************************************************

________________________________________________________________________


This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
and instructional support in MarcEdit.  If you wish to communicate directly
with the list owners, write to [log in to unmask] To
unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
[log in to unmask]

________________________________________________________________________


This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
and instructional support in MarcEdit.  If you wish to communicate directly
with the list owners, write to [log in to unmask] To
unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
[log in to unmask]
-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted with it
is for the intended recipient(s) alone. It may contain confidential
information that is exempt from the disclosure under English law and may
also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take
any action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by
using the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may
be intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any
response to it under the Freedom of Information Act 2000 unless the
information in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at [log in to unmask] and on fax
number
0161 253 5119 .
*************************************************************

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
and instructional support in MarcEdit.  If you wish to communicate directly
with the list owners, write to [log in to unmask] To
unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
[log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2