MARCEDIT-L Archives

May 2015

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Brown, Alan" <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Thu, 21 May 2015 14:39:08 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (153 lines)
Thanks Terry,

1, We did ask Innovative to change the encoding, our db is UTF-8 and when checked using yaz-client the marc is utf-8

2, Tried a few ways of writing to file, StreamWriter, File.WriteAllText. I don't think I had a BOM problem, the issue appeared to be that the system was encoding a utf-8 string as latin-1 (which I guess is Encoding.Default for this computer).

What worked for me in the end was

byte[] bytes = Encoding.Default.GetBytes(mrc);
File.WriteAllBytes(found, bytes);

For both MARC-8 AND UTF-8 targets. It became a bit clearer to me when I rewrote the code in Perl and didn't specify any encoding for the output filehandle.

3, I cannot seem to get Query.Hits to display anything other than zero. But if I do

string mrc = query.Z3950Search("9781119010173", 7);
int hits = query.Hits;
Console.WriteLine(hits);

I get 0 even though there is a hit. Am I missing something here as well?

Sorry for all of the questions.
regards

Alan


-----Original Message-----
From: Terry Reese [mailto:[log in to unmask]] 
Sent: 20 May 2015 16:25
Subject: Re: encoding of marc strings from MARCEngine5.Query.Z3950Search

This is coming from Innovative -- so two comments -- though I think the second is the most applicable:

1) Are you sure Innovative is sending you the data in UTF8?  By default, Innovative doesn't -- it provides data in ISO-8859-1 unless you specifically ask them to change the encoding.  So, your database may be UTF8 (or MARC8), but your Z39.50 data would not be.  I'd confirm that the string is actually what you think that it is.  

2) The second thing I would look at is how you are writing your file.  The StreamWriter class always include the BOM characters to the beginning of a
UTF8 encoded file -- this will invalidate a MARC file as these characters are not part of the spec.  When creating the StreamWriter class, you have to specifically tell the class not to automatically output the BOM characters.
The reason the Streaming functions work is by default -- marcedit filters those characters (and a few others) from the data since they are not valid.
If you are reading and writing to files directly -- you will need to do that process yourself.


--tr

-----Original Message-----
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Brown, Alan
Sent: Wednesday, May 20, 2015 4:03 AM
To: [log in to unmask]
Subject: [MARCEDIT-L] encoding of marc strings from MARCEngine5.Query.Z3950Search

Hi,

This is probably me but I am struggling a bit on how to deal with the output of a search in c#. e.g. a simple isbn search.

string mrc = query.Z3950Search("9781119010173", 7);

Our library z3950 is UTF-8. But if I write this mrc to a file or just write to console the record is invalid. In order for this to be a valid record I need to change the encoding of the string from the default to UTF-8. After a bit of googling I got

byte[] bytes = Encoding.Default.GetBytes(mrc); mrc = Encoding.UTF8.GetString(bytes);

This string writes to file nicely and looks right in the console. Everything would be fine except our Bibliographic data supplier uses MARC8 and for this, the mrc string seems to write to console OK, but when writing to file I get an invalid record with some unexpected characters again. Is there a similar way to encode this string so it produces a valid record?

The MARCEngine5.MARC21.MARC2stream method appears to do the right thing regardless of the encoding of the z39 target. However I would like  to work out how to save the raw marc correctly

Sample code below

--snip--
       public static void Main()
        {

            string path = "c:\\utils\\scripts\\files\\";
            string found = path + "found.mrc";

            Query query = new Query();


            query.Start = 0;
            query.Limit = 1;
            query.Host = "library.bury.gov.uk";
            query.Database = "INNOPAC";
            query.Port = 210;

            query.Syntax = "USMARC";


            string mrc = query.Z3950Search("9781119010173", 7);

            using (StreamWriter writer = new StreamWriter(found)) {
                writer.Write(mrc);
            }
            Console.WriteLine(mrc);
            MARC21 rec = new MARC21();


            string mn = rec.MARC2Stream(mrc);
            Console.WriteLine(mn);
            mrc = rec.Mnemonic2Stream(mn);


            Console.WriteLine(mrc);
            //byte[] bytes = Encoding.Default.GetBytes(mrc);
            //mrc = Encoding.UTF8.GetString(bytes);

}

-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance with our information security policy.
The information contained in this e-mail and any files transmitted with it is for the intended recipient(s) alone. It may contain confidential information that is exempt from the disclosure under English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by using the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any response to it under the Freedom of Information Act 2000 unless the information in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at [log in to unmask] and on fax number
0161 253 5119 .
*************************************************************

________________________________________________________________________


This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

________________________________________________________________________


This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted
with it is for the intended recipient(s) alone. It may contain
confidential information that is exempt from the disclosure under
English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any
action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by using 
the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be 
intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any 
response to it under the Freedom of Information Act 2000 unless the information
in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at [log in to unmask] and on fax number 
0161 253 5119 .
*************************************************************

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2