Re: MarcXML and char encodings

From: Sheila M. Morrissey <Sheila.Morrissey_at_nyob> Date: Tue, 17 Apr 2012 16:22:53 -0400 To: CODE4LIB_at_LISTSERV.ND.EDU

In XML standard:

	It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using 	their registered names; other encodings SHOULD use names starting with an "x-" prefix. XML processors SHOULD match character encoding names in a case-insensitive way and SHOULD 	either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-	registered encodings).

As I suggested -- since MARC8 isn't (so far as I know) registered -- you won't get far with most standard tools, in whatever language -- you'll have to extend them to first recognize the encoding name, and second, decode the content.

smm

-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind_at_jhu.edu] 
Sent: Tuesday, April 17, 2012 4:19 PM
To: Code for Libraries
Cc: Sheila M. Morrissey
Subject: Re: [CODE4LIB] MarcXML and char encodings

On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
> No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog,

Wait, how canyou declare a Marc8 encoding in an XML 
decleration/prolog/whatever it's called?

The things that appear there need to be from a specific list, and I 
didn't think Marc8 was on that list?

Can you give me an example?  And, if you happen to have it, link to XML 
standard that says this is legal?