In XML standard:
It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix. XML processors SHOULD match character encoding names in a case-insensitive way and SHOULD either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA- registered encodings).
As I suggested -- since MARC8 isn't (so far as I know) registered -- you won't get far with most standard tools, in whatever language -- you'll have to extend them to first recognize the encoding name, and second, decode the content.
smm
-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind_at_jhu.edu]
Sent: Tuesday, April 17, 2012 4:19 PM
To: Code for Libraries
Cc: Sheila M. Morrissey
Subject: Re: [CODE4LIB] MarcXML and char encodings
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
> No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog,
Wait, how canyou declare a Marc8 encoding in an XML
decleration/prolog/whatever it's called?
The things that appear there need to be from a specific list, and I
didn't think Marc8 was on that list?
Can you give me an example? And, if you happen to have it, link to XML
standard that says this is legal?
Received on Tue Apr 17 2012 - 16:23:59 EDT