Fwd: [camms-ccaam] Common encoding errors

From: Andrew Cunningham <lang.support_at_nyob>
Date: Tue, 23 Feb 2016 15:45:48 +1100
To: CODE4LIB_at_LISTSERV.ND.EDU
On behalf of Charles Riley:

---------- Forwarded message ----------
From: Riley, Charles <charles.riley_at_yale.edu>
Date: 23 February 2016 at 05:37
Subject: [camms-ccaam] Common encoding errors
To: "voyager-l_at_listserv.nd.edu" <voyager-l_at_listserv.nd.edu>, "
lita-l_at_lists.ala.org" <lita-l_at_lists.ala.org>, "camms-ccaam_at_lists.ala.org" <
camms-ccaam_at_lists.ala.org>, "ol-tech-bounces_at_archive.org" <
ol-tech-bounces_at_archive.org>, "ole.technical.usergroup_at_kuali.org" <
ole.technical.usergroup_at_kuali.org>, "autocat_at_listserv.syr.edu" <
autocat_at_listserv.syr.edu>


Hi all,



This is something I’ve noticed happening with somewhat regular, and
probably increasing occurrence lately:  a class of problems with records
containing either escaped entity references from HTML or XML (like
‘&nbsp;’), or accented characters that have become corrupted in a data
migration (like ‘français
<https://openlibrary.org/works/OL10004281W/Les_archets_fran%c3%83%c2%a7ais>‘).  I was
asked by another librarian if I could point them to any resources that deal
with this class of issues, and rounded up a few that I thought would be
good to share.  Here’s what I came across, in terms of examples and
explanations for some of the more common cases:



http://markmcb.com/2011/11/07/replacing-ae%E2%80%9C-ae%E2%84%A2-aeoe-etc-with-utf-8-characters-in-ruby-on-rails/



https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
(But treat this list with caution in using it to search; there will be
false positives for a search for ‘amp;’, for example.)



http://www.i18nqa.com/debug/utf8-debug.html (See also associated links on
this page.)



Hope this helps!



Charles Riley



*Charles Riley*

*Interim Librarian for African Studies and Catalog Librarian*

*Sterling Memorial Library*

*Yale University*



*charles.riley_at_yale.edu <charles.riley_at_yale.edu>*

*(203)432-7566 <%28203%29432-7566> or (203)432-9301 <%28203%29432-9301>*







-- 
Andrew Cunningham
lang.support_at_gmail.com
Received on Mon Feb 22 2016 - 23:46:53 EST