As an aggregator of metadata about research datasets/collections we (the
Australian National Data Service) currently treat the content of description
elements as xsd:string but strip out any tagging in the rendering which is thus
unformatted text (except for what can be done with spaces and returns).
Our aim is to be aggregated by larger discovery services and we already
support Opensearch, SRU, RSS and OAI-PMH.
We want to support minimal markup because many of our contributors support
it in their own systems from which they are exporting metadata for us to
harvest. They would like us to preserver at least someformatting e.g. lists,
super/sub scripting for chemical compounds, emphasis etc
Is there consensus on best practice in this area and/or what is common
practice ?
My initial reaction for our own aggregation and portal was to
(a) set a minimal subset of xhtml which we guarantee to pass through to our
portal display (are there any popular ones?)
(b) accept anything but strip out what's not in the minimal subset for display
content
But what do we expose for others to harvest ?
(a) exactly what was provided by the contributor
(b) what was provided but cleaned of possible malevolent tagging
(c) just text with all tagging stripped out
(d) what we ourselves render ie the minimal subset (using the namespace of
the minimal subset)
Monica Omodei (formerly Berko)
Senior Research Analyst
Australian National Data Service
Received on Fri Jun 24 2011 - 00:39:39 EDT