Re: XML split and transform in Java

From: Chris Fitzpatrick <chrisfitzpat_at_nyob>
Date: Sun, 8 Sep 2013 20:22:24 +0200
To: CODE4LIB_at_LISTSERV.ND.EDU
Hi,

Would something like this work?

https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java



On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <tod_at_uchicago.edu> wrote:

> code4lib,
>
> I'm looking for some advice on splitting and transforming XML data using
> Java. The context is writing a mixin for SolrMARC to enhance our bib data,
> bringing in table of contents and summary data. The data is in XML,
> isomorphic to MARCXML. I need to split it up, transform it, and store it
> for use at import time. I expect the input XML to be up to a few GB, so
> slurping the whole thing into a DOM seems questionable. I've done one
> implementation for a split-only version of the problem, but the transform
> requirement is causing me to re-think.
>
> And maybe someone out there has already done this exact thing.
>
> I think the basic approach is to read a record from start tag to end tag,
> and create a reader/stream/whatever to hand exactly that record to the
> transform API. Lots of options for this: SAX, StAX events, or what have
> you. Any thoughts of what seems the most straightforward for this
> split-and-transform scenario would be welcome.
>
> On a related note, any thoughts on your favorite light-weight key/value
> pair persistent storage for Java would be welcome. I expect the data to be
> a little large for a serialized HashMap.
>
> Best,
>
> -Tod
>
>
> Tod Olson <tod_at_uchicago.edu>
> Systems Librarian
> University of Chicago Library
>
Received on Sun Sep 08 2013 - 14:22:45 EDT