Re: [Spam] Re:

From: Kevin S. Clarke <ksclarke_at_nyob>
Date: Mon, 9 Feb 2004 12:27:31 -0800
On Mon, 2004-02-09 at 11:41, Walter Lewis wrote:

>     One of the issues that I bumped into was that was passes for HTML in
> some email programs is [insert expletive of choice here].  Putting it in
> an XML data store was going to cause a tons of validation errors.

Some success might be found with TagSoup:

It delivers SAX events from less than well-formed HTML.  It doesn't
correct validation or style problems though...  just provides a
consistent, well-formed interface to sloppy HTML.

An alternate approach, JTidy will do a good job of fixing many
validation problems, but it may fail depending on how bad the HTML is

TagSoup doesn't fail... "Just Keep[s] On Truckin'"

Kevin S. Clarke <>
Lane Medical Library, Stanford University
Received on Mon Feb 09 2004 - 15:38:39 EST