RPM Community Forums

Mailing List Message of <rpm-devel>

Anyone know of a tasteful LGPL HTML parser in C?

From: Jeff Johnson <n3npq@mac.com>
Date: Sat 09 Feb 2008 - 15:41:15 CET
Message-Id: <03111912-A3DE-49A1-9F00-8ED9C836D9D8@mac.com>
(aside) I first made this request 4+ years ago:
     https://lists.dulug.duke.edu/pipermail/rpm-devel/2004-November/ 
000139.html

That's how long its taken to restart rpm development, dealing with  
issues
like rpmrc files and NPTL in rpmdb and multilib and selinux and forks  
and ...

Since 2004 I have managed to get back to the point where the href's
contained within a plain (non-DAV) URI need to be iterated for
Opendir/Glob functionality in rpmio.

The best (i.e. most maintainable and least surprising imho) choice  
proposed was -lxml2:

I have a modified testHTML.c from libxml2 and indeed the HTML parser  
in libxml2
can be used.

There's also lhtml, a Lua HTML parser around these days.

If up to me, I'm going to pare down testHTML.c to extract the href's  
within
so that rpmio Opendir/Glob function through plain HTTP.

That does mean that libxml2 is mandatory if you want plain HTTP support.
neon already needs an XML parser, typically expat is used, but one could
in principle choose the already supported libxml2 for neon use.

Any other ideas?

73 de Jeff
Received on Sat Feb 9 15:42:16 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.