RPM Community Forums

Mailing List Message of <rpm-devel>

Re: Anyone know of a tasteful LGPL HTML parser in C?

From: Jeff Johnson <n3npq@mac.com>
Date: Thu 14 Feb 2008 - 04:03:02 CET
Message-Id: <222D9270-BDB2-45E3-8512-149893852927@mac.com>

On Feb 13, 2008, at 9:32 PM, Jeff Johnson wrote:
>
> Dunno. More actual experience is needed, I'l hack up some scriptie  
> tomorrow.
>
> (off the wall aside) I never would have dreamed that I would ever find
> colorized grep output useful. Adding --color to display the value that
> is matched by the pattern is so so so much less eye bleed.
>

Bingo!

Snooping at http://download.fedora.redhat.com/ traversals using rpmgrep,
I can see that "containers" (as in subdirectories that need traverse)  
have HTML
href's that look like

     ...<a href="Packages/">...

while "elements" (as in *.rpm files in a directory) have HTML href's  
that look like

     ...<a href="zlib-1.2.3-17.fc9.i386.rpm">...

while random crufty pointless (i.e. don't bother traversing ala Lstat 
(2)) URI's look like

     ...<a href="http://fedora.redhat.com/legal/">...

That heuristic alone is enough for me to get a proof-of-concept plain  
HTTP traversal
together using Opendir(3) and Glob(3).

I'll worry about other goopiness and performance tuning using HEAD  
later.

(aside) Amazing how useful --color is. I'm shocked, simply shocked,  
that I
have any interest whatsoever in colorized output spew.

73 de Jeff
Received on Thu Feb 14 04:03:15 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.