RPM Community Forums

Mailing List Message of <rpm-devel>

Re: rsyncable gzdio

From: Jeff Johnson <n3npq@mac.com>
Date: Wed 09 Jul 2008 - 14:02:41 CEST
Message-id: <40319142-AFE3-465C-8984-550EA3278EA0@mac.com>
Grrr, my 4th attempt at a reply without delivery ...

On Jul 8, 2008, at 6:57 PM, Alexey Tourbin wrote:

> On Mon, Jul 07, 2008 at 11:44:07PM +0200, Jeff Johnson wrote:
>>     - make gzdio.c standalone.
>
> BTW, I have rsyncable gzdio implemntation (this does not require
> patched zlib, one only has to call gzflush() at certain sync points).
>

I'd like to not carry internal zlib.

You are absolutely correct that the --rsyncable issue is calling  
gzflush() at
certain sync points.

> It is known to work well.  Please review the patches and tell me
> whether you want it or not.
> http://git.altlinux.org/people/at/packages/rpm.git? 
> a=commitdiff;h=c761902b
> http://git.altlinux.org/people/at/packages/rpm.git? 
> a=commitdiff;h=f7b5ee1e
> http://git.altlinux.org/people/at/packages/rpm.git? 
> a=commitdiff;h=8d5e355e
> http://git.altlinux.org/people/at/packages/rpm.git? 
> a=commitdiff;h=52b2499a

OTOH, here are other fundamental issues that are relevant:

0) No one understands why --rsyncable is important, or why gzip != zlib,
or why the "fuzzy" name patch in rsync would be a tremendous bandwidth
saving for *.rpm packages. I've been tracking the issue for like 6+  
years,
and what is fundamentally needed is a very clear demonstration,  
including
publicized benchmarks and likely a drop-in "production" ready transport
implementation, for any --rsyncable code to be worth the effort. JMHO
based on 6+ years of explaining ...

So "Using unpatched zlib external!" is the wrong reason to justify  
the changes.
One needs to make the fundamental reasons for the changes more  
obvious to
end lusers.

1) Your changes add 2-3 new wrapper layers (at least cpio/rsync) to
an already complex code base (with stacked I/O handlers accessed  
through libio,
and with emulated libio for non-glibc portability).

I personally dunno whether I can debug rpm problems if your  
additional patches are added.

(OTOH, I have no problems at all with an alternative gzdio  
implementation,
or with #ifdef'd additions, just not the main "production" RPM code  
path please).

2) From a fundamental coding design POV, adding --rsyncable to rpmio  
code is just
plain wrong. That's as true for the patched internal zlib as it is  
for your gzdio
patches. The issue of when gzflush() is called is fundamentally a  
compression
and rsync transport, not a *.rpm payload packaging, issue.

So isn't a pure API/ABI drop-in replacement for gzio(3), with a  
"gzwrite" rather
than a "rsyncable_gzwrite" symbol a better approach? There are many  
applications,
not just rpm, that _MUST_ participate in an efficient *.rpm  
distribution framework,
so patching gzdio in rpmio is just a tiny piece of a much bigger  
puzzle, where "gzwrite"
rather than "rsyncable_gzwrite" is likelier to be successful. JMHO,  
YMMV, as always.

3) As sole  developer/designer of RPMIO, I also believe that my  
choice to use the
gzopen(3) API was a design brain fart. The gzopen(3) API is/was  
appealingly simple
at the time (1999), but increasingly is becoming _THE_ roadblock to  
adding additional
functionality, like XAR and network transport. to RPMIO.

This code (from rpm2cpio.c an insanely simple program) breaks if  
handlers, rather than
plain old integer fdno's, are used with RPMIO:

     gzdi = Fdopen(fdi, rpmio_flags);    /* XXX gzdi == fdi */
     if (gzdi == NULL) {
         fprintf(stderr, _("cannot re-open payload: %s\n"), Fstrerror 
(gzdi));
         exit(EXIT_FAILURE);
     }

Note that neon and XAR handlers are not based on fdno's, rather  
opaque pointers, and so
interoperate badly with gzdio/bzdio/lzdio pointer handlers. There's  
no provision
for say, pushing neon under a compression library, and RPMIO <-> XAR  
have
dueling compression stacks.

Which is why I call the choice of using gzopen(3) a "brain fart", a  
compression layer
doesn't need heavt gzopen(3) I/O methods, only buffer compress/ 
decompress methods.

(aside) See XAR and rsync implementations using gzip/bzip2/lzma to  
see more precisely
what I think RPMIO needs to do instead of using the gzopen(3) API.

In summary, your patches are quite nicely done and extremely clever,  
but I'm not
sure that RPMIO could/should use them, as I've tried to describe above.

I'm most certainly willing to be convinced otherwise however.

73 de Jeff
Received on Fri Jul 11 21:39:30 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.