On Wed, Jul 09, 2008 at 05:35:46AM -0700, Jeff Johnson wrote:
>0) No one understands why --rsyncable is important, or why gzip != zlib,
>or why the "fuzzy" name patch in rsync would be a tremendous bandwidth
>saving for *.rpm packages. I've been tracking the issue for like 6+
>years,
>and what is fundamentally needed is a very clear demonstration,
>including
>publicized benchmarks and likely a drop-in "production" ready transport
>implementation, for any --rsyncable code to be worth the effort. JMHO
>based on 6+ years of explaining ...
I have done some comprehensive testing of rsyncable gzdio with respect
to rpm packaging. I posted them (in Russian) to our ALT Linux Team
development list. The upshot is that 1) it is known to work well,
which is at least no segfaults or corrupted data; 2) it does not
degrade compression rate, due to cpio hints (avg 0.09% compared
to avg 1% for patched zlib); 3) rsyncability effect can be worthwhile,
which is about 1/3 bandwidth saving on real data transfer.
Some details. I've tested rsyncability of our two directories:
/ALT/archive/Sisyphus/2008/03/01/files/x86_64/RPMS
/ALT/archive/Sisyphus/2008/04/01/files/x86_64/RPMS
(they are just what you may think.)
1) From these two directories, I select package tuples which have the
same %{NAME} (but file names %name-%version-%release.x86_64.rpm differ).
This means I test whether rsyncability is worthwhile for the packages that's
been updated within one month. This includes %version upgrades as well is
minor %release updates (something like representative data).
2) For each package in a tuple, I repackage its cpio archive
with rsyncable gzdio.
3) Small packages are excluded: repackaged cpio must be at least
32K each.
4) rsync is run (with a small trick) to diagnose if there is any
speedup.
The resulting table is
rpm-1 size-1 rpm-2 size-2 rsync-sent rsync-recv speedup
----- ------ ----- ------ ---------- ---------- -------
The table (the attachment) and some more details are available here:
http://lists.altlinux.org/pipermail/devel/2008-May/074937.html
$ wc -l <rsyncability.txt
1360
$
-- Total 1360 packages updated from 1 Mar to 1 Apr.
$ awk '$NF>2' rsyncability.txt |wc -l
211
$
-- 211 packages have high rsyncability rate (one has to download
less than 1/2 of new package size).
$ sum() { perl -MList::Util=sum -ln0 -e 'print sum split'; }
$ cut -f4 rsyncability.txt |sum
2433627
$
-- New packages are 2.32G total.
$ cut -f5 rsyncability.txt |sum
14017
$
-- rsync downloaded 1.57G.
(End of details.)
This means that rsyncable gzdio *can* be worthwhile -- one can expect
to save about 1/3 of bandwidth. However, this also has some
requirements: 1) you must have older rpms (or you are going to save
nothing anyway); 2) you must synchronize two directories, and you must
use 'rsync --fuzzy', to catch up file renames; 3) both old and new files
must be compressed with rsyncable gzdio.
I hope this gives some idea of what rsyncable gzdio can do.
- application/pgp-signature attachment: stored
Received on Thu Jul 10 06:36:27 2008