RPM Community Forums

Mailing List Message of <rpm-devel>

Re: RPM: rpm/ CHANGES rpm/build/ files.c

From: Jeff Johnson <n3npq@mac.com>
Date: Tue 17 Jun 2008 - 15:01:09 CEST
Message-id: <CCF53645-958A-4198-9C43-0E483DF6C5AB@mac.com>

On Jun 16, 2008, at 7:50 PM, Alexey Tourbin wrote:

> On Mon, Jun 16, 2008 at 10:21:56AM -0400, Jeff Johnson wrote:
>> Hmm, at some point, I start to question whether permitting duplicates
>> like
>>    /foo/1
>>    /foo/1
>> in %files, or worrying about %ghost (and %attr and %verify and %
>> exclude and ...,
>> there's *at least* one other place that the test for %ghost/%exclude
>> is needed)
>> scoping over pathologies as in your example above, is worth the  
>> effort.
>
> It is worth to have a correct algorithm for RPMTAG_SIZE counter.

I'd say slightly differently but I absolutely agree that "correct" as in
objectively verifiable, is needed for RPMTAG_SIZE (and every other
metadata item).

> The algorithm is correct iff cpio file data bytes match RPMTAG_SIZE
> value (this is why we move the code to genCpioListAndHeader()).
>
> rpm2cpio foo.rpm |catenate_cpio_file_data |wc -c
> rpmquery --qf '%{SIZE}\n' -p foo.rpm
>
> My point is that as far as we can build valid cpio archive, we can
> also mimic some cpio logic a bit and get valid RPMTAG_SIZE value.
> It has nothing to do with specfile pathologies (or we just can't
> build cpio archive otherwise).
>

Hmmm, rpm permits tar payloads, and is likely to pick up other formats,
ISO-9660 would be very interesting, from attempting to use libarchive
for payload handling.

So tying RPMTAG_SIZE too closely to cpio format is likely not best.  
There
are also padding issues (cpio pads to 4b boundary) and variable
length strings for paths and symlink end-points are additional payload
overhead with cpio that is not reflected in RPMTAG_SIZE.

(aside) But RPMTAG_CPIOSIZE, with analogues RPMTAG_TARSIZE etc,
that _ARE_ specifically tied to archive payloads, would be quite useful
if done as header extensions. I'd tie an error message or even an  
assertion
to a comparison between the actual and computed size in an a heartbeat
if reliable computed values existed.

There are other aliasing issues encountered with archive formats, its  
quite
possible (and quite insane) to have identical paths with different  
content
in an archive format.

So I'd say that RPMTAG_SIZE is more usefully compared to the file
system than the payload. But files systems have block (and fragment)
padding, and have soft/hard link aliasing issues. The killer issue for
tying RPMTAG_SIZE more closely to a file system reference comparison
is that the target file system attributes cannot be known a priori  
during build.

But that issue might be finessed by assuming a model file system like,
say, ext2 attributes, with 4K blocks, 1K fragments.

Then, in order to simplify writing the find/ls script that objectively
verifies that RPMTAG_SIZE has the correct value, one might
(I'm not suggesting, just pointing out the simplification) even
choose to double count hard linked file sizes. Its really
annoying, sufficiently annoying that few attempt, to identify
hard links by {dev, inode} pairs or link count to verify a
sum of file sizes removing duplicates.

>> With %ghost, the issue runs to a fundamental spec file design flaw,
>> there
>> is plain and simply no way _BY DEFINITION_ to know the file type
>> associated
>> with the path that has a %ghost attribute.
>>
>> The RPMTAG_FILEMODES associated with %ghost files has ugo rwx, but
>> not the file type.
>
> I can't quite understand what you mean.  It looks like you're trying
> to say that, in genCpioListAndHeader(), if (flp->flags & RPMFILE_HOST)
> holds, then we cannot reliably check for S_ISREG(flp->fl_mode).  I  
> can't
> see why yet.
>

You likely won't find the issue in RPM build code.

I know that the 4 bits that identify file type contained in  
RPMTAG_FILEMODES for
%ghost files cannot be used reliably. I tried to use the bits when  
attaching
SELinux file contexts to %ghost paths, did not work correctly or  
reliably.

There's also a case of %ghost in a RHL package that pointed to a path  
that
was sometimes a directory and sometimes a file (I can likely dig out the
actual usage case if necessary, a notting package iirc) that RPM  
needed to handle.

Which tells me that the original (and very broken imho) intent for % 
ghost in
RPM is/was flawed.

>> Why shouldn't all of the above be treated as syntax errors instead
>> of quietly assuming that indeed, there is some real world need to
>> have sloppy goosey-loosey spec file syntax permitting duplicates (and
>> file marker attribute scoping across hard links, or primitive
>> filtering directives like %exclude) anyways.
>
> rpm seems to permit file dups by design, and, while issuing a warning,
> it also has some code to fold dups and merge their flags correctly.
>

Heh. I added the code that permits file dupes. The reason for doing
so was so that I did not have to assist every n00b trying to use
RPM with identifying a duplicate path. It was cheaper/easier
for me to add a warning message for the n00b to ignore and
complete the build producing *.rpm packages than it was
(the original/previous behavior in rpm) to identify the flaw, stop
the build withe a error message, and force the n00b to get
smarter. Guess who got to teach a whole lotta n00b's how to package?

> There is a good reason -- glob(3) is not quite flexible at times.
> Sometimes you do:
>
> %files
> /foo/prefix.*
> /foo/*.suffix
>
> Overlaps are okay here.
>

The output of glob(3) is deterministic. It is trivial to detect  
duplicates and
stop the build when they are encounetred. In fact that is/was RPM used
to do. Always. Period. Q.E.D.

And preventing package errors is arguably correct pedantic behavior
when attempting "reproducible builds" for binary packages that are
shipped with a vendor distro.

All depends on POV. See what is in RPM (and reread why) if you wish
to know my POV.

>> But as always, since there's no grammar for spec files, anything  
>> goes.
>> Even with a grammar, the issues of scoping through implicit hard link
>> aliasing,
>> are semantic, not syntactical (but duplicates are syntax).
>>

These were all trick questions ...

>> Guess what happens in your example when the install runs with
>>     --excludepath=/foo/1
>>

A: What likely (unchecked) happens is that the path /foo/1 is not  
created.

>> What link count should be checked with --verify, particularly if
>> another package also contains
>>     %ghost /foo/2
>> and /foo/1 explicitly had --excludepath when installed?
>>

A: --verify does not check file link count. rpm -qlv displays a computed
link count that is gud enuf in most cases, but has no well defined  
behavior.
Directory link counts have their own speshul pain.

>> And to return to the original RPMTAG_SIZE issue, what value should
>>     --qf '%{size}
>> report for a given package with excluded hard linked paths that span
>> multiple
>> packages as above?
>

A: The point is that %{size} could have a dynamic value computed of
the actual disk space used by an installed package. IMHO, that would
be closest to what end users would find useful.

> I think that %{SIZE} should report cpio file data bytes (i.e. cpio
> archive size excluding 110 bytes per cpio entry, filenames, and
> alignment bytes).

Good, you know the cpio issues I pointed out.

Still there's non-cpio payloads like tar, which is already permitted  
by rpm.

Note RPMTAG_CPIOSIZE and other suggestions above.

Anyways, your changes should be checked in everywhere already. The
above is just discussion fodder.

73 de Jeff
Received on Tue Jun 17 15:02:43 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.