RPM Community Forums

Mailing List Message of <rpm-devel>

Re: Size limit on tag values?

From: Jeff Johnson <n3npq@mac.com>
Date: Tue 25 Dec 2007 - 16:59:22 CET
Message-Id: <4CC8E406-2004-4AEC-8AEF-FC1A29A61818@mac.com>
Nicely done.

Just to check that I'm reading the code correctly:

You're serializing all pkg NVR on the build machine into
an arbitrary Environment: tag using {} as a separator

    Environment: N-V-R{}N-V-R{}....{}

Couple of random comments below.

On Dec 25, 2007, at 5:52 AM, Ralf S. Engelschall wrote:

> On Fri, Dec 14, 2007, Thomas Lotterer wrote:
>
>>>>> On Thursday, 13. December 2007 at 8:34 pm, "Ralf S. Engelschall"  
>>>>> wrote:
>>> I'm this evening trying to implement for OpenPKG one of Thomas
>>> Lotterer's long awaited features related to security engineering: to
>>> *recursively* attach to an RPM package the "list of all packages  
>>> which
>>> were installed at the built time of the package". [...]
>>>
>> Oh yeah. I want to turn back time. This feature is amazing.
>>
>> BTW, the OpenPKG world absolutely requires the "package options" to  
>> be
>> remembered.
>>
>> I examined this feature, dumping all data using "rpm -qa --xml". A
>> typical software stack with ~65 packages adds a 10MB header to every
>> package and the DB. The XML bloat can be somewhat defeated with
>> compression, bzip2 shrinks it down to 10%, lzma to 5%. Still that  
>> means
>> 500K to 1MB size increase for every package.
>>
>> Another issue is the infinite ancestry tracking. Without  
>> countermeasure,
>> repetitive build/installs or natural updates keep the whole  
>> genealogical
>> tree including all ancestors. Some mechanism must be put into force  
>> to
>> prune ancient data. Maybe keep less details for every generation of  
>> the
>> same package or just cut off after n-th generation of the package.
>
> Ok, I've now finished a possible complete implementation of this
> "recursive package environment tracking". It is now implemented with  
> the
> help of RPM Lua and I've decided to use the Lua table constructor as
> the serialization format as this allows us to easily parse the  
> existing
> environments for subsequent pruning.
>
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
>
> %description\
> %{?__hook_description_1}\
> %{?__hook_description_2}\
> %{?__hook_description_3}\
> %%description
>

This assumes that %description is the 1st section marker that causes
the parser to leave preamble parsing. Perhaps a better longer term
implementation would be to create syntax and macro primitive that  
forces information into
the preamble from deeper within spec file section parsing. E.g. mumble  
mumble
     %{preamble:Environment: %(...)}
which could then be expanded anywhere without the clunkiness of
overloading %description.

> [...]
>
> %__environment_delete_nve_regex    ^gpg-pubkey-[^-]+-[^-]+$
> %__environment_prune_nve_regex     ^%{name}-[^-]+-[^-]+$
> %__environment_prune_depth_number  3
> %__environment_debug               yes
>
> [...]
>
> %__hook_description_3 %{lua: \
>    --  determine current environment \
>    local keyvalue = rpm.expand( \
>        "%(%{l_rpm} -qa --qf " .. \
>        "'\\\\[\\"%%{name}-%%{version}-%%{release}\\"\\\\] =" .. \
>        " %%|environment?{%%{environment}}:{\\\\{\\\\}}|,')" \
>    ) \
>    loadstring("environment = { " .. keyvalue .. "}")() \
>    local debug = rpm.expand("%{?__environment_debug}%{!? 
> __environment_debug:no}") \
>    if debug == "yes" then \
>        io.stdout:write("Environment(original): " ..  
> util.dump_object(environment, false) .. "\\n") \
>    end
>    \

The lua binding design is slightly inverted here. You are invoking rpm  
through %(...)
macros to do headerSprintf() call through rpm --query mostly because lua
is at the rpmio layer, while headerSprintf() is one layer above at the  
rpmdb
layer.

Perhaps a lua method through callback for headerSprintf() call  
directly would be cleaner
and more useful rpmlua implementation.

>
>    --  prune environment according to configuration \
>    function prune_environment (environment, depth) \
>        local delete_nve_regex   = tostring(rpm.expand("%{? 
> __environment_delete_nve_regex}")) \
>        local prune_nve_regex    = tostring(rpm.expand("%{? 
> __environment_prune_nve_regex}")) \
>        local prune_depth_number = tonumber(rpm.expand("%{? 
> __environment_prune_depth_number}")) \
>        if environment ~= nil then \
>            for nve, _ in pairs(environment) do \
>                if (prune_depth_number ~= nil and depth >  
> prune_depth_number) or \
>                   (delete_nve_regex ~= "" and util.rmatch(nve,  
> delete_nve_regex) ~= nil) then \
>                    environment[nve] = nil \
>                elseif prune_nve_regex ~= nil and util.rmatch(nve,  
> prune_nve_regex) ~= nil then \
>                    environment[nve] = {} \
>                else \
>                    prune_environment(environment[nve], depth + 1) \
>                end \
>            end \
>        end \
>    end \

All this filtering of gpg-pubkey everywhere using --query gets  
annoying. Perhaps
generalizing the mire patterns currently implemented only as
     rpm -qa 'name=!gpg-pubkey-*'
for general --query and headerSprintf, not just rpm -qa, filtering is  
finally perceived
as sufficiently useful to attempt.

>
>    prune_environment(environment, 1) \
>    if debug == "yes" then \
>        io.stdout:write("Environment(pruned): " ..  
> util.dump_object(environment, false) .. "\\n") \
>    end \
>    \
>    --  export serialized environment as an RPM tag \
>    if environment ~= nil then \
>        local tag = "Environment: " .. util.dump_object(environment,  
> true) .. "\\n" \
>        print(tag) \
>        if debug == "yes" then \
>            io.stdout:write(tag) \
>        end \
>    end \
> }
>
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
>
> As you can see, one now can completely delete packages (like the  
> public
> keys) from the environment, prune after some packages (like the  
> current
> package) and prune the tree at some recursion depth. With pruning the
> tree is not really fully *complete* any longer, but doesn't explode  
> too
> much and should be still sufficiently complete for practical purposes
> (like library embeddings for security engineering, etc).
>

Nothing wrong with Environment:, but to my taste adding into existing
dependency metadata, rather than inventing Yet Another representation
of metadata linking packages, might be a more "natural" implementation.

But that''s very much a offhand design aesthetic comment, rather than  
criticism
or RFE of what you''ve done.

73 de Jeff
Received on Tue Dec 25 16:59:29 2007
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.