RPM Community Forums

Mailing List Message of <rpm-devel>

rpmdb flaws in need of fixing (was Re: rpmdbNextIterator infinite loop)

From: Jeff Johnson <n3npq@mac.com>
Date: Sun 24 Aug 2008 - 18:48:01 CEST
Message-id: <42F3E7CB-548D-4C6F-AF5F-745EF22179AE@mac.com>

On Aug 24, 2008, at 12:15 PM, Jeff Johnson wrote:

>
> Another fundamental design flaw is that rpmdb is doing joins (actually
> secondary lookup's) without assistance from the underlying db  
> implementation.
>
>

While I'm on the subject ...

One of the consequences of doing manual "joins" between primary
lookups and 2ndary retrieval is that there is a RMW cycle introduced
that is likely the major performance hit doing rpmdbAdd().

(aside) rpmdbAdd i mebbe 10-15% of total install time at worst,
so divide any measured savings by (at least) 10. Note also that
the real flaw in rpm is that redundant data is repeatedly looked up.
Removing the redundant lookups would have a far larger
saving than anything done to rpmdb/rpmdb.c, caveat emptor.

That's likely an obscure statement, so let me illustrate with an
example.

Consider how
     Requires: libc.so.6
is added to /var/lib/rpm/Requirename.

     The string "libc.so.6" is used as a primary retrieval key.
     The set of header instances is retrieved.
     A new header instance integer is added to the array.
     The array of header instances is qsort'ed.
     The array is rewritten to disk (with O_SYNC or equivalent).

Lather rinse repeat. That's what I'm calling the "RMW cycle".

(aside) mergesort, rather than qucksort, should be attempted,
resorting an already sorted array is "worst case" quicksort
performance. I've looked, but just never gotten there, qsort()
in glibc already has tweaks to avoid the lose.

Permitting duplicate primary keys for "libc.so.6" breaks the
RMW cycle. E.g. the flow I outlined previously becomes
     dbiPut is called with "libc.so.6" as primary key and header  
instance value.

However, all the retrieves then need to handle possibly duplicate  
primary keys.
The rpmdb/rpmdb.c code is maybe 60-70% complete to handle duplicate
primary keys in indices, thereby breaking the "RMW cycle" in rpmdbAdd().

If/when completed, all the silly and stoopid goopiness to avoid the  
consequences
of O_SYNC used to ensure that the data in an rpmdb ends up on disk  
likely
becomes superfluous.

I hope the above makes sense. If not, holler and I'll try to explain  
better.

73 de Jeff
Received on Sun Aug 24 18:48:57 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.