RPM Community Forums

Mailing List Message of <rpm-devel>

strange RPMDB problem: messed up entries (regularily)

From: Ralf S. Engelschall <rse+rpm-devel@rpm5.org>
Date: Mon 18 Aug 2008 - 13:37:14 CEST
Message-ID: <20080818113714.GA10442@engelschall.com>
Now that it happended to me multiple times and subsequently, I think it
might be more a real bug than just an unusual accident. The symptom is
the following:

I've dozen of OpenPKG/RPM5 (RPM 5.1.4) based software stacks on
production machines. They all work just fine and I'm very happy about
RPM 5.1.4 as it works perfectly with all over 1200 OpenPKG RPM packages
during intallation, upgrading, etc. But after some time (between a few
days and 1-3 weeks as it looks) the following can be seen in multiple
OpenPKG instances (not all which exist on the same machine, but usually
at least on more than one on the same machine in parallel):

| # /usr/opkg/bin/openpkg rpm -qa
| make-3.81-20080101
| m4-1.4.11-20080403
| binutils-2.18-20080101
| grep-2.5.3-20080101
| autoconf-2.62-20080409
| automake-1.10-20080101
| libiconv-1.12-20080101
| gettext-0.17-20080101
| less-418-20080103
| perl-openpkg-5.10.0-20080409
| procmail-3.22-20080101
| pkgconfig-0.23-20080117
| flex-2.5.35-20080227
| flowtools-0.68-20080101
| perl-stats-5.10.0-20080101
| libart-2.3.20-20080130
| fsl-1.7.0-20080101
| perl-term-5.10.0-20080208
| bzip2-1.0.5-20080318
| zlib-1.2.3-20080101
| diffutils-2.8.7-20080101
| w3m-0.5.2-20080101
| screen-4.0.3-20080101
| sed-4.1.5-20080101
| flow2rrd-0.9.1-20080101
| cfg-0.9.11-20080101
| gc-6.8-20080101
| expat-2.0.1-20080101
| gzip-1.3.12-20080101
| pcre-7.7-20080508
| lzo-2.03-20080430
| bash-3.2.39-20080502
| texinfo-4.12-20080421
| readline-5.2.12-20080502
| png-1.2.29-20080508
| lsof-4.80-20080518
| gcc-4.2.4-20080521
| openssl-0.9.8h-20080528
| rrdtool-1.2.27-20080521
| lftp-3.7.3-20080524
| error: rpmdb: skipping h#      42 blob size(4140): BAD, 8 + 16 * il(959996723) + dl(825374516)
| error: db4 error(-30986) from dbcursor->get: DB_PAGE_NOTFOUND: Requested page not found

Notice the error at the end. I can easily recover from this problem
by just running "openpkg rpm --rebuilddb". This then once again shows
the same error but at least results in no more errors during the next
"openpkg rpm -qa". BUT! The problem then is still not gone: "openpkg rpm
-qi openpkg" then shows:

| # /usr/opkg/bin/openpkg rpm -qi openpkg
| package openpkg is not installed

Woohooo... "openpkg" is our RPM "root" package and usually never can be
missing (or the instance would be gone as it removes all artefacts of an
OpenPKG instance with it during package erasure). It is really gone from
the database only. Ok, I can also recover from this problem by running
"openpkg rpm -Uvh --justdb openpkg*.rpm". Then everything is just fine
again.

But this now occurred about the 5th time now for me and it is always
exactly the same symptom. I do not understand why entries (here the
"openpkg" packae) get garbled in the RPMDB from time to time -- even
without any administrative RPM actions in between (no write to the RPMDB
from commands like "openpkg rpm -i", etc). There are just "openpkg rpm
-q openpkg" commands running regularily via crond(8) on all OpenPKG
instances. How can those mess up the database? Hmmm... any clues? Can I
do anything reasonably here next time this occurs? Perhaps preserve and
share the garbled RPMDB for manual inspection?

                                       Ralf S. Engelschall
                                       rse@engelschall.com
                                       www.engelschall.com
Received on Mon Aug 18 13:37:20 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.