RPM Community Forums

Mailing List Message of <rpm-devel>

Re: database support in rpm

From: Jeff Johnson <n3npq@mac.com>
Date: Tue 31 Jul 2007 - 15:04:55 CEST
Message-Id: <B90C7B90-B5F0-4527-BDF1-ED68B53783D5@mac.com>

On Jul 31, 2007, at 5:02 AM, Thomas Lotterer wrote:

> On Monday, 30. July 2007 at 9:26 pm, Jeff Johnson <n3npq@mac.com>  
> wrote:
>> On Jul 30, 2007, at 2:58 PM, Thomas Lotterer wrote:
>>> Decision:
>>> - exclusively use Berkeley DB
>>
>> I'd change "exclusively" to "primarily". Otherwise we have an
>> external feature regression.
>>
> That's exactly the intention! My observations are there is not enough
> interest in supporting a different DB. The current incomplete support
> for SQLite sends a false signal to those who are interested in it.
> Everyone should know for sure: BDB only.
>

We should start separating implementation from architecture issues.

For example, the --initdb discussion is confusing "database" with  
"table"
wrto an "init" method. The issue is really whether lazy index creation
should be supported (i.e. "table"), and whether a lazy mkdir on "rpm - 
qa"
should be permitted. ATM, we confuse ourselves by discussing the  
existence
of a --initdb option rather than the underlying issues.

In this thread, using SQL for queries, as wellas having a SQL schema,  
are
the useful engineering goals. Whether that is SQLite or MySqL or Oracle
implementations are very different matters.

>> The engineering issues wrto a SQL db should be addressed. No matter
>> what, a reference SQL schema for rpm package metadata has been needed
>> for years. Candidates are [...] Rolling a SQL schema from scratch is
>> not impossible either.
>>
> Can be done later if "Reasons" are reviewed and found to be obsolete.
>

OK

>> Alternatives to Berkeley DB for the 3-4 usage cases:
>>     1) licensing
>>
> What's the issue here?

rpm using Berkeley DB cannot be used by commercial vendors
who do not have a Berkeley DB license.

The intent in rpm has _ALWAYS_ been to be free for commercial use.

Alas, that also puts rpm development on the LGPL lunatic fringe,  
where even
GPL licensed code cannot be used. I would have used wget code
already if it wasn't for the GPL license. There's a really nice HTML
parser in wget, short and sweet. But there is similar in libxml2 with
LGPL license too.

>
>>     2) embedded and -NPTL locking (although fcntl with BDB likely
>> addresses)
>>
> So BDB is fine.
>

I believe fcntl locking is fine, but have not investigated. We could  
flip to fcntl
locking on HEAD for a month or so to insure there are no surprises. I do
think NPTL locking, when available, is the better implementation going
forward because it unifies thread and process locking.

BTW, the entire locking subsystem is now exposed in Berkeley DB. Adding
methods in the DBI layer to permit applications to acquire Berkeley  
DB locks,
of the nptl/fcntl/internal persuasion, should likely be attempted.

OTOH, if I continue with DBI methods that are exactly the same  
programming
signature as Berkeley DB, then I implicitly add the need for  
functionality to
additional databases under the DBI layer like sqlite (or whatever  
other implementation
one might choose like gdbm et al).

So far I've been quite conservative in using the existing DBI  
methods. See rpmdb/db_emu.h
for the handful of "necessary" stuff that has been swiped from  
Berkeley DB, mostly a DBT
container and a random set of named bits.

Using (*join) is likely the most important win, that was the  
motivation for writing
the toy rpmdb/tjfn.c (jfn -- Join File Name) I pointed you to.


>>     3) NFS support
>>
> The can be rewritten to "networked filesystem support". BDB, SQLite  
> and
> any embedded DB require proper filesystem locking. Given the
> environmental issues of a networked filesystem this will never work
> reliably. Many NFS admins have disabled locking to avoid client  
> hangs on
> server outages and the price is obviously that locking doesn't  
> work. If
> they would use locking then this problem is gone at the price of
> unpredictable client hangs. I spent years of my life trying to improve
> DOS/Windows applications with embedded DB like M$ Access in networked
> environments using CIFS and NCP - no way to ever fix 'em through
> filesystem access. The only real fix is to use applications that are
> designed for client/server environments. If rpm is ever split into a
> client/server application then the server still is likely to use BDB.
>

I don't disagree. However, an answer for client/server rpmdb needs to  
be identified.
There is sunrpc, and there is sqlite, are the current answers.

Note that Berkeley DB has replication. If you're serious about  
package management
on cloned clients with "hardened" and "absoultely reliable" rpmdb,  
then a proof-of-concept
replication implementation should be attempted. Similar things can be  
said about
transactions and logging ...

73 de Jeff
Received on Tue Jul 31 15:05:16 2007
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.