RPM Community Forums

Mailing List Message of <rpm-devel>

Re: Hierarchical data cache using UUID's

From: Jeff Johnson <n3npq@mac.com>
Date: Tue 13 May 2008 - 23:25:36 CEST
Message-Id: <18FEA1D1-4284-4B41-8849-7D90D23A166D@mac.com>

On May 13, 2008, at 4:42 PM, devzero2000 wrote:

> He, sorry. I have not understood well, my fault.
>

No, I'm likely being way too terse explaining what I'm trying to do.  
Apologies.

> What is the ultimate goal, what is the problem you would like to  
> solve ?
>

I'm looking for a means to use UUID's to handle a package cache.

Think of a URI path to a *.rpm as being a piece of content. If the path
is known, and the *.rpm can be retrieved from that path, then the
package can be installed.

A UUIDv5 is a way to assign a unique bit string to a URI string.

The simplest representation for repository contents can be  
represented as the collection
of URI paths literally, aka a manifest. That filesystem analogue is  
an absolute path to a file.

The problem is that a manifest is not the most efficient store. All  
paths
must be compared by linear searching.

To have an efficient lookup, the UUIDv5 can be used as a hash key,
with the URI path as the value. So a lookup of a full URI path is
     1) generate the UUIDv5 for some URI path you want to lookup
     2) if the UUIDv5 exists, you win, otherwise, you lose, when looking
     for a package.

That's a win if UUID's can be generated cheaply. Yes I've only
described the reason for hash retrieval so far ...

More generally, a means to split a relative "N-V-R.A.rpm" string from  
the
leading part of the URI needs to be devised, so that, say, multiple  
repository
URI's can be used as containers for identical packages. The  
filesystem analogue
of a repository container is a directory, and the file system analogue
of a UUID is an inode. Now each pair {UUID, "N-V.R.A.rpm"} can be
represented equivalently, even when contained in multiple repositories.

And just like a file system, if strings that, say, don't end in *.rpm  
are treated
as URI path components rather than packages, then a full path to a  
pkg can
be represented in the same store, with unique UUID's being used as  
the analogue
of a filesystem inode (or {dev,inode} pair if you prefer).

So that's the "hierarchical" piece of the puzzle, a repeated lookup  
of parent
components in path until the root node is reached, and the path  
necessary to retrieve
a *.rpm package is fully reconstructed.

Adding a reversed ("N-V.R.A.rpm", UUID} index captures the 1-to-many  
relation
when a package is found in multiple repositores as well.

Before committing to the effort in designing a database schema, I'm  
looking
for a cheap implementation using a file system, where the UUID is  
used as the
file name, and either (literally) the "N-V.R.A.rpm" or the basename 
(3) component
of a path, is saved as the file contents.

There's a problem with throwing thousands of files into one directory  
however.
The fix is what I'm calling "terminfo-like", where all terminfo  
entries that start
with 'a' are save as a/a*, etc etc to statistically minimize the  
number of files
in each sub-directory.

Someone, somewhere, has done the implementation I've just described, its
too obvious and too simple not to have been attempted (my guess).

So I'm looking for pointers to existing implementations in C.

Does that help clarify what I'm looking for?

73 de Jeff
Received on Tue May 13 23:27:13 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.