On May 13, 2008, at 4:42 PM, devzero2000 wrote:
> He, sorry. I have not understood well, my fault.
>
No, I'm likely being way too terse explaining what I'm trying to do.
Apologies.
> What is the ultimate goal, what is the problem you would like to
> solve ?
>
I'm looking for a means to use UUID's to handle a package cache.
Think of a URI path to a *.rpm as being a piece of content. If the path
is known, and the *.rpm can be retrieved from that path, then the
package can be installed.
A UUIDv5 is a way to assign a unique bit string to a URI string.
The simplest representation for repository contents can be
represented as the collection
of URI paths literally, aka a manifest. That filesystem analogue is
an absolute path to a file.
The problem is that a manifest is not the most efficient store. All
paths
must be compared by linear searching.
To have an efficient lookup, the UUIDv5 can be used as a hash key,
with the URI path as the value. So a lookup of a full URI path is
1) generate the UUIDv5 for some URI path you want to lookup
2) if the UUIDv5 exists, you win, otherwise, you lose, when looking
for a package.
That's a win if UUID's can be generated cheaply. Yes I've only
described the reason for hash retrieval so far ...
More generally, a means to split a relative "N-V-R.A.rpm" string from
the
leading part of the URI needs to be devised, so that, say, multiple
repository
URI's can be used as containers for identical packages. The
filesystem analogue
of a repository container is a directory, and the file system analogue
of a UUID is an inode. Now each pair {UUID, "N-V.R.A.rpm"} can be
represented equivalently, even when contained in multiple repositories.
And just like a file system, if strings that, say, don't end in *.rpm
are treated
as URI path components rather than packages, then a full path to a
pkg can
be represented in the same store, with unique UUID's being used as
the analogue
of a filesystem inode (or {dev,inode} pair if you prefer).
So that's the "hierarchical" piece of the puzzle, a repeated lookup
of parent
components in path until the root node is reached, and the path
necessary to retrieve
a *.rpm package is fully reconstructed.
Adding a reversed ("N-V.R.A.rpm", UUID} index captures the 1-to-many
relation
when a package is found in multiple repositores as well.
Before committing to the effort in designing a database schema, I'm
looking
for a cheap implementation using a file system, where the UUID is
used as the
file name, and either (literally) the "N-V.R.A.rpm" or the basename
(3) component
of a path, is saved as the file contents.
There's a problem with throwing thousands of files into one directory
however.
The fix is what I'm calling "terminfo-like", where all terminfo
entries that start
with 'a' are save as a/a*, etc etc to statistically minimize the
number of files
in each sub-directory.
Someone, somewhere, has done the implementation I've just described, its
too obvious and too simple not to have been attempted (my guess).
So I'm looking for pointers to existing implementations in C.
Does that help clarify what I'm looking for?
73 de Jeff
Received on Tue May 13 23:27:13 2008