RPM Community Forums

Mailing List Message of <rpm-devel>

Re: Hierarchical data cache using UUID's

From: Ralf S. Engelschall <rse+rpm-devel@rpm5.org>
Date: Wed 14 May 2008 - 10:28:00 CEST
Message-ID: <20080514082800.GA23888@engelschall.com>
On Tue, May 13, 2008, Jeff Johnson wrote:

> On May 13, 2008, at 7:08 PM, Dmitry V. Levin wrote:
>
>> On Tue, May 13, 2008 at 05:25:36PM -0400, Jeff Johnson wrote:
>> [...]
>>> There's a problem with throwing thousands of files into one directory
>>> however.
>>> The fix is what I'm calling "terminfo-like", where all terminfo
>>> entries that start
>>> with 'a' are save as a/a*, etc etc to statistically minimize the
>>> number of files
>>> in each sub-directory.
>>>
>>> Someone, somewhere, has done the implementation I've just described, its
>>> too obvious and too simple not to have been attempted (my guess).
>>>
>>> So I'm looking for pointers to existing implementations in C.
>>
>> Something similar, implemented in C:
>> Git stores objects in filesystem using SHA-1 as a hash.
>> It uses simple dir/file scheme:
>> hex presentation of the first byte forms directory name, and hex
>> presentation of remained 19 bytes forms file name within this
>> directory, e.g. de/adbeefdeadc0decafef00dbadc0deddeadbeef.
>
> Poifect, exactly what I was looking for. Thanks.

But one has to carefully select _which_ parts of the hash to use for
the directory structure to end up with a reasonable distribution of
files over directories. The hex-encoding refelcts part of the internal
structure of a UUID (and each UUID version has a different structure)
and hence one has to make sure not to pick up the wrong bytes. For
instance if one would pick up the last bytes of a UUIDv1 one would end
up with all files in the same directory as UUIDv1 usually (if not made
out of random multicast addresses) have the hosts MAC encoded there
(which does not change!). For UUIDv3, UUIDv4 and UUIDv5 is it less a
problem as long as one doesn't pick out form near the middle (as there
are the version encoded, etc).

                                       Ralf S. Engelschall
                                       rse@engelschall.com
                                       www.engelschall.com
Received on Wed May 14 10:33:17 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.