RPM Community Forums

Mailing List Message of <rpm-devel>

Re: Hierarchical data cache using UUID's

From: devzero2000 <pinto.elia@gmail.com>
Date: Wed 14 May 2008 - 16:53:07 CEST
Message-ID: <b086760e0805140753i270e919k5a5c6aa96b8a8eb8@mail.gmail.com>
In can't speak in place of Jeff.

But post this anyway, just in order to  synthetize the argument and to see
if finally I have understood.

So the plan is to equip  rpm of a deeps resolver built-in - based on
sat-solver i think - and therefore to equip itself of a efficent structure
for storing  a package cache efficently - based on UUID.

If so, it would be the case to describe it in the roadmap.

In my opinion it would be the greatest evolution dell' RPM during the last
few years.

Certainly a plan much ambitious, imho.




On Wed, May 14, 2008 at 2:13 PM, Ralf S. Engelschall
<rse+rpm-devel@rpm5.org<rse%2Brpm-devel@rpm5.org>>
wrote:

> On Wed, May 14, 2008, Jeff Johnson wrote:
>
> > 1) I had to make certain choices (like for the UUIDv1 clock bits)
> > in order to retrofit what (imho) is a "universal" (and useful) time
> > stamp. What should be put in the clock field of a retrofitted UUIDv1
> > time stamp so that it can be usefully shared?
>
> According to the forthcoming UUID RFC (still not released):
>
> | 4.1.5.  Clock Sequence
> |
> |    For UUID version 1, the clock sequence is used to help avoid
> |    duplicates that could arise when the clock is set backwards in time
> |    or if the node ID changes.
> |
> |    If the clock is set backwards, or might have been set backwards
> |    (e.g., while the system was powered off), and the UUID generator can
> |    not be sure that no UUIDs were generated with timestamps larger than
> |    the value to which the clock was set, then the clock sequence has to
> |    be changed.  If the previous value of the clock sequence is known, it
> |    can just be incremented; otherwise it should be set to a random or
> |    high-quality pseudo-random value.
> |
> |    Similarly, if the node ID changes (e.g., because a network card has
> |    been moved between machines), setting the clock sequence to a random
> |    number minimizes the probability of a duplicate due to slight
> |    differences in the clock settings of the machines.  If the value of
> |    clock sequence associated with the changed node ID were known, then
> |    the clock sequence could just be incremented, but that is unlikely.
> |
> |    The clock sequence MUST be originally (i.e., once in the lifetime of
> |    a system) initialized to a random number to minimize the correlation
> |    across systems.  This provides maximum protection against node
> |    identifiers that may move or switch from system to system rapidly.
> |    The initial value MUST NOT be correlated to the node identifier.
> |
> |    For UUID version 3 or 5, the clock sequence is a 14-bit value
> |    constructed from a name as described in Section 4.3.
> |
> |    For UUID version 4, clock sequence is a randomly or pseudo-randomly
> |    generated 14-bit value as described in Section 4.4.
>
> So, optimally a steadily increasing number unique to RPM should be used,
> but alternatively you could use a random number, too. Just avoid to use
> a fixed number like zero or something like this. That's not appropriate.
>
> > 2) Is there any guidance on how to substitute MD5/SHA1 with some
> > other hash/digest? Both UUIDv3/UUIDv5 substitute for the 60bit time
> > stamp in a UUIDv1 iirc. Can other 60 bit strings be used similarly?
> > I can see uses for a crc64 that can be linearly combined with other
> > UUID/crc64's, and yet still should be sufficiently collision free to
> > preserve uniqueness.
>
> No, not just the 60 bit of the time is used! MD5 hashes are 128 bit
> in size and they are all stored into the UUID except for the few bits
> of the UUID branding. SHA-1 hases are 160 bit in size and are clipped
> (first bytes) to the first 128 bit for storing into a UUID.
>
> Only MD5 is defined for UUIDv3 and SHA-1 for UUIDv5, but the same
> approach (calculate the digest and clip it to the first 128 bits, store
> it into the UUID buffer and brand it as UUID of a certain version by
> changing some of the middle bits) can be used with any digest algorithm.
> Those UUIDs are just not DCE 1.1 (variant) UUIDv[1345] (version),
> but usually should be branded to be of a custom UUID _variant_ and
> _version_. Microsoft GUIDs are such things: they use a non-DCE 1.1
> variant AFAIK. You can see this when you decode those GUIDs with "uuid
> -d".
>
> > 3) Is there some conventional (or de facto dominant) store using UUIDs
> > that is commonly used on web sites? Or is that usually just handled
> > with a db schema, which is certainly pretty easy to do. Perhap crc60
> > instead if truncated crc64 cannot be linearly combined (but the 4 bits
> > can perhaps be stashed somewhere).
>
> RDBMS usually either store UUIDs in a CHAR(36) [= 128/4+4 chars] or
> in a custom UUID type which is 128 bits in size (as PostgreSQL 8.3
> does not). In general there are at least 3 usual representations of a
> UUID which can be all used for storing a UUID: textual representation
> (the hex-encoding plus dashes), number representation (the 128bit
> treated as one large unsigned integer and output in decimal system) and
> binary representation (the 128 bits in canonical byte ordering). Most
> efficient encoding is the binary encoding, the most readable the textual
> representation.
>
> > 4) The design of the URI hierarchical name space (for a package cache,
> > or more generally, for *.rpm metadata) likely needs more thought than
> > the backing store, or using UUID's as primary keys. Any thoughts?
>
> Here I've to admit that I still cannot say anything as I've not closely
> followed all your mails related to this. If you can summarize in more
> detail what exactly you want to implement and how you at least plan to
> implement it I can give you some feedback on this, of course.
>
>                                       Ralf S. Engelschall
>                                       rse@engelschall.com
>                                       www.engelschall.com
>
> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> Developer Communication List                        rpm-devel@rpm5.org
>
Received on Wed May 14 16:53:10 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.