I've just checked in a proof-of-concept implementation to
use Bloom filters to handle arbitrary option string values
The problem that I am trying to solve in popt (and for RPM and
for option processing in general) is to devise
a means to handle arbitrary (as in unknown when
code is compiled) string options and values and
Bloom filters, because they partition the space,
not the data, are the best means to that end that
I'm aware of.
All that the above means is that a Bloom filter
bit set with all 1's: 11111111111.... will match
*everything* that is fed to it. Consider the alternative,
an explicit table of "known" values, which will fail
if/when a Newer! Better! Bestest! string is added
but someone fergits to add it to the "known" value table.
The cost (as in no. of bytes required) is quite modest:
Assuming a population of an option attribute set of ~1024 arbitrary
strings (>2 orders of magnitude more than the number of
fingers that I have ;-), uisng 16 linearly combined hashes (to
reduce the probability of false positives to < 0.01%), popt
will lazily alloc a bit set of approx.
(3 * 1024) / 2
which is 1536 bits (or 192 bytes). *shrug*
All of the above parameters are tunable, the only hard question is
What should the default pramaters in popt be?
I won't know whether I like the popt implementation (Bloom filters
are not at all bloaty) until I try using it at an application
Other opinions welcomed, particularly on the API. There's
still some polishing that remains.
73 de Jeff
Received on Sun Jul 26 00:56:09 2009