One of the more subtle additions in popt-1.16 is
This a Bloom filter data type that parses option
arguments as a CSV and adds to the "bit set".
What is difficult to describe is why/how Bloom filters
"work" (wikipedia does an excellent job) and particularly
how to describe
Why would I _EVER_ need Bloom filters doing CLI option parsing?
So here's a perfectly sane usage case that I'll be implementing in RPM
One of the mysteries in RPM is %config handling. In short, a file path
marked with %config is handled with a state machine that tries to
avoid accidentally discarding local customizations during "upgrade".
However, what is often needed is a means (through POPT options) to
disable some functionality, or also to negate the behavior (like "grep -v")
Since there are like 300K possible file paths on existing systems, this is a rather
complex task if arrays/lists are used to attach a per-file disabler using POPT.
One can certainly fiddle up the application code to process POPT
options in any number of ways.
Or (as I'm gonna do) one can let POPT_ARG_BITSET do most of the heavy lifting for you
handling a large estimated universe (in this case potentially 300K file paths)
with almost no additional application code needed. This POPT helper
int poptBitsChk(poptBits bits, const char * s);
will return TRUE/FALSE depending on whether a string was mentioned (or not).
There's already a a test case (popt/tdict.c) based on /usr/share/dict/words
in popt code.
Fiddle around with tdict.c if you wish to attempt using POPT_ARG_BITSET
in some application.
Wayne: RSYNC+POPT likely has zillions of usage cases for POPT_ARG_BITSET
with large unknown(a priori) populations of file paths (and enablers/disablers
attached to a specific path). Note that the "default" expected population size
in POPT is way too small for 300K+ file paths (from popt/popt.h):
#define _POPT_BITS_N 1024U /* estimated population */
#define _POPT_BITS_M ((3U * _POPT_BITS_N) / 2U)
#define _POPT_BITS_K 16U /* no. of linear hash combinations */
but can be overridden by assigning values to
extern unsigned int _poptBitsN;
extern unsigned int _poptBitsM;
extern unsigned int _poptBitsK;
There's also the attached bit of deep dark chocolate from RPM code (where
the POPT_ARG_BITSET Bloom filter implementation in POPT was swiped from)
if interested in an "optimal" set of parameters with a "false positive"
failure rate less than specified for a given estimated population size.
See the appended routine.
I left rpmbfParams() (the mnemonic is RPMBF == RPM Bloom Filter, duh) routine
out of POPT largely because of the math complexity and no clearly established
usage case for POPT_ARG_BITSET so far. I also hadn't implemented rpmbfParams()
yet at the time I did the port to POPT ;-).
I almost certainly will include some variant of the attached routine
to estimate "optimal" bitset parameters in POPT 2.0 before I'm done.
73 de Jeff
void rpmbfParams(size_t n, double e, size_t * mp, size_t * kp)
static size_t _nmin = 10;
static size_t _n = 10000;
static size_t _nmax = 1 * 1000 * 1000 * 1000;
static double _emin = 1.0e-10;
static double _e = 1.0e-4;
static double _emax = 1.0e-2;
/* XXX sanity checks on (n,e) args. */
if (!(n >= _nmin && _n <= _nmax))
n = _n;
if (!(e >= _emin && _e <= _emax))
e = _e;
m = (size_t)((n * log(e)) / (log(1.0 / pow(2.0, log(2.0)))) + 0.5);
k = (size_t) ((m * log(2.0)) / n);
if (mp) *mp = m;
if (kp) *kp = k;
Received on Mon Jun 21 16:41:47 2010