RPM Community Forums

Mailing List Message of <popt-devel>

A better example usage case for POPT_ARG_BITSET

From: Jeff Johnson <n3npq@mac.com>
Date: Mon 21 Jun 2010 - 16:41:21 CEST
Message-id: <511FA3FF-8FCD-4909-BF55-33B12C605F33@mac.com>
One of the more subtle additions in popt-1.16 is

This a Bloom filter data type that parses option
arguments as a CSV and adds to the "bit set".

What is difficult to describe is why/how Bloom filters
"work" (wikipedia does an excellent job) and particularly
how to describe
	Why would I _EVER_ need Bloom filters doing CLI option parsing?

So here's a perfectly sane usage case that I'll be implementing in RPM

One of the mysteries in RPM is %config handling. In short, a file path
marked with %config is handled with a state machine that tries to
avoid accidentally discarding local customizations during "upgrade".

However, what is often needed is a means (through POPT options) to
disable some functionality, or also to negate the behavior (like "grep -v")

Since there are like 300K possible file paths on existing systems, this is a rather
complex task if arrays/lists are used to attach a per-file disabler using POPT.

One can certainly fiddle up the application code to process POPT
options in any number of ways.

Or (as I'm gonna do) one can let POPT_ARG_BITSET do most of the heavy lifting for you
handling a large estimated universe (in this case potentially 300K file paths)
with almost no additional application code needed. This POPT helper
	int poptBitsChk(poptBits bits, const char * s);
will return TRUE/FALSE depending on whether a string was mentioned (or not).

There's already a a test case (popt/tdict.c) based on /usr/share/dict/words
in popt code.

Fiddle around with tdict.c if you wish to attempt using POPT_ARG_BITSET
in some application.

Wayne: RSYNC+POPT likely has zillions of usage cases for POPT_ARG_BITSET
with large unknown(a priori) populations of file paths (and enablers/disablers
attached to a specific path). Note that the "default" expected population size
in POPT is way too small for 300K+ file paths (from popt/popt.h):

	#define _POPT_BITS_N    1024U    /* estimated population */
	#define _POPT_BITS_M    ((3U * _POPT_BITS_N) / 2U)
	#define _POPT_BITS_K    16U      /* no. of linear hash combinations */

but can be overridden by assigning values to

	extern unsigned int _poptBitsN;
	extern  unsigned int _poptBitsM;
	extern  unsigned int _poptBitsK;

There's also the attached bit of deep dark chocolate from RPM code (where
the POPT_ARG_BITSET Bloom filter implementation in POPT was swiped from)
if interested in an "optimal" set of parameters with a "false positive"
failure rate less than specified for a given estimated population size.

See the appended routine.

I left rpmbfParams() (the mnemonic is RPMBF == RPM Bloom Filter, duh) routine
out of POPT largely because of the math complexity and no clearly established
usage case for POPT_ARG_BITSET so far. I also hadn't implemented rpmbfParams()
yet at the time I did the port to POPT ;-).

I almost certainly will include some variant of the attached routine
to estimate "optimal" bitset parameters in POPT 2.0 before I'm done.


73 de Jeff
void rpmbfParams(size_t n, double e, size_t * mp, size_t * kp)
    static size_t _nmin = 10;
    static size_t _n = 10000;
    static size_t _nmax = 1 * 1000 * 1000 * 1000;
    static double _emin = 1.0e-10;
    static double _e = 1.0e-4;
    static double _emax = 1.0e-2;
    size_t m;
    size_t k;

    /* XXX sanity checks on (n,e) args. */
    if (!(n >= _nmin && _n <= _nmax))
        n = _n;
    if (!(e >= _emin && _e <= _emax))
        e = _e;

    m = (size_t)((n * log(e)) / (log(1.0 / pow(2.0, log(2.0)))) + 0.5);
    k = (size_t) ((m * log(2.0)) / n);
    if (mp) *mp = m;
    if (kp) *kp = k;
Received on Mon Jun 21 16:41:47 2010
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.