On Dec 4, 2009, at 11:04 AM, Jeff Johnson wrote:
> I am now recovered from Thanksgiving and am proceeding
> with finalizing rpmdb access with patterns, and speeding
> up "rpm -qa" another order of magnitude.
>
> I already have full blown pattern matching for PCRE patterns
> applied to the RPMTAG_NAME index:
>
> http://rpm5.org/community/rpm-devel/4012.html
>
> What remains is to choose a pattern syntax, and apply to the
> RPMTA_NVRA index instead.
>
> I can easily do glob's or PCRE's or strcmp or ... for all rpm CLI
> package (i.e. anything that is looked up in an rpmdb) arguments
> if I can devise some means to infer what the input is.
>
> ATM, the best I can think of is to use the presence of a leading '^'
> or trailing '$' anchor as a positive confirmation that a
> pattern, not a string, is intended, and the pattern will
> not be escaped further.
>
> If no anchors, I will add both anchors and whatever else
> is needed to preserve the rather goosey-loosey legacy
> behavior of RPM, where the V and R are optional, and A
> is restricted to a known set of arch keywords, in
> N-V-R.A
> and escape all PCRE characters found in the string.
>
> Any other ideas?
>
> Should I dumb the implementation down to fnmatch(3)? The problem
> with fnmatch(3) is that negative matches like "everything but"
> as in "grep -v ..." don't work well enough with glob patterns.
>
Well the deed is largely done:
[jbj@fedora10 wdj]$ /usr/bin/time rpm -qa '[a-z][ab]*' | wc -l
0.02user 0.31system 0:00.44elapsed 77%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19610minor)pagefaults 0swaps
104
[jbj@fedora10 wdj]$ /usr/bin/time rpm -q '^.[ab]' | wc -l
0.03user 0.14system 0:00.32elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4043minor)pagefaults 0swaps
113
One can either use globs with rpm -qa, or use PCRE patterns with rpm -q ...
The PCRE patterns are faster because only indexes are looked at, the rpm -qa
loads every header in order to apply the glob (transformed to PCRE) to the
RPMTAG_NAME in each header.
Note that the PCRE patterns should be functional everywhere a RPMDBI_LABEL
index is used: with --query/--verify/--erase as well as through bindings
that use the rpmmi match iterator.
(aside)
There's still a largish (like >10x) performance win possible when
I figger DB_DBT_PARTIAL retrieves with *RE stems, and eliminate the
headerLoad() by applying the *RE patterns solely to the index keys.
73 de Jeff
Received on Fri Dec 4 19:33:46 2009