(Dunno why this msg is mot getting through ... 3rd time's a charm)
For various rpmio development reasons, I needed PCRE expressions
applied to HTML content delivered by plain HTTP (not DAV enabled)
transport.
In order to achieve that goal, I've rewritten pcregrep (from
pcre-7.6) to use -lpopt and -lrpmio.
IMHO, the result has uses outside of rpm (and rpmio), so I'm going to
install
the executable (at least through rpm-5.1 development) in bindir
(i.e. /usr/bin).
We'll see later about whether /usr/bin/rpmgrep should be included in
rpm-5.1
(or not). For now, I need to hear problem reports with rpmgrep, and
that simply
isn't going to happen unless I install /usr/bin/rpmgrep in PATH.
Here's a very brief intro to what rpmgrep adds to pcregrep (note the
URL argument, and
I hope the HTML in the output spew makes it through mail):
$ ./rpmgrep Fedora http://jbj.org/
<title>Test Page for the Apache HTTP Server on
Fedora</title>
<h1>Fedora <strong>Test Page</strong></h1>
<p>For information on
Fedora, please visit the <a href="http://fedoraproject.org/">Fedora
Project website</a>.</p>
<p>You are free
to use the images below on Apache and Fedora powered HTTP servers.
Thanks for using Apache and Fedora!</p>
<p><a
href="http://httpd.apache.org/"><img src="/icons/apache_pb2.gif"
alt="[ Powered by Apache ]"/></a> <a href="http://
fedoraproject.org/"><img src="/icons/poweredby.png" alt="[ Powered by
Fedora ]" width="88" height="31" /></a></p>
(aside) Yes, the same functionality could have be done with
$ curl http://jbj.org/ | grep Fedora
if I were writing a grep program.
(aside) I'm not writing a grep program, but rather using rpmgrep as
an external executable to stabilize
PCRE patterns, hierarchical path traversal, and HTTP transport before
enabling the same functionality within
rpm itself.
The output of rpm grep --help is appended below. Note that everwhere
"file"
is mentioned, a URI should be able to be substituted. That's what
rpmio is about.
I'll get a rpmgrep man page together soonishly ...
Enjoy!
73 de Jeff
====================================================
[jbj@wellfleet rpmio]$ ./rpmgrep --help
Usage: lt-rpmgrep [OPTION...]
-A, --after-context=number set number of following context lines
-B, --before-context=number set number of prior context lines
--color matched text color option
--colour matched text colour option
-C, --context=number set number of context lines,
before & after
-c, --count print only a count of matching
lines per FILE
-D, --devices=action how to handle devices, FIFOs, and
sockets
-d, --directories=action how to handle directories
-e, --regex(p) specify pattern (may be used more
than once)
-F, --fixed_strings patterns are sets of newline-
separated
strings
-f, --file=path read patterns from file
--file-offsets output file offsets, not text
-H, --with-filename force the prefixing filename on
output
-h, --no-filename suppress the prefixing filename on
output
-i, --ignore-case ignore case distinctions
-l, --files-with-matches print only FILE names containing
matches
-L, --files-without-match print only FILE names not
containing matches
--label=name set name for standard input
--line-offsets output line numbers and offsets,
not text
--locale=locale use the named locale
-M, --multiline run in multiline mode
-N, --newline=type set newline type (CR, LF, CRLF,
ANYCRLF or
ANY)
-n, --line-number print line number with output lines
-o, --only-matching show only the part of the line
that matched
-q, --quiet suppress output, just set return code
-r, --recursive recursively scan sub-directories
--exclude=pattern exclude matching files when recursing
--include=pattern include matching files when recursing
-s, --no-messages suppress error messages
-u, --utf-8 use UTF-8 mode
-V, --version print version information and exit
-v, --invert-match select non-matching lines
-w, --word-regex force patterns to match only as words
-x, --line-regex force patterns to match only whole
lines
Common options for all rpmio executables:
-D, --define='MACRO EXPR' define MACRO with value EXPR
--undefine='MACRO' undefine MACRO
-E, --eval='EXPR' print macro expansion of EXPR
-r, --root=ROOT use ROOT as top level directory
(default:
"/")
--quiet provide less detailed output
-v, --verbose provide more detailed output
--version print the version
Help options:
-?, --help Show this help message
--usage Display brief usage message
-- Terminate options
Usage: rpmgrep [OPTION...] [PATTERN] [FILE1 FILE2 ...]
Search for PATTERN in each FILE or standard input.
PATTERN must be present if neither -e nor -f is used.
"-" can be used as a file name to mean STDIN.
All files are read as plain files, without any interpretation.
Example: rpmgrep -i 'hello.*world' menu.h main.c
When reading patterns from a file instead of using a command line
option,
trailing white space is removed and blank lines are ignored.
With no FILEs, read standard input. If fewer than two FILEs given,
assume -h.
On Feb 13, 2008, at 5:40 PM, Jeff Johnson wrote:
> RPM Package Manager, CVS Repository
> http://rpm5.org/cvs/
>
> ______________________________________________________________________
> ______
>
> Server: rpm5.org Name: Jeff Johnson
> Root: /v/rpm/cvs Email: jbj@rpm5.org
> Module: rpm Date: 13-Feb-2008
> 23:40:59
> Branch: HEAD Handle: 2008021322405801
>
> Modified files:
> rpm CHANGES
> rpm/rpmio Makefile.am rpmgrep.1
>
> Log:
> - jbj: rpmgrep: install in bindir with man page.
>
> Summary:
> Revision Changes Path
> 1.2175 +1 -0 rpm/CHANGES
> 1.133 +6 -1 rpm/rpmio/Makefile.am
> 1.2 +26 -26 rpm/rpmio/rpmgrep.1
>
> ______________________________________________________________________
> ______
>
> patch -p0 <<'@@ .'
> Index: rpm/CHANGES
>
> ======================================================================
> ======
> $ cvs diff -u -r1.2174 -r1.2175 CHANGES
> --- rpm/CHANGES 12 Feb 2008 05:36:10 -0000 1.2174
> +++ rpm/CHANGES 13 Feb 2008 22:40:58 -0000 1.2175
> @@ -1,4 +1,5 @@
> 5.0.0 -> 5.1a1:
> + - jbj: rpmgrep: install in bindir with man page.
> - rpm-maint: fix: limit exit codes to 254 to keep xargs happy.
> - jbj: mire: add vallen argument to mireRegexec().
> - jbj: borrow pcregrep.c from pcre-7.6, rename as rpmgrep.c.
> @@ .
> patch -p0 <<'@@ .'
> Index: rpm/rpmio/Makefile.am
>
> ======================================================================
> ======
> $ cvs diff -u -r1.132 -r1.133 Makefile.am
> --- rpm/rpmio/Makefile.am 11 Feb 2008 22:23:49 -0000 1.132
> +++ rpm/rpmio/Makefile.am 13 Feb 2008 22:40:59 -0000 1.133
> @@ -8,6 +8,9 @@
>
> EXTRA_PROGRAMS = thkp thtml tinv tkey tmacro tmagic tput tpw
> trpmio tsw dumpasn1 lookup3
>
> +bin_PROGRAMS =
> +man_MANS =
> +
> TESTS =
> check_PROGRAMS = tdir tfts tget tglob tmire
> check_SCRIPTS = testit.sh
> @@ -108,7 +111,9 @@
>
> TESTS += RunGrepTest
> dist_noinst_SCRIPTS += RunGrepTest
> -check_PROGRAMS += rpmgrep
> +bin_PROGRAMS += rpmgrep
> +man_MANS += rpmgrep.1
> +
> rpmgrep_SOURCES = rpmgrep.c
> rpmgrep_LDADD = $(RPMIO_LDADD)
>
> @@ .
> patch -p0 <<'@@ .'
> Index: rpm/rpmio/rpmgrep.1
>
> ======================================================================
> ======
> $ cvs diff -u -r1.1 -r1.2 rpmgrep.1
> --- rpm/rpmio/rpmgrep.1 13 Feb 2008 22:09:48 -0000 1.1
> +++ rpm/rpmio/rpmgrep.1 13 Feb 2008 22:40:59 -0000 1.2
> @@ -1,13 +1,13 @@
> .TH PCREGREP 1
> .SH NAME
> -pcregrep - a grep with Perl-compatible regular expressions.
> +rpmgrep - a grep with Perl-compatible regular expressions.
> .SH SYNOPSIS
> -.B pcregrep [options] [long options] [pattern] [path1 path2 ...]
> +.B rpmgrep [options] [long options] [pattern] [path1 path2 ...]
> .
> .SH DESCRIPTION
> .rs
> .sp
> -\fBpcregrep\fP searches files for character patterns, in the
> same way as other
> +\fBrpmgrep\fP searches files for character patterns, in the same
> way as other
> grep commands do, but it uses the PCRE regular expression
> library to support
> patterns that are compatible with the regular expressions of
> Perl 5. See
> .\" HREF
> @@ -19,7 +19,7 @@
> Patterns, whether supplied on the command line or in a separate
> file, are given
> without delimiters. For example:
> .sp
> - pcregrep Thursday /etc/motd
> + rpmgrep Thursday /etc/motd
> .sp
> If you attempt to use delimiters (for example, by surrounding a
> pattern with
> slashes, as is common in Perl scripts), they are interpreted as
> part of the
> @@ -33,16 +33,16 @@
> arguments are treated as path names. At least one of \fB-e\fP,
> \fB-f\fP, or an
> argument pattern must be provided.
> .P
> -If no files are specified, \fBpcregrep\fP reads the standard
> input. The
> +If no files are specified, \fBrpmgrep\fP reads the standard
> input. The
> standard input can also be referenced by a name consisting of a
> single hyphen.
> For example:
> .sp
> - pcregrep some-pattern /file1 - /file3
> + rpmgrep some-pattern /file1 - /file3
> .sp
> By default, each line that matches a pattern is copied to the
> standard
> output, and if there is more than one file, the file name is
> output at the
> start of each line, followed by a colon. However, there are
> options that can
> -change how \fBpcregrep\fP behaves. In particular, the \fB-M\fP
> option makes it
> +change how \fBrpmgrep\fP behaves. In particular, the \fB-M\fP
> option makes it
> possible to search for patterns that span line boundaries. What
> defines a line
> boundary is controlled by the \fB-N\fP (\fB--newline\fP) option.
> .P
> @@ -62,13 +62,13 @@
> earlier part of the line.
> .P
> If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
> -\fBpcregrep\fP uses the value to set a locale when calling the
> PCRE library.
> +\fBrpmgrep\fP uses the value to set a locale when calling the
> PCRE library.
> The \fB--locale\fP option can be used to override this.
> .
> .SH "SUPPORT FOR COMPRESSED FILES"
> .rs
> .sp
> -It is possible to compile \fBpcregrep\fP so that it uses \fBlibz
> \fP or
> +It is possible to compile \fBrpmgrep\fP so that it uses \fBlibz
> \fP or
> \fBlibbz2\fP to read files whose names end in \fB.gz\fP or
> \fB.bz2\fP,
> respectively. You can find out whether your binary has support
> for one or both
> of these file types by running it with the \fB--help\fP option.
> If the
> @@ -88,7 +88,7 @@
> and/or line numbers are being output, a hyphen separator is used
> instead of a
> colon for the context lines. A line containing "--" is output
> between each
> group of lines, unless they are in fact contiguous in the input
> file. The value
> -of \fInumber\fP is expected to be relatively small. However,
> \fBpcregrep\fP
> +of \fInumber\fP is expected to be relatively small. However,
> \fBrpmgrep\fP
> guarantees to have up to 8K of following text available for
> context output.
> .TP
> \fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
> @@ -96,7 +96,7 @@
> and/or line numbers are being output, a hyphen separator is used
> instead of a
> colon for the context lines. A line containing "--" is output
> between each
> group of lines, unless they are in fact contiguous in the input
> file. The value
> -of \fInumber\fP is expected to be relatively small. However,
> \fBpcregrep\fP
> +of \fInumber\fP is expected to be relatively small. However,
> \fBrpmgrep\fP
> guarantees to have up to 8K of preceding text available for
> context output.
> .TP
> \fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
> @@ -150,13 +150,13 @@
> of the order in which these options are specified. Note that
> multiple use of
> \fB-e\fP is not the same as a single pattern with alternatives.
> For example,
> X|Y finds the first character in a line that is X or Y, whereas
> if the two
> -patterns are given separately, \fBpcregrep\fP finds X if it is
> present, even if
> +patterns are given separately, \fBrpmgrep\fP finds X if it is
> present, even if
> it follows Y in the line. It finds Y only if there is no X in
> the line. This
> really matters only if you are using \fB-o\fP to show the part
> (s) of the line
> that matched.
> .TP
> \fB--exclude\fP=\fIpattern\fP
> -When \fBpcregrep\fP is searching the files in a directory as a
> consequence of
> +When \fBrpmgrep\fP is searching the files in a directory as a
> consequence of
> the \fB-r\fP (recursive search) option, any files whose names
> match the pattern
> are excluded. The pattern is a PCRE regular expression. If a
> file name matches
> both \fB--include\fP and \fB--exclude\fP, it is excluded. There
> is no short
> @@ -211,7 +211,7 @@
> Ignore upper/lower case distinctions during comparisons.
> .TP
> \fB--include\fP=\fIpattern\fP
> -When \fBpcregrep\fP is searching the files in a directory as a
> consequence of
> +When \fBrpmgrep\fP is searching the files in a directory as a
> consequence of
> the \fB-r\fP (recursive search) option, only those files whose
> names match the
> pattern are included. The pattern is a PCRE regular expression.
> If a file name
> matches both \fB--include\fP and \fB--exclude\fP, it is
> excluded. There is no
> @@ -254,8 +254,8 @@
> and $ characters. The output for any one match may consist of
> more than one
> line. When this option is set, the PCRE library is called in
> "multiline" mode.
> There is a limit to the number of lines that can be matched,
> imposed by the way
> -that \fBpcregrep\fP buffers the input file as it scans it. However,
> -\fBpcregrep\fP ensures that at least 8K characters or the rest
> of the document
> +that \fBrpmgrep\fP buffers the input file as it scans it. However,
> +\fBrpmgrep\fP ensures that at least 8K characters or the rest of
> the document
> (whichever is the shorter) are available for forward matching,
> and similarly
> the previous 8K characters (or all the previous characters, if
> fewer than 8K)
> are guaranteed to be available for lookbehind assertions.
> @@ -272,12 +272,12 @@
> .sp
> When the PCRE library is built, a default line-ending sequence
> is specified.
> This is normally the standard sequence for the operating system.
> Unless
> -otherwise specified by this option, \fBpcregrep\fP uses the
> library's default.
> +otherwise specified by this option, \fBrpmgrep\fP uses the
> library's default.
> The possible values for this option are CR, LF, CRLF, ANYCRLF,
> or ANY. This
> -makes it possible to use \fBpcregrep\fP on files that have come
> from other
> +makes it possible to use \fBrpmgrep\fP on files that have come
> from other
> environments without having to modify their line endings. If the
> data that is
> being scanned does not agree with the convention set by this
> option,
> -\fBpcregrep\fP may behave in strange ways.
> +\fBrpmgrep\fP may behave in strange ways.
> .TP
> \fB-n\fP, \fB--line-number\fP
> Precede each output line by its line number in the file,
> followed by a colon
> @@ -316,7 +316,7 @@
> UTF-8 characters.
> .TP
> \fB-V\fP, \fB--version\fP
> -Write the version numbers of \fBpcregrep\fP and the PCRE library
> that is being
> +Write the version numbers of \fBrpmgrep\fP and the PCRE library
> that is being
> used to the standard error stream.
> .TP
> \fB-v\fP, \fB--invert-match\fP
> @@ -346,9 +346,9 @@
> .SH "NEWLINES"
> .rs
> .sp
> -The \fB-N\fP (\fB--newline\fP) option allows \fBpcregrep\fP to
> scan files with
> +The \fB-N\fP (\fB--newline\fP) option allows \fBrpmgrep\fP to
> scan files with
> different newline conventions from the default. However, the
> setting of this
> -option does not affect the way in which \fBpcregrep\fP writes
> information to
> +option does not affect the way in which \fBrpmgrep\fP writes
> information to
> the standard error and output streams. It uses the string "\en"
> in C
> \fBprintf()\fP calls to indicate newlines, relying on the C I/O
> library to
> convert this to an appropriate sequence if the output is sent to
> a file.
> @@ -357,11 +357,11 @@
> .SH "OPTIONS COMPATIBILITY"
> .rs
> .sp
> -The majority of short and long forms of \fBpcregrep\fP's options
> are the same
> +The majority of short and long forms of \fBrpmgrep\fP's options
> are the same
> as in the GNU \fBgrep\fP program. Any long option of the form
> \fB--xxx-regexp\fP (GNU terminology) is also available as \fB--
> xxx-regex\fP
> (PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--
> multiline\fP,
> -\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP.
> +\fB-u\fP, and \fB--utf-8\fP options are specific to \fBrpmgrep\fP.
> .
> .
> .SH "OPTIONS WITH DATA"
> @@ -399,9 +399,9 @@
> fail to match certain lines. Such patterns normally involve
> nested indefinite
> repeats, for example: (a+)*\ed when matched against a line of
> a's with no final
> digit. The PCRE matching function has a resource limit that
> causes it to abort
> -in these circumstances. If this happens, \fBpcregrep\fP outputs
> an error
> +in these circumstances. If this happens, \fBrpmgrep\fP outputs
> an error
> message and the line that caused the problem to the standard
> error stream. If
> -there are more than 20 such errors, \fBpcregrep\fP gives up.
> +there are more than 20 such errors, \fBrpmgrep\fP gives up.
> .
> .
> .SH DIAGNOSTICS
> @@ .
> ______________________________________________________________________
> RPM Package Manager http://rpm5.org
> CVS Sources Repository rpm-cvs@rpm5.org
Received on Wed Feb 13 23:46:09 2008