-.\" Copyright (c) 2001 Andreas S. Oesterhelt <oes@oesterhelt.org>
+.\" Copyright (c) 2001-2003 Andreas S. Oesterhelt <oes@oesterhelt.org>
.\"
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\"
.\" You should have received a copy of the GNU General Public
.\" License along with this manual; if not, write to the Free
-.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111,
-.\" USA.
+.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+.\" MA 02111, USA.
.\"
-.TH PCRS 3 "4 March 2002"
+.TH PCRS 3 "2 December 2003" "pcrs-0.0.3"
.SH NAME
pcrs - Perl-compatible regular substitution.
.SH SYNOPSIS
.br
-.BI "#include <pcrs.h>"
+.B "#include <pcrs.h>"
.PP
.br
-.BI "pcrs_job *pcrs_compile(const char *" pattern ","
+.BI "pcrs_job *pcrs_compile(const char *" pattern ","
.ti +5n
.BI "const char *" substitute ", const char *" options ,
.ti +5n
The
.SM PCRS
-library is a supplement to the
+library is a supplement to the
.SB PCRE(3)
library that implements
.RB "regular expression based substitution, like provided by " Perl(1) "'s 's'"
In a first step, the information on a substitution, i.e. the pattern, the
substitute and the options are compiled from Perl syntax to an internal form
-.RB "called " pcrs_job " by using either the " pcrs_compile() " or "
+.RB "called " pcrs_job " by using either the " pcrs_compile() " or "
.BR pcrs_compile_command() " functions."
-Once the job is compiled, it can be used on subjects, which are arbitrary
+Once the job is compiled, it can be used on subjects, which are arbitrary
memory areas containing string or binary data, by calling
.BR pcrs_execute() ". Jobs can be chained to joblists and whole"
.RB "joblists can be applied to a subject using " pcrs_execute_list() .
.RB "follows the '" "s" "' will be used as the delimiter. Patterns or substitutes"
that contain the delimiter need to quote it:
\fBs/th\\/is/th\\/at/\fR
-.RB "will replace " "th/is" " by " "th/at" " and can be written more simply as"
+.RB "will replace " "th/is" " by " "th/at" " and can be written more simply as"
.BR "s|th/is|th/at|" "."
.IR "pattern" ", " "substitute" ", " "options" " and " "command" " must be"
.SS Substitutes
.RI "The " "substitute" " uses"
.RB "Perl syntax as documented in the " "perlre(1)" " manual page, with"
-some exceptions:
+some exceptions:
Most notably and evidently, since
.SM PCRS
if a global substitution previously matched.
.PP
-Perl4-style references to subpattern matches of the form
+Perl4-style references to subpattern matches of the form
\fB\\1, \\2, ...\fR
-.RB "which only exist in Perl5 for backwards compatibility, are " "not"
+.RB "which only exist in Perl5 for backwards compatibility, are " "not"
supported.
Also, since the substitute is a double-quoted string in Perl, you
.RI "The first " subject_length " bytes following " subject " are processed, so"
.RI "a " subject_length " that exceeds the actual " subject " is dangerous."
-Note that if you want to get your zero-terminated C strings back including their
-.RI "termination, you must let " subject_length " include the binary zero, i.e."
-set it to
-.BI strlen( subject ") + 1."
+.RI "Note that for zero-terminated C strings, you should set " subject_length " to"
+.BI strlen( subject ) \fR,
+so that the dollar metacharacter matches at the end of the string, not after
+the string-terminating null byte. For convenience, an extra null byte is
+appended to the result so it can again be used as a string.
.RI "The " subject " itself is left untouched, and the " *result " is dynamically"
.RB "allocated, so it is the caller's responsibility to " free() " it when it's"
no longer needed.
-.RI "The result's length is written to " *result_length "."
+.RI "The result's length (excluding the extra null byte) is written to " *result_length "."
.RB "If the job matched, the " PCRS_SUCCESS " flag in"
.IB job ->flags
is set.
+
+.SS String subjects
+If your
+
.SS Return value and diagnostics
.RB "On success, " pcrs_execute() " returns the number of substitutions that"
Chaining the jobs is up to you, but once you have built a linked list of jobs,
.RI "you can execute a whole " joblist " on a given subject by"
.RB "a single call to " pcrs_execute_list() ", which will sequentially traverse"
-.RB "the linked list until it reaches a " NULL " pointer, and call " pcrs_execute()
+.RB "the linked list until it reaches a " NULL " pointer, and call " pcrs_execute()
.RI "for each job it encounters, feeding the " result " and " result_length " of each"
.RI "call into the next as the " subject " and " subject_length ". As in the single"
.RI "job case, the original " subject " remains untouched, but all interim " result "s"
While compiling the pattern,
.SM PCRE
ran out of memory.
-.TP
+.TP
.B PCRS_ERR_NOMEM
While compiling the job,
.SM PCRS
ran out of memory.
-.TP
+.TP
.B PCRS_ERR_CMDSYNTAX
.BR pcrs_compile_command() " didn't find four tokens while parsing the"
.IR command .
-.TP
+.TP
.B PCRS_ERR_STUDY
A
.SM PCRE
.RB "error occured while studying the compiled pattern. Since " pcre_study()
only provides textual diagnostic information, the details are lost.
-.TP
+.TP
.B PCRS_WARN_BADREF
.RI "The " substitute " contains a reference to a capturing subpattern that"
.RI "has a higher index than the number of capturing subpatterns in the " pattern
ran out of memory. This can only happen if there are more than 33 backrefrences
.RI "in the " pattern "(!)"
.BR and " memory is too tight to extend storage for more."
-.TP
+.TP
.B PCRS_ERR_NOMEM
While executing the job,
.SM PCRS
ran out of memory.
-.TP
+.TP
.B PCRS_ERR_BADJOB
.RB "The " pcrs_job "* passed to " pcrs_execute " was NULL, or the"
.RB "job is bogus (it contains " NULL " pointers to the compiled
{
pcrs_job *job;
char *result;
- int newsize, err;
+ size_t newsize;
+ int err;
if (Argc != 3)
{
if (NULL == (job = pcrs_compile_command(Argv[1], &err)))
{
- printf("Compile error: %s (%d).\\n", pcrs_strerror(err), err);
+ fprintf(stderr, "%s: compile error: %s (%d).\\n", Argv[0], pcrs_strerror(err), err);
}
- if (0 > (err = pcrs_execute(job, Argv[2], strlen(Argv[2]) + 1, &result, &newsize)))
+ if (0 > (err = pcrs_execute(job, Argv[2], strlen(Argv[2]), &result, &newsize)))
{
- printf("Exec error: %s (%d).\\n", pcrs_strerror(err), err);
+ fprintf(stderr, "%s: exec error: %s (%d).\\n", Argv[0], pcrs_strerror(err), err);
}
else
{
is a Bad Thing[tm]. It should be dynamically expanded until it reaches the
.SM PCRE
limit of 99.
+.br
+This limitation is particularly embarassing since
+.SM PCRE
+3.5 has raised the capturing subpattern limit to 65K.
All of the above values can be adjusted in the "Capacity" section
.RB "of " pcrs.h "."
.SH HISTORY
.SM PCRS
-was originally written for the Internet Junkbuster project
-(http://sourceforge.net/projects/ijbswa/).
+was originally written for the Privoxy project
+(https://www.privoxy.org/).
.SH SEE ALSO
.B PCRE(3), perl(1), perlre(1)
.SH AUTHOR
.SM PCRS
-is Copyright 2000, 2001 by Andreas Oesterhelt <andreas@oesterhelt.org> and is
-licensed under the terms of the GNU Lesser General Public License (LGPL),
-version 2.1, which should be included in this distribution, with the exception
-that the permission to replace that license with the GNU General Public
-License (GPL) given in section 3 is restricted to version 2 of the GPL.
-
-If it is missing from this distribution, the LGPL can be obtained from
-http://www.gnu.org/licenses/lgpl.html or by mail: Write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+is Copyright 2000 - 2003 by Andreas Oesterhelt <andreas@oesterhelt.org>
+and is free software.
+
+You can redistribute it and/or modify it under the terms of the GNU
+General Public License as published by the Free Software Foundation;
+either version 2 of the License, or (at your option) any later
+version.
+
+The GNU General Public License should be included with this file.
+If not, you can view it at https://www.gnu.org/copyleft/gpl.html or write
+to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+Boston, MA 02111-1307, USA.