X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fpcrs.3;h=6b96b3755ee108192c7b269312e9257d181485bf;hp=c6925652e51279c66bbf0df7164b076c14116c52;hb=ec5b42c05e3a0f068b2efedcc35f9886ba580bda;hpb=9f9883408d76529bf25cba580429ed68e4f74eda diff --git a/doc/pcrs.3 b/doc/pcrs.3 index c6925652..6b96b375 100644 --- a/doc/pcrs.3 +++ b/doc/pcrs.3 @@ -1,4 +1,4 @@ -.\" Copyright (c) 2001 Andreas S. Oesterhelt +.\" Copyright (c) 2001-2003 Andreas S. Oesterhelt .\" .\" This is free documentation; you can redistribute it and/or .\" modify it under the terms of the GNU General Public License as @@ -17,82 +17,84 @@ .\" .\" You should have received a copy of the GNU General Public .\" License along with this manual; if not, write to the Free -.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, -.\" USA. +.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, +.\" MA 02111, USA. .\" -.TH PCRS 3 "17 August 2001" +.TH PCRS 3 "2 December 2003" "pcrs-0.0.3" .SH NAME pcrs - Perl-compatible regular substitution. .SH SYNOPSIS .br -.BI "#include " +.B "#include " .PP .br -.BI "pcrs_job *pcrs_compile(const char *" "pattern" "," +.BI "pcrs_job *pcrs_compile(const char *" pattern "," .ti +5n -.BI "const char *" "substitute" ", const char *" "options" "," +.BI "const char *" substitute ", const char *" options , .ti +5n -.BI "int *" "errptr" ");" +.BI "int *" errptr ); .PP .br -.BI "pcrs_job *pcrs_compile_command(const char *" "command" "," +.BI "pcrs_job *pcrs_compile_command(const char *" command , .ti +5n -.BI "int *" "errptr"); +.BI "int *" errptr ); .PP .br -.BI "int pcrs_execute(pcrs_job *" "job" ", char *" "subject" "," +.BI "int pcrs_execute(pcrs_job *" job ", char *" subject , .ti +5n -.BI "int " "subject_length" ", char **" "result" "," +.BI "int " subject_length ", char **" result , .ti +5n -.BI "int *" "result_length" ");" +.BI "int *" result_length ); .PP .br -.BI "int pcrs_execute_list (pcrs_job *" "joblist" ", char *" "subject" "," +.BI "int pcrs_execute_list (pcrs_job *" joblist ", char *" subject , .ti +5n -.BI "int " "subject_length" ", char **" "result" "," +.BI "int " subject_length ", char **" result , .ti +5n -.BI "int *" "result_length" ");" +.BI "int *" result_length ); .PP .br -.BI "pcrs_job *pcrs_free_job(pcrs_job *" "job" ");" +.BI "pcrs_job *pcrs_free_job(pcrs_job *" job ); .PP .br -.BI "void pcrs_free_joblist(pcrs_job *" "joblist" ");" +.BI "void pcrs_free_joblist(pcrs_job *" joblist ); .PP .br -.BI "char *pcrs_strerror(int " "err" ");" +.BI "char *pcrs_strerror(int " err ); .PP .br .SH DESCRIPTION + The .SM PCRS -library is a supplement to the +library is a supplement to the .SB PCRE(3) library that implements -.RB "regular expression based substitution, like provided by " "Perl(1)" "'s 's'" +.RB "regular expression based substitution, like provided by " Perl(1) "'s 's'" operator. It uses the same syntax and semantics as Perl 5, with just a few differences (see below). In a first step, the information on a substitution, i.e. the pattern, the substitute and the options are compiled from Perl syntax to an internal form -.RB "called " "pcrs_job" " by using either the " "pcrs_compile()" " or " -.BR "pcrs_compile_command()" " functions." +.RB "called " pcrs_job " by using either the " pcrs_compile() " or " +.BR pcrs_compile_command() " functions." -Once the job is compiled, it can be used on subjects, which are arbitrary +Once the job is compiled, it can be used on subjects, which are arbitrary memory areas containing string or binary data, by calling -.BR "pcrs_execute()" ". Jobs can be chained to joblists and whole" -.RB "joblists can be applied to a subject using " "pcrs_execute_list()" "." +.BR pcrs_execute() ". Jobs can be chained to joblists and whole" +.RB "joblists can be applied to a subject using " pcrs_execute_list() . There are also convenience functions for freeing the jobs and for errno-to-string -.RB "conversion, namely " "pcrs_free_job()" ", " "pcrs_free_joblist()" " and " -.BR "pcrs_strerror()" "." +.RB "conversion, namely " pcrs_free_job() ", " pcrs_free_joblist() " and " +.BR pcrs_strerror() . .SH COMPILING JOBS -.RB "The function " "pcrs_compile()" " is called to compile a " "pcrs_job" -.RI "from a " "pattern" ", " "substitute" " and " "options" " string. + +.RB "The function " pcrs_compile() " is called to compile a " pcrs_job +.RI "from a " pattern ", " substitute " and " options " string." .RB "The resulting " "pcrs_job" " structure is dynamically allocated and it" -.RB "is the caller's responsibility to " "free()" " it when it's no longer needed." +.RB "is the caller's responsibility to call " "pcrs_free_job()" " when it's no longer needed." .BR "pcrs_compile_command()" " is a convenience wrapper function that parses a Perl" .IR "command" " of the form" @@ -102,7 +104,7 @@ There are also convenience functions for freeing the jobs and for errno-to-strin .RB "follows the '" "s" "' will be used as the delimiter. Patterns or substitutes" that contain the delimiter need to quote it: \fBs/th\\/is/th\\/at/\fR -.RB "will replace " "th/is" " by " "th/at" " and can be written more simply as" +.RB "will replace " "th/is" " by " "th/at" " and can be written more simply as" .BR "s|th/is|th/at|" "." .IR "pattern" ", " "substitute" ", " "options" " and " "command" " must be" @@ -121,7 +123,7 @@ On success, both functions return a pointer to the compiled job. .SS Substitutes .RI "The " "substitute" " uses" .RB "Perl syntax as documented in the " "perlre(1)" " manual page, with" -some exceptions: +some exceptions: Most notably and evidently, since .SM PCRS @@ -145,9 +147,9 @@ refers to what the last capturing subpattern matched. if a global substitution previously matched. .PP -Perl4-style references to subpattern matches of the form +Perl4-style references to subpattern matches of the form \fB\\1, \\2, ...\fR -.RB "which only exist in Perl5 for backwards compatibility, are " "not" +.RB "which only exist in Perl5 for backwards compatibility, are " "not" supported. Also, since the substitute is a double-quoted string in Perl, you @@ -249,34 +251,39 @@ Unsupported options are silently ignored. .RI "The first " subject_length " bytes following " subject " are processed, so" .RI "a " subject_length " that exceeds the actual " subject " is dangerous." -Note that if you want to get your zero-terminated C strings back including their -.RI "termination, you must let " subject_length " include the binary zero, i.e." -set it to -.BI strlen( subject ") + 1." +.RI "Note that for zero-terminated C strings, you should set " subject_length " to" +.BI strlen( subject ) \fR, +so that the dollar metacharacter matches at the end of the string, not after +the string-terminating null byte. For convenience, an extra null byte is +appended to the result so it can again be used as a string. .RI "The " subject " itself is left untouched, and the " *result " is dynamically" .RB "allocated, so it is the caller's responsibility to " free() " it when it's" no longer needed. -.RI "The result's length is written to " *result_length "." +.RI "The result's length (excluding the extra null byte) is written to " *result_length "." .RB "If the job matched, the " PCRS_SUCCESS " flag in" .IB job ->flags is set. + +.SS String subjects +If your + .SS Return value and diagnostics .RB "On success, " pcrs_execute() " returns the number of substitutions that" were made, which is limited to 0 or 1 for non-global searches. -.RI "On failure, a negative error code is returned and " *result " is set" +.RI "On failure, a negative error code is returned and " result " is set" .RB "to " NULL . .SH FREEING JOBS .RB "It is not sufficient to call " free() " on a " pcrs_job ", because it " contains pointers to other dynamically allocated structures. -.RB "Use " pcrs_free() " instead. It is safe to pass " NULL " pointers " +.RB "Use " pcrs_free_job() " instead. It is safe to pass " NULL " pointers " .RB "(or pointers to invalid " pcrs_job "s that contain " NULL " pointers" -.RB "to dependant structures) to " pcrs_free() "." +.RB "to dependant structures) to " pcrs_free_job() "." .SS Return value .RB "The value of the job's " next " pointer." @@ -291,7 +298,7 @@ contains pointers to other dynamically allocated structures. Chaining the jobs is up to you, but once you have built a linked list of jobs, .RI "you can execute a whole " joblist " on a given subject by" .RB "a single call to " pcrs_execute_list() ", which will sequentially traverse" -.RB "the linked list until it reaches a " NULL " pointer, and call " pcrs_execute() +.RB "the linked list until it reaches a " NULL " pointer, and call " pcrs_execute() .RI "for each job it encounters, feeding the " result " and " result_length " of each" .RI "call into the next as the " subject " and " subject_length ". As in the single" .RI "job case, the original " subject " remains untouched, but all interim " result "s" @@ -308,7 +315,7 @@ The quote character is (surprise!) '\fB\\\fR'. It quotes the delimiter in a .IR command ", the" .RB ' $ "' in a" .IR substitute ", and, of course, itself. Note that the" -.RB ' $ "'doesn't need to be quoted if it isn't followed by " [0-9+'`&] "." +.RB ' $ "' doesn't need to be quoted if it isn't followed by " [0-9+'`&] "." .RI "For quoting in the " pattern ", please refer to" .BR PCRE(3) . @@ -328,22 +335,22 @@ Under normal circumstances, it can take the following values: While compiling the pattern, .SM PCRE ran out of memory. -.TP +.TP .B PCRS_ERR_NOMEM While compiling the job, .SM PCRS ran out of memory. -.TP +.TP .B PCRS_ERR_CMDSYNTAX .BR pcrs_compile_command() " didn't find four tokens while parsing the" .IR command . -.TP +.TP .B PCRS_ERR_STUDY A .SM PCRE .RB "error occured while studying the compiled pattern. Since " pcre_study() only provides textual diagnostic information, the details are lost. -.TP +.TP .B PCRS_WARN_BADREF .RI "The " substitute " contains a reference to a capturing subpattern that" .RI "has a higher index than the number of capturing subpatterns in the " pattern @@ -361,12 +368,12 @@ While matching the pattern, ran out of memory. This can only happen if there are more than 33 backrefrences .RI "in the " pattern "(!)" .BR and " memory is too tight to extend storage for more." -.TP +.TP .B PCRS_ERR_NOMEM While executing the job, .SM PCRS ran out of memory. -.TP +.TP .B PCRS_ERR_BADJOB .RB "The " pcrs_job "* passed to " pcrs_execute " was NULL, or the" .RB "job is bogus (it contains " NULL " pointers to the compiled @@ -403,7 +410,8 @@ int main(int Argc, char **Argv) { pcrs_job *job; char *result; - int newsize, err; + size_t newsize; + int err; if (Argc != 3) { @@ -413,21 +421,22 @@ int main(int Argc, char **Argv) if (NULL == (job = pcrs_compile_command(Argv[1], &err))) { - printf("Compile error: %s (%d).\\n", pcrs_strerror(err), err); + fprintf(stderr, "%s: compile error: %s (%d).\\n", Argv[0], pcrs_strerror(err), err); } - if (0 > (err = pcrs_execute(job, Argv[2], strlen(Argv[2]) + 1, &result, &newsize))) + if (0 > (err = pcrs_execute(job, Argv[2], strlen(Argv[2]), &result, &newsize))) { - printf("Exec error: %s (%d).\\n", pcrs_strerror(err), err); + fprintf(stderr, "%s: exec error: %s (%d).\\n", Argv[0], pcrs_strerror(err), err); + } + else + { + printf("Result: *%s*\\n", result); + free(result); } - - /* Will tolerate NULL result */ - printf("Result: *%s*\\n", result); pcrs_free_job(job); - if (result) free(result); + return(err < 0); - return 0; } .fi @@ -436,12 +445,16 @@ int main(int Argc, char **Argv) .SH LIMITATIONS The number of matches that a global job can have is only limited by the available memory. An initial storage for 40 matches is reserved, which -is dynamically resized by the factor 1.6 if exhausted. +is dynamically resized by the factor 1.6 whenever it is exhausted. The number of capturing subpatterns is currently limited to 33, which is a Bad Thing[tm]. It should be dynamically expanded until it reaches the .SM PCRE limit of 99. +.br +This limitation is particularly embarassing since +.SM PCRE +3.5 has raised the capturing subpattern limit to 65K. All of the above values can be adjusted in the "Capacity" section .RB "of " pcrs.h "." @@ -455,8 +468,8 @@ and should be considered high risk. .SH HISTORY .SM PCRS -was originally written for the Internet Junkbuster project -(http://sourceforge.net/projects/ijbswa/). +was originally written for the Privoxy project +(http://www.privoxy.org/). .SH SEE ALSO .B PCRE(3), perl(1), perlre(1) @@ -464,9 +477,12 @@ was originally written for the Internet Junkbuster project .SH AUTHOR .SM PCRS -is Copyright 2000, 2001 by Andreas Oesterhelt and is -licensed under the Terms of the GNU Lesser General Public License (LGPL), which -should be included in this distribution. - -If not, refer to http://www.gnu.org/licenses/lgpl.html or write to the Free -Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. \ No newline at end of file +is Copyright 2000 - 2003 by Andreas Oesterhelt and is +licensed under the terms of the GNU Lesser General Public License (LGPL), +version 2.1, which should be included in this distribution, with the exception +that the permission to replace that license with the GNU General Public +License (GPL) given in section 3 is restricted to version 2 of the GPL. + +If it is missing from this distribution, the LGPL can be obtained from +http://www.gnu.org/licenses/lgpl.html or by mail: Write to the Free Software +Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.