- .example.com/images/ads/</PRE
-></TD
-></TR
-></TABLE
->
- </P
-><P
-> You can trace this process for URL patterns and any given URL by visiting <A
-HREF="http://config.privoxy.org/show-url-info"
-TARGET="_top"
->http://config.privoxy.org/show-url-info</A
->.</P
-><P
-> Examples and more detail on this is provided in the Appendix, <A
-HREF="appendix.html#ACTIONSANAT"
-> Troubleshooting: Anatomy of an Action</A
-> section.</P
-></DIV
-><DIV
-CLASS="SECT2"
-><H2
-CLASS="SECT2"
-><A
-NAME="AF-PATTERNS"
->8.4. Patterns</A
-></H2
-><P
->
- As mentioned, <SPAN
-CLASS="APPLICATION"
->Privoxy</SPAN
-> uses <SPAN
-CLASS="QUOTE"
->"patterns"</SPAN
->
- to determine what <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->actions</I
-></SPAN
-> might apply to which sites and
- pages your browser attempts to access. These <SPAN
-CLASS="QUOTE"
->"patterns"</SPAN
-> use wild
- card type <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->pattern</I
-></SPAN
-> matching to achieve a high degree of
- flexibility. This allows one expression to be expanded and potentially match
- against many similar patterns.</P
-><P
-> Generally, an URL pattern has the form
- <TT
-CLASS="LITERAL"
-><domain>/<path></TT
->, where both the
- <TT
-CLASS="LITERAL"
-><domain></TT
-> and <TT
-CLASS="LITERAL"
-><path></TT
-> are
- optional. (This is why the special <TT
-CLASS="LITERAL"
->/</TT
-> pattern matches all
- URLs). Note that the protocol portion of the URL pattern (e.g.
- <TT
-CLASS="LITERAL"
->http://</TT
->) should <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->not</I
-></SPAN
-> be included in
- the pattern. This is assumed already!</P
-><P
-> The pattern matching syntax is different for the domain and path parts of
- the URL. The domain part uses a simple globbing type matching technique,
- while the path part uses more flexible
- <A
-HREF="http://en.wikipedia.org/wiki/Regular_expressions"
-TARGET="_top"
-><SPAN
-CLASS="QUOTE"
->"Regular
- Expressions"</SPAN
-></A
-> (POSIX 1003.2).</P
-><P
-></P
-><DIV
-CLASS="VARIABLELIST"
-><DL
-><DT
-><TT
-CLASS="LITERAL"
->www.example.com/</TT
-></DT
-><DD
-><P
-> is a domain-only pattern and will match any request to <TT
-CLASS="LITERAL"
->www.example.com</TT
->,
- regardless of which document on that server is requested. So ALL pages in
- this domain would be covered by the scope of this action. Note that a
- simple <TT
-CLASS="LITERAL"
->example.com</TT
-> is different and would NOT match.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->www.example.com</TT
-></DT
-><DD
-><P
-> means exactly the same. For domain-only patterns, the trailing <TT
-CLASS="LITERAL"
->/</TT
-> may
- be omitted.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->www.example.com/index.html</TT
-></DT
-><DD
-><P
-> matches all the documents on <TT
-CLASS="LITERAL"
->www.example.com</TT
->
- whose name starts with <TT
-CLASS="LITERAL"
->/index.html</TT
->.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->www.example.com/index.html$</TT
-></DT
-><DD
-><P
-> matches only the single document <TT
-CLASS="LITERAL"
->/index.html</TT
->
- on <TT
-CLASS="LITERAL"
->www.example.com</TT
->.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->/index.html$</TT
-></DT
-><DD
-><P
-> matches the document <TT
-CLASS="LITERAL"
->/index.html</TT
->, regardless of the domain,
- i.e. on <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->any</I
-></SPAN
-> web server anywhere.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->index.html</TT
-></DT
-><DD
-><P
-> matches nothing, since it would be interpreted as a domain name and
- there is no top-level domain called <TT
-CLASS="LITERAL"
->.html</TT
->. So its
- a mistake.
- </P
-></DD
-></DL
-></DIV
-><DIV
-CLASS="SECT3"
-><H3
-CLASS="SECT3"
-><A
-NAME="AEN2386"
->8.4.1. The Domain Pattern</A
-></H3
-><P
-> The matching of the domain part offers some flexible options: if the
- domain starts or ends with a dot, it becomes unanchored at that end.
- For example:</P
-><P
-></P
-><DIV
-CLASS="VARIABLELIST"
-><DL
-><DT
-><TT
-CLASS="LITERAL"
->.example.com</TT
-></DT
-><DD
-><P
-> matches any domain with first-level domain <TT
-CLASS="LITERAL"
->com</TT
->
- and second-level domain <TT
-CLASS="LITERAL"
->example</TT
->.
- For example <TT
-CLASS="LITERAL"
->www.example.com</TT
->,
- <TT
-CLASS="LITERAL"
->example.com</TT
-> and <TT
-CLASS="LITERAL"
->foo.bar.baz.example.com</TT
->.
- Note that it wouldn't match if the second-level domain was <TT
-CLASS="LITERAL"
->another-example</TT
->.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->www.</TT
-></DT
-><DD
-><P
-> matches any domain that <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->STARTS</I
-></SPAN
-> with
- <TT
-CLASS="LITERAL"
->www.</TT
-> (It also matches the domain
- <TT
-CLASS="LITERAL"
->www</TT
-> but most of the time that doesn't matter.)
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.example.</TT
-></DT
-><DD
-><P
-> matches any domain that <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->CONTAINS</I
-></SPAN
-> <TT
-CLASS="LITERAL"
->.example.</TT
->.
- And, by the way, also included would be any files or documents that exist
- within that domain since no path limitations are specified. (Correctly
- speaking: It matches any FQDN that contains <TT
-CLASS="LITERAL"
->example</TT
-> as
- a domain.) This might be <TT
-CLASS="LITERAL"
->www.example.com</TT
->,
- <TT
-CLASS="LITERAL"
->news.example.de</TT
->, or
- <TT
-CLASS="LITERAL"
->www.example.net/cgi/testing.pl</TT
-> for instance. All these
- cases are matched.
- </P
-></DD
-></DL
-></DIV
-><P
-> Additionally, there are wild-cards that you can use in the domain names
- themselves. These work similarly to shell globbing type wild-cards:
- <SPAN
-CLASS="QUOTE"
->"*"</SPAN
-> represents zero or more arbitrary characters (this is
- equivalent to the
- <A
-HREF="http://en.wikipedia.org/wiki/Regular_expressions"
-TARGET="_top"
-><SPAN
-CLASS="QUOTE"
->"Regular
- Expression"</SPAN
-></A
-> based syntax of <SPAN
-CLASS="QUOTE"
->".*"</SPAN
->),
- <SPAN
-CLASS="QUOTE"
->"?"</SPAN
-> represents any single character (this is equivalent to the
- regular expression syntax of a simple <SPAN
-CLASS="QUOTE"
->"."</SPAN
->), and you can define
- <SPAN
-CLASS="QUOTE"
->"character classes"</SPAN
-> in square brackets which is similar to
- the same regular expression technique. All of this can be freely mixed:</P
-><P
-></P
-><DIV
-CLASS="VARIABLELIST"
-><DL
-><DT
-><TT
-CLASS="LITERAL"
->ad*.example.com</TT
-></DT
-><DD
-><P
-> matches <SPAN
-CLASS="QUOTE"
->"adserver.example.com"</SPAN
->,
- <SPAN
-CLASS="QUOTE"
->"ads.example.com"</SPAN
->, etc but not <SPAN
-CLASS="QUOTE"
->"sfads.example.com"</SPAN
->
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->*ad*.example.com</TT
-></DT
-><DD
-><P
-> matches all of the above, and then some.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.?pix.com</TT
-></DT
-><DD
-><P
-> matches <TT
-CLASS="LITERAL"
->www.ipix.com</TT
->,
- <TT
-CLASS="LITERAL"
->pictures.epix.com</TT
->, <TT
-CLASS="LITERAL"
->a.b.c.d.e.upix.com</TT
-> etc.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->www[1-9a-ez].example.c*</TT
-></DT
-><DD
-><P
-> matches <TT
-CLASS="LITERAL"
->www1.example.com</TT
->,
- <TT
-CLASS="LITERAL"
->www4.example.cc</TT
->, <TT
-CLASS="LITERAL"
->wwwd.example.cy</TT
->,
- <TT
-CLASS="LITERAL"
->wwwz.example.com</TT
-> etc., but <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->not</I
-></SPAN
->
- <TT
-CLASS="LITERAL"
->wwww.example.com</TT
->.
- </P
-></DD
-></DL
-></DIV
-><P
-> While flexible, this is not the sophistication of full regular expression based syntax.</P
-></DIV
-><DIV
-CLASS="SECT3"
-><H3
-CLASS="SECT3"
-><A
-NAME="AEN2462"
->8.4.2. The Path Pattern</A
-></H3
-><P
-> <SPAN
-CLASS="APPLICATION"
->Privoxy</SPAN
-> uses <SPAN
-CLASS="QUOTE"
->"modern"</SPAN
-> POSIX 1003.2
- <A
-HREF="http://en.wikipedia.org/wiki/Regular_expressions"
-TARGET="_top"
-><SPAN
-CLASS="QUOTE"
->"Regular
- Expressions"</SPAN
-></A
-> for matching the path portion (after the slash),
- and is thus more flexible.</P
-><P
-> There is an <A
-HREF="appendix.html#REGEX"
->Appendix</A
-> with a brief quick-start into regular
- expressions, you also might want to have a look at your operating system's documentation
- on regular expressions (try <TT
-CLASS="LITERAL"
->man re_format</TT
->).</P
-><P
-> Note that the path pattern is automatically left-anchored at the <SPAN
-CLASS="QUOTE"
->"/"</SPAN
->,
- i.e. it matches as if it would start with a <SPAN
-CLASS="QUOTE"
->"^"</SPAN
-> (regular expression speak
- for the beginning of a line).</P
-><P
-> Please also note that matching in the path is <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->CASE INSENSITIVE</I
-></SPAN
->
- by default, but you can switch to case sensitive at any point in the pattern by using the
- <SPAN
-CLASS="QUOTE"
->"(?-i)"</SPAN
-> switch: <TT
-CLASS="LITERAL"
->www.example.com/(?-i)PaTtErN.*</TT
-> will match
- only documents whose path starts with <TT
-CLASS="LITERAL"
->PaTtErN</TT
-> in
- <SPAN
-CLASS="emphasis"
-><I
-CLASS="EMPHASIS"
->exactly</I
-></SPAN
-> this capitalization.</P
-><P
-></P
-><DIV
-CLASS="VARIABLELIST"
-><DL
-><DT
-><TT
-CLASS="LITERAL"
->.example.com/.*</TT
-></DT
-><DD
-><P
-> Is equivalent to just <SPAN
-CLASS="QUOTE"
->".example.com"</SPAN
->, since any documents
- within that domain are matched with or without the <SPAN
-CLASS="QUOTE"
->".*"</SPAN
->
- regular expression. This is redundant
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.example.com/.*/index.html$</TT
-></DT
-><DD
-><P
-> Will match any page in the domain of <SPAN
-CLASS="QUOTE"
->"example.com"</SPAN
-> that is
- named <SPAN
-CLASS="QUOTE"
->"index.html"</SPAN
->, and that is part of some path. For
- example, it matches <SPAN
-CLASS="QUOTE"
->"www.example.com/testing/index.html"</SPAN
-> but
- NOT <SPAN
-CLASS="QUOTE"
->"www.example.com/index.html"</SPAN
-> because the regular
- expression called for at least two <SPAN
-CLASS="QUOTE"
->"/'s"</SPAN
->, thus the path
- requirement. It also would match
- <SPAN
-CLASS="QUOTE"
->"www.example.com/testing/index_html"</SPAN
->, because of the
- special meta-character <SPAN
-CLASS="QUOTE"
->"."</SPAN
->.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.example.com/(.*/)?index\.html$</TT
-></DT
-><DD
-><P
-> This regular expression is conditional so it will match any page
- named <SPAN
-CLASS="QUOTE"
->"index.html"</SPAN
-> regardless of path which in this case can
- have one or more <SPAN
-CLASS="QUOTE"
->"/'s"</SPAN
->. And this one must contain exactly
- <SPAN
-CLASS="QUOTE"
->".html"</SPAN
-> (but does not have to end with that!).
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.example.com/(.*/)(ads|banners?|junk)</TT
-></DT
-><DD
-><P
-> This regular expression will match any path of <SPAN
-CLASS="QUOTE"
->"example.com"</SPAN
->
- that contains any of the words <SPAN
-CLASS="QUOTE"
->"ads"</SPAN
->, <SPAN
-CLASS="QUOTE"
->"banner"</SPAN
->,
- <SPAN
-CLASS="QUOTE"
->"banners"</SPAN
-> (because of the <SPAN
-CLASS="QUOTE"
->"?"</SPAN
->) or <SPAN
-CLASS="QUOTE"
->"junk"</SPAN
->.
- The path does not have to end in these words, just contain them.
- </P
-></DD
-><DT
-><TT
-CLASS="LITERAL"
->.example.com/(.*/)(ads|banners?|junk)/.*\.(jpe?g|gif|png)$</TT
-></DT
-><DD
-><P
-> This is very much the same as above, except now it must end in either
- <SPAN
-CLASS="QUOTE"
->".jpg"</SPAN
->, <SPAN
-CLASS="QUOTE"
->".jpeg"</SPAN
->, <SPAN
-CLASS="QUOTE"
->".gif"</SPAN
-> or <SPAN
-CLASS="QUOTE"
->".png"</SPAN
->. So this
- one is limited to common image formats.
- </P
-></DD
-></DL
-></DIV
-><P
-> There are many, many good examples to be found in <TT
-CLASS="FILENAME"
->default.action</TT
->,
- and more tutorials below in <A
-HREF="appendix.html#REGEX"
->Appendix on regular expressions</A
->.</P
-></DIV
-><DIV
-CLASS="SECT3"
-><H3
-CLASS="SECT3"
-><A
-NAME="TAG-PATTERN"
->8.4.3. The Tag Pattern</A
-></H3
-><P
-> Tag patterns are used to change the applying actions based on the
- request's tags. Tags can be created with either the
- <A
-HREF="actions-file.html#CLIENT-HEADER-TAGGER"
->client-header-tagger</A
->
- or the <A
-HREF="actions-file.html#SERVER-HEADER-TAGGER"
->server-header-tagger</A
-> action.</P
-><P
-> Tag patterns have to start with <SPAN
-CLASS="QUOTE"
->"TAG:"</SPAN
->, so <SPAN
-CLASS="APPLICATION"
->Privoxy</SPAN
->
- can tell them apart from URL patterns. Everything after the colon
- including white space, is interpreted as a regular expression with
- path pattern syntax, except that tag patterns aren't left-anchored
- automatically (<SPAN
-CLASS="APPLICATION"
->Privoxy</SPAN
-> doesn't silently add a <SPAN
-CLASS="QUOTE"
->"^"</SPAN
->,
- you have to do it yourself if you need it).</P
-><P
-> To match all requests that are tagged with <SPAN
-CLASS="QUOTE"
->"foo"</SPAN
->
- your pattern line should be <SPAN
-CLASS="QUOTE"
->"TAG:^foo$"</SPAN
->,
- <SPAN
-CLASS="QUOTE"
->"TAG:foo"</SPAN
-> would work as well, but it would also
- match requests whose tags contain <SPAN
-CLASS="QUOTE"
->"foo"</SPAN
-> somewhere.
- <SPAN
-CLASS="QUOTE"
->"TAG: foo"</SPAN
-> wouldn't work as it requires white space.</P
-><P
-> Sections can contain URL and tag patterns at the same time,
- but tag patterns are checked after the URL patterns and thus
- always overrule them, even if they are located before the URL patterns.</P
-><P
-> Once a new tag is added, Privoxy checks right away if it's matched by one
- of the tag patterns and updates the action settings accordingly. As a result
- tags can be used to activate other tagger actions, as long as these other
- taggers look for headers that haven't already be parsed.</P
-><P
-> For example you could tag client requests which use the
- <TT
-CLASS="LITERAL"
->POST</TT
-> method,
- then use this tag to activate another tagger that adds a tag if cookies
- are sent, and then use a block action based on the cookie tag. This allows
- the outcome of one action, to be input into a subsequent action. However if
- you'd reverse the position of the described taggers, and activated the
- method tagger based on the cookie tagger, no method tags would be created.
- The method tagger would look for the request line, but at the time
- the cookie tag is created, the request line has already been parsed.</P
-><P
-> While this is a limitation you should be aware of, this kind of
- indirection is seldom needed anyway and even the example doesn't
- make too much sense.</P
-></DIV
-></DIV
-><DIV
-CLASS="SECT2"
-><H2
-CLASS="SECT2"
-><A
-NAME="ACTIONS"
->8.5. Actions</A
-></H2
-><P
-> All actions are disabled by default, until they are explicitly enabled
- somewhere in an actions file. Actions are turned on if preceded with a
- <SPAN
-CLASS="QUOTE"
->"+"</SPAN
->, and turned off if preceded with a <SPAN
-CLASS="QUOTE"
->"-"</SPAN
->. So a
- <TT
-CLASS="LITERAL"
->+action</TT
-> means <SPAN
-CLASS="QUOTE"
->"do that action"</SPAN
->, e.g.
- <TT
-CLASS="LITERAL"
->+block</TT
-> means <SPAN
-CLASS="QUOTE"
->"please block URLs that match the
- following patterns"</SPAN
->, and <TT
-CLASS="LITERAL"
->-block</TT
-> means <SPAN
-CLASS="QUOTE"
->"don't
- block URLs that match the following patterns, even if <TT
-CLASS="LITERAL"
->+block</TT
->
- previously applied."</SPAN
-> </P
-><P
->
- Again, actions are invoked by placing them on a line, enclosed in curly braces and
- separated by whitespace, like in
- <TT
-CLASS="LITERAL"
->{+some-action -some-other-action{some-parameter}}</TT
->,
- followed by a list of URL patterns, one per line, to which they apply.
- Together, the actions line and the following pattern lines make up a section
- of the actions file. </P
-><P
->
- Actions fall into three categories:</P
-><P
-> <P
-></P
-><UL
-><LI
-><P
->
- Boolean, i.e the action can only be <SPAN
-CLASS="QUOTE"
->"enabled"</SPAN
-> or
- <SPAN
-CLASS="QUOTE"
->"disabled"</SPAN
->. Syntax:
- </P
-><P
-> <TABLE
-BORDER="0"
-BGCOLOR="#E0E0E0"
-WIDTH="90%"
-><TR
-><TD
-><PRE
-CLASS="SCREEN"
-> +<TT
-CLASS="REPLACEABLE"
-><I
->name</I
-></TT
-> # enable action <TT
-CLASS="REPLACEABLE"
-><I
->name</I
-></TT
->
- -<TT
-CLASS="REPLACEABLE"
-><I
->name</I
-></TT
-> # disable action <TT
-CLASS="REPLACEABLE"
-><I
->name</I
-></TT
-></PRE
-></TD
-></TR
-></TABLE
->
- </P
-><P
->
- Example: <TT
-CLASS="LITERAL"
->+handle-as-image</TT
->
- </P
-></LI
-><LI
-><P
->
- Parameterized, where some value is required in order to enable this type of action.
- Syntax:
- </P
-><P
-> <TABLE
-BORDER="0"
-BGCOLOR="#E0E0E0"
-WIDTH="90%"
-><TR
-><TD
-><PRE
-CLASS="SCREEN"
-> +<TT
-CLASS="REPLACEABLE"
-><I
->name</I
-></TT
->{<TT
-CLASS="REPLACEABLE"
-><I
->param</I
-></TT
->} # enable action and set parameter to <TT
-CLASS="REPLACEABLE"
-><I
->param</I
-></TT
->,
+ .example.com/images/ads/</pre>
+ </td>
+ </tr>
+ </table>
+ <p>You can trace this process for URL patterns and any given URL by visiting <a href=
+ "http://config.privoxy.org/show-url-info" target="_top">http://config.privoxy.org/show-url-info</a>.</p>
+ <p>Examples and more detail on this is provided in the Appendix, <a href=
+ "appendix.html#ACTIONSANAT">Troubleshooting: Anatomy of an Action</a> section.</p>
+ </div>
+ <div class="SECT2">
+ <h2 class="SECT2"><a name="AF-PATTERNS" id="AF-PATTERNS">8.4. Patterns</a></h2>
+ <p>As mentioned, <span class="APPLICATION">Privoxy</span> uses <span class="QUOTE">"patterns"</span> to determine
+ what <span class="emphasis"><i class="EMPHASIS">actions</i></span> might apply to which sites and pages your
+ browser attempts to access. These <span class="QUOTE">"patterns"</span> use wild card type <span class=
+ "emphasis"><i class="EMPHASIS">pattern</i></span> matching to achieve a high degree of flexibility. This allows
+ one expression to be expanded and potentially match against many similar patterns.</p>
+ <p>Generally, an URL pattern has the form <tt class="LITERAL"><host><port>/<path></tt>, where
+ the <tt class="LITERAL"><host></tt>, the <tt class="LITERAL"><port></tt> and the <tt class=
+ "LITERAL"><path></tt> are optional. (This is why the special <tt class="LITERAL">/</tt> pattern matches all
+ URLs). Note that the protocol portion of the URL pattern (e.g. <tt class="LITERAL">http://</tt>) should
+ <span class="emphasis"><i class="EMPHASIS">not</i></span> be included in the pattern. This is assumed
+ already!</p>
+ <p>The pattern matching syntax is different for the host and path parts of the URL. The host part uses a simple
+ globbing type matching technique, while the path part uses more flexible <a href=
+ "http://en.wikipedia.org/wiki/Regular_expressions" target="_top"><span class="QUOTE">"Regular
+ Expressions"</span></a> (POSIX 1003.2).</p>
+ <p>The port part of a pattern is a decimal port number preceded by a colon (<tt class="LITERAL">:</tt>). If the
+ host part contains a numerical IPv6 address, it has to be put into angle brackets (<tt class="LITERAL"><</tt>,
+ <tt class="LITERAL">></tt>).</p>
+ <div class="VARIABLELIST">
+ <dl>
+ <dt><tt class="LITERAL">www.example.com/</tt></dt>
+ <dd>
+ <p>is a host-only pattern and will match any request to <tt class="LITERAL">www.example.com</tt>,
+ regardless of which document on that server is requested. So ALL pages in this domain would be covered by
+ the scope of this action. Note that a simple <tt class="LITERAL">example.com</tt> is different and would
+ NOT match.</p>
+ </dd>
+ <dt><tt class="LITERAL">www.example.com</tt></dt>
+ <dd>
+ <p>means exactly the same. For host-only patterns, the trailing <tt class="LITERAL">/</tt> may be
+ omitted.</p>
+ </dd>
+ <dt><tt class="LITERAL">www.example.com/index.html</tt></dt>
+ <dd>
+ <p>matches all the documents on <tt class="LITERAL">www.example.com</tt> whose name starts with <tt class=
+ "LITERAL">/index.html</tt>.</p>
+ </dd>
+ <dt><tt class="LITERAL">www.example.com/index.html$</tt></dt>
+ <dd>
+ <p>matches only the single document <tt class="LITERAL">/index.html</tt> on <tt class=
+ "LITERAL">www.example.com</tt>.</p>
+ </dd>
+ <dt><tt class="LITERAL">/index.html$</tt></dt>
+ <dd>
+ <p>matches the document <tt class="LITERAL">/index.html</tt>, regardless of the domain, i.e. on
+ <span class="emphasis"><i class="EMPHASIS">any</i></span> web server anywhere.</p>
+ </dd>
+ <dt><tt class="LITERAL">/</tt></dt>
+ <dd>
+ <p>Matches any URL because there's no requirement for either the domain or the path to match anything.</p>
+ </dd>
+ <dt><tt class="LITERAL">:8000/</tt></dt>
+ <dd>
+ <p>Matches any URL pointing to TCP port 8000.</p>
+ </dd>
+ <dt><tt class="LITERAL">10.0.0.1/</tt></dt>
+ <dd>
+ <p>Matches any URL with the host address <tt class="LITERAL">10.0.0.1</tt>. (Note that the real URL uses
+ plain brackets, not angle brackets.)</p>
+ </dd>
+ <dt><tt class="LITERAL"><2001:db8::1>/</tt></dt>
+ <dd>
+ <p>Matches any URL with the host address <tt class="LITERAL">2001:db8::1</tt>. (Note that the real URL uses
+ plain brackets, not angle brackets.)</p>
+ </dd>
+ <dt><tt class="LITERAL">index.html</tt></dt>
+ <dd>
+ <p>matches nothing, since it would be interpreted as a domain name and there is no top-level domain called
+ <tt class="LITERAL">.html</tt>. So its a mistake.</p>
+ </dd>
+ </dl>
+ </div>
+ <div class="SECT3">
+ <h3 class="SECT3"><a name="HOST-PATTERN" id="HOST-PATTERN">8.4.1. The Host Pattern</a></h3>
+ <p>The matching of the host part offers some flexible options: if the host pattern starts or ends with a dot,
+ it becomes unanchored at that end. The host pattern is often referred to as domain pattern as it is usually
+ used to match domain names and not IP addresses. For example:</p>
+ <div class="VARIABLELIST">
+ <dl>
+ <dt><tt class="LITERAL">.example.com</tt></dt>
+ <dd>
+ <p>matches any domain with first-level domain <tt class="LITERAL">com</tt> and second-level domain
+ <tt class="LITERAL">example</tt>. For example <tt class="LITERAL">www.example.com</tt>, <tt class=
+ "LITERAL">example.com</tt> and <tt class="LITERAL">foo.bar.baz.example.com</tt>. Note that it wouldn't
+ match if the second-level domain was <tt class="LITERAL">another-example</tt>.</p>
+ </dd>
+ <dt><tt class="LITERAL">www.</tt></dt>
+ <dd>
+ <p>matches any domain that <span class="emphasis"><i class="EMPHASIS">STARTS</i></span> with <tt class=
+ "LITERAL">www.</tt> (It also matches the domain <tt class="LITERAL">www</tt> but most of the time that
+ doesn't matter.)</p>
+ </dd>
+ <dt><tt class="LITERAL">.example.</tt></dt>
+ <dd>
+ <p>matches any domain that <span class="emphasis"><i class="EMPHASIS">CONTAINS</i></span> <tt class=
+ "LITERAL">.example.</tt>. And, by the way, also included would be any files or documents that exist
+ within that domain since no path limitations are specified. (Correctly speaking: It matches any FQDN that
+ contains <tt class="LITERAL">example</tt> as a domain.) This might be <tt class=
+ "LITERAL">www.example.com</tt>, <tt class="LITERAL">news.example.de</tt>, or <tt class=
+ "LITERAL">www.example.net/cgi/testing.pl</tt> for instance. All these cases are matched.</p>
+ </dd>
+ </dl>
+ </div>
+ <p>Additionally, there are wild-cards that you can use in the domain names themselves. These work similarly to
+ shell globbing type wild-cards: <span class="QUOTE">"*"</span> represents zero or more arbitrary characters
+ (this is equivalent to the <a href="http://en.wikipedia.org/wiki/Regular_expressions" target=
+ "_top"><span class="QUOTE">"Regular Expression"</span></a> based syntax of <span class="QUOTE">".*"</span>),
+ <span class="QUOTE">"?"</span> represents any single character (this is equivalent to the regular expression
+ syntax of a simple <span class="QUOTE">"."</span>), and you can define <span class="QUOTE">"character
+ classes"</span> in square brackets which is similar to the same regular expression technique. All of this can
+ be freely mixed:</p>
+ <div class="VARIABLELIST">
+ <dl>
+ <dt><tt class="LITERAL">ad*.example.com</tt></dt>
+ <dd>
+ <p>matches <span class="QUOTE">"adserver.example.com"</span>, <span class=
+ "QUOTE">"ads.example.com"</span>, etc but not <span class="QUOTE">"sfads.example.com"</span></p>
+ </dd>
+ <dt><tt class="LITERAL">*ad*.example.com</tt></dt>
+ <dd>
+ <p>matches all of the above, and then some.</p>
+ </dd>
+ <dt><tt class="LITERAL">.?pix.com</tt></dt>
+ <dd>
+ <p>matches <tt class="LITERAL">www.ipix.com</tt>, <tt class="LITERAL">pictures.epix.com</tt>, <tt class=
+ "LITERAL">a.b.c.d.e.upix.com</tt> etc.</p>
+ </dd>
+ <dt><tt class="LITERAL">www[1-9a-ez].example.c*</tt></dt>
+ <dd>
+ <p>matches <tt class="LITERAL">www1.example.com</tt>, <tt class="LITERAL">www4.example.cc</tt>,
+ <tt class="LITERAL">wwwd.example.cy</tt>, <tt class="LITERAL">wwwz.example.com</tt> etc., but
+ <span class="emphasis"><i class="EMPHASIS">not</i></span> <tt class="LITERAL">wwww.example.com</tt>.</p>
+ </dd>
+ </dl>
+ </div>
+ <p>While flexible, this is not the sophistication of full regular expression based syntax.</p>
+ </div>
+ <div class="SECT3">
+ <h3 class="SECT3"><a name="PATH-PATTERN" id="PATH-PATTERN">8.4.2. The Path Pattern</a></h3>
+ <p><span class="APPLICATION">Privoxy</span> uses <span class="QUOTE">"modern"</span> POSIX 1003.2 <a href=
+ "http://en.wikipedia.org/wiki/Regular_expressions" target="_top"><span class="QUOTE">"Regular
+ Expressions"</span></a> for matching the path portion (after the slash), and is thus more flexible.</p>
+ <p>There is an <a href="appendix.html#REGEX">Appendix</a> with a brief quick-start into regular expressions,
+ you also might want to have a look at your operating system's documentation on regular expressions (try
+ <tt class="LITERAL">man re_format</tt>).</p>
+ <p>Note that the path pattern is automatically left-anchored at the <span class="QUOTE">"/"</span>, i.e. it
+ matches as if it would start with a <span class="QUOTE">"^"</span> (regular expression speak for the beginning
+ of a line).</p>
+ <p>Please also note that matching in the path is <span class="emphasis"><i class="EMPHASIS">CASE
+ INSENSITIVE</i></span> by default, but you can switch to case sensitive at any point in the pattern by using
+ the <span class="QUOTE">"(?-i)"</span> switch: <tt class="LITERAL">www.example.com/(?-i)PaTtErN.*</tt> will
+ match only documents whose path starts with <tt class="LITERAL">PaTtErN</tt> in <span class=
+ "emphasis"><i class="EMPHASIS">exactly</i></span> this capitalization.</p>
+ <div class="VARIABLELIST">
+ <dl>
+ <dt><tt class="LITERAL">.example.com/.*</tt></dt>
+ <dd>
+ <p>Is equivalent to just <span class="QUOTE">".example.com"</span>, since any documents within that
+ domain are matched with or without the <span class="QUOTE">".*"</span> regular expression. This is
+ redundant</p>
+ </dd>
+ <dt><tt class="LITERAL">.example.com/.*/index.html$</tt></dt>
+ <dd>
+ <p>Will match any page in the domain of <span class="QUOTE">"example.com"</span> that is named
+ <span class="QUOTE">"index.html"</span>, and that is part of some path. For example, it matches
+ <span class="QUOTE">"www.example.com/testing/index.html"</span> but NOT <span class=
+ "QUOTE">"www.example.com/index.html"</span> because the regular expression called for at least two
+ <span class="QUOTE">"/'s"</span>, thus the path requirement. It also would match <span class=
+ "QUOTE">"www.example.com/testing/index_html"</span>, because of the special meta-character <span class=
+ "QUOTE">"."</span>.</p>
+ </dd>
+ <dt><tt class="LITERAL">.example.com/(.*/)?index\.html$</tt></dt>
+ <dd>
+ <p>This regular expression is conditional so it will match any page named <span class=
+ "QUOTE">"index.html"</span> regardless of path which in this case can have one or more <span class=
+ "QUOTE">"/'s"</span>. And this one must contain exactly <span class="QUOTE">".html"</span> (and end with
+ that!).</p>
+ </dd>
+ <dt><tt class="LITERAL">.example.com/(.*/)(ads|banners?|junk)</tt></dt>
+ <dd>
+ <p>This regular expression will match any path of <span class="QUOTE">"example.com"</span> that contains
+ any of the words <span class="QUOTE">"ads"</span>, <span class="QUOTE">"banner"</span>, <span class=
+ "QUOTE">"banners"</span> (because of the <span class="QUOTE">"?"</span>) or <span class=
+ "QUOTE">"junk"</span>. The path does not have to end in these words, just contain them. The path has to
+ contain at least two slashes (including the one at the beginning).</p>
+ </dd>
+ <dt><tt class="LITERAL">.example.com/(.*/)(ads|banners?|junk)/.*\.(jpe?g|gif|png)$</tt></dt>
+ <dd>
+ <p>This is very much the same as above, except now it must end in either <span class=
+ "QUOTE">".jpg"</span>, <span class="QUOTE">".jpeg"</span>, <span class="QUOTE">".gif"</span> or
+ <span class="QUOTE">".png"</span>. So this one is limited to common image formats.</p>
+ </dd>
+ </dl>
+ </div>
+ <p>There are many, many good examples to be found in <tt class="FILENAME">default.action</tt>, and more
+ tutorials below in <a href="appendix.html#REGEX">Appendix on regular expressions</a>.</p>
+ </div>
+ <div class="SECT3">
+ <h3 class="SECT3"><a name="TAG-PATTERN" id="TAG-PATTERN">8.4.3. The Request Tag Pattern</a></h3>
+ <p>Request tag patterns are used to change the applying actions based on the request's tags. Tags can be
+ created based on HTTP headers with either the <a href=
+ "actions-file.html#CLIENT-HEADER-TAGGER">client-header-tagger</a> or the <a href=
+ "actions-file.html#SERVER-HEADER-TAGGER">server-header-tagger</a> action.</p>
+ <p>Request tag patterns have to start with <span class="QUOTE">"TAG:"</span>, so <span class=
+ "APPLICATION">Privoxy</span> can tell them apart from other patterns. Everything after the colon including
+ white space, is interpreted as a regular expression with path pattern syntax, except that tag patterns aren't
+ left-anchored automatically (<span class="APPLICATION">Privoxy</span> doesn't silently add a <span class=
+ "QUOTE">"^"</span>, you have to do it yourself if you need it).</p>
+ <p>To match all requests that are tagged with <span class="QUOTE">"foo"</span> your pattern line should be
+ <span class="QUOTE">"TAG:^foo$"</span>, <span class="QUOTE">"TAG:foo"</span> would work as well, but it would
+ also match requests whose tags contain <span class="QUOTE">"foo"</span> somewhere. <span class="QUOTE">"TAG:
+ foo"</span> wouldn't work as it requires white space.</p>
+ <p>Sections can contain URL and request tag patterns at the same time, but request tag patterns are checked
+ after the URL patterns and thus always overrule them, even if they are located before the URL patterns.</p>
+ <p>Once a new request tag is added, Privoxy checks right away if it's matched by one of the request tag
+ patterns and updates the action settings accordingly. As a result request tags can be used to activate other
+ tagger actions, as long as these other taggers look for headers that haven't already be parsed.</p>
+ <p>For example you could tag client requests which use the <tt class="LITERAL">POST</tt> method, then use this
+ tag to activate another tagger that adds a tag if cookies are sent, and then use a block action based on the
+ cookie tag. This allows the outcome of one action, to be input into a subsequent action. However if you'd
+ reverse the position of the described taggers, and activated the method tagger based on the cookie tagger, no
+ method tags would be created. The method tagger would look for the request line, but at the time the cookie tag
+ is created, the request line has already been parsed.</p>
+ <p>While this is a limitation you should be aware of, this kind of indirection is seldom needed anyway and even
+ the example doesn't make too much sense.</p>
+ </div>
+ <div class="SECT3">
+ <h3 class="SECT3"><a name="NEGATIVE-TAG-PATTERNS" id="NEGATIVE-TAG-PATTERNS">8.4.4. The Negative Request Tag
+ Patterns</a></h3>
+ <p>To match requests that do not have a certain request tag, specify a negative tag pattern by prefixing the
+ tag pattern line with either <span class="QUOTE">"NO-REQUEST-TAG:"</span> or <span class=
+ "QUOTE">"NO-RESPONSE-TAG:"</span> instead of <span class="QUOTE">"TAG:"</span>.</p>
+ <p>Negative request tag patterns created with <span class="QUOTE">"NO-REQUEST-TAG:"</span> are checked after
+ all client headers are scanned, the ones created with <span class="QUOTE">"NO-RESPONSE-TAG:"</span> are checked
+ after all server headers are scanned. In both cases all the created tags are considered.</p>
+ </div>
+ <div class="SECT3">
+ <h3 class="SECT3"><a name="CLIENT-TAG-PATTERN" id="CLIENT-TAG-PATTERN">8.4.5. The Client Tag Pattern</a></h3>
+ <div class="WARNING">
+ <table class="WARNING" border="1" width="100%">
+ <tr>
+ <td align="center"><b>Warning</b></td>
+ </tr>
+ <tr>
+ <td align="left">
+ <p>This is an experimental feature. The syntax is likely to change in future versions.</p>
+ </td>
+ </tr>
+ </table>
+ </div>
+ <p>Client tag patterns are not set based on HTTP headers but based on the client's IP address. Users can enable
+ them themselves, but the Privoxy admin controls which tags are available and what their effect is.</p>
+ <p>After a client-specific tag has been defined with the <a href=
+ "config.html#CLIENT-SPECIFIC-TAG">client-specific-tag</a>, directive, action sections can be activated based on
+ the tag by using a CLIENT-TAG pattern. The CLIENT-TAG pattern is evaluated at the same priority as URL
+ patterns, as a result the last matching pattern wins. Tags that are created based on client or server headers
+ are evaluated later on and can overrule CLIENT-TAG and URL patterns!</p>
+ <p>The tag is set for all requests that come from clients that requested it to be set. Note that "clients" are
+ differentiated by IP address, if the IP address changes the tag has to be requested again.</p>
+ <p>Clients can request tags to be set by using the CGI interface <a href=
+ "http://config.privoxy.org/client-tags" target="_top">http://config.privoxy.org/client-tags</a>.</p>
+ <p>Example:</p>
+ <table border="0" bgcolor="#E0E0E0" width="100%">
+ <tr>
+ <td>
+ <pre class="SCREEN"># If the admin defined the client-specific-tag circumvent-blocks,
+# and the request comes from a client that previously requested
+# the tag to be set, overrule all previous +block actions that
+# are enabled based on URL to CLIENT-TAG patterns.
+{-block}
+CLIENT-TAG:^circumvent-blocks$
+
+# This section is not overruled because it's located after
+# the previous one.
+{+block{Nobody is supposed to request this.}}
+example.org/blocked-example-page</pre>
+ </td>
+ </tr>
+ </table>
+ </div>
+ </div>
+ <div class="SECT2">
+ <h2 class="SECT2"><a name="ACTIONS" id="ACTIONS">8.5. Actions</a></h2>
+ <p>All actions are disabled by default, until they are explicitly enabled somewhere in an actions file. Actions
+ are turned on if preceded with a <span class="QUOTE">"+"</span>, and turned off if preceded with a <span class=
+ "QUOTE">"-"</span>. So a <tt class="LITERAL">+action</tt> means <span class="QUOTE">"do that action"</span>, e.g.
+ <tt class="LITERAL">+block</tt> means <span class="QUOTE">"please block URLs that match the following
+ patterns"</span>, and <tt class="LITERAL">-block</tt> means <span class="QUOTE">"don't block URLs that match the
+ following patterns, even if <tt class="LITERAL">+block</tt> previously applied."</span></p>
+ <p>Again, actions are invoked by placing them on a line, enclosed in curly braces and separated by whitespace,
+ like in <tt class="LITERAL">{+some-action -some-other-action{some-parameter}}</tt>, followed by a list of URL
+ patterns, one per line, to which they apply. Together, the actions line and the following pattern lines make up a
+ section of the actions file.</p>
+ <p>Actions fall into three categories:</p>
+ <ul>
+ <li>
+ <p>Boolean, i.e the action can only be <span class="QUOTE">"enabled"</span> or <span class=
+ "QUOTE">"disabled"</span>. Syntax:</p>
+ <table border="0" bgcolor="#E0E0E0" width="90%">
+ <tr>
+ <td>
+ <pre class="SCREEN"> +<tt class="REPLACEABLE"><i>name</i></tt> # enable action <tt class=
+ "REPLACEABLE"><i>name</i></tt>
+ -<tt class="REPLACEABLE"><i>name</i></tt> # disable action <tt class="REPLACEABLE"><i>name</i></tt></pre>
+ </td>
+ </tr>
+ </table>
+ <p>Example: <tt class="LITERAL">+handle-as-image</tt></p>
+ </li>
+ <li>
+ <p>Parameterized, where some value is required in order to enable this type of action. Syntax:</p>
+ <table border="0" bgcolor="#E0E0E0" width="90%">
+ <tr>
+ <td>
+ <pre class="SCREEN"> +<tt class="REPLACEABLE"><i>name</i></tt>{<tt class=
+ "REPLACEABLE"><i>param</i></tt>} # enable action and set parameter to <tt class=
+ "REPLACEABLE"><i>param</i></tt>,