- Privacy Features
- </td>
- <td>
- low
- </td>
- <td>
- medium
- </td>
- <td>
- medium/high
- </td>
- </tr>
- <tr>
- <td>
- Cookie handling
- </td>
- <td>
- none
- </td>
- <td>
- session-only
- </td>
- <td>
- kill
- </td>
- </tr>
- <tr>
- <td>
- Referer forging
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- GIF de-animation
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- Fast redirects
- </td>
- <td>
- no
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- HTML taming
- </td>
- <td>
- no
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- JavaScript taming
- </td>
- <td>
- no
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- Web-bug killing
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- <td>
- yes
- </td>
- </tr>
- <tr>
- <td>
- Image tag reordering
- </td>
- <td>
- no
- </td>
- <td>
- yes
- </td>
- <td>
- yes
- </td>
- </tr>
- </tbody>
- </table>
- </div>
- </li>
- </ul>
-
- <p>
- The list of actions files to be used are defined in the main
- configuration file, and are processed in the order they are defined
- (e.g. <tt class="FILENAME">default.action</tt> is typically processed
- before <tt class="FILENAME">user.action</tt>). The content of these
- can all be viewed and edited from <a href=
- "http://config.privoxy.org/show-status" target=
- "_top">http://config.privoxy.org/show-status</a>. The over-riding
- principle when applying actions, is that the last action that matches
- a given URL wins. The broadest, most general rules go first (defined
- in <tt class="FILENAME">default.action</tt>), followed by any
- exceptions (typically also in <tt class=
- "FILENAME">default.action</tt>), which are then followed lastly by
- any local preferences (typically in <span class="emphasis"><i class=
- "EMPHASIS">user</i></span><tt class="FILENAME">.action</tt>).
- Generally, <tt class="FILENAME">user.action</tt> has the last word.
- </p>
- <p>
- An actions file typically has multiple sections. If you want to use
- <span class="QUOTE">"aliases"</span> in an actions file, you have to
- place the (optional) <a href="actions-file.html#ALIASES">alias
- section</a> at the top of that file. Then comes the default set of
- rules which will apply universally to all sites and pages (be <span
- class="emphasis"><i class="EMPHASIS">very careful</i></span> with
- using such a universal set in <tt class="FILENAME">user.action</tt>
- or any other actions file after <tt class=
- "FILENAME">default.action</tt>, because it will override the result
- from consulting any previous file). And then below that, exceptions
- to the defined universal policies. You can regard <tt class=
- "FILENAME">user.action</tt> as an appendix to <tt class=
- "FILENAME">default.action</tt>, with the advantage that it is a
- separate file, which makes preserving your personal settings across
- <span class="APPLICATION">Privoxy</span> upgrades easier.
- </p>
- <p>
- Actions can be used to block anything you want, including ads,
- banners, or just some obnoxious URL whose content you would rather
- not see. Cookies can be accepted or rejected, or accepted only during
- the current browser session (i.e. not written to disk), content can
- be modified, some JavaScripts tamed, user-tracking fooled, and much
- more. See below for a <a href="actions-file.html#ACTIONS">complete
- list of actions</a>.
- </p>
- <div class="SECT2">
- <h2 class="SECT2">
- <a name="RIGHT-MIX">8.1. Finding the Right Mix</a>
- </h2>
- <p>
- Note that some <a href="actions-file.html#ACTIONS">actions</a>,
- like cookie suppression or script disabling, may render some sites
- unusable that rely on these techniques to work properly. Finding
- the right mix of actions is not always easy and certainly a matter
- of personal taste. And, things can always change, requiring
- refinements in the configuration. In general, it can be said that
- the more <span class="QUOTE">"aggressive"</span> your default
- settings (in the top section of the actions file) are, the more
- exceptions for <span class="QUOTE">"trusted"</span> sites you will
- have to make later. If, for example, you want to crunch all cookies
- per default, you'll have to make exceptions from that rule for
- sites that you regularly use and that require cookies for actually
- useful purposes, like maybe your bank, favorite shop, or newspaper.
- </p>
- <p>
- We have tried to provide you with reasonable rules to start from in
- the distribution actions files. But there is no general rule of
- thumb on these things. There just are too many variables, and sites
- are constantly changing. Sooner or later you will want to change
- the rules (and read this chapter again :).
- </p>
- </div>
- <div class="SECT2">
- <h2 class="SECT2">
- <a name="HOW-TO-EDIT">8.2. How to Edit</a>
- </h2>
- <p>
- The easiest way to edit the actions files is with a browser by
- using our browser-based editor, which can be reached from <a href=
- "http://config.privoxy.org/show-status" target=
- "_top">http://config.privoxy.org/show-status</a>. Note: the config
- file option <a href=
- "config.html#ENABLE-EDIT-ACTIONS">enable-edit-actions</a> must be
- enabled for this to work. The editor allows both fine-grained
- control over every single feature on a per-URL basis, and easy
- choosing from wholesale sets of defaults like <span class=
- "QUOTE">"Cautious"</span>, <span class="QUOTE">"Medium"</span> or
- <span class="QUOTE">"Advanced"</span>. Warning: the <span class=
- "QUOTE">"Advanced"</span> setting is more aggressive, and will be
- more likely to cause problems for some sites. Experienced users
- only!
- </p>
- <p>
- If you prefer plain text editing to GUIs, you can of course also
- directly edit the the actions files with your favorite text editor.
- Look at <tt class="FILENAME">default.action</tt> which is richly
- commented with many good examples.
- </p>
- </div>
- <div class="SECT2">
- <h2 class="SECT2">
- <a name="ACTIONS-APPLY">8.3. How Actions are Applied to
- Requests</a>
- </h2>
- <p>
- Actions files are divided into sections. There are special
- sections, like the <span class="QUOTE">"<a href=
- "actions-file.html#ALIASES">alias</a>"</span> sections which will
- be discussed later. For now let's concentrate on regular sections:
- They have a heading line (often split up to multiple lines for
- readability) which consist of a list of actions, separated by
- whitespace and enclosed in curly braces. Below that, there is a
- list of URL and tag patterns, each on a separate line.
- </p>
- <p>
- To determine which actions apply to a request, the URL of the
- request is compared to all URL patterns in each <span class=
- "QUOTE">"action file"</span>. Every time it matches, the list of
- applicable actions for the request is incrementally updated, using
- the heading of the section in which the pattern is located. The
- same is done again for tags and tag patterns later on.
- </p>
- <p>
- If multiple applying sections set the same action differently, the
- last match wins. If not, the effects are aggregated. E.g. a URL
- might match a regular section with a heading line of <tt class=
- "LITERAL">{ +<a href=
- "actions-file.html#HANDLE-AS-IMAGE">handle-as-image</a> }</tt>,
- then later another one with just <tt class="LITERAL">{ +<a href=
- "actions-file.html#BLOCK">block</a> }</tt>, resulting in <span
- class="emphasis"><i class="EMPHASIS">both</i></span> actions to
- apply. And there may well be cases where you will want to combine
- actions together. Such a section then might look like:
- </p>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="100%">
- <tr>
- <td>
-<pre class="SCREEN">
- { +<tt class="LITERAL">handle-as-image</tt> +<tt class=
-"LITERAL">block{Banner ads.}</tt> }
- # Block these as if they were images. Send no block page.
- banners.example.com
- media.example.com/.*banners
- .example.com/images/ads/
-</pre>
- </td>
- </tr>
- </table>
-
- <p>
- You can trace this process for URL patterns and any given URL by
- visiting <a href="http://config.privoxy.org/show-url-info" target=
- "_top">http://config.privoxy.org/show-url-info</a>.
- </p>
- <p>
- Examples and more detail on this is provided in the Appendix, <a
- href="appendix.html#ACTIONSANAT">Troubleshooting: Anatomy of an
- Action</a> section.
- </p>
- </div>
- <div class="SECT2">
- <h2 class="SECT2">
- <a name="AF-PATTERNS">8.4. Patterns</a>
- </h2>
- <p>
- As mentioned, <span class="APPLICATION">Privoxy</span> uses <span
- class="QUOTE">"patterns"</span> to determine what <span class=
- "emphasis"><i class="EMPHASIS">actions</i></span> might apply to
- which sites and pages your browser attempts to access. These <span
- class="QUOTE">"patterns"</span> use wild card type <span class=
- "emphasis"><i class="EMPHASIS">pattern</i></span> matching to
- achieve a high degree of flexibility. This allows one expression to
- be expanded and potentially match against many similar patterns.
- </p>
- <p>
- Generally, an URL pattern has the form <tt class=
- "LITERAL"><host><port>/<path></tt>, where the <tt
- class="LITERAL"><host></tt>, the <tt class=
- "LITERAL"><port></tt> and the <tt class=
- "LITERAL"><path></tt> are optional. (This is why the special
- <tt class="LITERAL">/</tt> pattern matches all URLs). Note that the
- protocol portion of the URL pattern (e.g. <tt class=
- "LITERAL">http://</tt>) should <span class="emphasis"><i class=
- "EMPHASIS">not</i></span> be included in the pattern. This is
- assumed already!
- </p>
- <p>
- The pattern matching syntax is different for the host and path
- parts of the URL. The host part uses a simple globbing type
- matching technique, while the path part uses more flexible <a href=
- "http://en.wikipedia.org/wiki/Regular_expressions" target=
- "_top"><span class="QUOTE">"Regular Expressions"</span></a> (POSIX
- 1003.2).
- </p>
- <p>
- The port part of a pattern is a decimal port number preceded by a
- colon (<tt class="LITERAL">:</tt>). If the host part contains a
- numerical IPv6 address, it has to be put into angle brackets (<tt
- class="LITERAL"><</tt>, <tt class="LITERAL">></tt>).
- </p>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- <tt class="LITERAL">www.example.com/</tt>
- </dt>
- <dd>
- <p>
- is a host-only pattern and will match any request to <tt
- class="LITERAL">www.example.com</tt>, regardless of which
- document on that server is requested. So ALL pages in this
- domain would be covered by the scope of this action. Note
- that a simple <tt class="LITERAL">example.com</tt> is
- different and would NOT match.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">www.example.com</tt>
- </dt>
- <dd>
- <p>
- means exactly the same. For host-only patterns, the trailing
- <tt class="LITERAL">/</tt> may be omitted.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">www.example.com/index.html</tt>
- </dt>
- <dd>
- <p>
- matches all the documents on <tt class=
- "LITERAL">www.example.com</tt> whose name starts with <tt
- class="LITERAL">/index.html</tt>.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">www.example.com/index.html$</tt>
- </dt>
- <dd>
- <p>
- matches only the single document <tt class=
- "LITERAL">/index.html</tt> on <tt class=
- "LITERAL">www.example.com</tt>.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">/index.html$</tt>
- </dt>
- <dd>
- <p>
- matches the document <tt class="LITERAL">/index.html</tt>,
- regardless of the domain, i.e. on <span class="emphasis"><i
- class="EMPHASIS">any</i></span> web server anywhere.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">/</tt>
- </dt>
- <dd>
- <p>
- Matches any URL because there's no requirement for either the
- domain or the path to match anything.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">:8000/</tt>
- </dt>
- <dd>
- <p>
- Matches any URL pointing to TCP port 8000.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">10.0.0.1/</tt>
- </dt>
- <dd>
- <p>
- Matches any URL with the host address <tt class=
- "LITERAL">10.0.0.1</tt>. (Note that the real URL uses plain
- brackets, not angle brackets.)
- </p>
- </dd>
- <dt>
- <tt class="LITERAL"><2001:db8::1>/</tt>
- </dt>
- <dd>
- <p>
- Matches any URL with the host address <tt class=
- "LITERAL">2001:db8::1</tt>. (Note that the real URL uses
- plain brackets, not angle brackets.)
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">index.html</tt>
- </dt>
- <dd>
- <p>
- matches nothing, since it would be interpreted as a domain
- name and there is no top-level domain called <tt class=
- "LITERAL">.html</tt>. So its a mistake.
- </p>
- </dd>
- </dl>
- </div>
- <div class="SECT3">
- <h3 class="SECT3">
- <a name="HOST-PATTERN">8.4.1. The Host Pattern</a>
- </h3>
- <p>
- The matching of the host part offers some flexible options: if
- the host pattern starts or ends with a dot, it becomes unanchored
- at that end. The host pattern is often referred to as domain
- pattern as it is usually used to match domain names and not IP
- addresses. For example:
- </p>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- <tt class="LITERAL">.example.com</tt>
- </dt>
- <dd>
- <p>
- matches any domain with first-level domain <tt class=
- "LITERAL">com</tt> and second-level domain <tt class=
- "LITERAL">example</tt>. For example <tt class=
- "LITERAL">www.example.com</tt>, <tt class=
- "LITERAL">example.com</tt> and <tt class=
- "LITERAL">foo.bar.baz.example.com</tt>. Note that it
- wouldn't match if the second-level domain was <tt class=
- "LITERAL">another-example</tt>.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">www.</tt>
- </dt>
- <dd>
- <p>
- matches any domain that <span class="emphasis"><i class=
- "EMPHASIS">STARTS</i></span> with <tt class=
- "LITERAL">www.</tt> (It also matches the domain <tt class=
- "LITERAL">www</tt> but most of the time that doesn't
- matter.)
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">.example.</tt>
- </dt>
- <dd>
- <p>
- matches any domain that <span class="emphasis"><i class=
- "EMPHASIS">CONTAINS</i></span> <tt class=
- "LITERAL">.example.</tt>. And, by the way, also included
- would be any files or documents that exist within that
- domain since no path limitations are specified. (Correctly
- speaking: It matches any FQDN that contains <tt class=
- "LITERAL">example</tt> as a domain.) This might be <tt
- class="LITERAL">www.example.com</tt>, <tt class=
- "LITERAL">news.example.de</tt>, or <tt class=
- "LITERAL">www.example.net/cgi/testing.pl</tt> for instance.
- All these cases are matched.
- </p>
- </dd>
- </dl>
- </div>
- <p>
- Additionally, there are wild-cards that you can use in the domain
- names themselves. These work similarly to shell globbing type
- wild-cards: <span class="QUOTE">"*"</span> represents zero or
- more arbitrary characters (this is equivalent to the <a href=
- "http://en.wikipedia.org/wiki/Regular_expressions" target=
- "_top"><span class="QUOTE">"Regular Expression"</span></a> based
- syntax of <span class="QUOTE">".*"</span>), <span class=
- "QUOTE">"?"</span> represents any single character (this is
- equivalent to the regular expression syntax of a simple <span
- class="QUOTE">"."</span>), and you can define <span class=
- "QUOTE">"character classes"</span> in square brackets which is
- similar to the same regular expression technique. All of this can
- be freely mixed:
- </p>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- <tt class="LITERAL">ad*.example.com</tt>
- </dt>
- <dd>
- <p>
- matches <span class="QUOTE">"adserver.example.com"</span>,
- <span class="QUOTE">"ads.example.com"</span>, etc but not
- <span class="QUOTE">"sfads.example.com"</span>
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">*ad*.example.com</tt>
- </dt>
- <dd>
- <p>
- matches all of the above, and then some.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">.?pix.com</tt>
- </dt>
- <dd>
- <p>
- matches <tt class="LITERAL">www.ipix.com</tt>, <tt class=
- "LITERAL">pictures.epix.com</tt>, <tt class=
- "LITERAL">a.b.c.d.e.upix.com</tt> etc.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">www[1-9a-ez].example.c*</tt>
- </dt>
- <dd>
- <p>
- matches <tt class="LITERAL">www1.example.com</tt>, <tt
- class="LITERAL">www4.example.cc</tt>, <tt class=
- "LITERAL">wwwd.example.cy</tt>, <tt class=
- "LITERAL">wwwz.example.com</tt> etc., but <span class=
- "emphasis"><i class="EMPHASIS">not</i></span> <tt class=
- "LITERAL">wwww.example.com</tt>.
- </p>
- </dd>
- </dl>
- </div>
- <p>
- While flexible, this is not the sophistication of full regular
- expression based syntax.
- </p>
- </div>
- <div class="SECT3">
- <h3 class="SECT3">
- <a name="PATH-PATTERN">8.4.2. The Path Pattern</a>
- </h3>
- <p>
- <span class="APPLICATION">Privoxy</span> uses <span class=
- "QUOTE">"modern"</span> POSIX 1003.2 <a href=
- "http://en.wikipedia.org/wiki/Regular_expressions" target=
- "_top"><span class="QUOTE">"Regular Expressions"</span></a> for
- matching the path portion (after the slash), and is thus more
- flexible.
- </p>
- <p>
- There is an <a href="appendix.html#REGEX">Appendix</a> with a
- brief quick-start into regular expressions, you also might want
- to have a look at your operating system's documentation on
- regular expressions (try <tt class="LITERAL">man re_format</tt>).
- </p>
- <p>
- Note that the path pattern is automatically left-anchored at the
- <span class="QUOTE">"/"</span>, i.e. it matches as if it would
- start with a <span class="QUOTE">"^"</span> (regular expression
- speak for the beginning of a line).
- </p>
- <p>
- Please also note that matching in the path is <span class=
- "emphasis"><i class="EMPHASIS">CASE INSENSITIVE</i></span> by
- default, but you can switch to case sensitive at any point in the
- pattern by using the <span class="QUOTE">"(?-i)"</span> switch:
- <tt class="LITERAL">www.example.com/(?-i)PaTtErN.*</tt> will
- match only documents whose path starts with <tt class=
- "LITERAL">PaTtErN</tt> in <span class="emphasis"><i class=
- "EMPHASIS">exactly</i></span> this capitalization.
- </p>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- <tt class="LITERAL">.example.com/.*</tt>
- </dt>
- <dd>
- <p>
- Is equivalent to just <span class=
- "QUOTE">".example.com"</span>, since any documents within
- that domain are matched with or without the <span class=
- "QUOTE">".*"</span> regular expression. This is redundant
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">.example.com/.*/index.html$</tt>
- </dt>
- <dd>
- <p>
- Will match any page in the domain of <span class=
- "QUOTE">"example.com"</span> that is named <span class=
- "QUOTE">"index.html"</span>, and that is part of some path.
- For example, it matches <span class=
- "QUOTE">"www.example.com/testing/index.html"</span> but NOT
- <span class="QUOTE">"www.example.com/index.html"</span>
- because the regular expression called for at least two
- <span class="QUOTE">"/'s"</span>, thus the path
- requirement. It also would match <span class=
- "QUOTE">"www.example.com/testing/index_html"</span>,
- because of the special meta-character <span class=
- "QUOTE">"."</span>.
- </p>
- </dd>
- <dt>
- <tt class="LITERAL">.example.com/(.*/)?index\.html$</tt>
- </dt>
- <dd>
- <p>
- This regular expression is conditional so it will match any
- page named <span class="QUOTE">"index.html"</span>
- regardless of path which in this case can have one or more
- <span class="QUOTE">"/'s"</span>. And this one must contain
- exactly <span class="QUOTE">".html"</span> (but does not
- have to end with that!).
- </p>
- </dd>
- <dt>
- <tt class=
- "LITERAL">.example.com/(.*/)(ads|banners?|junk)</tt>
- </dt>
- <dd>
- <p>
- This regular expression will match any path of <span class=
- "QUOTE">"example.com"</span> that contains any of the words
- <span class="QUOTE">"ads"</span>, <span class=
- "QUOTE">"banner"</span>, <span class=
- "QUOTE">"banners"</span> (because of the <span class=
- "QUOTE">"?"</span>) or <span class="QUOTE">"junk"</span>.
- The path does not have to end in these words, just contain
- them.
- </p>
- </dd>
- <dt>
- <tt class=
- "LITERAL">.example.com/(.*/)(ads|banners?|junk)/.*\.(jpe?g|gif|png)$</tt>
- </dt>
- <dd>
- <p>
- This is very much the same as above, except now it must end
- in either <span class="QUOTE">".jpg"</span>, <span class=
- "QUOTE">".jpeg"</span>, <span class="QUOTE">".gif"</span>
- or <span class="QUOTE">".png"</span>. So this one is
- limited to common image formats.
- </p>
- </dd>
- </dl>
- </div>
- <p>
- There are many, many good examples to be found in <tt class=
- "FILENAME">default.action</tt>, and more tutorials below in <a
- href="appendix.html#REGEX">Appendix on regular expressions</a>.
- </p>
- </div>
- <div class="SECT3">
- <h3 class="SECT3">
- <a name="TAG-PATTERN">8.4.3. The Request Tag Pattern</a>
- </h3>
- <p>
- Request tag patterns are used to change the applying actions
- based on the request's tags. Tags can be created based on HTTP
- headers with either the <a href=
- "actions-file.html#CLIENT-HEADER-TAGGER">client-header-tagger</a>
- or the <a href=
- "actions-file.html#SERVER-HEADER-TAGGER">server-header-tagger</a>
- action.
- </p>
- <p>
- Request tag patterns have to start with <span class=
- "QUOTE">"TAG:"</span>, so <span class=
- "APPLICATION">Privoxy</span> can tell them apart from other
- patterns. Everything after the colon including white space, is
- interpreted as a regular expression with path pattern syntax,
- except that tag patterns aren't left-anchored automatically
- (<span class="APPLICATION">Privoxy</span> doesn't silently add a
- <span class="QUOTE">"^"</span>, you have to do it yourself if you
- need it).
- </p>
- <p>
- To match all requests that are tagged with <span class=
- "QUOTE">"foo"</span> your pattern line should be <span class=
- "QUOTE">"TAG:^foo$"</span>, <span class="QUOTE">"TAG:foo"</span>
- would work as well, but it would also match requests whose tags
- contain <span class="QUOTE">"foo"</span> somewhere. <span class=
- "QUOTE">"TAG: foo"</span> wouldn't work as it requires white
- space.
- </p>
- <p>
- Sections can contain URL and request tag patterns at the same
- time, but request tag patterns are checked after the URL patterns
- and thus always overrule them, even if they are located before
- the URL patterns.
- </p>
- <p>
- Once a new request tag is added, Privoxy checks right away if
- it's matched by one of the request tag patterns and updates the
- action settings accordingly. As a result request tags can be used
- to activate other tagger actions, as long as these other taggers
- look for headers that haven't already be parsed.
- </p>
- <p>
- For example you could tag client requests which use the <tt
- class="LITERAL">POST</tt> method, then use this tag to activate
- another tagger that adds a tag if cookies are sent, and then use
- a block action based on the cookie tag. This allows the outcome
- of one action, to be input into a subsequent action. However if
- you'd reverse the position of the described taggers, and
- activated the method tagger based on the cookie tagger, no method
- tags would be created. The method tagger would look for the
- request line, but at the time the cookie tag is created, the
- request line has already been parsed.
- </p>
- <p>
- While this is a limitation you should be aware of, this kind of
- indirection is seldom needed anyway and even the example doesn't
- make too much sense.
- </p>
- </div>
- <div class="SECT3">
- <h3 class="SECT3">
- <a name="NEGATIVE-TAG-PATTERNS">8.4.4. The Negative Request Tag
- Patterns</a>
- </h3>
- <p>
- To match requests that do not have a certain request tag, specify
- a negative tag pattern by prefixing the tag pattern line with
- either <span class="QUOTE">"NO-REQUEST-TAG:"</span> or <span
- class="QUOTE">"NO-RESPONSE-TAG:"</span> instead of <span class=
- "QUOTE">"TAG:"</span>.
- </p>
- <p>
- Negative request tag patterns created with <span class=
- "QUOTE">"NO-REQUEST-TAG:"</span> are checked after all client
- headers are scanned, the ones created with <span class=
- "QUOTE">"NO-RESPONSE-TAG:"</span> are checked after all server
- headers are scanned. In both cases all the created tags are
- considered.
- </p>
- </div>
- <div class="SECT3">
- <h3 class="SECT3">
- <a name="CLIENT-TAG-PATTERN">8.4.5. The Client Tag Pattern</a>
- </h3>
- <div class="WARNING">
- <table class="WARNING" border="1" width="100%">
- <tr>
- <td align="CENTER">
- <b>Warning</b>
- </td>
- </tr>
- <tr>
- <td align="LEFT">
- <p>
- This is an experimental feature. The syntax is likely to
- change in future versions.
- </p>
- </td>
- </tr>
- </table>
- </div>
- <p>
- Client tag patterns are not set based on HTTP headers but based
- on the client's IP address. Users can enable them themselves, but
- the Privoxy admin controls which tags are available and what
- their effect is.
- </p>
- <p>
- After a client-specific tag has been defined with the <a href=
- "config.html#CLIENT-SPECIFIC-TAG">client-specific-tag</a>,
- directive, action sections can be activated based on the tag by
- using a CLIENT-TAG pattern. The CLIENT-TAG pattern is evaluated
- at the same priority as URL patterns, as a result the last
- matching pattern wins. Tags that are created based on client or
- server headers are evaluated later on and can overrule CLIENT-TAG
- and URL patterns!
- </p>
- <p>
- The tag is set for all requests that come from clients that
- requested it to be set. Note that "clients" are differentiated by
- IP address, if the IP address changes the tag has to be requested
- again.
- </p>
- <p>
- Clients can request tags to be set by using the CGI interface <a
- href="http://config.privoxy.org/client-tags" target=
- "_top">http://config.privoxy.org/client-tags</a>.
- </p>
- <p>
- Example:
- </p>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="100%">
- <tr>
- <td>
-<pre class="SCREEN">
-# If the admin defined the client-specific-tag circumvent-blocks,
-# and the request comes from a client that previously requested
-# the tag to be set, overrule all previous +block actions that
-# are enabled based on URL to CLIENT-TAG patterns.
-{-block}
-CLIENT-TAG:^circumvent-blocks$
-
-# This section is not overruled because it's located after
-# the previous one.
-{+block{Nobody is supposed to request this.}}
-example.org/blocked-example-page
-</pre>
- </td>
- </tr>
- </table>
- </div>
- </div>
- <div class="SECT2">
- <h2 class="SECT2">
- <a name="ACTIONS">8.5. Actions</a>
- </h2>
- <p>
- All actions are disabled by default, until they are explicitly
- enabled somewhere in an actions file. Actions are turned on if
- preceded with a <span class="QUOTE">"+"</span>, and turned off if
- preceded with a <span class="QUOTE">"-"</span>. So a <tt class=
- "LITERAL">+action</tt> means <span class="QUOTE">"do that
- action"</span>, e.g. <tt class="LITERAL">+block</tt> means <span
- class="QUOTE">"please block URLs that match the following
- patterns"</span>, and <tt class="LITERAL">-block</tt> means <span
- class="QUOTE">"don't block URLs that match the following patterns,
- even if <tt class="LITERAL">+block</tt> previously
- applied."</span>
- </p>
- <p>
- Again, actions are invoked by placing them on a line, enclosed in
- curly braces and separated by whitespace, like in <tt class=
- "LITERAL">{+some-action -some-other-action{some-parameter}}</tt>,
- followed by a list of URL patterns, one per line, to which they
- apply. Together, the actions line and the following pattern lines
- make up a section of the actions file.
- </p>
- <p>
- Actions fall into three categories:
- </p>
- <p>
- </p>
- <ul>
- <li>
- <p>
- Boolean, i.e the action can only be <span class=
- "QUOTE">"enabled"</span> or <span class=
- "QUOTE">"disabled"</span>. Syntax:
- </p>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
- +<tt class="REPLACEABLE"><i>name</i></tt> # enable action <tt class=
-"REPLACEABLE"><i>name</i></tt>
- -<tt class="REPLACEABLE"><i>name</i></tt> # disable action <tt
-class="REPLACEABLE"><i>name</i></tt>
-</pre>
- </td>
- </tr>
- </table>
-
- <p>
- Example: <tt class="LITERAL">+handle-as-image</tt>
- </p>
- </li>
- <li>
- <p>
- Parameterized, where some value is required in order to enable
- this type of action. Syntax:
- </p>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
- +<tt class="REPLACEABLE"><i>name</i></tt>{<tt class=
-"REPLACEABLE"><i>param</i></tt>} # enable action and set parameter to <tt
-class="REPLACEABLE"><i>param</i></tt>,
- # overwriting parameter from previous match if necessary
- -<tt class=
-"REPLACEABLE"><i>name</i></tt> # disable action. The parameter can be omitted
-</pre>
- </td>
- </tr>
- </table>
-
- <p>
- Note that if the URL matches multiple positive forms of a
- parameterized action, the last match wins, i.e. the params from
- earlier matches are simply ignored.
- </p>
- <p>
- Example: <tt class="LITERAL">+hide-user-agent{Mozilla/5.0 (X11;
- U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070602
- Firefox/2.0.0.4}</tt>
- </p>
- </li>
- <li>
- <p>
- Multi-value. These look exactly like parameterized actions, but
- they behave differently: If the action applies multiple times
- to the same URL, but with different parameters, <span class=
- "emphasis"><i class="EMPHASIS">all</i></span> the parameters
- from <span class="emphasis"><i class="EMPHASIS">all</i></span>
- matches are remembered. This is used for actions that can be
- executed for the same request repeatedly, like adding multiple
- headers, or filtering through multiple filters. Syntax:
- </p>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
- +<tt class="REPLACEABLE"><i>name</i></tt>{<tt class=
-"REPLACEABLE"><i>param</i></tt>} # enable action and add <tt class=
-"REPLACEABLE"><i>param</i></tt> to the list of parameters
- -<tt class="REPLACEABLE"><i>name</i></tt>{<tt class=
-"REPLACEABLE"><i>param</i></tt>} # remove the parameter <tt class=
-"REPLACEABLE"><i>param</i></tt> from the list of parameters
- # If it was the last one left, disable the action.
- <tt class=
-"REPLACEABLE"><i>-name</i></tt> # disable this action completely and remove all parameters from the list
-</pre>
- </td>
- </tr>
- </table>
-
- <p>
- Examples: <tt class="LITERAL">+add-header{X-Fun-Header: Some
- text}</tt> and <tt class=
- "LITERAL">+filter{html-annoyances}</tt>
- </p>
- </li>
- </ul>
-
- <p>
- If nothing is specified in any actions file, no <span class=
- "QUOTE">"actions"</span> are taken. So in this case <span class=
- "APPLICATION">Privoxy</span> would just be a normal, non-blocking,
- non-filtering proxy. You must specifically enable the privacy and
- blocking features you need (although the provided default actions
- files will give a good starting point).
- </p>
- <p>
- Later defined action sections always over-ride earlier ones of the
- same type. So exceptions to any rules you make, should come in the
- latter part of the file (or in a file that is processed later when
- using multiple actions files such as <tt class=
- "FILENAME">user.action</tt>). For multi-valued actions, the actions
- are applied in the order they are specified. Actions files are
- processed in the order they are defined in <tt class=
- "FILENAME">config</tt> (the default installation has three actions
- files). It also quite possible for any given URL to match more than
- one <span class="QUOTE">"pattern"</span> (because of wildcards and
- regular expressions), and thus to trigger more than one set of
- actions! Last match wins.
- </p>
- <p>
- The list of valid <span class="APPLICATION">Privoxy</span> actions
- are:
- </p>
- <div class="SECT3">
- <h4 class="SECT3">
- <a name="ADD-HEADER">8.5.1. add-header</a>
- </h4>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- Typical use:
- </dt>
- <dd>
- <p>
- Confuse log analysis, custom applications
- </p>
- </dd>
- <dt>
- Effect:
- </dt>
- <dd>
- <p>
- Sends a user defined HTTP header to the web server.
- </p>
- </dd>
- <dt>
- Type:
- </dt>
- <dd>
- <p>
- Multi-value.
- </p>
- </dd>
- <dt>
- Parameter:
- </dt>
- <dd>
- <p>
- Any string value is possible. Validity of the defined HTTP
- headers is not checked. It is recommended that you use the
- <span class="QUOTE">"<tt class="LITERAL">X-</tt>"</span>
- prefix for custom headers.
- </p>
- </dd>
- <dt>
- Notes:
- </dt>
- <dd>
- <p>
- This action may be specified multiple times, in order to
- define multiple headers. This is rarely needed for the
- typical user. If you don't know what <span class=
- "QUOTE">"HTTP headers"</span> are, you definitely don't
- need to worry about this one.
- </p>
- <p>
- Headers added by this action are not modified by other
- actions.
- </p>
- </dd>
- <dt>
- Example usage:
- </dt>
- <dd>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
-# Add a DNT ("Do not track") header to all requests,
-# event to those that already have one.
-#
-# This is just an example, not a recommendation.
-#
-# There is no reason to believe that user-tracking websites care
-# about the DNT header and depending on the User-Agent, adding the
-# header may make user-tracking easier.
-{+add-header{DNT: 1}}
-/
-</pre>
- </td>
- </tr>
- </table>
- </dd>
- </dl>
- </div>
- </div>
- <div class="SECT3">
- <h4 class="SECT3">
- <a name="BLOCK">8.5.2. block</a>
- </h4>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- Typical use:
- </dt>
- <dd>
- <p>
- Block ads or other unwanted content
- </p>
- </dd>
- <dt>
- Effect:
- </dt>
- <dd>
- <p>
- Requests for URLs to which this action applies are blocked,
- i.e. the requests are trapped by <span class=
- "APPLICATION">Privoxy</span> and the requested URL is never
- retrieved, but is answered locally with a substitute page
- or image, as determined by the <tt class="LITERAL"><a href=
- "actions-file.html#HANDLE-AS-IMAGE">handle-as-image</a></tt>,
- <tt class="LITERAL"><a href=
- "actions-file.html#SET-IMAGE-BLOCKER">set-image-blocker</a></tt>,
- and <tt class="LITERAL"><a href=
- "actions-file.html#HANDLE-AS-EMPTY-DOCUMENT">handle-as-empty-document</a></tt>
- actions.
- </p>
- </dd>
- <dt>
- Type:
- </dt>
- <dd>
- <p>
- Parameterized.
- </p>
- </dd>
- <dt>
- Parameter:
- </dt>
- <dd>
- <p>
- A block reason that should be given to the user.
- </p>
- </dd>
- <dt>
- Notes:
- </dt>
- <dd>
- <p>
- <span class="APPLICATION">Privoxy</span> sends a special
- <span class="QUOTE">"BLOCKED"</span> page for requests to
- blocked pages. This page contains the block reason given as
- parameter, a link to find out why the block action applies,
- and a click-through to the blocked content (the latter only
- if the force feature is available and enabled).
- </p>
- <p>
- A very important exception occurs if <span class=
- "emphasis"><i class="EMPHASIS">both</i></span> <tt class=
- "LITERAL">block</tt> and <tt class="LITERAL"><a href=
- "actions-file.html#HANDLE-AS-IMAGE">handle-as-image</a></tt>,
- apply to the same request: it will then be replaced by an
- image. If <tt class="LITERAL"><a href=
- "actions-file.html#SET-IMAGE-BLOCKER">set-image-blocker</a></tt>
- (see below) also applies, the type of image will be
- determined by its parameter, if not, the standard
- checkerboard pattern is sent.
- </p>
- <p>
- It is important to understand this process, in order to
- understand how <span class="APPLICATION">Privoxy</span>
- deals with ads and other unwanted content. Blocking is a
- core feature, and one upon which various other features
- depend.
- </p>
- <p>
- The <tt class="LITERAL"><a href=
- "actions-file.html#FILTER">filter</a></tt> action can
- perform a very similar task, by <span class=
- "QUOTE">"blocking"</span> banner images and other content
- through rewriting the relevant URLs in the document's HTML
- source, so they don't get requested in the first place.
- Note that this is a totally different technique, and it's
- easy to confuse the two.
- </p>
- </dd>
- <dt>
- Example usage (section):
- </dt>
- <dd>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
-{+block{No nasty stuff for you.}}
-# Block and replace with "blocked" page
- .nasty-stuff.example.com
-
-{+block{Doubleclick banners.} +handle-as-image}
-# Block and replace with image
- .ad.doubleclick.net
- .ads.r.us/banners/
-
-{+block{Layered ads.} +handle-as-empty-document}
-# Block and then ignore
- adserver.example.net/.*\.js$
-</pre>
- </td>
- </tr>
- </table>
- </dd>
- </dl>
- </div>
- </div>
- <div class="SECT3">
- <h4 class="SECT3">
- <a name="CHANGE-X-FORWARDED-FOR">8.5.3.
- change-x-forwarded-for</a>
- </h4>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- Typical use:
- </dt>
- <dd>
- <p>
- Improve privacy by not forwarding the source of the request
- in the HTTP headers.
- </p>
- </dd>
- <dt>
- Effect:
- </dt>
- <dd>
- <p>
- Deletes the <span class="QUOTE">"X-Forwarded-For:"</span>
- HTTP header from the client request, or adds a new one.
- </p>
- </dd>
- <dt>
- Type:
- </dt>
- <dd>
- <p>
- Parameterized.
- </p>
- </dd>
- <dt>
- Parameter:
- </dt>
- <dd>
- <ul>
- <li>
- <p>
- <span class="QUOTE">"block"</span> to delete the
- header.
- </p>
- </li>
- <li>
- <p>
- <span class="QUOTE">"add"</span> to create the header
- (or append the client's IP address to an already
- existing one).
- </p>
- </li>
- </ul>
- </dd>
- <dt>
- Notes:
- </dt>
- <dd>
- <p>
- It is safe and recommended to use <tt class=
- "LITERAL">block</tt>.
- </p>
- <p>
- Forwarding the source address of the request may make sense
- in some multi-user setups but is also a privacy risk.
- </p>
- </dd>
- <dt>
- Example usage:
- </dt>
- <dd>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
-+change-x-forwarded-for{block}
-</pre>
- </td>
- </tr>
- </table>
- </dd>
- </dl>
- </div>
- </div>
- <div class="SECT3">
- <h4 class="SECT3">
- <a name="CLIENT-HEADER-FILTER">8.5.4. client-header-filter</a>
- </h4>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- Typical use:
- </dt>
- <dd>
- <p>
- Rewrite or remove single client headers.
- </p>
- </dd>
- <dt>
- Effect:
- </dt>
- <dd>
- <p>
- All client headers to which this action applies are
- filtered on-the-fly through the specified regular
- expression based substitutions.
- </p>
- </dd>
- <dt>
- Type:
- </dt>
- <dd>
- <p>
- Multi-value.
- </p>
- </dd>
- <dt>
- Parameter:
- </dt>
- <dd>
- <p>
- The name of a client-header filter, as defined in one of
- the <a href="filter-file.html">filter files</a>.
- </p>
- </dd>
- <dt>
- Notes:
- </dt>
- <dd>
- <p>
- Client-header filters are applied to each header on its
- own, not to all at once. This makes it easier to diagnose
- problems, but on the downside you can't write filters that
- only change header x if header y's value is z. You can do
- that by using tags though.
- </p>
- <p>
- Client-header filters are executed after the other header
- actions have finished and use their output as input.
- </p>
- <p>
- If the request URI gets changed, <span class=
- "APPLICATION">Privoxy</span> will detect that and use the
- new one. This can be used to rewrite the request
- destination behind the client's back, for example to
- specify a Tor exit relay for certain requests.
- </p>
- <p>
- Please refer to the <a href="filter-file.html">filter file
- chapter</a> to learn which client-header filters are
- available by default, and how to create your own.
- </p>
- </dd>
- <dt>
- Example usage (section):
- </dt>
- <dd>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
-# Hide Tor exit notation in Host and Referer Headers
-{+client-header-filter{hide-tor-exit-notation}}
-/
-
-</pre>
- </td>
- </tr>
- </table>
- </dd>
- </dl>
- </div>
- </div>
- <div class="SECT3">
- <h4 class="SECT3">
- <a name="CLIENT-HEADER-TAGGER">8.5.5. client-header-tagger</a>
- </h4>
- <div class="VARIABLELIST">
- <dl>
- <dt>
- Typical use:
- </dt>
- <dd>
- <p>
- Block requests based on their headers.
- </p>
- </dd>
- <dt>
- Effect:
- </dt>
- <dd>
- <p>
- Client headers to which this action applies are filtered
- on-the-fly through the specified regular expression based
- substitutions, the result is used as tag.
- </p>
- </dd>
- <dt>
- Type:
- </dt>
- <dd>
- <p>
- Multi-value.
- </p>
- </dd>
- <dt>
- Parameter:
- </dt>
- <dd>
- <p>
- The name of a client-header tagger, as defined in one of
- the <a href="filter-file.html">filter files</a>.
- </p>
- </dd>
- <dt>
- Notes:
- </dt>
- <dd>
- <p>
- Client-header taggers are applied to each header on its
- own, and as the header isn't modified, each tagger <span
- class="QUOTE">"sees"</span> the original.
- </p>
- <p>
- Client-header taggers are the first actions that are
- executed and their tags can be used to control every other
- action.
- </p>
- </dd>
- <dt>
- Example usage (section):
- </dt>
- <dd>
- <p>
- </p>
- <table border="0" bgcolor="#E0E0E0" width="90%">
- <tr>
- <td>
-<pre class="SCREEN">
-# Tag every request with the User-Agent header