<!entity license SYSTEM "license.sgml">
<!entity p-authors SYSTEM "p-authors.sgml">
<!entity config SYSTEM "p-config.sgml">
-<!entity p-version "3.0.6">
+<!entity p-version "3.0.7">
<!entity p-status "stable">
<!entity % p-authors-formal "INCLUDE"> <!-- include additional text, etc -->
<!entity % p-not-stable "IGNORE">
This file belongs into
ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/
- $Id: user-manual.sgml,v 2.27 2006/11/14 01:57:47 hal9 Exp $
+ $Id: user-manual.sgml,v 2.31 2007/06/02 14:01:37 fabiankeil Exp $
- Copyright (C) 2001- 2006 Privoxy Developers http://www.privoxy.org
+ Copyright (C) 2001-2007 Privoxy Developers http://www.privoxy.org/
See LICENSE.
========================================================================
<subscript>
<!-- Completely the wrong markup, but very little is allowed -->
<!-- in this part of an article. FIXME -->
- <link linkend="copyright">Copyright</link> &my-copy; 2001 - 2006 by
+ <link linkend="copyright">Copyright</link> &my-copy; 2001 - 2007 by
<ulink url="http://www.privoxy.org/">Privoxy Developers</ulink>
</subscript>
</pubdate>
-<pubdate>$Id: user-manual.sgml,v 2.27 2006/11/14 01:57:47 hal9 Exp $</pubdate>
+<pubdate>$Id: user-manual.sgml,v 2.31 2007/06/02 14:01:37 fabiankeil Exp $</pubdate>
<!--
How to install the binary packages depends on your operating system:
</para>
+<!-- XXX: The installation sections should be sorted -->
+
<!-- ~~~~~ New section ~~~~~ -->
<sect3 id="installation-pack-rpm"><title>Red Hat and Fedora RPMs</title>
</sect3>
<!-- ~~~~~ New section ~~~~~ -->
-<sect3 id="installation-pack-bintgz"><title>Solaris, NetBSD, FreeBSD, HP-UX</title>
+<sect3 id="installation-pack-bintgz"><title>Solaris, NetBSD, HP-UX</title>
<para>
Create a new directory, <literal>cd</literal> to it, then unzip and
</para>
</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 id="installation-tbz"><title>FreeBSD</title>
+
+<para>
+ Privoxy is part of FreeBSD's Ports Collection, you can build and install
+ it with <literal>cd /usr/ports/www/privoxy; make install clean</literal>.
+</para>
+<para>
+ If you don't use the ports, you can fetch and install
+ the package with <literal>pkg_add -r privoxy</literal>.
+</para>
+<para>
+ The port skeleton and the package can also be downloaded from the
+ <ulink url="https://sourceforge.net/project/showfiles.php?group_id=11118">File Release
+ Page</ulink>, but if you're interested in stable releases only you don't
+ gain anything by using them.
+</para>
+</sect3>
+
<!-- ~~~~~ New section ~~~~~ -->
<sect3 id="installattion-gentoo"><title>Gentoo</title>
<para>
<sect1 id="whatsnew">
<title>What's New in this Release</title>
<para>
- There are many improvements and new features since <application>Privoxy 3.0.3</application>, the last stable release:
+ There are many improvements and new features since <application>Privoxy 3.0.6</application>, the last stable release:
</para>
<para>
<itemizedlist>
<listitem>
<para>
- Multiple <link linkend="filter-file">filter files</link> can now be specified in <filename>config</filename>. This allows for
- locally defined filters that can be maintained separately from the filters as
- supplied by the developers, i.e. <filename>default.filter</filename>.
+ Header filtering can be done with dedicated header filters now. As a result
+ the actions <quote>filter-client-headers</quote> and <quote>filter-server-headers</quote>
+ that were introduced with <application>Privoxy 3.0.5</application> to apply
+ the content filters to the headers as, well have been removed again.
</para>
</listitem>
-
+
+<!-- pre-3.0.6 changes:
<listitem>
<para>
There are a number of new <link linkend="actions-file">actions</link>:
configuration updates for better ad blocking and junk elimination.
</para>
</listitem>
-
+-->
</itemizedlist>
</para>
<para>
The list of actions files to be used are defined in the main configuration
file, and are processed in the order they are defined (e.g.
- <filename>default.action</filename> is typically process before
+ <filename>default.action</filename> is typically processed before
<filename>user.action</filename>). The content of these can all be viewed and
edited from <ulink
url="http://config.privoxy.org/show-status">http://config.privoxy.org/show-status</ulink>.
<sect2 id="actions-apply">
-<title>How Actions are Applied to URLs</title>
+<title>How Actions are Applied to Requests</title>
<para>
Actions files are divided into sections. There are special sections,
like the <quote><link linkend="aliases">alias</link></quote> sections which will
be discussed later. For now let's concentrate on regular sections: They have a
heading line (often split up to multiple lines for readability) which consist
of a list of actions, separated by whitespace and enclosed in curly braces.
- Below that, there is a list of URL patterns, each on a separate line.
+ Below that, there is a list of URL and tag patterns, each on a separate line.
</para>
<para>
To determine which actions apply to a request, the URL of the request is
- compared to all patterns in each <quote>action file</quote> file. Every time it matches, the list of
- applicable actions for the URL is incrementally updated, using the heading
- of the section in which the pattern is located. If multiple matches for
- the same URL set the same action differently, the last match wins. If not,
- the effects are aggregated. E.g. a URL might match a regular section with
- a heading line of <literal>{
+ compared to all URL patterns in each <quote>action file</quote>.
+ Every time it matches, the list of applicable actions for the request is
+ incrementally updated, using the heading of the section in which the
+ pattern is located. The same is done again for tags and tag patterns later on.
+</para>
+
+<para>
+ If multiple applying sections set the same action differently,
+ the last match wins. If not, the effects are aggregated.
+ E.g. a URL might match a regular section with a heading line of <literal>{
+<link linkend="handle-as-image">handle-as-image</link> }</literal>,
then later another one with just <literal>{
+<link linkend="block">block</link> }</literal>, resulting
</para>
<para>
- You can trace this process for any given URL by visiting <ulink
+ You can trace this process for URL patterns and any given URL by visiting <ulink
url="http://config.privoxy.org/show-url-info">http://config.privoxy.org/show-url-info</ulink>.
</para>
</para>
<para>
- Generally, a <application>Privoxy</application> pattern has the form
+ Generally, a URL pattern has the form
<literal><domain>/<path></literal>, where both the
<literal><domain></literal> and <literal><path></literal> are
optional. (This is why the special <literal>/</literal> pattern matches all
</sect3>
+<!-- ~ End section ~ -->
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 id="tag-pattern"><title>The Tag Pattern</title>
+
+<para>
+ Tag patterns are used to change the applying actions based on the
+ request's tags. Tags can be created with either the
+ <link linkend="CLIENT-HEADER-FILTER">client-header-tagger</link>
+ or the <link linkend="SERVER-HEADER-FILTER">server-header-tagger</link> action.
+</para>
+
+<para>
+ Tag patterns have to start with <quote>TAG:</quote>, so &my-app;
+ can tell them apart from URL patterns. Everything after the colon
+ including white space, is interpreted as a regular expression with
+ path patterns syntax, except that tag patterns aren't left-anchored
+ automatically (Privoxy doesn't silently add a <quote>^</quote>,
+ you have to do it yourself if you need it).
+</para>
+
+<para>
+ To match all requests that are tagged with <quote>foo</quote>
+ your pattern line should be <quote>TAG:^foo$</quote>,
+ <quote>TAG:foo</quote> would work as well, but it would also
+ match requests whose tags contain <quote>foo</quote> somewhere.
+</para>
+
+<para>
+ Sections can contain URL and tag patterns at the same time,
+ but tag patterns are checked after the URL patterns and thus
+ always overrule them, even if they are located before the URL patterns.
+</para>
+
+<para>
+ Once a new tag is added, Privoxy checks right away if it's matched by one
+ of the tag patterns and updates the action settings accordingly. As a result
+ tags can be used to activate other tagger actions, as long as these other
+ taggers look for headers that haven't already be parsed.
+</para>
+
+<para>
+ For example you could tag client requests which use the POST method,
+ use this tag to activate another tagger that adds a tag if cookies
+ are send, and then block based on the cookie tag. However if you'd
+ reverse the position of the described taggers, and activated the method
+ tagger based on the cookie tagger, no method tags would be created.
+ The method tagger would look for the request line, but at the time
+ the cookie tag is created the request line has already been parsed.
+</para>
+
+<para>
+ While this is a limitation you should be aware of, this kind of
+ indirection is seldom needed anyway and even the example doesn't
+ make too much sense.
+</para>
+
+</sect3>
+
</sect2>
<!-- ~ End section ~ -->
</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 renderas="sect4" id="client-header-filter">
+<title>client-header-filter</title>
+
+<variablelist>
+ <varlistentry>
+ <term>Typical use:</term>
+ <listitem>
+ <para>
+ Rewrite or remove single client headers.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Effect:</term>
+ <listitem>
+ <para>
+ All client headers to which this action applies are filtered on-the-fly through
+ the specified regular expression based substitutions.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Type:</term>
+ <!-- boolean, parameterized, Multi-value -->
+ <listitem>
+ <para>Parameterized.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Parameter:</term>
+ <listitem>
+ <para>
+ The name of a client-header filter, as defined in one of the
+ <link linkend="filter-file">filter files</link>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Notes:</term>
+ <listitem>
+ <para>
+ Client-header filters are applied to each header on its own, not to
+ all at once. This makes it easier to diagnose problems, but on the downside
+ you can't write filters that only change header x if header y's value is z.
+ You can do that by using tags though.
+ </para>
+ <para>
+ Client-header filters are executed after the other header actions have finished
+ and use their output as input.
+ </para>
+ <para>
+ Please refer to the <link linkend="filter-file">filter file chapter</link>
+ to learn which client-header filters are available by default, and how to
+ create your own.
+ </para>
+
+ </varlistentry>
+
+ <varlistentry>
+ <term>Example usage (section):</term>
+ <listitem>
+ <para>
+ <screen>
+{+client-header-filter{hide-tor-exit-notation}}
+.exit/
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect3>
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 renderas="sect4" id="client-header-tagger">
+<title>client-header-tagger</title>
+
+<variablelist>
+ <varlistentry>
+ <term>Typical use:</term>
+ <listitem>
+ <para>
+ Block requests based on their headers.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Effect:</term>
+ <listitem>
+ <para>
+ Client headers to which this action applies are filtered on-the-fly through
+ the specified regular expression based substitutions, the result is used as
+ tag.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Type:</term>
+ <!-- boolean, parameterized, Multi-value -->
+ <listitem>
+ <para>Parameterized.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Parameter:</term>
+ <listitem>
+ <para>
+ The name of a client-header tagger, as defined in one of the
+ <link linkend="filter-file">filter files</link>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Notes:</term>
+ <listitem>
+ <para>
+ Client-header taggers are applied to each header on its own,
+ and as the header isn't modified, each tagger <quote>sees</quote>
+ the original.
+ </para>
+ <para>
+ Client-header taggers are the first actions that are executed
+ and their tags can be used to control every other action.
+ </para>
+
+ </varlistentry>
+
+ <varlistentry>
+ <term>Example usage (section):</term>
+ <listitem>
+ <para>
+ <screen>
+# Tag every request with the User-Agent header
+{+client-header-filter{user-agent}}
+/
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect3>
+
+
<!-- ~~~~~ New section ~~~~~ -->
<sect3 renderas="sect4" id="content-type-overwrite">
<!--
This limitation exists for a reason, think twice before circumventing it.
</para>
<para>
- Most of the time it's easier to enable
- <literal><link linkend="filter-server-headers">filter-server-headers</link></literal>
- and replace this action with a custom regular expression. It allows you
- to activate it for every document of a certain site and it will still
+ Most of the time it's easier to replace this action with a custom
+ <literal><link linkend="server-header-filter">server-header filter</link></literal>.
+ It allows you to activate it for every document of a certain site and it will still
only replace the content types you aimed at.
</para>
<para>
<para>
<literal>crunch-client-header</literal> is only meant for quick tests.
If you have to block several different headers, or only want to modify
- parts of them, you should enable
- <literal><link linkend="filter-client-headers">filter-client-headers</link></literal>
- and create your own filter.
+ parts of them, you should use a
+ <literal><link linkend="client-header-filter">client-header filter</link></literal>.
</para>
<warning>
<para>
<para>
<literal>crunch-server-header</literal> is only meant for quick tests.
If you have to block several different headers, or only want to modify
- parts of them, you should enable
- <literal><link linkend="filter-server-headers">filter-server-headers</link></literal>
- and create your own filter.
+ parts of them, you should use a custom
+ <literal><link linkend="server-header-filter">server-header filter</link></literal>.
</para>
<warning>
<para>
followed by another parameter. <literal>fast-redirects</literal> doesn't know that
and will cause a redirect to <quote>http://www.example.net/&foo=bar</quote>.
Depending on the target server configuration, the parameter will be silently ignored
- or lead to a <quote>page not found</quote> error. It is possible to fix these redirected
- requests with <literal><link linkend="filter-client-headers">filter-client-headers</link></literal>
- but it requires a little effort.
+ or lead to a <quote>page not found</quote> error. You can prevent this problem by
+ first using the <literal><link linkend="redirect">redirect</link></literal> action
+ to remove the last part of the URL, but it requires a little effort.
</para>
<para>
To detect a redirection URL, <literal>fast-redirects</literal> only
<term>Effect:</term>
<listitem>
<para>
- All files of text-based type, most notably HTML and
- JavaScript, to which this action applies, can be filtered on-the-fly
- through the specified regular expression based substitutions. (Note: as of
- version 3.0.3 plain text documents are exempted from filtering, because
- web servers often use the <literal>text/plain</literal> MIME type for all
- files whose type they don't know.) By default, filtering works only on the
- raw document content itself (that which can be seen with <literal>View
- Source</literal>),
- not the headers.
+ All instances of text-based type, most notably HTML and JavaScript, to which
+ this action applies, can be filtered on-the-fly through the specified regular
+ expression based substitutions. (Note: as of version 3.0.3 plain text documents
+ are exempted from filtering, because web servers often use the
+ <literal>text/plain</literal> MIME type for all files whose type they don't know.)
</para>
</listitem>
</varlistentry>
<term>Parameter:</term>
<listitem>
<para>
- The name of a filter, as defined in the <link linkend="filter-file">filter file</link>.
+ The name of a content filter, as defined in the <link linkend="filter-file">filter file</link>.
Filters can be defined in one or more files as defined by the
<literal><link linkend="filterfile">filterfile</link></literal>
option in the <link linkend="config">config file</link>.
by defining appropriate <literal>-filter</literal> exceptions.
</para>
<para>
- At this time, <application>Privoxy</application> cannot uncompress compressed
- documents. If you want filtering to work on all documents, even those that
- would normally be sent compressed, you must use the
- <literal><link linkend="prevent-compression">prevent-compression</link></literal>
+ Compressed content can't be filtered either, unless &my-app;
+ is compiled with zlib support (requires at least &my-app; 3.0.7),
+ in which case &my-app; will decompress the content before filtering
+ it.
+ </para>
+ <para>
+ If you use a &my-app; version without zlib support, but want filtering to work on
+ as much documents as possible, even those that would normally be sent compressed,
+ you must use the <literal><link linkend="prevent-compression">prevent-compression</link></literal>
action in conjunction with <literal>filter</literal>.
</para>
<para>
- Filtering can achieve some of the same effects as the
+ Content filtering can achieve some of the same effects as the
<literal><link linkend="block">block</link></literal>
action, i.e. it can be used to block ads and banners. But the mechanism
works quite differently. One effective use, is to block ad banners
<anchor id="filter-blogspot">
<screen>+filter{blogspot} # Cleans up Blogspot blogs</screen>
</para>
- <para>
- <anchor id="filter-html-to-xml">
- <screen>+filter{html-to-xml} # Header filter to change the Content-Type from html to xml</screen>
- </para>
- <para>
- <anchor id="filter-xml-to-html">
- <screen>+filter{xml-to-html} # Header filter to change the Content-Type from xml to html</screen>
- </para>
<para>
<anchor id="filter-no-ping">
<screen>+filter{no-ping} # Removes non-standard ping attributes from anchor and area tags</screen>
</para>
- <para>
- <anchor id="filter-hide-tor-exit-notation">
- <screen>+filter{hide-tor-exit-notation} # Header filter to remove the Tor exit node notation in Host and Referer headers</screen>
- </para>
</listitem>
</varlistentry>
</variablelist>
<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="filter-client-headers">
-<title>filter-client-headers</title>
-
+<sect3 renderas="sect4" id="force-text-mode">
+<title>force-text-mode</title>
+<!--
+new action
+-->
<variablelist>
<varlistentry>
<term>Typical use:</term>
<listitem>
- <para>
- To apply filtering to the client's (browser's) headers
- </para>
+ <para>Force <application>Privoxy</application> to treat a document as if it was in some kind of <emphasis>text</emphasis> format. </para>
</listitem>
</varlistentry>
<term>Effect:</term>
<listitem>
<para>
- By default, <application>Privoxy's</application> filters only apply
- to the document content itself. This will extend those filters to
- include the client's headers as well.
- </para>
+ Declares a document as text, even if the <quote>Content-Type:</quote> isn't detected as such.
+ </para>
</listitem>
</varlistentry>
<varlistentry>
<term>Type:</term>
- <!-- boolean, parameterized, Multi-value -->
+ <!-- Boolean, Parameterized, Multi-value -->
<listitem>
<para>Boolean.</para>
</listitem>
</para>
</listitem>
</varlistentry>
-
-<varlistentry>
+
+ <varlistentry>
<term>Notes:</term>
<listitem>
<para>
- Regular expressions can be used to filter headers as well. Check your
- filters closely before activating this action, as it can easily lead to broken
- requests.
- </para>
- <para>
- These filters are applied to each header on its own, not to them
- all at once. This makes it easier to diagnose problems, but on the downside
- you can't write filters that only change header x if header y's value is
- z.
- </para>
- <para>
- The filters are used after the other header actions have finished and can
- use their output as input.
+ As explained <literal><link linkend="filter">above</link></literal>,
+ <application>Privoxy</application> tries to only filter files that are
+ in some kind of text format. The same restrictions apply to
+ <literal><link linkend="content-type-overwrite">content-type-overwrite</link></literal>.
+ <literal>force-text-mode</literal> declares a document as text,
+ without looking at the <quote>Content-Type:</quote> first.
</para>
-
- <para>
- Whenever possible one should specify <literal>^</literal>,
- <literal>$</literal>, the whole header name and the colon, to make sure
- the filter doesn't cause havoc to other headers or the
- page itself. For example if you want to transform
- <application>Galeon</application> User-Agents to
- <application>Firefox</application> User-Agents you
- shouldn't use:
-</para>
-<para>
-<screen>
-s@Galeon/\d\.\d\.\d @@
-</screen>
-</para><para>
- but:
-</para><para>
-<screen>
-s@^(User-Agent:.*) Galeon/\d\.\d\.\d (Firefox/\d\.\d\.\d\.\d)$@$1 $2@
-</screen>
-</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage (section):</term>
- <listitem>
+ <warning>
<para>
- <screen>
-{+filter-client-headers +filter{test_filter}}
-problem-host.example.com
- </screen>
+ Think twice before activating this action. Filtering binary data
+ with regular expressions can cause file damage.
</para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="filter-server-headers">
-<title>filter-server-headers</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- To apply filtering to the server's headers
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- By default, <application>Privoxy's</application> filters only apply
- to the document content itself. This will extend those filters to
- include the server's headers as well.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- boolean, parameterized, Multi-value -->
- <listitem>
- <para>Boolean.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- N/A
- </para>
+ </warning>
</listitem>
</varlistentry>
-<varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- Similar to <literal>filter-client-headers</literal>, but works on
- the server instead. To filter both server and client, use both.
- </para>
- <para>
- As with <literal>filter-client-headers</literal>, check your
- filters before activating this action, as it can easily lead to broken
- requests.
- </para>
- <para>
- These filters are applied to each header on its own, not to them
- all at once. This makes it easier to diagnose problems, but on the downside
- you can't write filters that only change header x if header y's value is
- z.
- </para>
- <para>
- The filters are used after the other header actions have finished and can
- use their output as input.
- </para>
- <para>
- Remember too, whenever possible one should specify <literal>^</literal>,
- <literal>$</literal>, the whole header name and the colon, to make sure
- the filter doesn't cause havoc to other headers or the
- page itself. See above for example.
- </para>
-
- </listitem>
- </varlistentry>
-
<varlistentry>
- <term>Example usage (section):</term>
+ <term>Example usage:</term>
<listitem>
- <para>
+ <para>
<screen>
-{+filter-server-headers +filter{test_filter}}
-problem-host.example.com
- </screen>
- </para>
++force-text-mode
+ </screen>
+ </para>
</listitem>
</varlistentry>
-
</variablelist>
</sect3>
<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="force-text-mode">
-<title>force-text-mode</title>
+<sect3 renderas="sect4" id="forward-override">
+<title>forward-override</title>
<!--
new action
-->
<varlistentry>
<term>Typical use:</term>
<listitem>
- <para>Force <application>Privoxy</application> to treat a document as if it was in some kind of <emphasis>text</emphasis> format. </para>
+ <para>Change the forwarding settings based on User-Agent or request origin</para>
</listitem>
</varlistentry>
<term>Effect:</term>
<listitem>
<para>
- Declares a document as text, even if the <quote>Content-Type:</quote> isn't detected as such.
+ Overrules the forward directives in the configuration files.
</para>
</listitem>
</varlistentry>
<term>Type:</term>
<!-- Boolean, Parameterized, Multi-value -->
<listitem>
- <para>Boolean.</para>
+ <para>Multi-value.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Parameter:</term>
<listitem>
- <para>
- N/A
- </para>
+ <itemizedlist>
+ <listitem>
+ <para><quote>forward .</quote> to use a direct connection without any additional proxies.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <quote>forward 127.0.0.1:8123</quote> to use the HTTP proxy listening at 127.0.0.1 port 8123.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <quote>forward-socks4a 127.0.0.1:9050 .</quote> to use the socks4a proxy listening at 127.0.0.1 port 9050.
+ Replace <quote>forward-socks4a</quote> with <quote>forward-socks4</quote> to use a socks4 connection (with local DNS
+ resolution) instead.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <quote>forward-socks4a 127.0.0.1:9050 proxy.example.org:8000</quote> to use the socks4a proxy
+ listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000.
+ Replace <quote>forward-socks4a</quote> with <quote>forward-socks4</quote> to use a socks4 connection (with local DNS
+ resolution) instead.
+ </para>
+ </listitem>
+ </itemizedlist>
</listitem>
</varlistentry>
<term>Notes:</term>
<listitem>
<para>
- As explained <literal><link linkend="filter">above</link></literal>,
- <application>Privoxy</application> tries to only filter files that are
- in some kind of text format. The same restrictions apply to
- <literal><link linkend="content-type-overwrite">content-type-overwrite</link></literal>.
- <literal>force-text-mode</literal> declares a document as text,
- without looking at the <quote>Content-Type:</quote> first.
+ This action takes parameters similar to the <!-- I hope this link actual works -->
+ <link linkend="forwarding">forward</link> directives in the configuration
+ file, but without the URL pattern. It can be used as replacement, but normally it's only
+ used in cases where matching based on the request URL isn't sufficient.
</para>
<warning>
<para>
- Think twice before activating this action. Filtering binary data
- with regular expressions can cause file damage.
+ Please read the description for the <link linkend="forwarding">forward</link> directives before
+ using this action. Forwarding to the wrong people will reduce your privacy and increase the
+ chances of man-in-the-middle attacks.
+ </para>
+ <para>
+ If the ports are missing or invalid, default values will be used. This might change
+ in the future and you shouldn't rely on it. Otherwise incorrect syntax causes Privoxy
+ to exit.
+ </para>
+ <para>
+ Use the <ulink url="http://config.privoxy.org/show-url-info">show-url-info CGI page</ulink>
+ to verify that your forward settings do what you thought the do.
</para>
</warning>
</listitem>
<listitem>
<para>
<screen>
-+force-text-mode
+# Always use direct connections for requests previously tagged as
+# <quote>User-Agent: fetch libfetch/2.0</quote> and make sure
+# resuming downloads continues to work.
+# This way you can continue to use Tor for your normal browsing,
+# without overloading the Tor network with your FreeBSD ports updates
+# or downloads of bigger files like ISOs.
+{+forward-override{forward .} \
+ -hide-if-modified-since \
+ -overwrite-last-modified \
+}
+TAG:^User-Agent: fetch libfetch/2.0$
</screen>
</para>
</listitem>
<listitem>
<para>
More and more websites send their content compressed by default, which
- is generally a good idea and saves bandwidth. But for the <literal><link
+ is generally a good idea and saves bandwidth. But the <literal><link
linkend="filter">filter</link></literal>, <literal><link linkend="deanimate-gifs">deanimate-gifs</link></literal>
- and <literal><link linkend="kill-popups">kill-popups</link></literal> actions to work,
- <application>Privoxy</application> needs access to the uncompressed data.
- Unfortunately, <application>Privoxy</application> can't yet(!) uncompress, filter, and
- re-compress the content on the fly. So if you want to ensure that all websites, including
- those that normally compress, can be filtered, you need to use this action.
+ and <literal><link linkend="kill-popups">kill-popups</link></literal> actions need
+ access to the uncompressed data.
+ </para>
+ <para>
+ When compiled with zlib support (available since &my-app; 3.0.7), content that should be
+ filtered is decompressed on-the-fly and you don't have to worry about this action.
+ If you are using an older &my-app; version, or one that hasn't been compiled with zlib
+ support, this action can be used to convince the server to send the content uncompressed.
</para>
<para>
- This will slow down transfers from those websites, though. If you use any of the above-mentioned
- actions, you will typically want to use <literal>prevent-compression</literal> in conjunction
- with them.
+ Most text-based instances compress very well, the size is seldom decreased by less than 50%,
+ for markup-heavy instances like news feeds saving more than 90% of the original size isn't
+ unusual.
+ </para>
+ <para>
+ Not using compression will therefore slow down the transfer, and you should only
+ enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all
+ predefined action settings.
</para>
<para>
Note that some (rare) ill-configured sites don't handle requests for uncompressed
- documents correctly (they send an empty document body). If you use <literal>prevent-compression</literal>
- per default, you'll have to add exceptions for those sites. See the example for how to do that.
+ documents correctly. Broken PHP applications tend to send an empty document body,
+ some IIS versions only send the beginning of the content. If you enable
+ <literal>prevent-compression</literal> per default, you might want to add
+ exceptions for those sites. See the example for how to do that.
</para>
</listitem>
</varlistentry>
{ +prevent-compression }
/ # Match all sites
-# Then maybe make exceptions for ill-behaved sites:
+# Then maybe make exceptions for broken sites:
#
{ -prevent-compression }
- .debianhelp.org
- www.pclinuxonline.com</screen>
+.compusa.com/</screen>
</para>
</listitem>
</varlistentry>
<term>Parameter:</term>
<listitem>
<para>
- Any URL.
+ An absolute URL or a single pcrs command.
</para>
</listitem>
</varlistentry>
<term>Notes:</term>
<listitem>
<para>
- This action is useful to replace whole documents with ones of your
- choosing. This can be used to enforce safe surfing, or just as a simple
- convenience.
- </para>
- <para>
- You can do the same by combining the actions
- <literal><link linkend="block">block</link></literal>,
- <literal><link linkend="handle-as-image">handle-as-image</link></literal> and
- <literal><link linkend="set-image-blocker">set-image-blocker{URL}</link></literal>.
- It doesn't sound right for non-image documents, and that's why this action
- was created.
+ Requests to which this action applies are answered with a
+ HTTP redirect to URLs of your choosing. The new URL is
+ either provided as parameter, or derived by applying a
+ single pcrs command to the original URL.
</para>
<para>
This action will be ignored if you use it together with
<literal><link linkend="block">block</link></literal>.
+ It can be combined with
+ <literal><link linkend="fast-redirects">fast-redirects{check-decoded-url}</link></literal>
+ to redirect to a decoded version of a rewritten URL.
+ </para>
+ <para>
+ Use this action carefully, make sure not to create redirection loops
+ and be aware that using your own redirects might make it
+ possible to fingerprint your requests.
</para>
</listitem>
</varlistentry>
example.com/stylesheet\.css
# Create a short, easy to remember nickname for a favorite site
+# (relies on the browser accept and forward invalid URLs to &my-app;)
{ +redirect{http://www.privoxy.org/user-manual/actions-file.html} }
- a</screen>
+ a
+
+# Always use the expanded view for Undeadly.org articles
+# (Note the $ at the end of the URL pattern to make sure
+# the request for the rewritten URL isn't redirected as well)
+{+redirect{s@$@&mode=expanded@}}
+undeadly.org/cgi\?action=article&sid=\d*$</screen>
</para>
</listitem>
</varlistentry>
</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 renderas="sect4" id="server-header-filter">
+<title>server-header-filter</title>
+
+<variablelist>
+ <varlistentry>
+ <term>Typical use:</term>
+ <listitem>
+ <para>
+ Rewrite or remove single server headers.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Effect:</term>
+ <listitem>
+ <para>
+ All server headers to which this action applies are filtered on-the-fly
+ through the specified regular expression based substitutions.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Type:</term>
+ <!-- boolean, parameterized, Multi-value -->
+ <listitem>
+ <para>Parameterized.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Parameter:</term>
+ <listitem>
+ <para>
+ The name of a server-header filter, as defined in one of the
+ <link linkend="filter-file">filter files</link>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Notes:</term>
+ <listitem>
+ <para>
+ Server-header filters are applied to each header on its own, not to
+ all at once. This makes it easier to diagnose problems, but on the downside
+ you can't write filters that only change header x if header y's value is z.
+ You can do that by using tags though.
+ </para>
+ <para>
+ Server-header filters are executed after the other header actions have finished
+ and use their output as input.
+ </para>
+ <para>
+ Please refer to the <link linkend="filter-file">filter file chapter</link>
+ to learn which server-header filters are available by default, and how to
+ create your own.
+ </para>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Example usage (section):</term>
+ <listitem>
+ <para>
+ <screen>
+{+server-header-filter{html-to-xml}}
+example.org/xml-instance-that-is-delivered-as-html
+
+{+server-header-filter{xml-to-html}}
+example.org/instance-that-is-delivered-as-xml-but-is-not
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect3>
+
+
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 renderas="sect4" id="server-header-tagger">
+<title>server-header-tagger</title>
+
+<variablelist>
+ <varlistentry>
+ <term>Typical use:</term>
+ <listitem>
+ <para>
+ Disable or disable filters based on the Content-Type header.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Effect:</term>
+ <listitem>
+ <para>
+ Server headers to which this action applies are filtered on-the-fly through
+ the specified regular expression based substitutions, the result is used as
+ tag.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Type:</term>
+ <!-- boolean, parameterized, Multi-value -->
+ <listitem>
+ <para>Parameterized.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Parameter:</term>
+ <listitem>
+ <para>
+ The name of a server-header tagger, as defined in one of the
+ <link linkend="filter-file">filter files</link>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Notes:</term>
+ <listitem>
+ <para>
+ Server-header taggers are applied to each header on its own,
+ and as the header isn't modified, each tagger <quote>sees</quote>
+ the original.
+ </para>
+ <para>
+ Server-header taggers are executed before all other header actions
+ that modify server headers. Their tags can be used to control
+ all of the other server-header actions, the content filters
+ and the crunch actions (<link linkend="redirect">redirect</link>
+ and <link linkend="block">block</link>).
+ </para>
+ <para>
+ Obviously crunching based on tags created by server-header taggers
+ doesn't prevent the request from showing up in the server's log file.
+ </para>
+
+ </varlistentry>
+
+ <varlistentry>
+ <term>Example usage (section):</term>
+ <listitem>
+ <para>
+ <screen>
+# Tag every request with the declared content type
+{+client-header-filter{content-type}}
+/
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect3>
+
+
<!-- ~~~~~ New section ~~~~~ -->
<sect3 renderas="sect4" id="session-cookies-only">
<title>session-cookies-only</title>
<screen>+set-image-blocker{pattern}</screen>
</para>
<para>
- Redirect to the BSD devil:
+ Redirect to the BSD daemon:
</para>
<para>
<screen>+set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif}</screen>
For <quote>Connect</quote> requests the clients tell
<application>Privoxy</application> which host they are interested
in, but not which document they plan to get later. As a result, the
- <quote>Go there anyway</quote> link becomes rather useless:
- it lets the client request the home page of the forbidden host
- through unencrypted HTTP, still using the port of the last request.
- </para>
- <para>
- If you previously configured <application>Privoxy</application> to do the
- request through a SSL tunnel, everything will work. Most likely you haven't
- and the server will respond with an error message because it is expecting
- HTTPS (SSL).
+ <quote>Go there anyway</quote> wouldn't work and is therefore suppressed.
</para>
</listitem>
</varlistentry>
##########################################################################
{ \
-<link linkend="ADD-HEADER">add-header</link> \
+ -<link linkend="CLIENT-HEADER-FILTER">client-header-filter{hide-tor-exit-notation}</link> \
-<link linkend="BLOCK">block</link> \
-<link linkend="CONTENT-TYPE-OVERWRITE">content-type-overwrite</link> \
-<link linkend="CRUNCH-CLIENT-HEADER">crunch-client-header</link> \
-<link linkend="FILTER-FUN">filter{fun}</link> \
-<link linkend="FILTER-CRUDE-PARENTAL">filter{crude-parental}</link> \
+<link linkend="FILTER-IE-EXPLOITS">filter{ie-exploits}</link> \
- -<link linkend="FILTER-CLIENT-HEADERS">filter-client-headers</link> \
- -<link linkend="FILTER-SERVER-HEADERS">filter-server-headers</link> \
- -<link linkend="FILTER-GOOGLE">filter-google</link> \
- -<link linkend="FILTER-YAHOO">filter-yahoo</link> \
- -<link linkend="FILTER-MSN">filter-msn</link> \
- -<link linkend="FILTER-BLOGSPOT">filter-blogspot</link> \
- -<link linkend="FILTER-XML-TO-HTML">filter-xml-to-html</link> \
- -<link linkend="FILTER-HTML-TO-XML">filter-html-to-xml</link> \
- -<link linkend="FILTER-NO-PING">filter-no-ping</link> \
- -<link linkend="FILTER-HIDE-TOR-EXIT-NOTATION">filter-hide-tor-exit-notation</link> \
+ -<link linkend="FILTER-GOOGLE">filter{google}</link> \
+ -<link linkend="FILTER-YAHOO">filter{yahoo}</link> \
+ -<link linkend="FILTER-MSN">filter{msn}</link> \
+ -<link linkend="FILTER-BLOGSPOT">filter{blogspot}</link> \
+ -<link linkend="FILTER-NO-PING">filter{no-ping}</link> \
-<link linkend="FORCE-TEXT-MODE">force-text-mode</link> \
-<link linkend="HANDLE-AS-EMPTY-DOCUMENT">handle-as-empty-document</link> \
-<link linkend="HANDLE-AS-IMAGE">handle-as-image</link> \
-<link linkend="REDIRECT">redirect</link> \
-<link linkend="SEND-VANILLA-WAFER">send-vanilla-wafer</link> \
-<link linkend="SEND-WAFER">send-wafer</link> \
+ -<link linkend="SERVER-HEADER-FILTER">server-header-filter{xml-to-html}</link> \
+ -<link linkend="SERVER-HEADER-FILTER">server-header-filter{html-to-xml}</link> \
+<link linkend="SESSION-COOKIES-ONLY">session-cookies-only</link> \
+<link linkend="SET-IMAGE-BLOCKER">set-image-blocker{pattern}</link> \
-<link linkend="TREAT-FORBIDDEN-CONNECTS-LIKE-BLOCKS">treat-forbidden-connects-like-blocks</link> \
<title>Filter Files</title>
<para>
- On-the-fly text substitutions that can be invoked through the
- <literal><link linkend="filter">filter</link></literal> action need
+ On-the-fly text substitutions need
to be defined in a <quote>filter file</quote>. Once defined, they
- can then be invoked as an <quote>action</quote>. Multiple filter files can be
- defined through the <literal> <link
+ can then be invoked as an <quote>action</quote>.
+</para>
+
+<para>
+ &my-app; supports three different filter actions:
+ <literal><link linkend="filter">filter</link></literal> to
+ rewrite the content that is send to the client,
+ <literal><link linkend="client-header-filter">client-header-filter</link></literal>
+ to rewrite headers that are send by the client, and
+ <literal><link linkend="server-header-filter">server-header-filter</link></literal>
+ to rewrite headers that are send by the server, and
+</para>
+
+<para>
+ &my-app; also supports two tagger actions:
+ <literal><link linkend="client-header-tagger">client-header-tagger</link></literal>
+ and
+ <literal><link linkend="server-header-tagger">server-header-tagger</link></literal>.
+ Taggers and filters use the same syntax in the filter files, the differnce
+ is that taggers don't modify the text they are filtering, but use a rewritten
+ version of the filtered text as tag. The tags can then be used to change the
+ applying actions through sections with <link linkend="tag-pattern">tag-patterns</link>.
+</para>
+
+
+<para>
+ Multiple filter files can be defined through the <literal> <link
linkend="filterfile">filterfile</link></literal> config directive. The filters
as supplied by the developers will be found in
<filename>default.filter</filename>. It is recommended that any locally
</para>
<para>
- Typical reasons for doing these kinds of substitutions are to eliminate
- common annoyances in HTML and JavaScript, such as pop-up windows,
+ Command tasks for content filters are to eliminate common annoyances in
+ HTML and JavaScript, such as pop-up windows,
exit consoles, crippled windows without navigation tools, the
infamous <BLINK> tag etc, to suppress images with certain
width and height attributes (standard banner sizes or web-bugs),
- or just to have fun. The possibilities are endless.
+ or just to have fun.
</para>
<para>
- Filtering works on any text-based document type, including
+ Content filtering works on any text-based document type, including
HTML, JavaScript, CSS etc. (all <literal>text/*</literal>
MIME types, <emphasis>except</emphasis> <literal>text/plain</literal>).
Substitutions are made at the source level, so if you want to <quote>roll
your own</quote> filters, you should first be familiar with HTML syntax,
- and, of course, regular expressions. By default, filters are only applied
- to the raw document content, but can be extended to the HTTP headers with
- the supplemental actions:
- <link linkend="filter-client-headers">filter-client-headers</link> and
- <link linkend="filter-server-headers">filter-server-headers</link>.
+ and, of course, regular expressions.
</para>
<para>
Just like the <link linkend="actions-file">actions files</link>, the
filter file is organized in sections, which are called <emphasis>filters</emphasis>
- here. Each filter consists of a heading line, that starts with the
- <emphasis>keyword</emphasis> <literal>FILTER:</literal>, followed by
- the filter's <emphasis>name</emphasis>, and a short (one line)
+ here. Each filter consists of a heading line, that starts with one of the
+ <emphasis>keywords</emphasis> <literal>FILTER:</literal>,
+ <literal>CLIENT-HEADER-FILTER:</literal> or <literal>SERVER-HEADER-FILTER:</literal>
+ followed by the filter's <emphasis>name</emphasis>, and a short (one line)
<emphasis>description</emphasis> of what it does. Below that line
come the <emphasis>jobs</emphasis>, i.e. lines that define the actual
text substitutions. By convention, the name of a filter
</para>
<para>
- A filter header line for a filter called <quote>foo</quote> could look
+ Filter definitions start with a header line that contains the filter
+ type, the filter name and the filter description.
+ A content filter header line for a filter called <quote>foo</quote> could look
like this:
</para>
<sect2><title>Filter File Tutorial</title>
<para>
- Now, let's complete our <quote>foo</quote> filter. We have already defined
+ Now, let's complete our <quote>foo</quote> content filter. We have already defined
the heading, but the jobs are still missing. Since all it does is to replace
<quote>foo</quote> with <quote>bar</quote>, there is only one (trivial) job
needed:
<term><emphasis>xml-to-html</emphasis></term>
<listitem>
<para>
- Header filter to change the Content-Type from xml to html.
+ Server-header filter to change the Content-Type from xml to html.
</para>
</listitem>
</varlistentry>
<term><emphasis>html-to-xml</emphasis></term>
<listitem>
<para>
- Header filter to change the Content-Type from html to xml.
+ Server-header filter to change the Content-Type from html to xml.
</para>
</listitem>
</varlistentry>
<term><emphasis>hide-tor-exit-notation</emphasis></term>
<listitem>
<para>
- Header filter to remove the <command>Tor</command> exit node notation
+ Client-header filter to remove the <command>Tor</command> exit node notation
found in Host and Referer headers.
</para>
+ <para>
+ If &my-app; and <command>Tor</command> are chained and &my-app;
+ is configured to use socks4a, one can use <quote>http://www.example.org.foobar.exit/</quote>
+ to access the host <quote>www.example.org</quote> through the
+ <command>Tor</command> exit node <quote>foobar</quote>.
+ </para>
+ <para>
+ As the HTTP client isn't aware of this notation, it treats the
+ whole string <quote>www.example.org.foobar.exit</quote> as host and uses it
+ for the <quote>Host</quote> and <quote>Referer</quote> headers. From the
+ server's point of view the resulting headers are invalid and can cause problems.
+ </para>
+ <para>
+ An invalid <quote>Referer</quote> header can trigger <quote>hot-linking</quote>
+ protections, an invalid <quote>Host</quote> header will make it impossible for
+ the server to find the right vhost (several domains hosted on the same IP address).
+ </para>
+ <para>
+ This client-header filter removes the <quote>foo.exit</quote> part in those headers
+ to prevent the mentioned problems. Note that it only modifies
+ the HTTP headers, it doesn't make it impossible for the server
+ to detect your <command>Tor</command> exit node based on the IP address
+ the request is coming from.
+ </para>
</listitem>
</varlistentry>
{-add-header
-block
+ -client-header-filter{hide-tor-exit-notation}
-content-type-overwrite
-crunch-client-header
-crunch-if-none-match
-filter {yahoo}
-filter {msn}
-filter {blogspot}
- -filter {xml-to-html}
- -filter {html-to-xml}
-filter {no-ping}
- -filter{hide-tor-exit-notation}
- -filter-client-headers
- -filter-server-headers
-force-text-mode
-handle-as-empty-document
-handle-as-image
-redirect
-send-vanilla-wafer
-send-wafer
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+session-cookies-only
+set-image-blocker {pattern}
-treat-forbidden-connects-like-blocks }
-add-header
-block
+ -client-header-filter{hide-tor-exit-notation}
-content-type-overwrite
-crunch-client-header
-crunch-if-none-match
-filter {yahoo}
-filter {msn}
-filter {blogspot}
- -filter {xml-to-html}
- -filter {html-to-xml}
-filter {no-ping}
- -filter{hide-tor-exit-notation}
- -filter-client-headers
- -filter-server-headers
-force-text-mode
-handle-as-empty-document
-handle-as-image
-redirect
-send-vanilla-wafer
-send-wafer
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
-session-cookies-only
+set-image-blocker {pattern}
-treat-forbidden-connects-like-blocks </screen>
{-add-header
-block
+ -client-header-filter{hide-tor-exit-notation}
-content-type-overwrite
-crunch-client-header
-crunch-if-none-match
-filter {yahoo}
-filter {msn}
-filter {blogspot}
- -filter {xml-to-html}
- -filter {html-to-xml}
-filter {no-ping}
- -filter{hide-tor-exit-notation}
- -filter-client-headers
- -filter-server-headers
-force-text-mode
-handle-as-empty-document
-handle-as-image
+prevent-compression
-redirect
-send-vanilla-wafer
- -send-wafer
+ -send-wafer
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+session-cookies-only
+set-image-blocker{blank}
-treat-forbidden-connects-like-blocks }
USA
$Log: user-manual.sgml,v $
+ Revision 2.31 2007/06/02 14:01:37 fabiankeil
+ Start to document forward-override{}.
+
+ Revision 2.30 2007/04/25 15:10:36 fabiankeil
+ - Describe installation for FreeBSD.
+ - Start to document taggers and tag patterns.
+ - Don't confuse devils and daemons.
+
+ Revision 2.29 2007/04/05 11:47:51 fabiankeil
+ Some updates regarding header filtering,
+ handling of compressed content and redirect's
+ support for pcrs commands.
+
+ Revision 2.28 2006/12/10 23:42:48 hal9
+ Fix various typos reported by Adam P. Thanks.
+
Revision 2.27 2006/11/14 01:57:47 hal9
Dump all docs prior to 3.0.6 release. Various minor changes to faq and user
manual.