-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="limit-connect">
-<title>limit-connect</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>Prevent abuse of <application>Privoxy</application> as a TCP proxy relay or disable SSL for untrusted sites</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Specifies to which ports HTTP CONNECT requests are allowable.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Parameterized.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- A comma-separated list of ports or port ranges (the latter using dashes, with the minimum
- defaulting to 0 and the maximum to 65K).
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- By default, i.e. if no <literal>limit-connect</literal> action applies,
- <application>Privoxy</application> allows HTTP CONNECT requests to all
- ports. Use <literal>limit-connect</literal> if fine-grained control
- is desired for some or all destinations.
- </para>
- <para>
- The CONNECT methods exists in HTTP to allow access to secure websites
- (<quote>https://</quote> URLs) through proxies. It works very simply:
- the proxy connects to the server on the specified port, and then
- short-circuits its connections to the client and to the remote server.
- This means CONNECT-enabled proxies can be used as TCP relays very easily.
- </para>
- <para>
- <application>Privoxy</application> relays HTTPS traffic without seeing
- the decoded content. Websites can leverage this limitation to circumvent &my-app;'s
- filters. By specifying an invalid port range you can disable HTTPS entirely.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usages:</term>
- <listitem>
- <!-- I had trouble getting the spacing to look right in my browser -->
- <!-- I probably have the wrong font setup, bollocks. -->
- <!-- Apparently the emphasis tag uses a proportional font no matter what -->
- <para>
- <screen>+limit-connect{443} # Port 443 is OK.
-+limit-connect{80,443} # Ports 80 and 443 are OK.
-+limit-connect{-3, 7, 20-100, 500-} # Ports less than 3, 7, 20 to 100 and above 500 are OK.
-+limit-connect{-} # All ports are OK
-+limit-connect{,} # No HTTPS/SSL traffic is allowed</screen>
- </para>
- </listitem>
- </varlistentry>
-</variablelist>
-</sect3>
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="prevent-compression">
-<title>prevent-compression</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- Ensure that servers send the content uncompressed, so it can be
- passed through <literal><link linkend="filter">filter</link></literal>s.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Removes the Accept-Encoding header which can be used to ask for compressed transfer.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Boolean.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- N/A
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- More and more websites send their content compressed by default, which
- is generally a good idea and saves bandwidth. But the <literal><link
- linkend="filter">filter</link></literal> and
- <literal><link linkend="deanimate-gifs">deanimate-gifs</link></literal>
- actions need access to the uncompressed data.
- </para>
- <para>
- When compiled with zlib support (available since &my-app; 3.0.7), content that should be
- filtered is decompressed on-the-fly and you don't have to worry about this action.
- If you are using an older &my-app; version, or one that hasn't been compiled with zlib
- support, this action can be used to convince the server to send the content uncompressed.
- </para>
- <para>
- Most text-based instances compress very well, the size is seldom decreased by less than 50%,
- for markup-heavy instances like news feeds saving more than 90% of the original size isn't
- unusual.
- </para>
- <para>
- Not using compression will therefore slow down the transfer, and you should only
- enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all
- predefined action settings.
- </para>
- <para>
- Note that some (rare) ill-configured sites don't handle requests for uncompressed
- documents correctly. Broken PHP applications tend to send an empty document body,
- some IIS versions only send the beginning of the content. If you enable
- <literal>prevent-compression</literal> per default, you might want to add
- exceptions for those sites. See the example for how to do that.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage (sections):</term>
- <listitem>
- <para>
- <screen>
-# Selectively turn off compression, and enable a filter
-#
-{ +filter{tiny-textforms} +prevent-compression }
-# Match only these sites
- .google.
- sourceforge.net
- sf.net
-
-# Or instead, we could set a universal default:
-#
-{ +prevent-compression }
- / # Match all sites
-
-# Then maybe make exceptions for broken sites:
-#
-{ -prevent-compression }
-.compusa.com/</screen>
- </para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="overwrite-last-modified">
-<title>overwrite-last-modified</title>
-<!--
-new action
--->
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>Prevent yet another way to track the user's steps between sessions.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Deletes the <quote>Last-Modified:</quote> HTTP server header or modifies its value.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Parameterized.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- One of the keywords: <quote>block</quote>, <quote>reset-to-request-time</quote>
- and <quote>randomize</quote>
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- Removing the <quote>Last-Modified:</quote> header is useful for filter
- testing, where you want to force a real reload instead of getting status
- code <quote>304</quote>, which would cause the browser to reuse the old
- version of the page.
- </para>
- <para>
- The <quote>randomize</quote> option overwrites the value of the
- <quote>Last-Modified:</quote> header with a randomly chosen time
- between the original value and the current time. In theory the server
- could send each document with a different <quote>Last-Modified:</quote>
- header to track visits without using cookies. <quote>Randomize</quote>
- makes it impossible and the browser can still revalidate cached documents.
- </para>
- <para>
- <quote>reset-to-request-time</quote> overwrites the value of the
- <quote>Last-Modified:</quote> header with the current time. You could use
- this option together with
- <literal><link linkend="hide-if-modified-since">hide-if-modified-since</link></literal>
- to further customize your random range.
- </para>
- <para>
- The preferred parameter here is <quote>randomize</quote>. It is safe
- to use, as long as the time settings are more or less correct.
- If the server sets the <quote>Last-Modified:</quote> header to the time
- of the request, the random range becomes zero and the value stays the same.
- Therefore you should later randomize it a second time with
- <literal><link linkend="hide-if-modified-since">hided-if-modified-since</link></literal>,
- just to be sure.
- </para>
- <para>
- It is also recommended to use this action together with
- <literal><link linkend="crunch-if-none-match">crunch-if-none-match</link></literal>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage:</term>
- <listitem>
- <para>
- <screen># Let the browser revalidate without being tracked across sessions
-{ +hide-if-modified-since{-60} \
- +overwrite-last-modified{randomize} \
- +crunch-if-none-match}
-/</screen>
- </para>
- </listitem>
- </varlistentry>
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="redirect">
-<title>redirect</title>
-<!--
-new action
--->
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- Redirect requests to other sites.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Convinces the browser that the requested document has been moved
- to another location and the browser should get it from there.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Parameterized</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- An absolute URL or a single pcrs command.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- Requests to which this action applies are answered with a
- HTTP redirect to URLs of your choosing. The new URL is
- either provided as parameter, or derived by applying a
- single pcrs command to the original URL.
- </para>
- <para>
- This action will be ignored if you use it together with
- <literal><link linkend="block">block</link></literal>.
- It can be combined with
- <literal><link linkend="fast-redirects">fast-redirects{check-decoded-url}</link></literal>
- to redirect to a decoded version of a rewritten URL.
- </para>
- <para>
- Use this action carefully, make sure not to create redirection loops
- and be aware that using your own redirects might make it
- possible to fingerprint your requests.
- </para>
- <para>
- In case of problems with your redirects, or simply to watch
- them working, enable <link linkend="DEBUG">debug 128</link>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usages:</term>
- <listitem>
- <para>
- <screen># Replace example.com's style sheet with another one
-{ +redirect{http://localhost/css-replacements/example.com.css} }
- example.com/stylesheet\.css
-
-# Create a short, easy to remember nickname for a favorite site
-# (relies on the browser accept and forward invalid URLs to &my-app;)
-{ +redirect{http://www.privoxy.org/user-manual/actions-file.html} }
- a
-
-# Always use the expanded view for Undeadly.org articles
-# (Note the $ at the end of the URL pattern to make sure
-# the request for the rewritten URL isn't redirected as well)
-{+redirect{s@$@&mode=expanded@}}
-undeadly.org/cgi\?action=article&sid=\d*$
-
-# Redirect Google search requests to MSN
-{+redirect{s@^http://[^/]*/search\?q=([^&]*).*@http://search.msn.com/results.aspx?q=$1@}}
-.google.com/search
-
-# Redirect MSN search requests to Yahoo
-{+redirect{s@^http://[^/]*/results\.aspx\?q=([^&]*).*@http://search.yahoo.com/search?p=$1@}}
-search.msn.com//results\.aspx\?q=
-
-# Redirect remote requests for this manual
-# to the local version delivered by Privoxy
-{+redirect{s@^http://www@http://config@}}
-www.privoxy.org/user-manual/</screen>
- </para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="server-header-filter">
-<title>server-header-filter</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- Rewrite or remove single server headers.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- All server headers to which this action applies are filtered on-the-fly
- through the specified regular expression based substitutions.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- boolean, parameterized, Multi-value -->
- <listitem>
- <para>Parameterized.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- The name of a server-header filter, as defined in one of the
- <link linkend="filter-file">filter files</link>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- Server-header filters are applied to each header on its own, not to
- all at once. This makes it easier to diagnose problems, but on the downside
- you can't write filters that only change header x if header y's value is z.
- You can do that by using tags though.
- </para>
- <para>
- Server-header filters are executed after the other header actions have finished
- and use their output as input.
- </para>
- <para>
- Please refer to the <link linkend="filter-file">filter file chapter</link>
- to learn which server-header filters are available by default, and how to
- create your own.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage (section):</term>
- <listitem>
- <para>
- <screen>
-{+server-header-filter{html-to-xml}}
-example.org/xml-instance-that-is-delivered-as-html
-
-{+server-header-filter{xml-to-html}}
-example.org/instance-that-is-delivered-as-xml-but-is-not
- </screen>
- </para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="server-header-tagger">
-<title>server-header-tagger</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- Enable or disable filters based on the Content-Type header.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Server headers to which this action applies are filtered on-the-fly through
- the specified regular expression based substitutions, the result is used as
- tag.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- boolean, parameterized, Multi-value -->
- <listitem>
- <para>Parameterized.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- The name of a server-header tagger, as defined in one of the
- <link linkend="filter-file">filter files</link>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- Server-header taggers are applied to each header on its own,
- and as the header isn't modified, each tagger <quote>sees</quote>
- the original.
- </para>
- <para>
- Server-header taggers are executed before all other header actions
- that modify server headers. Their tags can be used to control
- all of the other server-header actions, the content filters
- and the crunch actions (<link linkend="redirect">redirect</link>
- and <link linkend="block">block</link>).
- </para>
- <para>
- Obviously crunching based on tags created by server-header taggers
- doesn't prevent the request from showing up in the server's log file.
- </para>
-
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage (section):</term>
- <listitem>
- <para>
- <screen>
-# Tag every request with the content type declared by the server
-{+server-header-tagger{content-type}}
-/
- </screen>
- </para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="session-cookies-only">
-<title>session-cookies-only</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>
- Allow only temporary <quote>session</quote> cookies (for the current
- browser session <emphasis>only</emphasis>).
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- Deletes the <quote>expires</quote> field from <quote>Set-Cookie:</quote>
- server headers. Most browsers will not store such cookies permanently and
- forget them in between sessions.
- </para>
- </listitem>
- </varlistentry>
-
-<varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Boolean.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <para>
- N/A
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- This is less strict than <literal><link linkend="crunch-incoming-cookies">crunch-incoming-cookies</link></literal> /
- <literal><link linkend="crunch-outgoing-cookies">crunch-outgoing-cookies</link></literal> and allows you to browse
- websites that insist or rely on setting cookies, without compromising your privacy too badly.
- </para>
- <para>
- Most browsers will not permanently store cookies that have been processed by
- <literal>session-cookies-only</literal> and will forget about them between sessions.
- This makes profiling cookies useless, but won't break sites which require cookies so
- that you can log in for transactions. This is generally turned on for all
- sites, and is the recommended setting.
- </para>
- <para>
- It makes <emphasis>no sense at all</emphasis> to use <literal>session-cookies-only</literal>
- together with <literal><link linkend="crunch-incoming-cookies">crunch-incoming-cookies</link></literal> or
- <literal><link linkend="crunch-outgoing-cookies">crunch-outgoing-cookies</link></literal>. If you do, cookies
- will be plainly killed.
- </para>
- <para>
- Note that it is up to the browser how it handles such cookies without an <quote>expires</quote>
- field. If you use an exotic browser, you might want to try it out to be sure.
- </para>
- <para>
- This setting also has no effect on cookies that may have been stored
- previously by the browser before starting <application>Privoxy</application>.
- These would have to be removed manually.
- </para>
- <para>
- <application>Privoxy</application> also uses
- the <link linkend="filter-content-cookies">content-cookies filter</link>
- to block some types of cookies. Content cookies are not effected by
- <literal>session-cookies-only</literal>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage:</term>
- <listitem>
- <para>
- <screen>+session-cookies-only</screen>
- </para>
- </listitem>
- </varlistentry>
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3 renderas="sect4" id="set-image-blocker">
-<title>set-image-blocker</title>
-
-<variablelist>
- <varlistentry>
- <term>Typical use:</term>
- <listitem>
- <para>Choose the replacement for blocked images</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Effect:</term>
- <listitem>
- <para>
- This action alone doesn't do anything noticeable. If <emphasis>both</emphasis>
- <literal><link linkend="block">block</link></literal> <emphasis>and</emphasis> <literal><link
- linkend="handle-as-image">handle-as-image</link></literal> <emphasis>also</emphasis>
- apply, i.e. if the request is to be blocked as an image,
- <emphasis>then</emphasis> the parameter of this action decides what will be
- sent as a replacement.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Type:</term>
- <!-- Boolean, Parameterized, Multi-value -->
- <listitem>
- <para>Parameterized.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Parameter:</term>
- <listitem>
- <itemizedlist>
- <listitem>
- <para>
- <quote>pattern</quote> to send a built-in checkerboard pattern image. The image is visually
- decent, scales very well, and makes it obvious where banners were busted.
- </para>
- </listitem>
- <listitem>
- <para>
- <quote>blank</quote> to send a built-in transparent image. This makes banners disappear
- completely, but makes it hard to detect where <application>Privoxy</application> has blocked
- images on a given page and complicates troubleshooting if <application>Privoxy</application>
- has blocked innocent images, like navigation icons.
- </para>
- </listitem>
- <listitem>
- <para>
- <quote><replaceable class="parameter">target-url</replaceable></quote> to
- send a redirect to <replaceable class="parameter">target-url</replaceable>. You can redirect
- to any image anywhere, even in your local filesystem via <quote>file:///</quote> URL.
- (But note that not all browsers support redirecting to a local file system).
- </para>
- <para>
- A good application of redirects is to use special <application>Privoxy</application>-built-in
- URLs, which send the built-in images, as <replaceable class="parameter">target-url</replaceable>.
- This has the same visual effect as specifying <quote>blank</quote> or <quote>pattern</quote> in
- the first place, but enables your browser to cache the replacement image, instead of requesting
- it over and over again.
- </para>
- </listitem>
- </itemizedlist>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- The URLs for the built-in images are <quote>http://config.privoxy.org/send-banner?type=<replaceable
- class="parameter">type</replaceable></quote>, where <replaceable class="parameter">type</replaceable> is
- either <quote>blank</quote> or <quote>pattern</quote>.
- </para>
- <para>
- There is a third (advanced) type, called <quote>auto</quote>. It is <emphasis>NOT</emphasis> to be
- used in <literal>set-image-blocker</literal>, but meant for use from <link linkend="filter-file">filters</link>.
- Auto will select the type of image that would have applied to the referring page, had it been an image.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Example usage:</term>
- <listitem>
- <para>
- Built-in pattern:
- </para>
- <para>
- <screen>+set-image-blocker{pattern}</screen>
- </para>
- <para>
- Redirect to the BSD daemon:
- </para>
- <para>
- <screen>+set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif}</screen>
- </para>
- <para>
- Redirect to the built-in pattern for better caching:
- </para>
- <para>
- <screen>+set-image-blocker{http://config.privoxy.org/send-banner?type=pattern}</screen>
- </para>
- </listitem>
- </varlistentry>
-</variablelist>
-</sect3>
-
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect3>
-<title>Summary</title>
-<para>
- Note that many of these actions have the potential to cause a page to
- misbehave, possibly even not to display at all. There are many ways
- a site designer may choose to design his site, and what HTTP header
- content, and other criteria, he may depend on. There is no way to have hard
- and fast rules for all sites. See the <link
- linkend="ACTIONSANAT">Appendix</link> for a brief example on troubleshooting
- actions.
-</para>
-</sect3>
-</sect2>
-
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="aliases">
-<title>Aliases</title>
-<para>
- Custom <quote>actions</quote>, known to <application>Privoxy</application>
- as <quote>aliases</quote>, can be defined by combining other actions.
- These can in turn be invoked just like the built-in actions.
- Currently, an alias name can contain any character except space, tab,
- <quote>=</quote>,
- <quote>{</quote> and <quote>}</quote>, but we <emphasis>strongly
- recommend</emphasis> that you only use <quote>a</quote> to <quote>z</quote>,
- <quote>0</quote> to <quote>9</quote>, <quote>+</quote>, and <quote>-</quote>.
- Alias names are not case sensitive, and are not required to start with a
- <quote>+</quote> or <quote>-</quote> sign, since they are merely textually
- expanded.
-</para>
-<para>
- Aliases can be used throughout the actions file, but they <emphasis>must be
- defined in a special section at the top of the file!</emphasis>
- And there can only be one such section per actions file. Each actions file may
- have its own alias section, and the aliases defined in it are only visible
- within that file.
-</para>
-<para>
- There are two main reasons to use aliases: One is to save typing for frequently
- used combinations of actions, the other one is a gain in flexibility: If you
- decide once how you want to handle shops by defining an alias called
- <quote>shop</quote>, you can later change your policy on shops in
- <emphasis>one</emphasis> place, and your changes will take effect everywhere
- in the actions file where the <quote>shop</quote> alias is used. Calling aliases
- by their purpose also makes your actions files more readable.
-</para>
-<para>
- Currently, there is one big drawback to using aliases, though:
- <application>Privoxy</application>'s built-in web-based action file
- editor honors aliases when reading the actions files, but it expands
- them before writing. So the effects of your aliases are of course preserved,
- but the aliases themselves are lost when you edit sections that use aliases
- with it.
-</para>
-
-<para>
- Now let's define some aliases...
-</para>
-
-<para>
- <screen>
- # Useful custom aliases we can use later.
- #
- # Note the (required!) section header line and that this section
- # must be at the top of the actions file!
- #
- {{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = +<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> +<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- -crunch-all-cookies = -<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> -<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- +block-as-image = +block{Blocked image.} +handle-as-image
- allow-all-cookies = -crunch-all-cookies -<link linkend="SESSION-COOKIES-ONLY">session-cookies-only</link> -<link linkend="FILTER-CONTENT-COOKIES">filter{content-cookies}</link>
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = -<link linkend="BLOCK">block</link> -<link linkend="FILTER">filter</link> -crunch-all-cookies -<link linkend="FAST-REDIRECTS">fast-redirects</link> -<link linkend="HIDE-REFERER">hide-referrer</link> -<link linkend="PREVENT-COMPRESSION">prevent-compression</link>
-
- shop = -crunch-all-cookies -<link linkend="FILTER-ALL-POPUPS">filter{all-popups}</link>
-
- # Short names for other aliases, for really lazy people ;-)
- #
- c0 = +crunch-all-cookies
- c1 = -crunch-all-cookies</screen>
-</para>
-
-<para>
- ...and put them to use. These sections would appear in the lower part of an
- actions file and define exceptions to the default actions (as specified further
- up for the <quote>/</quote> pattern):
-</para>
-
-<para>
- <screen>
- # These sites are either very complex or very keen on
- # user data and require minimal interference to work:
- #
- {fragile}
- .office.microsoft.com
- .windowsupdate.microsoft.com
- # Gmail is really mail.google.com, not gmail.com
- mail.google.com
-
- # Shopping sites:
- # Allow cookies (for setting and retrieving your customer data)
- #
- {shop}
- .quietpc.com
- .worldpay.com # for quietpc.com
- mybank.example.com
-
- # These shops require pop-ups:
- #
- {-filter{all-popups} -filter{unsolicited-popups}}
- .dabs.com
- .overclockers.co.uk</screen>
-</para>
-
-<para>
- Aliases like <quote>shop</quote> and <quote>fragile</quote> are typically used for
- <quote>problem</quote> sites that require more than one action to be disabled
- in order to function properly.
-</para>
-</sect2>
-<!--
-hal stop here
--->
-<!-- ~~~~~ New section ~~~~~ -->
-<sect2 id="act-examples">
-<title>Actions Files Tutorial</title>
-<para>
- The above chapters have shown <link linkend="actions-file">which actions files
- there are and how they are organized</link>, how actions are <link
- linkend="actions">specified</link> and <link linkend="actions-apply">applied
- to URLs</link>, how <link linkend="af-patterns">patterns</link> work, and how to
- define and use <link linkend="aliases">aliases</link>. Now, let's look at an
- example <filename>match-all.action</filename>, <filename>default.action</filename>
- and <filename>user.action</filename> file and see how all these pieces come together:
-</para>
-
-<sect3>
-<title>match-all.action</title>
-<para>
- Remember <emphasis>all actions are disabled when matching starts</emphasis>,
- so we have to explicitly enable the ones we want.
-</para>
-
-<para>
- While the <filename>match-all.action</filename> file only contains a
- single section, it is probably the most important one. It has only one
- pattern, <quote><literal>/</literal></quote>, but this pattern
- <link linkend="af-patterns">matches all URLs</link>. Therefore, the set of
- actions used in this <quote>default</quote> section <emphasis>will
- be applied to all requests as a start</emphasis>. It can be partly or
- wholly overridden by other actions files like <filename>default.action</filename>
- and <filename>user.action</filename>, but it will still be largely responsible
- for your overall browsing experience.
-</para>
-
-<para>
- Again, at the start of matching, all actions are disabled, so there is
- no need to disable any actions here. (Remember: a <quote>+</quote>
- preceding the action name enables the action, a <quote>-</quote> disables!).
- Also note how this long line has been made more readable by splitting it into
- multiple lines with line continuation.
-</para>
-
-<para>
- <screen>
-{ \
- +<link linkend="CHANGE-X-FORWARDED-FOR">change-x-forwarded-for{block}</link> \
- +<link linkend="HIDE-FROM-HEADER">hide-from-header{block}</link> \
- +<link linkend="SET-IMAGE-BLOCKER">set-image-blocker{pattern}</link> \
-}
-/ # Match all URLs
- </screen>
-</para>
-
-<para>
- The default behavior is now set.
-</para>
-</sect3>
-
-<sect3>
-<title>default.action</title>
-
-<para>
- If you aren't a developer, there's no need for you to edit the
- <filename>default.action</filename> file. It is maintained by
- the &my-app; developers and if you disagree with some of the
- sections, you should overrule them in your <filename>user.action</filename>.
-</para>
-
-<para>
- Understanding the <filename>default.action</filename> file can
- help you with your <filename>user.action</filename>, though.
-</para>
-
-<para>
- The first section in this file is a special section for internal use
- that prevents older &my-app; versions from reading the file:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Settings -- Don't change! For internal Privoxy use ONLY.
-##########################################################################
-{{settings}}
-for-privoxy-version=3.0.11</screen>
-</para>
-
-<para>
- After that comes the (optional) alias section. We'll use the example
- section from the above <link linkend="aliases">chapter on aliases</link>,
- that also explains why and how aliases are used:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Aliases
-##########################################################################
-{{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = +<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> +<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- -crunch-all-cookies = -<link linkend="CRUNCH-INCOMING-COOKIES">crunch-incoming-cookies</link> -<link linkend="CRUNCH-OUTGOING-COOKIES">crunch-outgoing-cookies</link>
- +block-as-image = +block{Blocked image.} +handle-as-image
- mercy-for-cookies = -crunch-all-cookies -<link linkend="SESSION-COOKIES-ONLY">session-cookies-only</link> -<link linkend="FILTER-CONTENT-COOKIES">filter{content-cookies}</link>
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = -<link linkend="BLOCK">block</link> -<link linkend="FILTER">filter</link> -crunch-all-cookies -<link linkend="FAST-REDIRECTS">fast-redirects</link> -<link linkend="HIDE-REFERER">hide-referrer</link>
- shop = -crunch-all-cookies -<link linkend="FILTER-ALL-POPUPS">filter{all-popups}</link></screen>
-</para>
-
-<para>
- The first of our specialized sections is concerned with <quote>fragile</quote>
- sites, i.e. sites that require minimum interference, because they are either
- very complex or very keen on tracking you (and have mechanisms in place that
- make them unusable for people who avoid being tracked). We will simply use
- our pre-defined <literal>fragile</literal> alias instead of stating the list
- of actions explicitly:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Exceptions for sites that'll break under the default action set:
-##########################################################################
-
-# "Fragile" Use a minimum set of actions for these sites (see alias above):
-#
-{ fragile }
-.office.microsoft.com # surprise, surprise!
-.windowsupdate.microsoft.com
-mail.google.com</screen>
-</para>
-
-<para>
- Shopping sites are not as fragile, but they typically
- require cookies to log in, and pop-up windows for shopping
- carts or item details. Again, we'll use a pre-defined alias:
-</para>
-
-<para>
- <screen>
-# Shopping sites:
-#
-{ shop }
-.quietpc.com
-.worldpay.com # for quietpc.com
-.jungle.com
-.scan.co.uk</screen>
-</para>
-
-<para>
- The <literal><link linkend="FAST-REDIRECTS">fast-redirects</link></literal>
- action, which may have been enabled in <filename>match-all.action</filename>,
- breaks some sites. So disable it for popular sites where we know it misbehaves:
-</para>
-
-<para>
- <screen>
-{ -<link linkend="FAST-REDIRECTS">fast-redirects</link> }
-login.yahoo.com
-edit.*.yahoo.com
-.google.com
-.altavista.com/.*(like|url|link):http
-.altavista.com/trans.*urltext=http
-.nytimes.com</screen>
-</para>
-
-<para>
- It is important that <application>Privoxy</application> knows which
- URLs belong to images, so that <emphasis>if</emphasis> they are to
- be blocked, a substitute image can be sent, rather than an HTML page.
- Contacting the remote site to find out is not an option, since it
- would destroy the loading time advantage of banner blocking, and it
- would feed the advertisers information about you. We can mark any
- URL as an image with the <literal><link
- linkend="handle-as-image">handle-as-image</link></literal> action,
- and marking all URLs that end in a known image file extension is a
- good start:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Images:
-##########################################################################
-
-# Define which file types will be treated as images, in case they get
-# blocked further down this file:
-#
-{ +<link linkend="HANDLE-AS-IMAGE">handle-as-image</link> }
-/.*\.(gif|jpe?g|png|bmp|ico)$</screen>
-</para>
-
-<para>
- And then there are known banner sources. They often use scripts to
- generate the banners, so it won't be visible from the URL that the
- request is for an image. Hence we block them <emphasis>and</emphasis>
- mark them as images in one go, with the help of our
- <literal>+block-as-image</literal> alias defined above. (We could of
- course just as well use <literal>+<link linkend="block">block</link>
- +<link linkend="handle-as-image">handle-as-image</link></literal> here.)
- Remember that the type of the replacement image is chosen by the
- <literal><link linkend="set-image-blocker">set-image-blocker</link></literal>
- action. Since all URLs have matched the default section with its
- <literal>+<link linkend="set-image-blocker">set-image-blocker</link>{pattern}</literal>
- action before, it still applies and needn't be repeated:
-</para>
-
-<para>
- <screen>
-# Known ad generators:
-#
-{ +block-as-image }
-ar.atwola.com
-.ad.doubleclick.net
-.ad.*.doubleclick.net
-.a.yimg.com/(?:(?!/i/).)*$
-.a[0-9].yimg.com/(?:(?!/i/).)*$
-bs*.gsanet.com
-.qkimg.net</screen>
-</para>
-
-<para>
- One of the most important jobs of <application>Privoxy</application>
- is to block banners. Many of these can be <quote>blocked</quote>
- by the <literal><link linkend="filter">filter</link>{banners-by-size}</literal>
- action, which we enabled above, and which deletes the references to banner
- images from the pages while they are loaded, so the browser doesn't request
- them anymore, and hence they don't need to be blocked here. But this naturally
- doesn't catch all banners, and some people choose not to use filters, so we
- need a comprehensive list of patterns for banner URLs here, and apply the
- <literal><link linkend="block">block</link></literal> action to them.
-</para>
-<para>
- First comes many generic patterns, which do most of the work, by
- matching typical domain and path name components of banners. Then comes
- a list of individual patterns for specific sites, which is omitted here
- to keep the example short:
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Block these fine banners:
-##########################################################################
-{ <link linkend="BLOCK">+block{Banner ads.}</link> }
-
-# Generic patterns:
-#
-ad*.
-.*ads.
-banner?.
-count*.
-/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
-/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
-
-# Site-specific patterns (abbreviated):
-#
-.hitbox.com</screen>
-</para>
-
-<para>
- It's quite remarkable how many advertisers actually call their banner
- servers ads.<replaceable>company</replaceable>.com, or call the directory
- in which the banners are stored simply <quote>banners</quote>. So the above
- generic patterns are surprisingly effective.
-</para>
-<para>
- But being very generic, they necessarily also catch URLs that we don't want
- to block. The pattern <literal>.*ads.</literal> e.g. catches
- <quote>nasty-<emphasis>ads</emphasis>.nasty-corp.com</quote> as intended,
- but also <quote>downlo<emphasis>ads</emphasis>.sourcefroge.net</quote> or
- <quote><emphasis>ads</emphasis>l.some-provider.net.</quote> So here come some
- well-known exceptions to the <literal>+<link linkend="BLOCK">block</link></literal>
- section above.
-</para>
-<para>
- Note that these are exceptions to exceptions from the default! Consider the URL
- <quote>downloads.sourcefroge.net</quote>: Initially, all actions are deactivated,
- so it wouldn't get blocked. Then comes the defaults section, which matches the
- URL, but just deactivates the <literal><link linkend="BLOCK">block</link></literal>
- action once again. Then it matches <literal>.*ads.</literal>, an exception to the
- general non-blocking policy, and suddenly
- <literal><link linkend="BLOCK">+block</link></literal> applies. And now, it'll match
- <literal>.*loads.</literal>, where <literal><link linkend="BLOCK">-block</link></literal>
- applies, so (unless it matches <emphasis>again</emphasis> further down) it ends up
- with no <literal><link linkend="BLOCK">block</link></literal> action applying.
-</para>
-
-<para>
- <screen>
-##########################################################################
-# Save some innocent victims of the above generic block patterns:
-##########################################################################
-
-# By domain:
-#
-{ -<link linkend="BLOCK">block</link> }
-adv[io]*. # (for advogato.org and advice.*)
-adsl. # (has nothing to do with ads)
-adobe. # (has nothing to do with ads either)
-ad[ud]*. # (adult.* and add.*)
-.edu # (universities don't host banners (yet!))
-.*loads. # (downloads, uploads etc)
-
-# By path:
-#
-/.*loads/
-
-# Site-specific:
-#
-www.globalintersec.com/adv # (adv = advanced)
-www.ugu.com/sui/ugu/adv</screen>
-</para>
-
-<para>
- Filtering source code can have nasty side effects,
- so make an exception for our friends at sourceforge.net,
- and all paths with <quote>cvs</quote> in them. Note that
- <literal>-<link linkend="FILTER">filter</link></literal>
- disables <emphasis>all</emphasis> filters in one fell swoop!
-</para>
-
-<para>
- <screen>
-# Don't filter code!
-#
-{ -<link linkend="FILTER">filter</link> }
-/(.*/)?cvs
-bugzilla.
-developer.
-wiki.
-.sourceforge.net</screen>
-</para>
-
-<para>
- The actual <filename>default.action</filename> is of course much more
- comprehensive, but we hope this example made clear how it works.
-</para>
-
-</sect3>
-
-<sect3><title>user.action</title>
-
-<para>
- So far we are painting with a broad brush by setting general policies,
- which would be a reasonable starting point for many people. Now,
- you might want to be more specific and have customized rules that
- are more suitable to your personal habits and preferences. These would
- be for narrowly defined situations like your ISP or your bank, and should
- be placed in <filename>user.action</filename>, which is parsed after all other
- actions files and hence has the last word, over-riding any previously
- defined actions. <filename>user.action</filename> is also a
- <emphasis>safe</emphasis> place for your personal settings, since
- <filename>default.action</filename> is actively maintained by the
- <application>Privoxy</application> developers and you'll probably want
- to install updated versions from time to time.
-</para>
-
-<para>
- So let's look at a few examples of things that one might typically do in
- <filename>user.action</filename>:
-</para>
-
-
-<!-- brief sample user.action here -->
-
-<para>
- <screen>
-# My user.action file. <fred@example.com></screen>
-</para>
-
-<para>
- As <link linkend="aliases">aliases</link> are local to the actions
- file that they are defined in, you can't use the ones from
- <filename>default.action</filename>, unless you repeat them here:
-</para>
-
-<para>
- <screen>
-# Aliases are local to the file they are defined in.
-# (Re-)define aliases for this file:
-#
-{{alias}}
-#
-# These aliases just save typing later, and the alias names should
-# be self explanatory.
-#
-+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
--crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
- allow-all-cookies = -crunch-all-cookies -session-cookies-only
- allow-popups = -filter{all-popups}
-+block-as-image = +block{Blocked as image.} +handle-as-image
--block-as-image = -block
-
-# These aliases define combinations of actions that are useful for
-# certain types of sites:
-#
-fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
-shop = -crunch-all-cookies allow-popups
-
-# Allow ads for selected useful free sites:
-#
-allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
-
-# Alias for specific file types that are text, but might have conflicting
-# MIME types. We want the browser to force these to be text documents.
-handle-as-text = -<link linkend="FILTER">filter</link> +-<link linkend="content-type-overwrite">content-type-overwrite{text/plain}</link> +-<link linkend="FORCE-TEXT-MODE">force-text-mode</link> -<link linkend="HIDE-CONTENT-DISPOSITION">hide-content-disposition</link></screen>
-
-</para>
-
-<para>
- Say you have accounts on some sites that you visit regularly, and
- you don't want to have to log in manually each time. So you'd like
- to allow persistent cookies for these sites. The
- <literal>allow-all-cookies</literal> alias defined above does exactly
- that, i.e. it disables crunching of cookies in any direction, and the
- processing of cookies to make them only temporary.
-</para>
-
-<para>
- <screen>
-{ allow-all-cookies }
- sourceforge.net
- .yahoo.com
- .msdn.microsoft.com
- .redhat.com</screen>
-</para>
-
-<para>
- Your bank is allergic to some filter, but you don't know which, so you disable them all:
-</para>
-
-<para>
- <screen>
-{ -<link linkend="FILTER">filter</link> }
- .your-home-banking-site.com</screen>
-</para>
-
-<para>
- Some file types you may not want to filter for various reasons:
-</para>
-
-<para>
- <screen>
-# Technical documentation is likely to contain strings that might
-# erroneously get altered by the JavaScript-oriented filters:
-#
-.tldp.org
-/(.*/)?selfhtml/
-
-# And this stupid host sends streaming video with a wrong MIME type,
-# so that Privoxy thinks it is getting HTML and starts filtering:
-#
-stupid-server.example.com/</screen>
-</para>
-
-<para>
- Example of a simple <link linkend="BLOCK">block</link> action. Say you've
- seen an ad on your favourite page on example.com that you want to get rid of.
- You have right-clicked the image, selected <quote>copy image location</quote>
- and pasted the URL below while removing the leading http://, into a
- <literal>{ +block{} }</literal> section. Note that <literal>{ +handle-as-image
- }</literal> need not be specified, since all URLs ending in
- <literal>.gif</literal> will be tagged as images by the general rules as set
- in default.action anyway:
-</para>
-
-<para>
- <screen>
-{ +<link linkend="BLOCK">block</link>{Nasty ads.} }
- www.example.com/nasty-ads/sponsor\.gif
- another.example.net/more/junk/here/</screen>
-</para>
-
-<para>
- The URLs of dynamically generated banners, especially from large banner
- farms, often don't use the well-known image file name extensions, which
- makes it impossible for <application>Privoxy</application> to guess
- the file type just by looking at the URL.
- You can use the <literal>+block-as-image</literal> alias defined above for
- these cases.
- Note that objects which match this rule but then turn out NOT to be an
- image are typically rendered as a <quote>broken image</quote> icon by the
- browser. Use cautiously.
-</para>
-
-<para>
- <screen>
-{ +block-as-image }
- .doubleclick.net
- .fastclick.net
- /Realmedia/ads/
- ar.atwola.com/</screen>
-</para>
-
-<para>
- Now you noticed that the default configuration breaks Forbes Magazine,
- but you were too lazy to find out which action is the culprit, and you
- were again too lazy to give <link linkend="contact">feedback</link>, so
- you just used the <literal>fragile</literal> alias on the site, and
- -- <emphasis>whoa!</emphasis> -- it worked. The <literal>fragile</literal>
- aliases disables those actions that are most likely to break a site. Also,
- good for testing purposes to see if it is <application>Privoxy</application>
- that is causing the problem or not. We later find other regular sites
- that misbehave, and add those to our personalized list of troublemakers:
-</para>
-
-<para>
-<screen>
-{ fragile }
- .forbes.com
- webmail.example.com
- .mybank.com</screen>
-</para>
-
-<para>
- You like the <quote>fun</quote> text replacements in <filename>default.filter</filename>,
- but it is disabled in the distributed actions file.
- So you'd like to turn it on in your private,
- update-safe config, once and for all:
-</para>
-
-<para>
-<screen>
-{ +<link linkend="filter-fun">filter{fun}</link> }
- / # For ALL sites!</screen>
-</para>
-
-<para>
- Note that the above is not really a good idea: There are exceptions
- to the filters in <filename>default.action</filename> for things that
- really shouldn't be filtered, like code on CVS->Web interfaces. Since
- <filename>user.action</filename> has the last word, these exceptions
- won't be valid for the <quote>fun</quote> filtering specified here.
-</para>
-
-<para>
- You might also worry about how your favourite free websites are
- funded, and find that they rely on displaying banner advertisements
- to survive. So you might want to specifically allow banners for those
- sites that you feel provide value to you:
-</para>
-
-<para>
-<screen>
-{ allow-ads }
- .sourceforge.net
- .slashdot.org
- .osdn.net</screen>
-</para>
-
-<para>
- Note that <literal>allow-ads</literal> has been aliased to
- <literal>-<link linkend="block">block</link></literal>,
- <literal>-<link linkend="filter-banners-by-size">filter{banners-by-size}</link></literal>, and
- <literal>-<link linkend="filter-banners-by-link">filter{banners-by-link}</link></literal> above.
-</para>
-
-<para>
- Invoke another alias here to force an over-ride of the MIME type <literal>
- application/x-sh</literal> which typically would open a download type
- dialog. In my case, I want to look at the shell script, and then I can save
- it should I choose to.
-</para>
-
-<para>
-<screen>
-{ handle-as-text }
- /.*\.sh$</screen>
-</para>
-
-<para>
- <filename>user.action</filename> is generally the best place to define
- exceptions and additions to the default policies of
- <filename>default.action</filename>. Some actions are safe to have their
- default policies set here though. So let's set a default policy to have a
- <quote>blank</quote> image as opposed to the checkerboard pattern for
- <emphasis>ALL</emphasis> sites. <quote>/</quote> of course matches all URL
- paths and patterns:
-</para>
-
-<para>
-<screen>
-{ +<link linkend="set-image-blocker">set-image-blocker{blank}</link> }
-/ # ALL sites</screen>
-</para>
-
-</sect3>
-</sect2>
-
-<!-- ~ End section ~ -->
-
-</sect1>
-
-<!-- ~ End section ~ -->
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect1 id="filter-file">
-<title>Filter Files</title>
-
-<para>
- On-the-fly text substitutions need
- to be defined in a <quote>filter file</quote>. Once defined, they
- can then be invoked as an <quote>action</quote>.
-</para>
-
-<para>
- &my-app; supports three different filter actions:
- <literal><link linkend="filter">filter</link></literal> to
- rewrite the content that is send to the client,
- <literal><link linkend="client-header-filter">client-header-filter</link></literal>
- to rewrite headers that are send by the client, and
- <literal><link linkend="server-header-filter">server-header-filter</link></literal>
- to rewrite headers that are send by the server.
-</para>
-
-<para>
- &my-app; also supports two tagger actions:
- <literal><link linkend="client-header-tagger">client-header-tagger</link></literal>
- and
- <literal><link linkend="server-header-tagger">server-header-tagger</link></literal>.
- Taggers and filters use the same syntax in the filter files, the difference
- is that taggers don't modify the text they are filtering, but use a rewritten
- version of the filtered text as tag. The tags can then be used to change the
- applying actions through sections with <link linkend="tag-pattern">tag-patterns</link>.
-</para>
-
-
-<para>
- Multiple filter files can be defined through the <literal> <link
- linkend="filterfile">filterfile</link></literal> config directive. The filters
- as supplied by the developers are located in
- <filename>default.filter</filename>. It is recommended that any locally
- defined or modified filters go in a separately defined file such as
- <filename>user.filter</filename>.
- </para>
-
-<para>
- Common tasks for content filters are to eliminate common annoyances in
- HTML and JavaScript, such as pop-up windows,
- exit consoles, crippled windows without navigation tools, the
- infamous <BLINK> tag etc, to suppress images with certain
- width and height attributes (standard banner sizes or web-bugs),
- or just to have fun.
-</para>
-
-<para>
- Enabled content filters are applied to any content whose
- <quote>Content Type</quote> header is recognised as a sign
- of text-based content, with the exception of <literal>text/plain</literal>.
- Use the <link linkend="FORCE-TEXT-MODE">force-text-mode</link> action
- to also filter other content.
-</para>
-
-<para>
- Substitutions are made at the source level, so if you want to <quote>roll
- your own</quote> filters, you should first be familiar with HTML syntax,
- and, of course, regular expressions.
-</para>
-
-<para>
- Just like the <link linkend="actions-file">actions files</link>, the
- filter file is organized in sections, which are called <emphasis>filters</emphasis>
- here. Each filter consists of a heading line, that starts with one of the
- <emphasis>keywords</emphasis> <literal>FILTER:</literal>,
- <literal>CLIENT-HEADER-FILTER:</literal> or <literal>SERVER-HEADER-FILTER:</literal>
- followed by the filter's <emphasis>name</emphasis>, and a short (one line)
- <emphasis>description</emphasis> of what it does. Below that line
- come the <emphasis>jobs</emphasis>, i.e. lines that define the actual
- text substitutions. By convention, the name of a filter
- should describe what the filter <emphasis>eliminates</emphasis>. The
- comment is used in the <ulink url="http://config.privoxy.org/">web-based
- user interface</ulink>.
-</para>
-
-<para>
- Once a filter called <replaceable>name</replaceable> has been defined
- in the filter file, it can be invoked by using an action of the form
- +<literal><link linkend="filter">filter</link>{<replaceable>name</replaceable>}</literal>
- in any <link linkend="actions-file">actions file</link>.
-</para>
-
-<para>
- Filter definitions start with a header line that contains the filter
- type, the filter name and the filter description.
- A content filter header line for a filter called <quote>foo</quote> could look
- like this:
-</para>
-
-<para>
- <screen>FILTER: foo Replace all "foo" with "bar"</screen>
-</para>
-
-<para>
- Below that line, and up to the next header line, come the jobs that
- define what text replacements the filter executes. They are specified
- in a syntax that imitates <ulink url="http://www.perl.org/">Perl</ulink>'s
- <literal>s///</literal> operator. If you are familiar with Perl, you
- will find this to be quite intuitive, and may want to look at the
- PCRS documentation for the subtle differences to Perl behaviour. Most
- notably, the non-standard option letter <literal>U</literal> is supported,
- which turns the default to ungreedy matching.
-</para>
-
-<para>
- If you are new to
- <ulink url="http://en.wikipedia.org/wiki/Regular_expressions"><quote>Regular
- Expressions</quote></ulink>, you might want to take a look at
- the <link linkend="regex">Appendix on regular expressions</link>, and
- see the <ulink url="http://perldoc.perl.org/perlre.html">Perl
- manual</ulink> for
- <ulink url="http://perldoc.perl.org/perlop.html">the
- <literal>s///</literal> operator's syntax</ulink> and <ulink
- url="http://perldoc.perl.org/perlre.html">Perl-style regular
- expressions</ulink> in general.
- The below examples might also help to get you started.
-</para>
-
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect2><title>Filter File Tutorial</title>
-<para>
- Now, let's complete our <quote>foo</quote> content filter. We have already defined
- the heading, but the jobs are still missing. Since all it does is to replace
- <quote>foo</quote> with <quote>bar</quote>, there is only one (trivial) job
- needed:
-</para>
-
-<para>
- <screen>s/foo/bar/</screen>
-</para>
-
-<para>
- But wait! Didn't the comment say that <emphasis>all</emphasis> occurrences
- of <quote>foo</quote> should be replaced? Our current job will only take
- care of the first <quote>foo</quote> on each page. For global substitution,
- we'll need to add the <literal>g</literal> option:
-</para>
-
-<para>
- <screen>s/foo/bar/g</screen>
-</para>
-
-<para>
- Our complete filter now looks like this:
-</para>
-<para>
- <screen>FILTER: foo Replace all "foo" with "bar"
-s/foo/bar/g</screen>
-</para>
-
-<para>
- Let's look at some real filters for more interesting examples. Here you see
- a filter that protects against some common annoyances that arise from JavaScript
- abuse. Let's look at its jobs one after the other:
-</para>
-
-
-<para>
- <screen>
-FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
-
-# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
-#
-s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg</screen>
-</para>
-
-<para>
- Following the header line and a comment, you see the job. Note that it uses
- <literal>|</literal> as the delimiter instead of <literal>/</literal>, because
- the pattern contains a forward slash, which would otherwise have to be escaped
- by a backslash (<literal>\</literal>).
-</para>
-
-<para>
- Now, let's examine the pattern: it starts with the text <literal><script.*</literal>
- enclosed in parentheses. Since the dot matches any character, and <literal>*</literal>
- means: <quote>Match an arbitrary number of the element left of myself</quote>, this
- matches <quote><script</quote>, followed by <emphasis>any</emphasis> text, i.e.
- it matches the whole page, from the start of the first <script> tag.
-</para>
-
-<para>
- That's more than we want, but the pattern continues: <literal>document\.referrer</literal>
- matches only the exact string <quote>document.referrer</quote>. The dot needed to
- be <emphasis>escaped</emphasis>, i.e. preceded by a backslash, to take away its
- special meaning as a joker, and make it just a regular dot. So far, the meaning is:
- Match from the start of the first <script> tag in a the page, up to, and including,
- the text <quote>document.referrer</quote>, if <emphasis>both</emphasis> are present
- in the page (and appear in that order).
-</para>
-
-<para>
- But there's still more pattern to go. The next element, again enclosed in parentheses,
- is <literal>.*</script></literal>. You already know what <literal>.*</literal>
- means, so the whole pattern translates to: Match from the start of the first <script>
- tag in a page to the end of the last <script> tag, provided that the text
- <quote>document.referrer</quote> appears somewhere in between.
-</para>
-
-<para>
- This is still not the whole story, since we have ignored the options and the parentheses:
- The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
- remembered and be available through the variables <literal>$1, $2, ...</literal> in
- the substitute. The <literal>U</literal> option switches to ungreedy matching, which means
- that the first <literal>.*</literal> in the pattern will only <quote>eat up</quote> all
- text in between <quote><script</quote> and the <emphasis>first</emphasis> occurrence
- of <quote>document.referrer</quote>, and that the second <literal>.*</literal> will
- only span the text up to the <emphasis>first</emphasis> <quote></script></quote>
- tag. Furthermore, the <literal>s</literal> option says that the match may span
- multiple lines in the page, and the <literal>g</literal> option again means that the
- substitution is global.
-</para>
-
-<para>
- So, to summarize, the pattern means: Match all scripts that contain the text
- <quote>document.referrer</quote>. Remember the parts of the script from
- (and including) the start tag up to (and excluding) the string
- <quote>document.referrer</quote> as <literal>$1</literal>, and the part following
- that string, up to and including the closing tag, as <literal>$2</literal>.
-</para>
-
-<para>
- Now the pattern is deciphered, but wasn't this about substituting things? So
- lets look at the substitute: <literal>$1"Not Your Business!"$2</literal> is
- easy to read: The text remembered as <literal>$1</literal>, followed by
- <literal>"Not Your Business!"</literal> (<emphasis>including</emphasis>
- the quotation marks!), followed by the text remembered as <literal>$2</literal>.
- This produces an exact copy of the original string, with the middle part
- (the <quote>document.referrer</quote>) replaced by <literal>"Not Your
- Business!"</literal>.
-</para>
-
-<para>
- The whole job now reads: Replace <quote>document.referrer</quote> by
- <literal>"Not Your Business!"</literal> wherever it appears inside a
- <script> tag. Note that this job won't break JavaScript syntax,
- since both the original and the replacement are syntactically valid
- string objects. The script just won't have access to the referrer
- information anymore.
-</para>
-
-<para>
- We'll show you two other jobs from the JavaScript taming department, but
- this time only point out the constructs of special interest:
-</para>
-
-<para>
- <screen>
-# The status bar is for displaying link targets, not pointless blahblah
-#
-s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig</screen>
-</para>
-
-<para>
- <literal>\s</literal> stands for whitespace characters (space, tab, newline,
- carriage return, form feed), so that <literal>\s*</literal> means: <quote>zero
- or more whitespace</quote>. The <literal>?</literal> in <literal>.*?</literal>
- makes this matching of arbitrary text ungreedy. (Note that the <literal>U</literal>
- option is not set). The <literal>['"]</literal> construct means: <quote>a single
- <emphasis>or</emphasis> a double quote</quote>. Finally, <literal>\1</literal> is
- a back-reference to the first parenthesis just like <literal>$1</literal> above,
- with the difference that in the <emphasis>pattern</emphasis>, a backslash indicates
- a back-reference, whereas in the <emphasis>substitute</emphasis>, it's the dollar.
-</para>
-
-<para>
- So what does this job do? It replaces assignments of single- or double-quoted
- strings to the <quote>window.status</quote> object with a dummy assignment
- (using a variable name that is hopefully odd enough not to conflict with
- real variables in scripts). Thus, it catches many cases where e.g. pointless
- descriptions are displayed in the status bar instead of the link target when
- you move your mouse over links.
-</para>
-
-<para>
- <screen>
-# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
-#
-s/(<body [^>]*)onunload(.*>)/$1never$2/iU</screen>
-</para>
-
-<para>
- Including the
- <ulink url="http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-htmlevents">OnUnload
- event binding</ulink> in the HTML DOM was a <emphasis>CRIME</emphasis>.
- When I close a browser window, I want it to close and die. Basta.
- This job replaces the <quote>onunload</quote> attribute in
- <quote><body></quote> tags with the dummy word <literal>never</literal>.
- Note that the <literal>i</literal> option makes the pattern matching
- case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
- a minimal match: In the first parenthesis, we had to use <literal>[^>]*</literal>
- instead of <literal>.*</literal> to prevent the match from exceeding the
- <body> tag if it doesn't contain <quote>OnUnload</quote>, but the page's
- content does.
-</para>
-
-<para>
- The last example is from the fun department:
-</para>
-
-<para>
- <screen>
-FILTER: fun Fun text replacements
-
-# Spice the daily news:
-#
-s/microsoft(?!\.com)/MicroSuck/ig</screen>
-</para>
-
-<para>
- Note the <literal>(?!\.com)</literal> part (a so-called negative lookahead)
- in the job's pattern, which means: Don't match, if the string
- <quote>.com</quote> appears directly following <quote>microsoft</quote>
- in the page. This prevents links to microsoft.com from being trashed, while
- still replacing the word everywhere else.
-</para>
-
-<para>
- <screen>
-# Buzzword Bingo (example for extended regex syntax)
-#
-s* industry[ -]leading \
-| cutting[ -]edge \
-| customer[ -]focused \
-| market[ -]driven \
-| award[ -]winning # Comments are OK, too! \
-| high[ -]performance \
-| solutions[ -]based \
-| unmatched \
-| unparalleled \
-| unrivalled \
-*<font color="red"><b>BINGO!</b></font> \
-*igx</screen>
-</para>
-
-<para>
- The <literal>x</literal> option in this job turns on extended syntax, and allows for
- e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
-</para>
-
-<para>
- You get the idea?
-</para>
-</sect2>
-
-<!-- ~~~~~~~~ New section Header ~~~~~~~~~ -->
-
-<sect2 id="predefined-filters"><title>The Pre-defined Filters</title>
-
-<!--
-
- Note each filter is also listed in the +filter action section above. Please
- keep these listings in sync.
-
--->