This file belongs into
ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/
- $Id: user-manual.sgml,v 1.76 2002/04/16 04:25:51 hal9 Exp $
+ $Id: user-manual.sgml,v 1.82 2002/04/18 12:04:50 oes Exp $
Written by and Copyright (C) 2001 the SourceForge
Privoxy team. http://www.privoxy.org/
<artheader>
<title>Privoxy User Manual</title>
-<pubdate>$Id: user-manual.sgml,v 1.76 2002/04/16 04:25:51 hal9 Exp $</pubdate>
+<pubdate>$Id: user-manual.sgml,v 1.82 2002/04/18 12:04:50 oes Exp $</pubdate>
<authorgroup>
<author>
<title>New Features</title>
<para>
In addition to <application>Internet Junkbuster's</application> traditional
- feature of ad and banner blocking and cookie management,
+ features of ad and banner blocking and cookie management,
<application>Privoxy</application> provides new features<![%p-not-stable;[,
some of them currently under development]]>:
<anchor id="testing"/>
<!-- ~~~~~ New section ~~~~~ -->
<sect1 id="installation"><title>Installation</title>
+
<para>
<application>Privoxy</application> is available both in convenient pre-compiled
packages for a wide range of operating systems, and as raw source code.
For most users, we recommend using the packages, which can be downloaded from our
<ulink url="http://sourceforge.net/projects/ijbswa/">Privoxy Project Page</ulink>.
</para>
+
<para>
If you like to live on the bleeding edge and are not afraid of using
possibly unstable development versions, you can check out the up-to-the-minute
</para>
<!-- Include supported.sgml boilerplate -->
-&supported;
+ &supported;
<!-- end boilerplate -->
<!-- ~~~~~ New section ~~~~~ -->
<sect2 id="installation-packages"><title>Binary Packages</title>
+
<para>
- The packages can be downloaded from our <ulink
- url="http://sourceforge.net/projects/ijbswa/">Privoxy Project Page</ulink>.
+ Note: If you have a previous <application>Junkbuster</application> or
+ <application>Privoxy</application> installation on your system, you
+ will either need to remove it, or that might be done by the setup
+ procedure. (See below for your platform).
</para>
<para>
- How to install them depends on your operating system:
+ In any case <emphasis>be sure to backup your old configuration
+ if it is valuable to you.</emphasis> In that case, also see the
+ <link linkend="upgradersnote">note to upgraders</link>.
+</para>
+
+<para>
+ How to install the binary packages depends on your operating system:
</para>
<!-- ~~~~~ New section ~~~~~ -->
<sect3 id="installation-pack-rpm"><title>Redhat and SuSE RPMs</title>
<para>
- RPMs can be installed with <literal>rpm -i <name-of-rpm.rpm></literal>,
+ RPMs can be installed with <literal>rpm -Uvh <name-of-rpm.rpm></literal>,
and will use <filename>/etc/privoxy</filename> for configuration files.
</para>
<para>
- Note that if you have a Junkbuster RPM installed on your system, you
- need to remove it first, because the packages conflict.
+ Note that if you have a <application>Junkbuster</application> RPM installed
+ on your system, you need to remove it first, because the packages conflict.
+ Otherwise, RPM will try removing Junkbuster automaticaly, before installing
+ privoxy.
</para>
</sect3>
<!-- ~~~~~ New section ~~~~~ -->
-<sect3 id="installation-pack-bintgz"><title>Solaris, NetBSD, HP-UX</title>
-
+<sect3 id="installation-deb"><title>Debian</title>
<para>
- Create a new directory, <literal>cd</literal> to it, then unzip and
- untar the archive. For the most part, you'll have to figure out where
- things go. FIXME.
+ FIXME.
</para>
</sect3>
</para>
</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3 id="installation-pack-bintgz"><title>Solaris, NetBSD, FreeBSD, HP-UX</title>
+
+<para>
+ Create a new directory, <literal>cd</literal> to it, then unzip and
+ untar the archive. For the most part, you'll have to figure out where
+ things go. FIXME.
+</para>
+</sect3>
+
<!-- ~~~~~ New section ~~~~~ -->
<sect3 id="installation-os2"><title>OS/2</title>
<para>
- Just double-click the WarpIN self-installing archive, which will guide
- you through the installation process. A shadow of the
+ First, make sure that no previous installations of
+ <application>Junkbuster</application> and / or
+ <application>Privoxy</application> are left on your
+ system.
+</para>
+
+<para>
+ Then, just double-click the WarpIN self-installing archive, which will
+ guide you through the installation process. A shadow of the
<application>Privoxy</application> executable will be placed in your
startup folder so it will start automatically whenever OS/2 starts.
</para>
</sect3>
<!-- ~~~~~ New section ~~~~~ -->
-<sect3 id="installation-deb"><title>Debian</title>
+<sect3 id="installation-mac"><title>Max OSX</title>
<para>
FIXME.
</para>
<!-- include buildsource.sgml boilerplate: -->
&buildsource;
<!-- end boilerplate -->
-
-<para>
- For more detailed instructions, on how to build Redhat and SuSE RPMs,
- Windows self-extracting installers etc, please consult the <ulink
- url="../developer-manual/newrelease.html">developer manual</ulink>.
-</para>
</sect2>
</sect1>
<para>
A <quote>filter file</quote> (typically <filename>default.filter</filename>)
is new with <application>Privoxy 2.9.x</application>, and provides some
- of the new sophisticaton (explained below). <filename>config</filename> is
+ of the new sophistication (explained below). <filename>config</filename> is
much the same as before.
</para>
<para>
The primary configuration file for cookie management, ad and banner
blocking, and many other aspects of <application>Privoxy</application>
configuration is <filename>default.action</filename>. It is strongly
- recommended to make oneself familiar with the new actions concept below
- before modifying that file.
+ recommended to become familiar with the new actions concept below,
+ before modifying this file.
</para>
</listitem>
<listitem>
<para>
After doing this, flush your browser's disk and memory caches to force a
- re-reading of all pages and get rid of any ads that may be cached. You
+ re-reading of all pages and to get rid of any ads that may be cached. You
are now ready to start enjoying the benefits of using
<application>Privoxy</application>.
</para>
</para>
<para>
- Another feature where you will propably want to define exceptions for trusted
+ Another feature where you will probably want to define exceptions for trusted
sites is the popup-killing (through the <literal>+popup</literal> and
- <literal>+filter{popups}</literal> actions), because your favourite shopping,
+ <literal>+filter{popups}</literal> actions), because your favorite shopping,
banking, or leisure site may need popups.
</para>
try to force HTTP/1.0 compatibility. For Mozilla, look under <literal>Edit ->
Preferences -> Debug -> Networking</literal>.
Alternatively, set the <quote>+downgrade</quote> config option in
- <filename>default.action</filename> which will downgrade you brower's HTTP
+ <filename>default.action</filename> which will downgrade your browser's HTTP
requests from HTTP/1.1 to HTTP/1.0 before processing them.
</para>
If you encounter problems, try loading the page without
<application>Privoxy</application>. If that helps, enter the URL where
you have the problems into <ulink url="http://p.p/show-url-info">the browser
- based rule tracing utility</ulink>. Watch out which rules apply and why, and
+ based rule tracing utility</ulink>. See which rules apply and why, and
then try turning them off for that site one after the other, until the problem
is gone. When you have found the culprit, you might want to turn the rest on
again.
</para>
<para>
On startup, write the process ID to <emphasis>FILE</emphasis>. Delete the
- <emphasis>FILE</emphasis> on exit. Failiure to create or delete the
+ <emphasis>FILE</emphasis> on exit. Failure to create or delete the
<emphasis>FILE</emphasis> is non-fatal. If no <emphasis>FILE</emphasis>
option is given, no PID file will be used. Unix only.
</para>
<application>Privoxy</application>'s user interface can be reached through the special
URL <ulink url="http://config.privoxy.org/">http://config.privoxy.org/</ulink>
(shortcut: <ulink url="http://p.p/">http://p.p/</ulink>),
- which is a built-in page and works without internet access.
+ which is a built-in page and works without Internet access.
You will see the following section:
</para>
<para>
The main config file controls all aspects of <application>Privoxy</application>'s
- operation that are not location dependent (i.e. that apply invariantly no matter
- where in the web you are surfing).
+ operation that are not location dependent (i.e. they apply universally, no matter
+ where you may be surfing).
</para>
<para>
<application>Privoxy</application> can (and normally does) use a number of
- other files for addidtional configuration and logging.
+ other files for additional configuration and logging.
This section of the configuration file tells <application>Privoxy</application>
where to find those other files.
</para>
<para>
When development goes modular and multi-user, the blocker, filter, and
per-user config will be stored in subdirectories of <quote>confdir</quote>.
- For now, the configuration dir structure is flat, except for
+ For now, the configuration directory structure is flat, except for
<filename>confdir/templates</filename>, where the HTML templates for CGI
- output reside.
+ output reside (e.g. <application>Privoxy's</application> 404 error page).
</para>
</listitem>
</varlistentry>
<sect4><title>actionsfile</title>
-<variablelist>
- <varlistentry>
- <term>Specifies:</term>
- <listitem>
- <para>
- The actions file to use
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Type of value:</term>
- <listitem>
- <para>File name, relative to <literal>confdir</literal></para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Default value:</term>
- <listitem>
- <para>default.action (Unix) <emphasis>or</emphasis> default.action.txt (Windows)</para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Effect if unset:</term>
- <listitem>
- <para>
- No action is taken at all. Simple neutral proxying.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Notes:</term>
- <listitem>
- <para>
- There is no point in using <application>Privoxy</application> without
- an actions file.
- </para>
- </listitem>
- </varlistentry>
-</variablelist>
-</sect4>
-
-<sect4><title>actionsfile</title>
-
<variablelist>
<varlistentry>
<term>Specifies:</term>
<listitem>
<para>
There is no point in using <application>Privoxy</application> without
- an actions file. There are three diffrent actions files included in the
+ an actions file. There are three different actions files included in the
distribution, with varying degrees of aggressiveness:
<filename>default.action</filename>, <filename>intermediate.action</filename> and
<filename>advanced.action</filename>.
<term>Notes:</term>
<listitem>
<para>
- The windows version will additionally log to the console
+ The windows version will additionally log to the console.
</para>
<para>
The logfile is where all logging and error messages are written. The level
<term>Effect if unset:</term>
<listitem>
<para>
- The whole trust mechansim is turned off.
+ The whole trust mechanism is turned off.
</para>
</listitem>
</varlistentry>
<term>Notes:</term>
<listitem>
<para>
- The trust mechansim is an experimental feature for building whitelists and should
+ The trust mechanism is an experimental feature for building white-lists and should
be used with care. It is <emphasis>NOT</emphasis> recommended for the casual user.
</para>
<para>
the effect that access to untrusted sites will be granted, if a link from a
trusted referrer was used.
The link target will then be added to the <quote>trustfile</quote>.
- Possible applications include limiting internet access for children.
+ Possible applications include limiting Internet access for children.
</para>
<para>
If you use <literal>+</literal> operator in the trust file, it may grow considerably over time.
<!-- ~~~~~ New section ~~~~~ -->
<sect3>
-<title>Local Setup Documentation</title>
+<title>Local Set-up Documentation</title>
<para>
If you intend to operate <application>Privoxy</application> for more users
activated. (See <literal>trustfile</literal> above.)
</para>
<para>
- If you use the trust mechanism, it is a good idea to write up some online
+ If you use the trust mechanism, it is a good idea to write up some on-line
documentation about your trust policy and to specify the URL(s) here.
Use multiple times for multiple URLs.
</para>
<term>Specifies:</term>
<listitem>
<para>
- Keys that determine what information gets logged.
+ Key values that determine what information gets logged.
</para>
</listitem>
</varlistentry>
as it happens. <emphasis>1, 4096 and 8192 are highly recommended</emphasis>
so that you will notice when things go wrong. The other levels are probably
only of interest if you are hunting down a specific problem. They can produce
- a hell of output (especially 16).
+ a hell of an output (especially 16).
+ <!-- LOL -->
</para>
<para>
The reporting of <emphasis>fatal</emphasis> errors (i.e. ones which crash
<varlistentry>
<term>Type of value:</term>
<listitem>
- <para>[<replaceable class="parameter">IP-Adddress</replaceable>]:<replaceable class="parameter">Port</replaceable></para>
+ <para>[<replaceable class="parameter">IP-Address</replaceable>]:<replaceable class="parameter">Port</replaceable></para>
</listitem>
</varlistentry>
<varlistentry>
If you leave out the IP address, <application>Privoxy</application> will
bind to all interfaces (addresses) on your machine and may become reachable
from the Internet. In that case, consider using access control lists (acl's)
- (see <quote>Acls</quote> below), or a firewall.
+ (see <quote>ACLs</quote> below), or a firewall.
</para>
</listitem>
</varlistentry>
</para>
<para>
For the time being, access to the toggle feature can <emphasis>not</emphasis> be
- controlled separately by <quote>Acls</quote> or HTTP authentication,
+ controlled separately by <quote>ACLs</quote> or HTTP authentication,
so that everybody who can access <application>Privoxy</application> (see
- <quote>Acls</quote> and <literal>listen-address</literal> above) can
+ <quote>ACLs</quote> and <literal>listen-address</literal> above) can
toggle it for all users. So this option is <emphasis>not recommended</emphasis>
for multi-user environments with untrusted users.
</para>
<listitem>
<para>
For the time being, access to the editor can <emphasis>not</emphasis> be
- controlled separately by <quote>Acls</quote> or HTTP authentication,
+ controlled separately by <quote>ACLs</quote> or HTTP authentication,
so that everybody who can access <application>Privoxy</application> (see
- <quote>Acls</quote> and <literal>listen-address</literal> above) can
+ <quote>ACLs</quote> and <literal>listen-address</literal> above) can
modify its configuration for all users. So this option is <emphasis>not
recommended</emphasis> for multi-user environments with untrusted users.
</para>
</variablelist>
</sect4>
-<sect4><title>Acls: permit-access and deny-access</title>
+<sect4><title>ACLs: permit-access and deny-access</title>
<variablelist>
<varlistentry>
<term>Specifies:</term>
weaknesses.
</para>
<para>
- Multiple acl lines are OK.
- If any acls are specified, then the <application>Privoxy</application>
+ Multiple ACL lines are OK.
+ If any ACLs are specified, then the <application>Privoxy</application>
talks only to IP addresses that match at least one <literal>permit-access</literal> line
and don't match any subsequent <literal>deny-access</literal> line. In other words, the
last match wins, with the default being <literal>deny-access</literal>.
IP addresses, only the first one is used.
</para>
<para>
- Denying access to particular sites by acl may have undesired side effects
+ Denying access to particular sites by ACL may have undesired side effects
if the site in question is hosted on a machine which also hosts other sites.
</para>
</listitem>
<term>Examples:</term>
<listitem>
<para>
- Explicitly define the defauklt behaviour if no acl and
+ Explicitly define the default behavior if no ACL and
<literal>listen-address</literal> are set: <quote>localhost</quote>
is OK. The absence of a <replaceable class="parameter">dst_addr</replaceable> implies that
<emphasis>all</emphasis> destination addresses are OK:
through an anonymous public proxy (see e.g. <ulink
url="http://www.multiproxy.org/anon_list.htm">http://www.multiproxy.org/anon_list.htm</ulink>)
Or to use a caching proxy to speed up browsing. Or chaining to a parent
- proxy may be necessary because the mackine that <application>Privoxy</application>
- runs on has no direct internet access.
+ proxy may be necessary because the machine that <application>Privoxy</application>
+ runs on has no direct Internet access.
</para>
<para>
<term>Examples:</term>
<listitem>
<para>
- From the company example.com, direct connections are made to all <quote>internal</quote>
- domains, but everything outbound goes through their ISP's proxy by way example.com's
- corporate SOCKS 4A gateway to the Internet.
+ From the company example.com, direct connections are made to all
+ <quote>internal</quote> domains, but everything outbound goes through
+ their ISP's proxy by way of example.com's corporate SOCKS 4A gateway to
+ the Internet.
</para>
<para>
<screen>
</para>
<para>
- Now, you users can set their browser's proxy to use either
+ Now, your users can set their browser's proxy to use either
host-a or host-b and be able to browse the internal content
- on both isp-a or isp-b.
+ of both isp-a and isp-b.
</para>
<para>
<title>The Actions File</title>
<para>
- The <quote>default.action</quote> file (formerly
+ The actions file (<filename>default.action</filename>, formerly:
<filename>actionsfile</filename> or <filename>ijb.action</filename>) is used
- to define what actions <application>Privoxy</application> takes, and thus
- determines how ad images, cookies and various other aspects of HTTP content
- and transactions are handled. These can be accepted or rejected for all
- sites, or just those sites you choose. See below for a complete list of
- actions.
+ to define what actions <application>Privoxy</application> takes for which
+ URLs, and thus determines how ad images, cookies and various other aspects
+ of HTTP content and transactions are handled on which sites (or even parts
+ thereof).
</para>
+
<para>
Anything you want can blocked, including ads, banners, or just some obnoxious
URL that you would rather not see. Cookies can be accepted or rejected, or
- accepted only during the current browser session (i.e. not written to disk).
- Changes to <filename>default.action</filename> should be immediately visible
- to <application>Privoxy</application> without the need to restart.
+ accepted only during the current browser session (i.e. not written to disk),
+ content can be modified, JavaScripts tamed, user-tracking fooled, and much more.
+ See below for a complete list of available actions.
</para>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3>
+<title>Finding the Right Mix</title>
<para>
- Note that some sites may misbehave, or possibly not work at all with some
- actions. This may require some tinkering with the rules to get the most
- mileage of <application>Privoxy's</application> features, and still be
- able to see and enjoy just what you want to. There is no general rule of
- thumb on these things. There just are too many variables, and sites are
- always changing.
-
+ Note that some actions like cookie suppression or script disabling may
+ render some sites unusable, which rely on these techniques to work properly.
+ Finding the right mix of actions is not easy and certainly a matter of personal
+ taste. In general, it can be said that the more <quote>aggressive</quote>
+ your default settings (in the top section of the actions file) are,
+ the more exceptions for <quote>trusted</quote> sites you will have to
+ make later. If, for example, you want to kill popup windows per default, you'll
+ have to make exceptions from that rule for sites that you regularly use
+ and that require popups for actually useful content, like maybe your bank,
+ favorite shop, or newspaper.
</para>
<para>
- The easiest way to edit the <quote>actions</quote> file is with a browser by
- loading <ulink url="http://config.privoxy.org/">http://config.privoxy.org/</ulink>
- (shortcut: <ulink url="http://p.p/">http://p.p/</ulink>), and then select
- <quote>Edit Actions List</quote>. A text editor can also be used.
+ We have tried to provide you with reasonable rules to start from in the
+ distribution actions file. But there is no general rule of thumb on these
+ things. There just are too many variables, and sites are constantly changing.
+ Sooner or later you will want to change the rules (and read this chapter).
</para>
+</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3>
+<title>How to Edit</title>
<para>
- To determine which actions apply to a request, the URL of the request is
- compared to all patterns in this file. Every time it matches, the list of
- applicable actions for the URL is incrementally updated. You can trace
- this process by visiting <ulink
- url="http://p.p/show-url-info">http://p.p/show-url-info</ulink>.
+ The easiest way to edit the <quote>actions</quote> file is with a browser by
+ using our browser-based editor, which is available at <ulink
+ url="http://config.privoxy.org/edit-actions">http://config.privoxy.org/edit-actions</ulink>.
</para>
-
<para>
- There are four types of lines in this file: comments (begin with a
- <quote>#</quote> character), actions, aliases and patterns, all of which are
- explained below, as well as the configuration file syntax that
- <application>Privoxy</application> understands.
-
+ If you prefer plain text editing to GUIs, you can of course also directly edit the
+ <filename>default.action</filename> file.
</para>
+</sect3>
-<!-- ~~~~~ New section ~~~~~ -->
<sect3>
-<title>URL Domain and Path Syntax</title>
+<title>How Actions are Applied to URLs</title>
<para>
- Generally, a pattern has the form <domain>/<path>, where both the
- <domain> and <path> part are optional. If you only specify a
- domain part, the <quote>/</quote> can be left out:
+ The actions file is divided into sections. There are special sections,
+ like the alias sections which will be discussed later. For now let's
+ concentrate on regular sections: They have a heading line (often split
+ up to multiple lines for readability) which consist of a list of actions,
+ separated by whitespace and enclosed in curly braces. Below that, there
+ is a list of URL patterns, each on a separate line.
</para>
<para>
- <emphasis>www.example.com</emphasis> - is a domain only pattern and will match any request to
- <quote>www.example.com</quote>.
+ To determine which actions apply to a request, the URL of the request is
+ compared to all patterns in this file. Every time it matches, the list of
+ applicable actions for the URL is incrementally updated, using the heading
+ of the section in which the pattern is located. If multiple matches for
+ the same URL set the same action differently, the last match wins.
</para>
<para>
- <emphasis>www.example.com/</emphasis> - means exactly the same.
+ You can trace this process by visiting <ulink
+ url="http://config.privoxy.org/show-url-info">http://config.privoxy.org/show-url-info</ulink>.
</para>
<para>
- <emphasis>www.example.com/index.html</emphasis> - matches only the single
- document <quote>/index.html</quote> on <quote>www.example.com</quote>.
+ More detail on this is provided in the Appendix, <link linkend="ACTIONSANAT">
+ Anatomy of an Action</link>.
</para>
+</sect3>
+<!-- ~~~~~ New section ~~~~~ -->
+<sect3>
+<title>Patterns</title>
<para>
- <emphasis>/index.html</emphasis> - matches the document <quote>/index.html</quote>,
- regardless of the domain. So would match any page named <quote>index.html</quote>
- on any site.
+ Generally, a pattern has the form <literal><domain>/<path></literal>,
+ where both the <literal><domain></literal> and <literal><path></literal>
+ are optional. (This is why the pattern <literal>/</literal> matches all URLs).
</para>
-<para>
- <emphasis>index.html</emphasis> - matches nothing, since it would be
- interpreted as a domain name and there is no top-level domain called
- <quote>.html</quote>.
-</para>
+<variablelist>
+ <varlistentry>
+ <term><literal>www.example.com/</literal></term>
+ <listitem>
+ <para>
+ is a domain-only pattern and will match any request to <literal>www.example.com</literal>,
+ regardless of which document on that server is requested.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>www.example.com</literal></term>
+ <listitem>
+ <para>
+ means exactly the same. For domain-only patterns, the trailing <literal>/</literal> may
+ be omitted.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>www.example.com/index.html</literal></term>
+ <listitem>
+ <para>
+ matches only the single document <literal>/index.html</literal>
+ on <literal>www.example.com</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>/index.html</literal></term>
+ <listitem>
+ <para>
+ matches the document <literal>/index.html</literal>, regardless of the domain,
+ i.e. on <emphasis>any</emphasis> web server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>index.html</literal></term>
+ <listitem>
+ <para>
+ matches nothing, since it would be interpreted as a domain name and
+ there is no top-level domain called <literal>.html</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+</variablelist>
+
+<sect4><title>The Domain Pattern</title>
<para>
The matching of the domain part offers some flexible options: if the
For example:
</para>
-<para>
- <emphasis>.example.com</emphasis> - matches any domain or sub-domain that
- <emphasis>ENDS</emphasis> in <quote>.example.com</quote>.
-</para>
-
-<para>
- <emphasis>www.</emphasis> - matches any domain that <emphasis>STARTS</emphasis> with
- <quote>www</quote>.
-</para>
+<variablelist>
+ <varlistentry>
+ <term><literal>.example.com</literal></term>
+ <listitem>
+ <para>
+ matches any domain that <emphasis>ENDS</emphasis> in
+ <literal>.example.com</literal>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>www.</literal></term>
+ <listitem>
+ <para>
+ matches any domain that <emphasis>STARTS</emphasis> with
+ <literal>www.</literal>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>.example.</literal></term>
+ <listitem>
+ <para>
+ matches any domain that <emphasis>CONTAINS</emphasis> <literal>.example.</literal>
+ (Correctly speaking: It matches any FQDN that contains <literal>example</literal> as a domain.)
+ </para>
+ </listitem>
+ </varlistentry>
+</variablelist>
<para>
Additionally, there are wild-cards that you can use in the domain names
themselves. They work pretty similar to shell wild-cards: <quote>*</quote>
stands for zero or more arbitrary characters, <quote>?</quote> stands for
- any single character. And you can define character classes in square
- brackets and they can be freely mixed:
+ any single character, you can define character classes in square
+ brackets and all of that can be freely mixed:
</para>
-<para>
- <emphasis>ad*.example.com</emphasis> - matches <quote>adserver.example.com</quote>,
- <quote>ads.example.com</quote>, etc but not <quote>sfads.example.com</quote>.
-</para>
+<variablelist>
+ <varlistentry>
+ <term><literal>ad*.example.com</literal></term>
+ <listitem>
+ <para>
+ matches <quote>adserver.example.com</quote>,
+ <quote>ads.example.com</quote>, etc but not <quote>sfads.example.com</quote>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>*ad*.example.com</literal></term>
+ <listitem>
+ <para>
+ matches all of the above, and then some.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>.?pix.com</literal></term>
+ <listitem>
+ <para>
+ matches <literal>www.ipix.com</literal>,
+ <literal>pictures.epix.com</literal>, <literal>a.b.c.d.e.upix.com</literal> etc.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>www[1-9a-ez].example.c*</literal></term>
+ <listitem>
+ <para>
+ matches <literal>www1.example.com</literal>,
+ <literal>www4.example.cc</literal>, <literal>wwwd.example.cy</literal>,
+ <literal>wwwz.example.com</literal> etc., but <emphasis>not</emphasis>
+ <literal>wwww.example.com</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+</variablelist>
-<para>
- <emphasis>*ad*.example.com</emphasis> - matches all of the above, and then some.
-</para>
+</sect4>
-<para>
- <emphasis>.?pix.com</emphasis> - matches <quote>www.ipix.com</quote>,
- <quote>pictures.epix.com</quote>, <quote>a.b.c.d.e.upix.com</quote>, etc.
-</para>
+<sect4><title>The Path Pattern</title>
<para>
- <emphasis>www[1-9a-ez].example.com</emphasis> - matches <quote>www1.example.com</quote>,
- <quote>www4.example.com</quote>, <quote>wwwd.example.com</quote>,
- <quote>wwwz.example.com</quote>, etc., but <emphasis>not</emphasis>
- <quote>wwww.example.com</quote>.
+ <application>Privoxy</application> uses Perl compatible regular expressions
+ (through the <ulink url="http://www.pcre.org/">PCRE</ulink> library) for
+ matching the path.
</para>
<para>
- If <application>Privoxy</application> was compiled with
- <quote>pcre</quote> support (the default), Perl compatible regular expressions
- can be used. These are more flexible and powerful than other types
- of <quote>regular expressions</quote>. See the <filename>pcre/docs/</filename> directory or <quote>man
- perlre</quote> (also available on <ulink
- url="http://www.perldoc.com/perl5.6/pod/perlre.html">http://www.perldoc.com/perl5.6/pod/perlre.html</ulink>)
- for details. A brief discussion of regular expressions is in the
- <link linkend="regex">Appendix</link>. For instance:
+ There is an <link linkend="regex">Appendix</link> with a brief quick-start into regular
+ expressions, and full (very technical) documentation on PCRE regex syntax is available on-line
+ at <ulink url="http://www.pcre.org/man.txt">http://www.pcre.org/man.txt</ulink>.
+ You might also find the Perl man page on regular expressions (<literal>man perlre</literal>)
+ useful, which is available on-line at <ulink
+ url="http://www.perldoc.com/perl5.6/pod/perlre.html">http://www.perldoc.com/perl5.6/pod/perlre.html</ulink>.
</para>
<para>
- <emphasis>/.*/advert[0-9]+\.jpe?g</emphasis> - would match a URL from any
- domain, with any path that includes <quote>advert</quote> followed
- immediately by one or more digits, then a <quote>.</quote> and ending in
- either <quote>jpeg</quote> or <quote>jpg</quote>. So we match
- <quote>example.com/ads/advert2.jpg</quote>, and
- <quote>www.example.com/ads/banners/advert39.jpeg</quote>, but not
- <quote>www.example.com/ads/banners/advert39.gif</quote> (no gifs in the
- example pattern).
+ Note that the path pattern is automatically left-anchored at the <quote>/</quote>,
+ i.e. it matches as if it would start with a <quote>^</quote>.
</para>
<para>
- Please note that matching in the path is case
+ Please also note that matching in the path is case
<emphasis>INSENSITIVE</emphasis> by default, but you can switch to case
sensitive at any point in the pattern by using the
<quote>(?-i)</quote> switch:
-</para>
-
-<para>
- <emphasis>www.example.com/(?-i)PaTtErN.*</emphasis> - will match only
- documents whose path starts with <quote>PaTtErN</quote> in
+ <literal>www.example.com/(?-i)PaTtErN.*</literal> will match only
+ documents whose path starts with <literal>PaTtErN</literal> in
<emphasis>exactly</emphasis> this capitalization.
</para>
+</sect4>
</sect3>
</simplelist>
<simplelist>
<member>
- <emphasis>no-poups</emphasis>: Kill all popups in JS and HTML
+ <emphasis>content-cookies</emphasis>: Kill cookies that come in the HTML or JS content
</member>
</simplelist>
<simplelist>
<member>
- <emphasis>frameset-borders</emphasis>: Give frames a border
+ <emphasis>popups</emphasis>: Kill all popups in JS and HTML
+ </member>
+ </simplelist>
+ <simplelist>
+ <member>
+ <emphasis>frameset-borders</emphasis>: Give frames a border and make them resizable
</member>
</simplelist>
<simplelist>
</simplelist>
<simplelist>
<member>
- <emphasis>no-refresh</emphasis>: Automatic refresh sucks on auto-dialup lines
+ <emphasis>refresh-tags</emphasis>: Kill automatic refresh tags (for dial-on-demand setups)
</member>
</simplelist>
<simplelist>
</simplelist>
<simplelist>
<member>
- <emphasis>nimda</emphasis>: Remove (virus) Nimda code.
+ <emphasis>nimda</emphasis>: Remove Nimda (virus) code.
+ </member>
+ </simplelist>
+ <simplelist>
+ <member>
+ <emphasis>banners-by-size</emphasis>: Kill banners by size (<emphasis>very</emphasis> efficient!)
</member>
</simplelist>
<simplelist>
<member>
- <emphasis>banners-by-size</emphasis>: Kill banners by size
+ <emphasis>shockwave-flash</emphasis>: Kill embedded Shockwave Flash objects
</member>
</simplelist>
<simplelist>
</simplelist>
</blockquote>
+
<para>
Note: Filtering requires buffering the page content, which may appear to slow down
page rendering since nothing is displayed until all content has passed
<listitem>
<para> Decides what to do with URLs that end up tagged with <quote>{+block
- +image}</quote>, e.g an advertizement. There are five options.
+ +image}</quote>, e.g an advertisement. There are four options.
<quote>-image-blocker</quote> will send a HTML <quote>blocked</quote> page,
usually resulting in a <quote>broken image</quote> icon.
<!-- <quote>+image-blocker{logo}</quote> will send a -->
image. And finally, <quote>+image-blocker{http://xyz.com}</quote> will send a
HTTP temporary redirect to the specified image. This has the advantage of the
icon being being cached by the browser, which will speed up the display.
-<quote>+image-blocker{pattern}</quote> will send a checkboard type pattern
+<quote>+image-blocker{pattern}</quote> will send a checkerboard type pattern:
<!-- , -->
<!-- which scales better than the logo (which can get blocky if the browser -->
<!-- enlarges it too much). -->
<para>
Turn on page filtering according to rules in the defined sections
- of <filename>refilterfile</filename>, and make one exception for
- sourceforge:
+ of <filename>default.filter</filename>, and make one exception for
+ Sourceforge:
</para>
<para>
To save them, right-click the link and choose <quote>Add to Favorites</quote>
(IE) or <quote>Add Bookmark</quote> (Netscape). You will get a warning that
the bookmark <quote>may not be safe</quote> - just click OK. Then you can run the
- Bookmarklet directly from your favourites/bookmarks. For even faster access,
+ Bookmarklet directly from your favorites/bookmarks. For even faster access,
you can put them on the <quote>Links</quote> bar (IE) or the <quote>Personal
Toolbar</quote> (Netscape), and run them with a single click.
</para>
easy to understand what is happening. And sometimes we need to be able to
<emphasis>see</emphasis> just what <application>Privoxy</application> is
doing. Especially, if something <application>Privoxy</application> is doing
- is causing us a problem inadvertantly. It can be a little daunting to look at
+ is causing us a problem inadvertently. It can be a little daunting to look at
the actions and filters files themselves, since they tend to be filled with
<quote>regular expressions</quote> whose consequences are not always
so obvious. <application>Privoxy</application> provides the
how the current configuration will handle it. This will not
help with filtering effects from the <filename>default.filter</filename> file! It
also will not tell you about any other URLs that may be embedded within the
- URL you are testing. For instance, images such as ads are expressed as URLs
+ URL you are testing (i.e. a web page). For instance, images such as ads are expressed as URLs
within the raw page source of HTML pages. So you will only get info for the
actual URL that is pasted into the prompt area -- not any sub-URLs. If you
want to know about embedded URLs like ads, you will have to dig those out of
These are the default actions we have enabled. But we can define additional
actions that would be exceptions to these general rules, and then list
specific URLs that these exceptions would apply to. Last match wins.
- Just below this then are two explict matches for <quote>.google.com</quote>.
+ Just below this then are two explicit matches for <quote>.google.com</quote>.
The first is negating our various cookie blocking actions (i.e. we will allow
cookies here). The second is allowing <quote>fast-redirects</quote>. Note
that there is a leading dot here -- <quote>.google.com</quote>. This will
<para>
And now we pull it altogether in the bottom section and summarize how
- <application>Privoxy</application> is appying all its <quote>actions</quote>
+ <application>Privoxy</application> is applying all its <quote>actions</quote>
to <quote>google.com</quote>:
</para>
<para>
Ooops, the <quote>/adsl/</quote> is matching <quote>/ads</quote>! But
we did not want this at all! Now we see why we get the blank page. We could
- now add a new action below this that explictly does <emphasis>not</emphasis>
+ now add a new action below this that explicitly does <emphasis>not</emphasis>
block (-block) pages with <quote>adsl</quote>. There are various ways to
handle such exceptions. Example:
</para>
Temple Place - Suite 330, Boston, MA 02111-1307, USA.
$Log: user-manual.sgml,v $
+ Revision 1.82 2002/04/18 12:04:50 oes
+ Cosmetics
+
+ Revision 1.81 2002/04/18 11:50:24 oes
+ Extended Install section - needs fixing by packagers
+
+ Revision 1.80 2002/04/18 10:45:19 oes
+ Moved text to buildsource.sgml, renamed some filters, details
+
+ Revision 1.79 2002/04/18 03:18:06 hal9
+ Spellcheck, and minor touchups.
+
+ Revision 1.78 2002/04/17 18:04:16 oes
+ Proofreading part 2
+
+ Revision 1.77 2002/04/17 13:51:23 oes
+ Proofreading, part one
+
Revision 1.76 2002/04/16 04:25:51 hal9
-Added 'Note to Upgraders' and re-ordered the 'Quickstart' section.
-Note about proxy may need requests to re-read config files.