X-Git-Url: http://www.privoxy.org/gitweb/?a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=6d9c5d6c26ebb2e66447c3c9933b26af94eef35f;hb=2e8c7e4321104708859ad7bf3e5697c0897778c5;hp=408a61281a326995a3692f6833dd2e2d47040a2f;hpb=0428133610c525457cb16f7ac6a54203a2743d6c;p=privoxy.git
diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml
index 408a6128..6d9c5d6c 100644
--- a/doc/source/user-manual.sgml
+++ b/doc/source/user-manual.sgml
@@ -9,9 +9,11 @@
+
-
+
+
@@ -34,9 +36,9 @@
This file belongs into
ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/
- $Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $
+ $Id: user-manual.sgml,v 2.221 2017/05/20 09:27:54 fabiankeil Exp $
- Copyright (C) 2001-2011 Privoxy Developers http://www.privoxy.org/
+ Copyright (C) 2001-2017 Privoxy Developers https://www.privoxy.org/
See LICENSE.
========================================================================
@@ -55,12 +57,12 @@
- Copyright &my-copy; 2001-2011 by
- Privoxy Developers
+ Copyright &my-copy; 2001-2017 by
+ Privoxy Developers
-$Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $
+$Id: user-manual.sgml,v 2.221 2017/05/20 09:27:54 fabiankeil Exp $
@@ -99,14 +101,11 @@ Hal.
You can find the latest version of the Privoxy User Manual at http://www.privoxy.org/user-manual/ .
+ url="https://www.privoxy.org/user-manual/">https://www.privoxy.org/user-manual/.
Please see the Contact section on how to
contact the developers.
-
-
-
@@ -115,7 +114,7 @@ Hal.
Introduction
This documentation is included with the current &p-status; version of
- Privoxy , v.&p-version;Privoxy, &p-version;Privoxy is available both in convenient pre-compiled
packages for a wide range of operating systems, and as raw source code.
For most users, we recommend using the packages, which can be downloaded from our
- Privoxy Project
+ Privoxy Project
Page .
@@ -180,36 +179,6 @@ How to install the binary packages depends on your operating system:
-
-Red Hat and Fedora RPMs
-
-
- RPMs can be installed with rpm -Uvh privoxy-&p-version;-1.rpm ,
- and will use /etc/privoxy for the location
- of configuration files.
-
-
-
- Note that on Red Hat, Privoxy will
- not be automatically started on system boot. You will
- need to enable that using chkconfig ,
- ntsysv , or similar methods.
-
-
-
- If you have problems with failed dependencies, try rebuilding the SRC RPM:
- rpm --rebuild privoxy-&p-version;-1.src.rpm . This
- will use your locally installed libraries and RPM version.
-
-
-
- Also note that if you have a Junkbuster RPM installed
- on your system, you need to remove it first, because the packages conflict.
- Otherwise, RPM will try to remove Junkbuster
- automatically if found, before installing Privoxy .
-
-
-
Debian and Ubuntu
@@ -262,16 +231,6 @@ How to install the binary packages depends on your operating system:
-
-Solaris
-
-
- Create a new directory, cd to it, then unzip and
- untar the archive. For the most part, you'll have to figure out where
- things go.
-
-
-
OS/2
@@ -301,72 +260,83 @@ How to install the binary packages depends on your operating system:
Mac OS X
- Unzip the downloaded file (you can either double-click on the zip file
- icon from the Finder, or from the desktop if you downloaded it there).
- Then, double-click on the package installer icon and follow the
- installation process.
+ Installation instructions for the OS X platform depend upon whether
+ you downloaded a ready-built installation package (.pkg or .mpkg) or have
+ downloaded the source code.
+
+
+Installation from ready-built package
- The privoxy service will automatically start after a successful
- installation (in addition to every time your computer starts up). To
- prevent the privoxy service from automatically starting when your
- computer starts up, remove or rename the folder named
- /Library/StartupItems/Privoxy .
+ The downloaded file will either be a .pkg (for OS X 10.5 upwards) or a bzipped
+ .mpkg file (for OS X 10.4). The former can be double-clicked as is and the
+ installation will start; double-clicking the latter will unzip the .mpkg file
+ which can then be double-clicked to commence the installation.
- To manually start or stop the privoxy service, use the Privoxy Utility
- for Mac OS X. This application controls the privoxy service (e.g.
- starting and stopping the service as well as uninstalling the software).
+ The privoxy service will automatically start after a successful installation
+ (and thereafter every time your computer starts up) however you will need to
+ configure your web browser(s) to use it. To do so, configure them to use a
+ proxy for HTTP and HTTPS at the address 127.0.0.1:8118.
+
+
+ To prevent the privoxy service from automatically starting when your computer
+ starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist
+ (on OS X 10.5 and higher) or the folder named
+ /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger').
-
-
-
-AmigaOS
- Copy and then unpack the lha archive to a suitable location.
- All necessary files will be installed into Privoxy
- directory, including all configuration and log files. To uninstall, just
- remove this directory.
+ To manually start or stop the privoxy service, use the scripts startPrivoxy.sh
+ and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an
+ administrator account, using sudo.
+
+
+ To uninstall, run /Applications/Privoxy/uninstall.command as sudo from an
+ administrator account.
-
-
-FreeBSD
-
+
+Installation from source
- Privoxy is part of FreeBSD's Ports Collection, you can build and install
- it with cd /usr/ports/www/privoxy; make install clean .
+ To build and install the Privoxy source code on OS X you will need to obtain
+ the macsetup module from the Privoxy Sourceforge CVS repository (refer to
+ Sourceforge help for details of how to set up a CVS client to have read-only
+ access to the repository). This module contains scripts that leverage the usual
+ open-source tools (available as part of Apple's free of charge Xcode
+ distribution or via the usual open-source software package managers for OS X
+ (MacPorts, Homebrew, Fink etc.) to build and then install the privoxy binary
+ and associated files. The macsetup module's README file contains complete
+ instructions for its use.
- If you don't use the ports, you can fetch and install
- the package with pkg_add -r privoxy .
+ The privoxy service will automatically start after a successful installation
+ (and thereafter every time your computer starts up) however you will need to
+ configure your web browser(s) to use it. To do so, configure them to use a
+ proxy for HTTP and HTTPS at the address 127.0.0.1:8118.
- The port skeleton and the package can also be downloaded from the
- File Release
- Page , but there's no reason to use them unless you're interested in the
- beta releases which are only available there.
+ To prevent the privoxy service from automatically starting when your computer
+ starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist
+ (on OS X 10.5 and higher) or the folder named
+ /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger').
-
-
-
-Gentoo
- Gentoo source packages (Ebuilds) for Privoxy are
- contained in the Gentoo Portage Tree (they are not on the download page,
- but there is a Gentoo section, where you can see when a new
- Privoxy Version is added to the Portage Tree).
+ To manually start or stop the privoxy service, use the Privoxy Utility
+ for Mac OS X (also part of the macsetup module). This application can start
+ and stop the privoxy service and display its log and configuration files.
- Before installing Privoxy under Gentoo just do
- first emerge --sync to get the latest changes from the
- Portage tree. With emerge privoxy you install the latest
- version.
+ To uninstall, run the macsetup module's uninstall.sh as sudo from an
+ administrator account.
+
+
+
+FreeBSD
+
- Configuration files are in /etc/privoxy , the
- documentation is in /usr/share/doc/privoxy-&p-version;
- and the Log directory is in /var/log/privoxy .
+ Privoxy is part of FreeBSD's Ports Collection, you can build and install
+ it with cd /usr/ports/www/privoxy; make install clean .
@@ -378,14 +348,14 @@ How to install the binary packages depends on your operating system:
The most convenient way to obtain the Privoxy sources
is to download the source tarball from our
- project download
+ project download
page .
If you like to live on the bleeding edge and are not afraid of using
possibly unstable development versions, you can check out the up-to-the-minute
- version directly from the
+ version directly from the
CVS repository .
Keeping your Installation Up-to-Date
-
- As user feedback comes in and development continues, we will make updated versions
- of both the main actions file (as a separate
- package ) and the software itself (including the actions file) available for
- download.
-
If you wish to receive an email notification whenever we release updates of
Privoxy or the actions file, subscribe
- to our announce mailing list , ijbswa-announce@lists.sourceforge.net.
+ url="https://lists.privoxy.org/mailman/listinfo/privoxy-announce">subscribe
+ to our announce mailing list , privoxy-announce@lists.privoxy.org.
@@ -436,674 +399,154 @@ How to install the binary packages depends on your operating system:
What's New in this Release
+
+&changelog;
+
+
+
+
+Note to Upgraders
+
- Privoxy 3.0.17 is a stable release.
- The changes since 3.0.16 stable are:
+ A quick list of things to be aware of before upgrading from earlier
+ versions of Privoxy :
-
-
- Fixed last-chunk-detection for responses where the content was small
- enough to be read with the body, causing Privoxy to wait for the
- end of the content until the server closed the connection or the
- request timed out. Reported by "Karsten" in #3028326.
-
-
-
-
- Responses with status code 204 weren't properly detected as body-less
- like RFC2616 mandates. Like the previous bug, this caused Privoxy to
- wait for the end of the content until the server closed the connection
- or the request timed out. Fixes #3022042 and #3025553, reported by a
- user with no visible name. Most likely also fixes a bunch of other
- AJAX-related problem reports that got closed in the past due to
- insufficient information and lack of feedback.
-
-
-
-
- Fixed an ACL bug that made it impossible to build a blacklist.
- Usually the ACL directives are used in a whitelist, which worked
- as expected, but blacklisting is still useful for public proxies
- where one only needs to deny known abusers access.
-
-
-
-
- Added LOG_LEVEL_RECEIVED to log the not-yet-parsed data read from the
- network. This should make debugging various parsing issues a lot easier.
-
-
-
-
- The IPv6 code is enabled by default on Windows versions that support it.
- Patch submitted by oCameLo in #2942729.
-
-
-
-
- In mingw32 versions, the user.filter file is reachable through the
- GUI, just like default.filter is. Feature request 3040263.
-
+
+
+
+ The recommended way to upgrade &my-app; is to backup your old
+ configuration files, install the new ones, verify that &my-app;
+ is working correctly and finally merge back your changes using
+ diff and maybe patch .
+
+
+ There are a number of new features in each &my-app; release and
+ most of them have to be explicitly enabled in the configuration
+ files. Old configuration files obviously don't do that and due
+ to syntax changes using old configuration files with a new
+ &my-app; isn't always possible anyway.
+
+
+
+
+ Note that some installers remove earlier versions completely,
+ including configuration files, therefore you should really save
+ any important configuration files!
+
+
+
+
+ On the other hand, other installers don't overwrite existing configuration
+ files, thinking you will want to do that yourself.
+
+
+
+
+ In the default configuration only fatal errors are logged now.
+ You can change that in the debug section
+ of the configuration file. You may also want to enable more verbose
+ logging until you verified that the new &my-app; version is working
+ as expected.
+
+
+
+
+
+ Three other config file settings are now off by default:
+ enable-remote-toggle,
+ enable-remote-http-toggle,
+ and enable-edit-actions.
+ If you use or want these, you will need to explicitly enable them, and
+ be aware of the security issues involved.
+
+
+
+
+
+
+
-
-
-Note to Upgraders
-
-
- A quick list of things to be aware of before upgrading from earlier
- versions of Privoxy :
-
-
+Quickstart to Using Privoxy
- The recommended way to upgrade &my-app; is to backup your old
- configuration files, install the new ones, verify that &my-app;
- is working correctly and finally merge back your changes using
- diff and maybe patch .
-
-
- There are a number of new features in each &my-app; release and
- most of them have to be explicitly enabled in the configuration
- files. Old configuration files obviously don't do that and due
- to syntax changes using old configuration files with a new
- &my-app; isn't always possible anyway.
-
+ Install Privoxy . See the Installation Section below for platform specific
+ information.
+
-
-
- Note that some installers remove earlier versions completely,
- including configuration files, therefore you should really save
- any important configuration files!
-
-
-
-
- On the other hand, other installers don't overwrite existing configuration
- files, thinking you will want to do that yourself.
-
-
-
-
- standard.action has been merged into
- the default.action file.
-
-
-
-
- In the default configuration only fatal errors are logged now.
- You can change that in the debug section
- of the configuration file. You may also want to enable more verbose
- logging until you verified that the new &my-app; version is working
- as expected.
-
-
-
-
-
- Three other config file settings are now off by default:
- enable-remote-toggle,
- enable-remote-http-toggle,
- and enable-edit-actions.
- If you use or want these, you will need to explicitly enable them, and
- be aware of the security issues involved.
-
-
-
-
-
-
-
-
-
-
-
-
-
-Quickstart to Using Privoxy
-
-
-
-
-
- Install Privoxy . See the Installation Section below for platform specific
- information.
-
-
-
+
Advanced users and those who want to offer Privoxy
@@ -1178,18 +621,6 @@ How to install the binary packages depends on your operating system:
-
-
Please see the section Contacting the
@@ -1617,39 +1048,40 @@ How to install the binary packages depends on your operating system:
directory. Except on Win32 where it will try config.txt .
-
-Red Hat and Fedora
+
+Debian
- A default Red Hat installation may not start &my-app; upon boot. It will use
- the file /etc/privoxy/config as its main configuration
+ We use a script. Note that Debian typically starts &my-app; upon booting per
+ default. It will use the file
+ /etc/privoxy/config as its main configuration
file.
- # /etc/rc.d/init.d/privoxy start
+ # /etc/init.d/privoxy start
+
+
+
+FreeBSD and ElectroBSD
- Or ...
+ To start Privoxy upon booting, add
+ "privoxy_enable='YES'" to /etc/rc.conf .
+ Privoxy will use
+ /usr/local/etc/privoxy/config as its main
+ configuration file.
-
- # service privoxy start
-
+ If you installed Privoxy into a jail, the
+ paths above are relative to the jail root.
-
-
-
-Debian
- We use a script. Note that Debian typically starts &my-app; upon booting per
- default. It will use the file
- /etc/privoxy/config as its main configuration
- file.
+ To start Privoxy manually, run:
- # /etc/init.d/privoxy start
+ # service privoxy onestart
@@ -1673,15 +1105,21 @@ Click on the &my-app; Icon to start Privoxy . If no co
-Solaris, NetBSD, FreeBSD, HP-UX and others
+Generic instructions for Unix derivates (Solaris, NetBSD, HP-UX etc.)
Example Unix startup command:
- # /usr/sbin/privoxy /etc/privoxy/config
+ # /usr/sbin/privoxy --user privoxy /etc/privoxy/config
+
+ Note that if you installed Privoxy through
+ a package manager, the package will probably contain a platform-specific
+ script or configuration file to start Privoxy
+ upon boot.
+
@@ -1697,71 +1135,24 @@ Example Unix startup command:
Mac OS X
- After downloading the privoxy software, unzip the downloaded file by
- double-clicking on the zip file icon. Then, double-click on the
- installer package icon and follow the installation process.
-
-
- The privoxy service will automatically start after a successful
- installation. In addition, the privoxy service will automatically
- start every time your computer starts up.
-
-
- To prevent the privoxy service from automatically starting when your
- computer starts up, remove or rename the folder named
- /Library/StartupItems/Privoxy.
-
-
- A simple application named Privoxy Utility has been created which
- enables administrators to easily start and stop the privoxy service.
-
-
- In addition, the Privoxy Utility presents a simple way for
- administrators to edit the various privoxy config files. A method
- to uninstall the software is also available.
+ The privoxy service will automatically start after a successful installation
+ (and thereafter every time your computer starts up) however you will need to
+ configure your web browser(s) to use it. To do so, configure them to use a
+ proxy for HTTP and HTTPS at the address 127.0.0.1:8118.
- An administrator username and password must be supplied in order for
- the Privoxy Utility to perform any of the tasks.
+ To prevent the privoxy service from automatically starting when your computer
+ starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist
+ (on OS X 10.5 and higher) or the folder named
+ /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger').
-
-
-
-
-AmigaOS
- Start Privoxy (with RUN <>NIL:) in your
- startnet script (AmiTCP), in
- s:user-startup (RoadShow), as startup program in your
- startup script (Genesis), or as startup action (Miami and MiamiDx).
- Privoxy will automatically quit when you quit your
- TCP/IP stack (just ignore the harmless warning your TCP/IP stack may display that
- Privoxy is still running).
+ To manually start or stop the privoxy service, use the scripts startPrivoxy.sh
+ and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an
+ administrator account, using sudo.
-
-Gentoo
-
- A script is again used. It will use the file /etc/privoxy/config
- as its main configuration file.
-
-
-
- /etc/init.d/privoxy start
-
-
-
- Note that Privoxy is not automatically started at
- boot time by default. You can change this with the rc-update
- command.
-
-
-
- rc-update add privoxy default
-
-
-
-
+
Controlling Privoxy with Your Web Browser
Privoxy 's user interface can be reached through the special
@@ -2026,7 +1439,7 @@ for details.
▪ Documentation
+ url="https://www.privoxy.org/&p-version;/user-manual/">Documentation
@@ -2048,10 +1461,7 @@ for details.
it as a test to see whether it is Privoxy
causing the problem or not. Privoxy continues
to run as a proxy in this case, but all manipulation is disabled, i.e.
- Privoxy acts like a normal forwarding proxy. There
- is even a toggle Bookmarklet offered, so
- that you can toggle Privoxy with one click from
- your browser.
+ Privoxy acts like a normal forwarding proxy.
@@ -2460,7 +1870,7 @@ for details.
-
+
Finding the Right Mix
Note that some actions, like cookie suppression
@@ -2485,7 +1895,7 @@ for details.
-
+
How to Edit
The easiest way to edit the actions files is with a browser by
@@ -2575,23 +1985,23 @@ for details.
Generally, an URL pattern has the form
- <domain><port>/<path> , where the
- <domain> , the <port>
+ <host><port>/<path> , where the
+ <host> , the <port>
and the <path> are optional. (This is why the special
/ pattern matches all URLs). Note that the protocol
portion of the URL pattern (e.g. http:// ) should
not be included in the pattern. This is assumed already!
- The pattern matching syntax is different for the domain and path parts of
- the URL. The domain part uses a simple globbing type matching technique,
+ The pattern matching syntax is different for the host and path parts of
+ the URL. The host part uses a simple globbing type matching technique,
while the path part uses more flexible
Regular
Expressions
(POSIX 1003.2).
The port part of a pattern is a decimal port number preceded by a colon
- (: ). If the domain part contains a numerical IPv6 address,
+ (: ). If the host part contains a numerical IPv6 address,
it has to be put into angle brackets
(< , > ).
@@ -2601,7 +2011,7 @@ for details.
www.example.com/
- is a domain-only pattern and will match any request to www.example.com ,
+ is a host-only pattern and will match any request to www.example.com ,
regardless of which document on that server is requested. So ALL pages in
this domain would be covered by the scope of this action. Note that a
simple example.com is different and would NOT match.
@@ -2612,7 +2022,7 @@ for details.
www.example.com
- means exactly the same. For domain-only patterns, the trailing / may
+ means exactly the same. For host-only patterns, the trailing / may
be omitted.
@@ -2661,6 +2071,15 @@ for details.
+
+ 10.0.0.1/
+
+
+ Matches any URL with the host address 10.0.0.1 .
+ (Note that the real URL uses plain brackets, not angle brackets.)
+
+
+
<2001:db8::1>/
@@ -2684,11 +2103,13 @@ for details.
-The Domain Pattern
+The Host Pattern
- The matching of the domain part offers some flexible options: if the
- domain starts or ends with a dot, it becomes unanchored at that end.
+ The matching of the host part offers some flexible options: if the
+ host pattern starts or ends with a dot, it becomes unanchored at that end.
+ The host pattern is often referred to as domain pattern as it is usually
+ used to match domain names and not IP addresses.
For example:
@@ -2795,7 +2216,7 @@ for details.
-The Path Pattern
+The Path Pattern
Privoxy uses modern
POSIX 1003.2
@@ -2857,7 +2278,7 @@ for details.
This regular expression is conditional so it will match any page
named index.html
regardless of path which in this case can
have one or more /'s
. And this one must contain exactly
- .html
(but does not have to end with that!).
+ .html
(and end with that!).
@@ -2869,6 +2290,7 @@ for details.
that contains any of the words ads
, banner
,
banners
(because of the ?
) or junk
.
The path does not have to end in these words, just contain them.
+ The path has to contain at least two slashes (including the one at the beginning).
@@ -2895,18 +2317,18 @@ for details.
-The Tag Pattern
+The Request Tag Pattern
- Tag patterns are used to change the applying actions based on the
- request's tags. Tags can be created with either the
- client-header-tagger
+ Request tag patterns are used to change the applying actions based on the
+ request's tags. Tags can be created based on HTTP headers with either
+ the client-header-tagger
or the server-header-tagger action.
- Tag patterns have to start with TAG:
, so &my-app;
- can tell them apart from URL patterns. Everything after the colon
+ Request tag patterns have to start with TAG:
, so &my-app;
+ can tell them apart from other patterns. Everything after the colon
including white space, is interpreted as a regular expression with
path pattern syntax, except that tag patterns aren't left-anchored
automatically (&my-app; doesn't silently add a ^
,
@@ -2922,15 +2344,15 @@ for details.
- Sections can contain URL and tag patterns at the same time,
- but tag patterns are checked after the URL patterns and thus
+ Sections can contain URL and request tag patterns at the same time,
+ but request tag patterns are checked after the URL patterns and thus
always overrule them, even if they are located before the URL patterns.
- Once a new tag is added, Privoxy checks right away if it's matched by one
- of the tag patterns and updates the action settings accordingly. As a result
- tags can be used to activate other tagger actions, as long as these other
+ Once a new request tag is added, Privoxy checks right away if it's matched by one
+ of the request tag patterns and updates the action settings accordingly. As a result
+ request tags can be used to activate other tagger actions, as long as these other
taggers look for headers that haven't already be parsed.
@@ -2954,6 +2376,82 @@ for details.
+
+The Negative Request Tag Patterns
+
+
+ To match requests that do not have a certain request tag, specify a negative tag pattern
+ by prefixing the tag pattern line with either NO-REQUEST-TAG:
+ or NO-RESPONSE-TAG:
instead of TAG:
.
+
+
+
+ Negative request tag patterns created with NO-REQUEST-TAG:
are checked
+ after all client headers are scanned, the ones created with NO-RESPONSE-TAG:
+ are checked after all server headers are scanned. In both cases all the created
+ tags are considered.
+
+
+
+
+The Client Tag Pattern
+
+
+
+
+
+ This is an experimental feature. The syntax is likely to change in future versions.
+
+
+
+
+ Client tag patterns are not set based on HTTP headers but based on
+ the client's IP address. Users can enable them themselves, but the
+ Privoxy admin controls which tags are available and what their effect
+ is.
+
+
+
+ After a client-specific tag has been defined with the
+ client-specific-tag,
+ directive, action sections can be activated based on the tag by using a
+ CLIENT-TAG pattern. The CLIENT-TAG pattern is evaluated at the same priority
+ as URL patterns, as a result the last matching pattern wins. Tags that
+ are created based on client or server headers are evaluated later on
+ and can overrule CLIENT-TAG and URL patterns!
+
+
+ The tag is set for all requests that come from clients that requested
+ it to be set. Note that "clients" are differentiated by IP address,
+ if the IP address changes the tag has to be requested again.
+
+
+ Clients can request tags to be set by using the CGI interface http://config.privoxy.org/client-tags .
+
+
+
+ Example:
+
+
+
+
+# If the admin defined the client-specific-tag circumvent-blocks,
+# and the request comes from a client that previously requested
+# the tag to be set, overrule all previous +block actions that
+# are enabled based on URL to CLIENT-TAG patterns.
+{-block}
+CLIENT-TAG:^circumvent-blocks$
+
+# This section is not overruled because it's located after
+# the previous one.
+{+block{Nobody is supposed to request this.}}
+example.org/blocked-example-page
+
+
+
+
@@ -3145,7 +2643,16 @@ for details.
Example usage:
- +add-header{X-User-Tracking: sucks}
+ # Add a DNT ("Do not track") header to all requests,
+# event to those that already have one.
+#
+# This is just an example, not a recommendation.
+#
+# There is no reason to believe that user-tracking websites care
+# about the DNT header and depending on the User-Agent, adding the
+# header may make user-tracking easier.
+{+add-header{DNT: 1}}
+/
@@ -3355,7 +2862,7 @@ for details.
Type:
- Parameterized.
+ Multi-value.
@@ -3383,7 +2890,7 @@ for details.
and use their output as input.
- If the request URL gets changed, &my-app; will detect that and use the new
+ If the request URI gets changed, &my-app; will detect that and use the new
one. This can be used to rewrite the request destination behind the client's
back, for example to specify a Tor exit relay for certain requests.
@@ -3405,7 +2912,7 @@ for details.
{+client-header-filter{hide-tor-exit-notation}}
/
-
+
@@ -3442,7 +2949,7 @@ for details.
Type:
- Parameterized.
+ Multi-value.
@@ -3499,6 +3006,37 @@ TAG:^User-Agent: fetch libfetch/
TAG:^User-Agent: Ubuntu APT-HTTP/
TAG:^User-Agent: MPlayer/
+
+
+
+# Tag all requests with the Range header set
+{+client-header-tagger{range-requests}}
+/
+
+# Disable filtering for the tagged requests.
+#
+# With filtering enabled Privoxy would remove the Range headers
+# to be able to filter the whole response. The downside is that
+# it prevents clients from resuming downloads or skipping over
+# parts of multimedia files.
+{-filter -deanimate-gifs}
+TAG:^RANGE-REQUEST$
+
+
+
+
+# Tag all requests with the client IP address
+#
+# (Technically the client IP address isn't included in the
+# client headers but client-header taggers can set it anyway.
+# For details see the tagger in default.filter)
+{+client-header-tagger{client-ip-address}}
+/
+
+# Change forwarding settings for requests coming from address 10.0.0.1
+{+forward-override{forward-socks5 127.0.1.2:2222 .}}
+TAG:^IP-ADDRESS: 10\.0\.0\.1$
+
@@ -4117,9 +3655,19 @@ new action
This is a left-over from the time when Privoxy
didn't support important HTTP/1.1 features well. It is left here for the
- unlikely case that you experience HTTP/1.1 related problems with some server
- out there. Not all HTTP/1.1 features and requirements are supported yet,
- so there is a chance you might need this action.
+ unlikely case that you experience HTTP/1.1-related problems with some server
+ out there.
+
+
+ Note that enabling this action is only a workaround. It should not
+ be enabled for sites that work without it. While it shouldn't break
+ any pages, it has an (usually negative) performance impact.
+
+
+ If you come across a site where enabling this action helps, please report it,
+ so the cause of the problem can be analyzed. If the problem turns out to be
+ caused by a bug in Privoxy it should be
+ fixed so the following release works without the work around.
@@ -4138,14 +3686,14 @@ problem-host.example.com
-
-fast-redirects
+
+external-filter
Typical use:
- Fool some click-tracking scripts and speed up indirect links.
+ Modify content using a programming language of your choice.
@@ -4153,8 +3701,12 @@ problem-host.example.com
Effect:
- Detects redirection URLs and redirects the browser without contacting
- the redirection server first.
+ All instances of text-based type, most notably HTML and JavaScript, to which
+ this action applies, can be filtered on-the-fly through the specified external
+ filter.
+ By default plain text documents are exempted from filtering, because web
+ servers often use the text/plain MIME type for all files
+ whose type they don't know.)
@@ -4163,7 +3715,91 @@ problem-host.example.com
Type:
- Parameterized.
+ Multi-value.
+
+
+
+
+ Parameter:
+
+
+ The name of an external content filter, as defined in the
+ filter file.
+ External filters can be defined in one or more files as defined by the
+ filterfile
+ option in the config file.
+
+
+ When used in its negative form,
+ and without parameters, all filtering with external
+ filters is completely disabled.
+
+
+
+
+
+ Notes:
+
+
+ External filters are scripts or programs that can modify the content in
+ case common filters
+ aren't powerful enough. With the exception that this action doesn't
+ use pcrs-based filters, the notes in the
+ filter section apply.
+
+
+
+ Currently external filters are executed with &my-app;'s privileges.
+ Only use external filters you understand and trust.
+
+
+
+ This feature is experimental, the syntax
+ may change in the future.
+
+
+
+
+
+
+ Example usage:
+
+
+ +external-filter{fancy-filter}
+
+
+
+
+
+
+
+
+fast-redirects
+
+
+
+ Typical use:
+
+ Fool some click-tracking scripts and speed up indirect links.
+
+
+
+
+ Effect:
+
+
+ Detects redirection URLs and redirects the browser without contacting
+ the redirection server first.
+
+
+
+
+
+ Type:
+
+
+ Parameterized.
@@ -4291,7 +3927,7 @@ problem-host.example.com
Type:
- Parameterized.
+ Multi-value.
@@ -4356,10 +3992,10 @@ problem-host.example.com
by defining appropriate -filter exceptions.
- Compressed content can't be filtered either, unless &my-app;
- is compiled with zlib support (requires at least &my-app; 3.0.7),
- in which case &my-app; will decompress the content before filtering
- it.
+ Compressed content can't be filtered either, but if &my-app;
+ is compiled with zlib support and a supported compression algorithm
+ is used (gzip or deflate), &my-app; can first decompress the content
+ and then filter it.
If you use a &my-app; version without zlib support, but want filtering to work on
@@ -4399,7 +4035,7 @@ problem-host.example.com
- +filter{js-events} # Kill all JS event bindings and timers (Radically destructive! Only for extra nasty sites).
+ +filter{js-events} # Kill JavaScript event bindings and timers (Radically destructive! Only for extra nasty sites).
@@ -4411,5478 +4047,4697 @@ problem-host.example.com
- +filter{refresh-tags} # Kill automatic refresh tags (for dial-on-demand setups).
+ +filter{refresh-tags} # Kill automatic refresh tags if refresh time is larger than 9 seconds.
-
-
-
-
- +filter{img-reorder} # Reorder attributes in <img> tags to make the banners-by-* filters more effective.
-
-
-
- +filter{banners-by-size} # Kill banners by size.
-
-
-
- +filter{banners-by-link} # Kill banners by their links to known clicktrackers.
-
-
-
- +filter{webbugs} # Squish WebBugs (1x1 invisible GIFs used for user tracking).
-
-
-
- +filter{tiny-textforms} # Extend those tiny textareas up to 40x80 and kill the hard wrap.
-
-
-
- +filter{jumping-windows} # Prevent windows from resizing and moving themselves.
-
-
-
- +filter{frameset-borders} # Give frames a border and make them resizable.
-
-
-
- +filter{demoronizer} # Fix MS's non-standard use of standard charsets.
-
-
-
- +filter{shockwave-flash} # Kill embedded Shockwave Flash objects.
-
-
-
- +filter{quicktime-kioskmode} # Make Quicktime movies saveable.
-
-
-
- +filter{fun} # Text replacements for subversive browsing fun!
-
-
-
- +filter{crude-parental} # Crude parental filtering. Note that this filter doesn't work reliably.
-
-
-
- +filter{ie-exploits} # Disable some known Internet Explorer bug exploits.
-
-
-
- +filter{site-specifics} # Cure for site-specific problems. Don't apply generally!
-
-
-
- +filter{no-ping} # Removes non-standard ping attributes in <a> and <area> tags.
-
-
-
- +filter{google} # CSS-based block for Google text ads. Also removes a width limitation and the toolbar advertisement.
-
-
-
- +filter{yahoo} # CSS-based block for Yahoo text ads. Also removes a width limitation.
-
-
-
- +filter{msn} # CSS-based block for MSN text ads. Also removes tracking URLs and a width limitation.
-
-
-
- +filter{blogspot} # Cleans up some Blogspot blogs. Read the fine print before using this.
-
-
-
-
-
-
-
-
-
-force-text-mode
-
-
-
- Typical use:
-
- Force Privoxy to treat a document as if it was in some kind of text format.
-
-
-
-
- Effect:
-
-
- Declares a document as text, even if the Content-Type:
isn't detected as such.
-
-
-
-
-
- Type:
-
-
- Boolean.
-
-
-
-
- Parameter:
-
-
- N/A
-
-
-
-
-
- Notes:
-
-
- As explained above ,
- Privoxy tries to only filter files that are
- in some kind of text format. The same restrictions apply to
- content-type-overwrite .
- force-text-mode declares a document as text,
- without looking at the Content-Type:
first.
-
-
-
- Think twice before activating this action. Filtering binary data
- with regular expressions can cause file damage.
-
-
-
-
-
-
- Example usage:
-
-
-
-+force-text-mode
-
-
-
-
-
-
-
-
-
-
-forward-override
-
-
-
- Typical use:
-
- Change the forwarding settings based on User-Agent or request origin
-
-
-
-
- Effect:
-
-
- Overrules the forward directives in the configuration file.
-
-
-
-
-
- Type:
-
-
- Multi-value.
-
-
-
-
- Parameter:
-
-
-
- forward .
to use a direct connection without any additional proxies.
-
-
-
- forward 127.0.0.1:8123
to use the HTTP proxy listening at 127.0.0.1 port 8123.
-
-
-
-
- forward-socks4a 127.0.0.1:9050 .
to use the socks4a proxy listening at
- 127.0.0.1 port 9050. Replace forward-socks4a
with forward-socks4
- to use a socks4 connection (with local DNS resolution) instead, use forward-socks5
- for socks5 connections (with remote DNS resolution).
-
-
-
-
- forward-socks4a 127.0.0.1:9050 proxy.example.org:8000
to use the socks4a proxy
- listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000.
- Replace forward-socks4a
with forward-socks4
to use a socks4 connection
- (with local DNS resolution) instead, use forward-socks5
- for socks5 connections (with remote DNS resolution).
-
-
-
-
-
-
-
- Notes:
-
-
- This action takes parameters similar to the
- forward directives in the configuration
- file, but without the URL pattern. It can be used as replacement, but normally it's only
- used in cases where matching based on the request URL isn't sufficient.
-
-
-
- Please read the description for the forward directives before
- using this action. Forwarding to the wrong people will reduce your privacy and increase the
- chances of man-in-the-middle attacks.
-
-
- If the ports are missing or invalid, default values will be used. This might change
- in the future and you shouldn't rely on it. Otherwise incorrect syntax causes Privoxy
- to exit.
-
-
- Use the show-url-info CGI page
- to verify that your forward settings do what you thought the do.
-
-
-
-
-
-
- Example usage:
-
-
-
-# Always use direct connections for requests previously tagged as
-# User-Agent: fetch libfetch/2.0
and make sure
-# resuming downloads continues to work.
-# This way you can continue to use Tor for your normal browsing,
-# without overloading the Tor network with your FreeBSD ports updates
-# or downloads of bigger files like ISOs.
-# Note that HTTP headers are easy to fake and therefore their
-# values are as (un)trustworthy as your clients and users.
-{+forward-override{forward .} \
- -hide-if-modified-since \
- -overwrite-last-modified \
-}
-TAG:^User-Agent: fetch libfetch/2\.0$
-
-
-
-
-
-
-
-
-
-
-handle-as-empty-document
-
-
-
- Typical use:
-
- Mark URLs that should be replaced by empty documents if they get blocked
-
-
-
-
- Effect:
-
-
- This action alone doesn't do anything noticeable. It just marks URLs.
- If the block action also applies ,
- the presence or absence of this mark decides whether an HTML BLOCKED
- page, or an empty document will be sent to the client as a substitute for the blocked content.
- The empty document isn't literally empty, but actually contains a single space.
-
-
-
-
-
- Type:
-
-
- Boolean.
-
-
-
-
- Parameter:
-
-
- N/A
-
-
-
-
-
- Notes:
-
-
- Some browsers complain about syntax errors if JavaScript documents
- are blocked with Privoxy's
- default HTML page; this option can be used to silence them.
- And of course this action can also be used to eliminate the &my-app;
- BLOCKED message in frames.
-
-
- The content type for the empty document can be specified with
- content-type-overwrite{} ,
- but usually this isn't necessary.
-
-
-
-
-
- Example usage:
-
-
- # Block all documents on example.org that end with ".js",
-# but send an empty document instead of the usual HTML message.
-{+block{Blocked JavaScript} +handle-as-empty-document}
-example.org/.*\.js$
-
-
-
-
-
-
-
-
-
-
-handle-as-image
-
-
-
- Typical use:
-
- Mark URLs as belonging to images (so they'll be replaced by images if they do get blocked , rather than HTML pages)
-
-
-
-
- Effect:
-
-
- This action alone doesn't do anything noticeable. It just marks URLs as images.
- If the block action also applies ,
- the presence or absence of this mark decides whether an HTML blocked
- page, or a replacement image (as determined by the set-image-blocker action) will be sent to the
- client as a substitute for the blocked content.
-
-
-
-
-
- Type:
-
-
- Boolean.
-
-
-
-
- Parameter:
-
-
- N/A
-
-
-
-
-
- Notes:
-
-
- The below generic example section is actually part of default.action .
- It marks all URLs with well-known image file name extensions as images and should
- be left intact.
-
-
- Users will probably only want to use the handle-as-image action in conjunction with
- block , to block sources of banners, whose URLs don't
- reflect the file type, like in the second example section.
-
-
- Note that you cannot treat HTML pages as images in most cases. For instance, (in-line) ad
- frames require an HTML page to be sent, or they won't display properly.
- Forcing handle-as-image in this situation will not replace the
- ad frame with an image, but lead to error messages.
-
-
-
-
-
- Example usage (sections):
-
-
- # Generic image extensions:
-#
-{+handle-as-image}
-/.*\.(gif|jpg|jpeg|png|bmp|ico)$
-
-# These don't look like images, but they're banners and should be
-# blocked as images:
-#
-{+block{Nasty banners.} +handle-as-image}
-nasty-banner-server.example.com/junk.cgi\?output=trash
-
-
-
-
-
-
-
-
-
-
-hide-accept-language
-
-
-
- Typical use:
-
- Pretend to use different language settings.
-
-
-
-
- Effect:
-
-
- Deletes or replaces the Accept-Language:
HTTP header in client requests.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- Keyword: block
, or any user defined value.
-
-
-
-
-
- Notes:
-
-
- Faking the browser's language settings can be useful to make a
- foreign User-Agent set with
- hide-user-agent
- more believable.
-
-
- However some sites with content in different languages check the
- Accept-Language:
to decide which one to take by default.
- Sometimes it isn't possible to later switch to another language without
- changing the Accept-Language:
header first.
-
-
- Therefore it's a good idea to either only change the
- Accept-Language:
header to languages you understand,
- or to languages that aren't wide spread.
-
-
- Before setting the Accept-Language:
header
- to a rare language, you should consider that it helps to
- make your requests unique and thus easier to trace.
- If you don't plan to change this header frequently,
- you should stick to a common language.
-
-
-
-
-
- Example usage (section):
-
-
- # Pretend to use Canadian language settings.
-{+hide-accept-language{en-ca} \
-+hide-user-agent{Mozilla/5.0 (X11; U; OpenBSD i386; en-CA; rv:1.8.0.4) Gecko/20060628 Firefox/1.5.0.4} \
-}
-/
-
-
-
-
-
-
-
-
-
-hide-content-disposition
-
-
-
- Typical use:
-
- Prevent download menus for content you prefer to view inside the browser.
-
-
-
-
- Effect:
-
-
- Deletes or replaces the Content-Disposition:
HTTP header set by some servers.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- Keyword: block
, or any user defined value.
-
-
-
-
-
- Notes:
-
-
- Some servers set the Content-Disposition:
HTTP header for
- documents they assume you want to save locally before viewing them.
- The Content-Disposition:
header contains the file name
- the browser is supposed to use by default.
-
-
- In most browsers that understand this header, it makes it impossible to
- just view the document, without downloading it first,
- even if it's just a simple text file or an image.
-
-
- Removing the Content-Disposition:
header helps
- to prevent this annoyance, but some browsers additionally check the
- Content-Type:
header, before they decide if they can
- display a document without saving it first. In these cases, you have
- to change this header as well, before the browser stops displaying
- download menus.
-
-
- It is also possible to change the server's file name suggestion
- to another one, but in most cases it isn't worth the time to set
- it up.
-
-
- This action will probably be removed in the future,
- use server-header filters instead.
-
-
-
-
-
- Example usage:
-
-
- # Disarm the download link in Sourceforge's patch tracker
-{ -filter \
- +content-type-overwrite{text/plain}\
- +hide-content-disposition{block} }
- .sourceforge.net/tracker/download\.php
-
-
-
-
-
-
-
-
-
-hide-if-modified-since
-
-
-
- Typical use:
-
- Prevent yet another way to track the user's steps between sessions.
-
-
-
-
- Effect:
-
-
- Deletes the If-Modified-Since:
HTTP client header or modifies its value.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- Keyword: block
, or a user defined value that specifies a range of hours.
-
-
-
-
-
- Notes:
-
-
- Removing this header is useful for filter testing, where you want to force a real
- reload instead of getting status code 304
, which would cause the
- browser to use a cached copy of the page.
-
-
- Instead of removing the header, hide-if-modified-since can
- also add or subtract a random amount of time to/from the header's value.
- You specify a range of minutes where the random factor should be chosen from and
- Privoxy does the rest. A negative value means
- subtracting, a positive value adding.
-
-
- Randomizing the value of the If-Modified-Since:
makes
- it less likely that the server can use the time as a cookie replacement,
- but you will run into caching problems if the random range is too high.
-
-
- It is a good idea to only use a small negative value and let
- overwrite-last-modified
- handle the greater changes.
-
-
- It is also recommended to use this action together with
- crunch-if-none-match ,
- otherwise it's more or less pointless.
-
-
-
-
-
- Example usage (section):
-
-
- # Let the browser revalidate but make tracking based on the time less likely.
-{+hide-if-modified-since{-60} \
- +overwrite-last-modified{randomize} \
- +crunch-if-none-match}
-/
-
-
-
-
-
-
-
-
-
-
-
-
-
-hide-referrer
-
-
-
- Typical use:
-
- Conceal which link you followed to get to a particular site
-
-
-
-
- Effect:
-
-
- Deletes the Referer:
(sic) HTTP header from the client request,
- or replaces it with a forged one.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
-
- conditional-block
to delete the header completely if the host has changed.
-
-
- conditional-forge
to forge the header if the host has changed.
-
-
- block
to delete the header unconditionally.
-
-
- forge
to pretend to be coming from the homepage of the server we are talking to.
-
-
- Any other string to set a user defined referrer.
-
-
-
-
-
-
- Notes:
-
-
- conditional-block is the only parameter,
- that isn't easily detected in the server's log file. If it blocks the
- referrer, the request will look like the visitor used a bookmark or
- typed in the address directly.
-
-
- Leaving the referrer unmodified for requests on the same host
- allows the server owner to see the visitor's click path
,
- but in most cases she could also get that information by comparing
- other parts of the log file: for example the User-Agent if it isn't
- a very common one, or the user's IP address if it doesn't change between
- different requests.
-
-
- Always blocking the referrer, or using a custom one, can lead to
- failures on servers that check the referrer before they answer any
- requests, in an attempt to prevent their content from being
- embedded or linked to elsewhere.
-
-
- Both conditional-block and forge
- will work with referrer checks, as long as content and valid referring page
- are on the same host. Most of the time that's the case.
-
-
- hide-referer is an alternate spelling of
- hide-referrer and the two can be can be freely
- substituted with each other. (referrer
is the
- correct English spelling, however the HTTP specification has a bug - it
- requires it to be spelled as referer
.)
-
-
-
-
-
- Example usage:
-
-
- +hide-referrer{forge} or
- +hide-referrer{http://www.yahoo.com/}
-
-
-
-
-
-
-
-
-
-hide-user-agent
-
-
-
- Typical use:
-
- Try to conceal your type of browser and client operating system
-
-
-
-
- Effect:
-
-
- Replaces the value of the User-Agent:
HTTP header
- in client requests with the specified value.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- Any user-defined string.
-
-
-
-
-
- Notes:
-
-
-
- This can lead to problems on web sites that depend on looking at this header in
- order to customize their content for different browsers (which, by the
- way, is NOT the right thing to do: good web sites
- work browser-independently).
-
-
-
- Using this action in multi-user setups or wherever different types of
- browsers will access the same Privoxy is
- not recommended . In single-user, single-browser
- setups, you might use it to delete your OS version information from
- the headers, because it is an invitation to exploit known bugs for your
- OS. It is also occasionally useful to forge this in order to access
- sites that won't let you in otherwise (though there may be a good
- reason in some cases).
-
-
- More information on known user-agent strings can be found at
- http://www.user-agents.org/
- and
- http://en.wikipedia.org/wiki/User_agent .
-
-
-
-
-
- Example usage:
-
-
- +hide-user-agent{Netscape 6.1 (X11; I; Linux 2.4.18 i686)}
-
-
-
-
-
-
-
-
-
-limit-connect
-
-
-
- Typical use:
-
- Prevent abuse of Privoxy as a TCP proxy relay or disable SSL for untrusted sites
-
-
-
-
- Effect:
-
-
- Specifies to which ports HTTP CONNECT requests are allowable.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- A comma-separated list of ports or port ranges (the latter using dashes, with the minimum
- defaulting to 0 and the maximum to 65K).
-
-
-
-
-
- Notes:
-
-
- By default, i.e. if no limit-connect action applies,
- Privoxy allows HTTP CONNECT requests to all
- ports. Use limit-connect if fine-grained control
- is desired for some or all destinations.
-
-
- The CONNECT methods exists in HTTP to allow access to secure websites
- (https://
URLs) through proxies. It works very simply:
- the proxy connects to the server on the specified port, and then
- short-circuits its connections to the client and to the remote server.
- This means CONNECT-enabled proxies can be used as TCP relays very easily.
-
-
- Privoxy relays HTTPS traffic without seeing
- the decoded content. Websites can leverage this limitation to circumvent &my-app;'s
- filters. By specifying an invalid port range you can disable HTTPS entirely.
-
-
-
-
-
- Example usages:
-
-
-
-
-
- +limit-connect{443} # Port 443 is OK.
-+limit-connect{80,443} # Ports 80 and 443 are OK.
-+limit-connect{-3, 7, 20-100, 500-} # Ports less than 3, 7, 20 to 100 and above 500 are OK.
-+limit-connect{-} # All ports are OK
-+limit-connect{,} # No HTTPS/SSL traffic is allowed
-
-
-
-
-
-
-
-
-prevent-compression
-
-
-
- Typical use:
-
-
- Ensure that servers send the content uncompressed, so it can be
- passed through filter s.
-
-
-
-
-
- Effect:
-
-
- Removes the Accept-Encoding header which can be used to ask for compressed transfer.
-
-
-
-
-
- Type:
-
-
- Boolean.
-
-
-
-
- Parameter:
-
-
- N/A
-
-
-
-
-
- Notes:
-
-
- More and more websites send their content compressed by default, which
- is generally a good idea and saves bandwidth. But the filter and
- deanimate-gifs
- actions need access to the uncompressed data.
-
-
- When compiled with zlib support (available since &my-app; 3.0.7), content that should be
- filtered is decompressed on-the-fly and you don't have to worry about this action.
- If you are using an older &my-app; version, or one that hasn't been compiled with zlib
- support, this action can be used to convince the server to send the content uncompressed.
-
-
- Most text-based instances compress very well, the size is seldom decreased by less than 50%,
- for markup-heavy instances like news feeds saving more than 90% of the original size isn't
- unusual.
-
-
- Not using compression will therefore slow down the transfer, and you should only
- enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all
- predefined action settings.
-
-
- Note that some (rare) ill-configured sites don't handle requests for uncompressed
- documents correctly. Broken PHP applications tend to send an empty document body,
- some IIS versions only send the beginning of the content. If you enable
- prevent-compression per default, you might want to add
- exceptions for those sites. See the example for how to do that.
-
-
-
-
-
- Example usage (sections):
-
-
-
-# Selectively turn off compression, and enable a filter
-#
-{ +filter{tiny-textforms} +prevent-compression }
-# Match only these sites
- .google.
- sourceforge.net
- sf.net
-
-# Or instead, we could set a universal default:
-#
-{ +prevent-compression }
- / # Match all sites
-
-# Then maybe make exceptions for broken sites:
-#
-{ -prevent-compression }
-.compusa.com/
-
-
-
-
-
-
-
-
-
-
-overwrite-last-modified
-
-
-
- Typical use:
-
- Prevent yet another way to track the user's steps between sessions.
-
-
-
-
- Effect:
-
-
- Deletes the Last-Modified:
HTTP server header or modifies its value.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
- One of the keywords: block
, reset-to-request-time
- and randomize
-
-
-
-
-
- Notes:
-
-
- Removing the Last-Modified:
header is useful for filter
- testing, where you want to force a real reload instead of getting status
- code 304
, which would cause the browser to reuse the old
- version of the page.
-
-
- The randomize
option overwrites the value of the
- Last-Modified:
header with a randomly chosen time
- between the original value and the current time. In theory the server
- could send each document with a different Last-Modified:
- header to track visits without using cookies. Randomize
- makes it impossible and the browser can still revalidate cached documents.
-
-
- reset-to-request-time
overwrites the value of the
- Last-Modified:
header with the current time. You could use
- this option together with
- hide-if-modified-since
- to further customize your random range.
-
-
- The preferred parameter here is randomize
. It is safe
- to use, as long as the time settings are more or less correct.
- If the server sets the Last-Modified:
header to the time
- of the request, the random range becomes zero and the value stays the same.
- Therefore you should later randomize it a second time with
- hided-if-modified-since ,
- just to be sure.
-
-
- It is also recommended to use this action together with
- crunch-if-none-match .
-
-
-
-
-
- Example usage:
-
-
- # Let the browser revalidate without being tracked across sessions
-{ +hide-if-modified-since{-60} \
- +overwrite-last-modified{randomize} \
- +crunch-if-none-match}
-/
-
-
-
-
-
-
-
-
-
-redirect
-
-
-
- Typical use:
-
-
- Redirect requests to other sites.
-
-
-
-
-
- Effect:
-
-
- Convinces the browser that the requested document has been moved
- to another location and the browser should get it from there.
-
-
-
-
-
- Type:
-
-
- Parameterized
-
-
-
-
- Parameter:
-
-
- An absolute URL or a single pcrs command.
-
-
-
-
-
- Notes:
-
-
- Requests to which this action applies are answered with a
- HTTP redirect to URLs of your choosing. The new URL is
- either provided as parameter, or derived by applying a
- single pcrs command to the original URL.
-
-
- This action will be ignored if you use it together with
- block .
- It can be combined with
- fast-redirects{check-decoded-url}
- to redirect to a decoded version of a rewritten URL.
-
-
- Use this action carefully, make sure not to create redirection loops
- and be aware that using your own redirects might make it
- possible to fingerprint your requests.
-
-
- In case of problems with your redirects, or simply to watch
- them working, enable debug 128.
-
-
-
-
-
- Example usages:
-
-
- # Replace example.com's style sheet with another one
-{ +redirect{http://localhost/css-replacements/example.com.css} }
- example.com/stylesheet\.css
-
-# Create a short, easy to remember nickname for a favorite site
-# (relies on the browser accept and forward invalid URLs to &my-app;)
-{ +redirect{http://www.privoxy.org/user-manual/actions-file.html} }
- a
-
-# Always use the expanded view for Undeadly.org articles
-# (Note the $ at the end of the URL pattern to make sure
-# the request for the rewritten URL isn't redirected as well)
-{+redirect{s@$@&mode=expanded@}}
-undeadly.org/cgi\?action=article&sid=\d*$
-
-# Redirect Google search requests to MSN
-{+redirect{s@^http://[^/]*/search\?q=([^&]*).*@http://search.msn.com/results.aspx?q=$1@}}
-.google.com/search
-
-# Redirect MSN search requests to Yahoo
-{+redirect{s@^http://[^/]*/results\.aspx\?q=([^&]*).*@http://search.yahoo.com/search?p=$1@}}
-search.msn.com//results\.aspx\?q=
-
-# Redirect remote requests for this manual
-# to the local version delivered by Privoxy
-{+redirect{s@^http://www@http://config@}}
-www.privoxy.org/user-manual/
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-session-cookies-only
-
-
-
- Typical use:
-
-
- Allow only temporary session
cookies (for the current
- browser session only ).
-
-
-
-
-
- Effect:
-
-
- Deletes the expires
field from Set-Cookie:
- server headers. Most browsers will not store such cookies permanently and
- forget them in between sessions.
-
-
-
-
-
- Type:
-
-
- Boolean.
-
-
-
-
- Parameter:
-
-
- N/A
-
-
-
-
-
- Notes:
-
-
- This is less strict than crunch-incoming-cookies /
- crunch-outgoing-cookies and allows you to browse
- websites that insist or rely on setting cookies, without compromising your privacy too badly.
-
-
- Most browsers will not permanently store cookies that have been processed by
- session-cookies-only and will forget about them between sessions.
- This makes profiling cookies useless, but won't break sites which require cookies so
- that you can log in for transactions. This is generally turned on for all
- sites, and is the recommended setting.
-
-
- It makes no sense at all to use session-cookies-only
- together with crunch-incoming-cookies or
- crunch-outgoing-cookies . If you do, cookies
- will be plainly killed.
-
-
- Note that it is up to the browser how it handles such cookies without an expires
- field. If you use an exotic browser, you might want to try it out to be sure.
-
-
- This setting also has no effect on cookies that may have been stored
- previously by the browser before starting Privoxy .
- These would have to be removed manually.
-
-
- Privoxy also uses
- the content-cookies filter
- to block some types of cookies. Content cookies are not effected by
- session-cookies-only .
-
-
-
-
-
- Example usage:
-
-
- +session-cookies-only
-
-
-
-
-
-
-
-
-
-set-image-blocker
-
-
-
- Typical use:
-
- Choose the replacement for blocked images
-
-
-
-
- Effect:
-
-
- This action alone doesn't do anything noticeable. If both
- block and handle-as-image also
- apply, i.e. if the request is to be blocked as an image,
- then the parameter of this action decides what will be
- sent as a replacement.
-
-
-
-
-
- Type:
-
-
- Parameterized.
-
-
-
-
- Parameter:
-
-
-
-
- pattern
to send a built-in checkerboard pattern image. The image is visually
- decent, scales very well, and makes it obvious where banners were busted.
-
-
-
-
- blank
to send a built-in transparent image. This makes banners disappear
- completely, but makes it hard to detect where Privoxy has blocked
- images on a given page and complicates troubleshooting if Privoxy
- has blocked innocent images, like navigation icons.
-
-
-
-
- target-url
to
- send a redirect to target-url . You can redirect
- to any image anywhere, even in your local filesystem via file:///
URL.
- (But note that not all browsers support redirecting to a local file system).
-
-
- A good application of redirects is to use special Privoxy -built-in
- URLs, which send the built-in images, as target-url .
- This has the same visual effect as specifying blank
or pattern
in
- the first place, but enables your browser to cache the replacement image, instead of requesting
- it over and over again.
-
-
-
-
-
-
-
- Notes:
-
-
- The URLs for the built-in images are http://config.privoxy.org/send-banner?type=type
, where type is
- either blank
or pattern
.
-
-
- There is a third (advanced) type, called auto
. It is NOT to be
- used in set-image-blocker , but meant for use from filters.
- Auto will select the type of image that would have applied to the referring page, had it been an image.
-
-
-
-
-
- Example usage:
-
-
- Built-in pattern:
-
-
- +set-image-blocker{pattern}
-
-
- Redirect to the BSD daemon:
-
-
- +set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif}
-
-
- Redirect to the built-in pattern for better caching:
-
-
- +set-image-blocker{http://config.privoxy.org/send-banner?type=pattern}
-
-
-
-
-
-
-
-
-
-Summary
-
- Note that many of these actions have the potential to cause a page to
- misbehave, possibly even not to display at all. There are many ways
- a site designer may choose to design his site, and what HTTP header
- content, and other criteria, he may depend on. There is no way to have hard
- and fast rules for all sites. See the Appendix for a brief example on troubleshooting
- actions.
-
-
-
-
-
-
-Aliases
-
- Custom actions
, known to Privoxy
- as aliases
, can be defined by combining other actions.
- These can in turn be invoked just like the built-in actions.
- Currently, an alias name can contain any character except space, tab,
- =
,
- {
and }
, but we strongly
- recommend that you only use a
to z
,
- 0
to 9
, +
, and -
.
- Alias names are not case sensitive, and are not required to start with a
- +
or -
sign, since they are merely textually
- expanded.
-
-
- Aliases can be used throughout the actions file, but they must be
- defined in a special section at the top of the file!
- And there can only be one such section per actions file. Each actions file may
- have its own alias section, and the aliases defined in it are only visible
- within that file.
-
-
- There are two main reasons to use aliases: One is to save typing for frequently
- used combinations of actions, the other one is a gain in flexibility: If you
- decide once how you want to handle shops by defining an alias called
- shop
, you can later change your policy on shops in
- one place, and your changes will take effect everywhere
- in the actions file where the shop
alias is used. Calling aliases
- by their purpose also makes your actions files more readable.
-
-
- Currently, there is one big drawback to using aliases, though:
- Privoxy 's built-in web-based action file
- editor honors aliases when reading the actions files, but it expands
- them before writing. So the effects of your aliases are of course preserved,
- but the aliases themselves are lost when you edit sections that use aliases
- with it.
-
-
-
- Now let's define some aliases...
-
-
-
-
- # Useful custom aliases we can use later.
- #
- # Note the (required!) section header line and that this section
- # must be at the top of the actions file!
- #
- {{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
- -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
- +block-as-image = +block{Blocked image.} +handle-as-image
- allow-all-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer - prevent-compression
-
- shop = -crunch-all-cookies - filter{all-popups}
-
- # Short names for other aliases, for really lazy people ;-)
- #
- c0 = +crunch-all-cookies
- c1 = -crunch-all-cookies
-
-
-
- ...and put them to use. These sections would appear in the lower part of an
- actions file and define exceptions to the default actions (as specified further
- up for the /
pattern):
-
-
-
-
- # These sites are either very complex or very keen on
- # user data and require minimal interference to work:
- #
- {fragile}
- .office.microsoft.com
- .windowsupdate.microsoft.com
- # Gmail is really mail.google.com, not gmail.com
- mail.google.com
-
- # Shopping sites:
- # Allow cookies (for setting and retrieving your customer data)
- #
- {shop}
- .quietpc.com
- .worldpay.com # for quietpc.com
- mybank.example.com
-
- # These shops require pop-ups:
- #
- {-filter{all-popups} -filter{unsolicited-popups}}
- .dabs.com
- .overclockers.co.uk
-
-
-
- Aliases like shop
and fragile
are typically used for
- problem
sites that require more than one action to be disabled
- in order to function properly.
-
-
-
-
-
-Actions Files Tutorial
-
- The above chapters have shown which actions files
- there are and how they are organized, how actions are specified and applied
- to URLs, how patterns work, and how to
- define and use aliases. Now, let's look at an
- example match-all.action , default.action
- and user.action file and see how all these pieces come together:
-
-
-
-match-all.action
-
- Remember all actions are disabled when matching starts ,
- so we have to explicitly enable the ones we want.
-
-
-
- While the match-all.action file only contains a
- single section, it is probably the most important one. It has only one
- pattern, /
, but this pattern
- matches all URLs. Therefore, the set of
- actions used in this default
section will
- be applied to all requests as a start . It can be partly or
- wholly overridden by other actions files like default.action
- and user.action , but it will still be largely responsible
- for your overall browsing experience.
-
-
-
- Again, at the start of matching, all actions are disabled, so there is
- no need to disable any actions here. (Remember: a +
- preceding the action name enables the action, a -
disables!).
- Also note how this long line has been made more readable by splitting it into
- multiple lines with line continuation.
-
-
-
-
-{ \
- + change-x-forwarded-for{block} \
- + hide-from-header{block} \
- + set-image-blocker{pattern} \
-}
-/ # Match all URLs
-
-
-
-
- The default behavior is now set.
-
-
-
-
-default.action
-
-
- If you aren't a developer, there's no need for you to edit the
- default.action file. It is maintained by
- the &my-app; developers and if you disagree with some of the
- sections, you should overrule them in your user.action .
-
-
-
- Understanding the default.action file can
- help you with your user.action , though.
-
-
-
- The first section in this file is a special section for internal use
- that prevents older &my-app; versions from reading the file:
-
-
-
-
-##########################################################################
-# Settings -- Don't change! For internal Privoxy use ONLY.
-##########################################################################
-{{settings}}
-for-privoxy-version=3.0.11
-
-
-
- After that comes the (optional) alias section. We'll use the example
- section from the above chapter on aliases,
- that also explains why and how aliases are used:
-
-
-
-
-##########################################################################
-# Aliases
-##########################################################################
-{{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
- -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
- +block-as-image = +block{Blocked image.} +handle-as-image
- mercy-for-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer
- shop = -crunch-all-cookies - filter{all-popups}
-
-
-
- The first of our specialized sections is concerned with fragile
- sites, i.e. sites that require minimum interference, because they are either
- very complex or very keen on tracking you (and have mechanisms in place that
- make them unusable for people who avoid being tracked). We will simply use
- our pre-defined fragile alias instead of stating the list
- of actions explicitly:
-
-
-
-
-##########################################################################
-# Exceptions for sites that'll break under the default action set:
-##########################################################################
-
-# "Fragile" Use a minimum set of actions for these sites (see alias above):
-#
-{ fragile }
-.office.microsoft.com # surprise, surprise!
-.windowsupdate.microsoft.com
-mail.google.com
-
-
-
- Shopping sites are not as fragile, but they typically
- require cookies to log in, and pop-up windows for shopping
- carts or item details. Again, we'll use a pre-defined alias:
-
-
-
-
-# Shopping sites:
-#
-{ shop }
-.quietpc.com
-.worldpay.com # for quietpc.com
-.jungle.com
-.scan.co.uk
-
-
-
- The fast-redirects
- action, which may have been enabled in match-all.action ,
- breaks some sites. So disable it for popular sites where we know it misbehaves:
-
-
-
-
-{ - fast-redirects }
-login.yahoo.com
-edit.*.yahoo.com
-.google.com
-.altavista.com/.*(like|url|link):http
-.altavista.com/trans.*urltext=http
-.nytimes.com
-
-
-
- It is important that Privoxy knows which
- URLs belong to images, so that if they are to
- be blocked, a substitute image can be sent, rather than an HTML page.
- Contacting the remote site to find out is not an option, since it
- would destroy the loading time advantage of banner blocking, and it
- would feed the advertisers information about you. We can mark any
- URL as an image with the handle-as-image action,
- and marking all URLs that end in a known image file extension is a
- good start:
-
-
-
-
-##########################################################################
-# Images:
-##########################################################################
-
-# Define which file types will be treated as images, in case they get
-# blocked further down this file:
-#
-{ + handle-as-image }
-/.*\.(gif|jpe?g|png|bmp|ico)$
-
-
-
- And then there are known banner sources. They often use scripts to
- generate the banners, so it won't be visible from the URL that the
- request is for an image. Hence we block them and
- mark them as images in one go, with the help of our
- +block-as-image alias defined above. (We could of
- course just as well use + block
- + handle-as-image here.)
- Remember that the type of the replacement image is chosen by the
- set-image-blocker
- action. Since all URLs have matched the default section with its
- + set-image-blocker{pattern}
- action before, it still applies and needn't be repeated:
-
-
-
-
-# Known ad generators:
-#
-{ +block-as-image }
-ar.atwola.com
-.ad.doubleclick.net
-.ad.*.doubleclick.net
-.a.yimg.com/(?:(?!/i/).)*$
-.a[0-9].yimg.com/(?:(?!/i/).)*$
-bs*.gsanet.com
-.qkimg.net
-
-
-
- One of the most important jobs of Privoxy
- is to block banners. Many of these can be blocked
- by the filter{banners-by-size}
- action, which we enabled above, and which deletes the references to banner
- images from the pages while they are loaded, so the browser doesn't request
- them anymore, and hence they don't need to be blocked here. But this naturally
- doesn't catch all banners, and some people choose not to use filters, so we
- need a comprehensive list of patterns for banner URLs here, and apply the
- block action to them.
-
-
- First comes many generic patterns, which do most of the work, by
- matching typical domain and path name components of banners. Then comes
- a list of individual patterns for specific sites, which is omitted here
- to keep the example short:
-
-
-
-
-##########################################################################
-# Block these fine banners:
-##########################################################################
-{ +block{Banner ads.} }
-
-# Generic patterns:
-#
-ad*.
-.*ads.
-banner?.
-count*.
-/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
-/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
-
-# Site-specific patterns (abbreviated):
-#
-.hitbox.com
-
-
-
- It's quite remarkable how many advertisers actually call their banner
- servers ads.company .com, or call the directory
- in which the banners are stored simply banners
. So the above
- generic patterns are surprisingly effective.
-
-
- But being very generic, they necessarily also catch URLs that we don't want
- to block. The pattern .*ads. e.g. catches
- nasty-ads .nasty-corp.com
as intended,
- but also downloads .sourcefroge.net
or
- ads l.some-provider.net.
So here come some
- well-known exceptions to the + block
- section above.
-
-
- Note that these are exceptions to exceptions from the default! Consider the URL
- downloads.sourcefroge.net
: Initially, all actions are deactivated,
- so it wouldn't get blocked. Then comes the defaults section, which matches the
- URL, but just deactivates the block
- action once again. Then it matches .*ads. , an exception to the
- general non-blocking policy, and suddenly
- +block applies. And now, it'll match
- .*loads. , where -block
- applies, so (unless it matches again further down) it ends up
- with no block action applying.
-
-
-
-
-##########################################################################
-# Save some innocent victims of the above generic block patterns:
-##########################################################################
-
-# By domain:
-#
-{ - block }
-adv[io]*. # (for advogato.org and advice.*)
-adsl. # (has nothing to do with ads)
-adobe. # (has nothing to do with ads either)
-ad[ud]*. # (adult.* and add.*)
-.edu # (universities don't host banners (yet!))
-.*loads. # (downloads, uploads etc)
-
-# By path:
-#
-/.*loads/
-
-# Site-specific:
-#
-www.globalintersec.com/adv # (adv = advanced)
-www.ugu.com/sui/ugu/adv
-
-
-
- Filtering source code can have nasty side effects,
- so make an exception for our friends at sourceforge.net,
- and all paths with cvs
in them. Note that
- - filter
- disables all filters in one fell swoop!
-
-
-
-
-# Don't filter code!
-#
-{ - filter }
-/(.*/)?cvs
-bugzilla.
-developer.
-wiki.
-.sourceforge.net
-
-
-
- The actual default.action is of course much more
- comprehensive, but we hope this example made clear how it works.
-
-
-
-
-user.action
-
-
- So far we are painting with a broad brush by setting general policies,
- which would be a reasonable starting point for many people. Now,
- you might want to be more specific and have customized rules that
- are more suitable to your personal habits and preferences. These would
- be for narrowly defined situations like your ISP or your bank, and should
- be placed in user.action , which is parsed after all other
- actions files and hence has the last word, over-riding any previously
- defined actions. user.action is also a
- safe place for your personal settings, since
- default.action is actively maintained by the
- Privoxy developers and you'll probably want
- to install updated versions from time to time.
-
-
-
- So let's look at a few examples of things that one might typically do in
- user.action :
-
-
-
-
-
-
-
-# My user.action file. <fred@example.com>
-
-
-
- As aliases are local to the actions
- file that they are defined in, you can't use the ones from
- default.action , unless you repeat them here:
-
-
-
-
-# Aliases are local to the file they are defined in.
-# (Re-)define aliases for this file:
-#
-{{alias}}
-#
-# These aliases just save typing later, and the alias names should
-# be self explanatory.
-#
-+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
--crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
- allow-all-cookies = -crunch-all-cookies -session-cookies-only
- allow-popups = -filter{all-popups}
-+block-as-image = +block{Blocked as image.} +handle-as-image
--block-as-image = -block
-
-# These aliases define combinations of actions that are useful for
-# certain types of sites:
-#
-fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
-shop = -crunch-all-cookies allow-popups
-
-# Allow ads for selected useful free sites:
-#
-allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
-
-# Alias for specific file types that are text, but might have conflicting
-# MIME types. We want the browser to force these to be text documents.
-handle-as-text = - filter +- content-type-overwrite{text/plain} +- force-text-mode - hide-content-disposition
-
-
-
-
- Say you have accounts on some sites that you visit regularly, and
- you don't want to have to log in manually each time. So you'd like
- to allow persistent cookies for these sites. The
- allow-all-cookies alias defined above does exactly
- that, i.e. it disables crunching of cookies in any direction, and the
- processing of cookies to make them only temporary.
-
-
-
-
-{ allow-all-cookies }
- sourceforge.net
- .yahoo.com
- .msdn.microsoft.com
- .redhat.com
-
-
-
- Your bank is allergic to some filter, but you don't know which, so you disable them all:
-
-
-
-
-{ - filter }
- .your-home-banking-site.com
-
-
-
- Some file types you may not want to filter for various reasons:
-
-
-
-
-# Technical documentation is likely to contain strings that might
-# erroneously get altered by the JavaScript-oriented filters:
-#
-.tldp.org
-/(.*/)?selfhtml/
-
-# And this stupid host sends streaming video with a wrong MIME type,
-# so that Privoxy thinks it is getting HTML and starts filtering:
-#
-stupid-server.example.com/
-
-
-
- Example of a simple block action. Say you've
- seen an ad on your favourite page on example.com that you want to get rid of.
- You have right-clicked the image, selected copy image location
- and pasted the URL below while removing the leading http://, into a
- { +block{} } section. Note that { +handle-as-image
- } need not be specified, since all URLs ending in
- .gif will be tagged as images by the general rules as set
- in default.action anyway:
-
-
-
-
-{ + block{Nasty ads.} }
- www.example.com/nasty-ads/sponsor\.gif
- another.example.net/more/junk/here/
-
-
-
- The URLs of dynamically generated banners, especially from large banner
- farms, often don't use the well-known image file name extensions, which
- makes it impossible for Privoxy to guess
- the file type just by looking at the URL.
- You can use the +block-as-image alias defined above for
- these cases.
- Note that objects which match this rule but then turn out NOT to be an
- image are typically rendered as a broken image
icon by the
- browser. Use cautiously.
-
-
-
-
-{ +block-as-image }
- .doubleclick.net
- .fastclick.net
- /Realmedia/ads/
- ar.atwola.com/
-
-
-
- Now you noticed that the default configuration breaks Forbes Magazine,
- but you were too lazy to find out which action is the culprit, and you
- were again too lazy to give feedback, so
- you just used the fragile alias on the site, and
- -- whoa! -- it worked. The fragile
- aliases disables those actions that are most likely to break a site. Also,
- good for testing purposes to see if it is Privoxy
- that is causing the problem or not. We later find other regular sites
- that misbehave, and add those to our personalized list of troublemakers:
-
-
-
-
-{ fragile }
- .forbes.com
- webmail.example.com
- .mybank.com
-
-
-
- You like the fun
text replacements in default.filter ,
- but it is disabled in the distributed actions file.
- So you'd like to turn it on in your private,
- update-safe config, once and for all:
-
-
-
-
-{ + filter{fun} }
- / # For ALL sites!
-
-
-
- Note that the above is not really a good idea: There are exceptions
- to the filters in default.action for things that
- really shouldn't be filtered, like code on CVS->Web interfaces. Since
- user.action has the last word, these exceptions
- won't be valid for the fun
filtering specified here.
-
-
-
- You might also worry about how your favourite free websites are
- funded, and find that they rely on displaying banner advertisements
- to survive. So you might want to specifically allow banners for those
- sites that you feel provide value to you:
-
+ +filter{unsolicited-popups} # Disable only unsolicited pop-up windows.
+
+
+
+
+
+ +filter{img-reorder} # Reorder attributes in <img> tags to make the banners-by-* filters more effective.
+
+
+
+ +filter{banners-by-size} # Kill banners by size.
+
+
+
+ +filter{banners-by-link} # Kill banners by their links to known clicktrackers.
+
+
+
+ +filter{webbugs} # Squish WebBugs (1x1 invisible GIFs used for user tracking).
+
+
+
+ +filter{tiny-textforms} # Extend those tiny textareas up to 40x80 and kill the hard wrap.
+
+
+
+ +filter{jumping-windows} # Prevent windows from resizing and moving themselves.
+
+
+
+ +filter{frameset-borders} # Give frames a border and make them resizable.
+
+
+
+ +filter{iframes} # Removes all detected iframes. Should only be enabled for individual sites.
+
+
+
+ +filter{demoronizer} # Fix MS's non-standard use of standard charsets.
+
+
+
+ +filter{shockwave-flash} # Kill embedded Shockwave Flash objects.
+
+
+
+ +filter{quicktime-kioskmode} # Make Quicktime movies saveable.
+
+
+
+ +filter{fun} # Text replacements for subversive browsing fun!
+
+
+
+ +filter{crude-parental} # Crude parental filtering. Note that this filter doesn't work reliably.
+
+
+
+ +filter{ie-exploits} # Disable some known Internet Explorer bug exploits.
+
+
+
+ +filter{site-specifics} # Cure for site-specific problems. Don't apply generally!
+
+
+
+ +filter{no-ping} # Removes non-standard ping attributes in <a> and <area> tags.
+
+
+
+ +filter{google} # CSS-based block for Google text ads. Also removes a width limitation and the toolbar advertisement.
+
+
+
+ +filter{yahoo} # CSS-based block for Yahoo text ads. Also removes a width limitation.
+
+
+
+ +filter{msn} # CSS-based block for MSN text ads. Also removes tracking URLs and a width limitation.
+
+
+
+ +filter{blogspot} # Cleans up some Blogspot blogs. Read the fine print before using this.
+
+
+
+
+
-
-
-{ allow-ads }
- .sourceforge.net
- .slashdot.org
- .osdn.net
-
-
- Note that allow-ads has been aliased to
- - block ,
- - filter{banners-by-size} , and
- - filter{banners-by-link} above.
-
+
+
+force-text-mode
+
+
+
+ Typical use:
+
+ Force Privoxy to treat a document as if it was in some kind of text format.
+
+
-
- Invoke another alias here to force an over-ride of the MIME type
- application/x-sh which typically would open a download type
- dialog. In my case, I want to look at the shell script, and then I can save
- it should I choose to.
-
+
+ Effect:
+
+
+ Declares a document as text, even if the Content-Type:
isn't detected as such.
+
+
+
-
-
-{ handle-as-text }
- /.*\.sh$
-
+
+ Type:
+
+
+ Boolean.
+
+
-
- user.action is generally the best place to define
- exceptions and additions to the default policies of
- default.action . Some actions are safe to have their
- default policies set here though. So let's set a default policy to have a
- blank
image as opposed to the checkerboard pattern for
- ALL sites. /
of course matches all URL
- paths and patterns:
-
+
+ Parameter:
+
+
+ N/A
+
+
+
-
-
-{ + set-image-blocker{blank} }
-/ # ALL sites
-
+
+ Notes:
+
+
+ As explained above ,
+ Privoxy tries to only filter files that are
+ in some kind of text format. The same restrictions apply to
+ content-type-overwrite .
+ force-text-mode declares a document as text,
+ without looking at the Content-Type:
first.
+
+
+
+ Think twice before activating this action. Filtering binary data
+ with regular expressions can cause file damage.
+
+
+
+
+
+ Example usage:
+
+
+
++force-text-mode
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-Filter Files
-
- On-the-fly text substitutions need
- to be defined in a filter file
. Once defined, they
- can then be invoked as an action
.
-
+
+
+forward-override
+
+
+
+ Typical use:
+
+ Change the forwarding settings based on User-Agent or request origin
+
+
-
- &my-app; supports three different filter actions:
- filter to
- rewrite the content that is send to the client,
- client-header-filter
- to rewrite headers that are send by the client, and
- server-header-filter
- to rewrite headers that are send by the server.
-
+
+ Effect:
+
+
+ Overrules the forward directives in the configuration file.
+
+
+
-
- &my-app; also supports two tagger actions:
- client-header-tagger
- and
- server-header-tagger .
- Taggers and filters use the same syntax in the filter files, the difference
- is that taggers don't modify the text they are filtering, but use a rewritten
- version of the filtered text as tag. The tags can then be used to change the
- applying actions through sections with tag-patterns.
-
+
+ Type:
+
+
+ Parameterized.
+
+
+
+ Parameter:
+
+
+
+ forward .
to use a direct connection without any additional proxies.
+
+
+
+ forward 127.0.0.1:8123
to use the HTTP proxy listening at 127.0.0.1 port 8123.
+
+
+
+
+ forward-socks4a 127.0.0.1:9050 .
to use the socks4a proxy listening at
+ 127.0.0.1 port 9050. Replace forward-socks4a
with forward-socks4
+ to use a socks4 connection (with local DNS resolution) instead, use forward-socks5
+ for socks5 connections (with remote DNS resolution).
+
+
+
+
+ forward-socks4a 127.0.0.1:9050 proxy.example.org:8000
to use the socks4a proxy
+ listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000.
+ Replace forward-socks4a
with forward-socks4
to use a socks4 connection
+ (with local DNS resolution) instead, use forward-socks5
+ for socks5 connections (with remote DNS resolution).
+
+
+
+
+ forward-webserver 127.0.0.1:80
to use the HTTP
+ server listening at 127.0.0.1 port 80 without adjusting the
+ request headers.
+
+
+ This makes it more convenient to use Privoxy to make
+ existing websites available as onion services as well.
+
+
+ Many websites serve content with hardcoded URLs and
+ can't be easily adjusted to change the domain based
+ on the one used by the client.
+
+
+ Putting Privoxy between Tor and the webserver (or an stunnel
+ that forwards to the webserver) allows to rewrite headers and
+ content to make client and server happy at the same time.
+
+
+ Using Privoxy for webservers that are only reachable through
+ onion addresses and whose location is supposed to be secret
+ is not recommended and should not be necessary anyway.
+
+
+
+
+
-
- Multiple filter files can be defined through the filterfile config directive. The filters
- as supplied by the developers are located in
- default.filter . It is recommended that any locally
- defined or modified filters go in a separately defined file such as
- user.filter .
-
+
+ Notes:
+
+
+ This action takes parameters similar to the
+ forward directives in the configuration
+ file, but without the URL pattern. It can be used as replacement, but normally it's only
+ used in cases where matching based on the request URL isn't sufficient.
+
+
+
+ Please read the description for the forward directives before
+ using this action. Forwarding to the wrong people will reduce your privacy and increase the
+ chances of man-in-the-middle attacks.
+
+
+ If the ports are missing or invalid, default values will be used. This might change
+ in the future and you shouldn't rely on it. Otherwise incorrect syntax causes Privoxy
+ to exit. Due to design limitations, invalid parameter syntax isn't detected until the
+ action is used the first time.
+
+
+ Use the show-url-info CGI page
+ to verify that your forward settings do what you thought the do.
+
+
+
+
-
- Common tasks for content filters are to eliminate common annoyances in
- HTML and JavaScript, such as pop-up windows,
- exit consoles, crippled windows without navigation tools, the
- infamous <BLINK> tag etc, to suppress images with certain
- width and height attributes (standard banner sizes or web-bugs),
- or just to have fun.
-
+
+ Example usage:
+
+
+
+# Use an ssh tunnel for requests previously tagged as
+# User-Agent: fetch libfetch/2.0
and make sure
+# resuming downloads continues to work.
+#
+# This way you can continue to use Tor for your normal browsing,
+# without overloading the Tor network with your FreeBSD ports updates
+# or downloads of bigger files like ISOs.
+#
+# Note that HTTP headers are easy to fake and therefore their
+# values are as (un)trustworthy as your clients and users.
+{+forward-override{forward-socks5 10.0.0.2:2222 .} \
+ -hide-if-modified-since \
+ -overwrite-last-modified \
+}
+TAG:^User-Agent: fetch libfetch/2\.0$
+
+
+
+
+
+
-
- Enabled content filters are applied to any content whose
- Content Type
header is recognised as a sign
- of text-based content, with the exception of text/plain .
- Use the force-text-mode action
- to also filter other content.
-
-
- Substitutions are made at the source level, so if you want to roll
- your own
filters, you should first be familiar with HTML syntax,
- and, of course, regular expressions.
-
+
+
+handle-as-empty-document
+
+
+
+ Typical use:
+
+ Mark URLs that should be replaced by empty documents if they get blocked
+
+
-
- Just like the actions files, the
- filter file is organized in sections, which are called filters
- here. Each filter consists of a heading line, that starts with one of the
- keywords FILTER: ,
- CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER:
- followed by the filter's name , and a short (one line)
- description of what it does. Below that line
- come the jobs , i.e. lines that define the actual
- text substitutions. By convention, the name of a filter
- should describe what the filter eliminates . The
- comment is used in the web-based
- user interface .
-
+
+ Effect:
+
+
+ This action alone doesn't do anything noticeable. It just marks URLs.
+ If the block action also applies ,
+ the presence or absence of this mark decides whether an HTML BLOCKED
+ page, or an empty document will be sent to the client as a substitute for the blocked content.
+ The empty document isn't literally empty, but actually contains a single space.
+
+
+
-
- Once a filter called name has been defined
- in the filter file, it can be invoked by using an action of the form
- + filter{name }
- in any actions file.
-
+
+ Type:
+
+
+ Boolean.
+
+
-
- Filter definitions start with a header line that contains the filter
- type, the filter name and the filter description.
- A content filter header line for a filter called foo
could look
- like this:
-
+
+ Parameter:
+
+
+ N/A
+
+
+
-
- FILTER: foo Replace all "foo" with "bar"
-
+
+ Notes:
+
+
+ Some browsers complain about syntax errors if JavaScript documents
+ are blocked with Privoxy's
+ default HTML page; this option can be used to silence them.
+ And of course this action can also be used to eliminate the &my-app;
+ BLOCKED message in frames.
+
+
+ The content type for the empty document can be specified with
+ content-type-overwrite{} ,
+ but usually this isn't necessary.
+
+
+
-
- Below that line, and up to the next header line, come the jobs that
- define what text replacements the filter executes. They are specified
- in a syntax that imitates Perl 's
- s/// operator. If you are familiar with Perl, you
- will find this to be quite intuitive, and may want to look at the
- PCRS documentation for the subtle differences to Perl behaviour. Most
- notably, the non-standard option letter U is supported,
- which turns the default to ungreedy matching.
-
+
+ Example usage:
+
+
+ # Block all documents on example.org that end with ".js",
+# but send an empty document instead of the usual HTML message.
+{+block{Blocked JavaScript} +handle-as-empty-document}
+example.org/.*\.js$
+
+
+
+
+
+
-
- If you are new to
- Regular
- Expressions
, you might want to take a look at
- the Appendix on regular expressions, and
- see the Perl
- manual for
- the
- s/// operator's syntax and Perl-style regular
- expressions in general.
- The below examples might also help to get you started.
-
+
+
+handle-as-image
-
+
+
+ Typical use:
+
+ Mark URLs as belonging to images (so they'll be replaced by images if they do get blocked , rather than HTML pages)
+
+
-Filter File Tutorial
-
- Now, let's complete our foo
content filter. We have already defined
- the heading, but the jobs are still missing. Since all it does is to replace
- foo
with bar
, there is only one (trivial) job
- needed:
-
+
+ Effect:
+
+
+ This action alone doesn't do anything noticeable. It just marks URLs as images.
+ If the block action also applies ,
+ the presence or absence of this mark decides whether an HTML blocked
+ page, or a replacement image (as determined by the set-image-blocker action) will be sent to the
+ client as a substitute for the blocked content.
+
+
+
-
- s/foo/bar/
-
+
+ Type:
+
+
+ Boolean.
+
+
-
- But wait! Didn't the comment say that all occurrences
- of foo
should be replaced? Our current job will only take
- care of the first foo
on each page. For global substitution,
- we'll need to add the g option:
-
+
+ Parameter:
+
+
+ N/A
+
+
+
-
- s/foo/bar/g
-
+
+ Notes:
+
+
+ The below generic example section is actually part of default.action .
+ It marks all URLs with well-known image file name extensions as images and should
+ be left intact.
+
+
+ Users will probably only want to use the handle-as-image action in conjunction with
+ block , to block sources of banners, whose URLs don't
+ reflect the file type, like in the second example section.
+
+
+ Note that you cannot treat HTML pages as images in most cases. For instance, (in-line) ad
+ frames require an HTML page to be sent, or they won't display properly.
+ Forcing handle-as-image in this situation will not replace the
+ ad frame with an image, but lead to error messages.
+
+
+
-
- Our complete filter now looks like this:
-
-
- FILTER: foo Replace all "foo" with "bar"
-s/foo/bar/g
-
+
+ Example usage (sections):
+
+
+ # Generic image extensions:
+#
+{+handle-as-image}
+/.*\.(gif|jpg|jpeg|png|bmp|ico)$
-
- Let's look at some real filters for more interesting examples. Here you see
- a filter that protects against some common annoyances that arise from JavaScript
- abuse. Let's look at its jobs one after the other:
-
+# These don't look like images, but they're banners and should be
+# blocked as images:
+#
+{+block{Nasty banners.} +handle-as-image}
+nasty-banner-server.example.com/junk.cgi\?output=trash
+
+
+
+
+
+
-
-
-FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
+
+
+hide-accept-language
+
+
+
+ Typical use:
+
+ Pretend to use different language settings.
+
+
-# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
-#
-s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
-
+
+ Effect:
+
+
+ Deletes or replaces the Accept-Language:
HTTP header in client requests.
+
+
+
-
- Following the header line and a comment, you see the job. Note that it uses
- | as the delimiter instead of / , because
- the pattern contains a forward slash, which would otherwise have to be escaped
- by a backslash (\ ).
-
+
+ Type:
+
+
+ Parameterized.
+
+
-
- Now, let's examine the pattern: it starts with the text <script.*
- enclosed in parentheses. Since the dot matches any character, and *
- means: Match an arbitrary number of the element left of myself
, this
- matches <script
, followed by any text, i.e.
- it matches the whole page, from the start of the first <script> tag.
-
+
+ Parameter:
+
+
+ Keyword: block
, or any user defined value.
+
+
+
-
- That's more than we want, but the pattern continues: document\.referrer
- matches only the exact string document.referrer
. The dot needed to
- be escaped , i.e. preceded by a backslash, to take away its
- special meaning as a joker, and make it just a regular dot. So far, the meaning is:
- Match from the start of the first <script> tag in a the page, up to, and including,
- the text document.referrer
, if both are present
- in the page (and appear in that order).
-
+
+ Notes:
+
+
+ Faking the browser's language settings can be useful to make a
+ foreign User-Agent set with
+ hide-user-agent
+ more believable.
+
+
+ However some sites with content in different languages check the
+ Accept-Language:
to decide which one to take by default.
+ Sometimes it isn't possible to later switch to another language without
+ changing the Accept-Language:
header first.
+
+
+ Therefore it's a good idea to either only change the
+ Accept-Language:
header to languages you understand,
+ or to languages that aren't wide spread.
+
+
+ Before setting the Accept-Language:
header
+ to a rare language, you should consider that it helps to
+ make your requests unique and thus easier to trace.
+ If you don't plan to change this header frequently,
+ you should stick to a common language.
+
+
+
-
- But there's still more pattern to go. The next element, again enclosed in parentheses,
- is .*</script> . You already know what .*
- means, so the whole pattern translates to: Match from the start of the first <script>
- tag in a page to the end of the last <script> tag, provided that the text
- document.referrer
appears somewhere in between.
-
+
+ Example usage (section):
+
+
+ # Pretend to use Canadian language settings.
+{+hide-accept-language{en-ca} \
++hide-user-agent{Mozilla/5.0 (X11; U; OpenBSD i386; en-CA; rv:1.8.0.4) Gecko/20060628 Firefox/1.5.0.4} \
+}
+/
+
+
+
+
+
-
- This is still not the whole story, since we have ignored the options and the parentheses:
- The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
- remembered and be available through the variables $1, $2, ... in
- the substitute. The U option switches to ungreedy matching, which means
- that the first .* in the pattern will only eat up
all
- text in between <script
and the first occurrence
- of document.referrer
, and that the second .* will
- only span the text up to the first </script>
- tag. Furthermore, the s option says that the match may span
- multiple lines in the page, and the g option again means that the
- substitution is global.
-
-
- So, to summarize, the pattern means: Match all scripts that contain the text
- document.referrer
. Remember the parts of the script from
- (and including) the start tag up to (and excluding) the string
- document.referrer
as $1 , and the part following
- that string, up to and including the closing tag, as $2 .
-
+
+
+hide-content-disposition
+
+
+
+ Typical use:
+
+ Prevent download menus for content you prefer to view inside the browser.
+
+
-
- Now the pattern is deciphered, but wasn't this about substituting things? So
- lets look at the substitute: $1"Not Your Business!"$2 is
- easy to read: The text remembered as $1 , followed by
- "Not Your Business!" (including
- the quotation marks!), followed by the text remembered as $2 .
- This produces an exact copy of the original string, with the middle part
- (the document.referrer
) replaced by "Not Your
- Business!" .
-
+
+ Effect:
+
+
+ Deletes or replaces the Content-Disposition:
HTTP header set by some servers.
+
+
+
-
- The whole job now reads: Replace document.referrer
by
- "Not Your Business!" wherever it appears inside a
- <script> tag. Note that this job won't break JavaScript syntax,
- since both the original and the replacement are syntactically valid
- string objects. The script just won't have access to the referrer
- information anymore.
-
+
+ Type:
+
+
+ Parameterized.
+
+
-
- We'll show you two other jobs from the JavaScript taming department, but
- this time only point out the constructs of special interest:
-
+
+ Parameter:
+
+
+ Keyword: block
, or any user defined value.
+
+
+
-
-
-# The status bar is for displaying link targets, not pointless blahblah
-#
-s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
-
+
+ Notes:
+
+
+ Some servers set the Content-Disposition:
HTTP header for
+ documents they assume you want to save locally before viewing them.
+ The Content-Disposition:
header contains the file name
+ the browser is supposed to use by default.
+
+
+ In most browsers that understand this header, it makes it impossible to
+ just view the document, without downloading it first,
+ even if it's just a simple text file or an image.
+
+
+ Removing the Content-Disposition:
header helps
+ to prevent this annoyance, but some browsers additionally check the
+ Content-Type:
header, before they decide if they can
+ display a document without saving it first. In these cases, you have
+ to change this header as well, before the browser stops displaying
+ download menus.
+
+
+ It is also possible to change the server's file name suggestion
+ to another one, but in most cases it isn't worth the time to set
+ it up.
+
+
+ This action will probably be removed in the future,
+ use server-header filters instead.
+
+
+
-
- \s stands for whitespace characters (space, tab, newline,
- carriage return, form feed), so that \s* means: zero
- or more whitespace
. The ? in .*?
- makes this matching of arbitrary text ungreedy. (Note that the U
- option is not set). The ['"] construct means: a single
- or a double quote
. Finally, \1 is
- a back-reference to the first parenthesis just like $1 above,
- with the difference that in the pattern , a backslash indicates
- a back-reference, whereas in the substitute , it's the dollar.
-
+
+ Example usage:
+
+
+ # Disarm the download link in Sourceforge's patch tracker
+{ -filter \
+ +content-type-overwrite{text/plain}\
+ +hide-content-disposition{block} }
+ .sourceforge.net/tracker/download\.php
+
+
+
+
+
-
- So what does this job do? It replaces assignments of single- or double-quoted
- strings to the window.status
object with a dummy assignment
- (using a variable name that is hopefully odd enough not to conflict with
- real variables in scripts). Thus, it catches many cases where e.g. pointless
- descriptions are displayed in the status bar instead of the link target when
- you move your mouse over links.
-
-
-
-# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
-#
-s/(<body [^>]*)onunload(.*>)/$1never$2/iU
-
+
+
+hide-if-modified-since
+
+
+
+ Typical use:
+
+ Prevent yet another way to track the user's steps between sessions.
+
+
-
- Including the
- OnUnload
- event binding in the HTML DOM was a CRIME .
- When I close a browser window, I want it to close and die. Basta.
- This job replaces the onunload
attribute in
- <body>
tags with the dummy word never .
- Note that the i option makes the pattern matching
- case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
- a minimal match: In the first parenthesis, we had to use [^>]*
- instead of .* to prevent the match from exceeding the
- <body> tag if it doesn't contain OnUnload
, but the page's
- content does.
-
+
+ Effect:
+
+
+ Deletes the If-Modified-Since:
HTTP client header or modifies its value.
+
+
+
-
- The last example is from the fun department:
-
+
+ Type:
+
+
+ Parameterized.
+
+
-
-
-FILTER: fun Fun text replacements
+
+ Parameter:
+
+
+ Keyword: block
, or a user defined value that specifies a range of hours.
+
+
+
-# Spice the daily news:
-#
-s/microsoft(?!\.com)/MicroSuck/ig
-
+
+ Notes:
+
+
+ Removing this header is useful for filter testing, where you want to force a real
+ reload instead of getting status code 304
, which would cause the
+ browser to use a cached copy of the page.
+
+
+ Instead of removing the header, hide-if-modified-since can
+ also add or subtract a random amount of time to/from the header's value.
+ You specify a range of minutes where the random factor should be chosen from and
+ Privoxy does the rest. A negative value means
+ subtracting, a positive value adding.
+
+
+ Randomizing the value of the If-Modified-Since:
makes
+ it less likely that the server can use the time as a cookie replacement,
+ but you will run into caching problems if the random range is too high.
+
+
+ It is a good idea to only use a small negative value and let
+ overwrite-last-modified
+ handle the greater changes.
+
+
+ It is also recommended to use this action together with
+ crunch-if-none-match ,
+ otherwise it's more or less pointless.
+
+
+
-
- Note the (?!\.com) part (a so-called negative lookahead)
- in the job's pattern, which means: Don't match, if the string
- .com
appears directly following microsoft
- in the page. This prevents links to microsoft.com from being trashed, while
- still replacing the word everywhere else.
-
+
+ Example usage (section):
+
+
+ # Let the browser revalidate but make tracking based on the time less likely.
+{+hide-if-modified-since{-60} \
+ +overwrite-last-modified{randomize} \
+ +crunch-if-none-match}
+/
+
+
+
+
+
-
-
-# Buzzword Bingo (example for extended regex syntax)
-#
-s* industry[ -]leading \
-| cutting[ -]edge \
-| customer[ -]focused \
-| market[ -]driven \
-| award[ -]winning # Comments are OK, too! \
-| high[ -]performance \
-| solutions[ -]based \
-| unmatched \
-| unparalleled \
-| unrivalled \
-*<font color="red"><b>BINGO!</b></font> \
-*igx
-
-
- The x option in this job turns on extended syntax, and allows for
- e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
-
+
+
-
-The distribution default.filter file contains a selection of
-pre-defined filters for your convenience:
-
+
+
+hide-referrer
+
- js-annoyances
+ Typical use:
+
+ Conceal which link you followed to get to a particular site
+
+
+
+
+ Effect:
- The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
- To that end, it
+ Deletes the Referer:
(sic) HTTP header from the client request,
+ or replaces it with a forged one.
+
+
+
+
+
+ Type:
+
+
+ Parameterized.
+
+
+
+
+ Parameter:
+
-
- replaces JavaScript references to the browser's referrer information
- with the string "Not Your Business!". This compliments the hide-referrer action on the content level.
-
+ conditional-block
to delete the header completely if the host has changed.
-
- removes the bindings to the DOM's
- unload
- event which we feel has no right to exist and is responsible for most exit consoles
, i.e.
- nasty windows that pop up when you close another one.
-
+ conditional-forge
to forge the header if the host has changed.
-
- removes code that causes new windows to be opened with undesired properties, such as being
- full-screen, non-resizeable, without location, status or menu bar etc.
-
+ block
to delete the header unconditionally.
+
+
+ forge
to pretend to be coming from the homepage of the server we are talking to.
+
+
+ Any other string to set a user defined referrer.
+
+
+
+
+ Notes:
+
+
+ conditional-block is the only parameter,
+ that isn't easily detected in the server's log file. If it blocks the
+ referrer, the request will look like the visitor used a bookmark or
+ typed in the address directly.
- Use with caution. This is an aggressive filter, and can break sites that
- rely heavily on JavaScript.
+ Leaving the referrer unmodified for requests on the same host
+ allows the server owner to see the visitor's click path
,
+ but in most cases she could also get that information by comparing
+ other parts of the log file: for example the User-Agent if it isn't
+ a very common one, or the user's IP address if it doesn't change between
+ different requests.
+
+
+ Always blocking the referrer, or using a custom one, can lead to
+ failures on servers that check the referrer before they answer any
+ requests, in an attempt to prevent their content from being
+ embedded or linked to elsewhere.
+
+
+ Both conditional-block and forge
+ will work with referrer checks, as long as content and valid referring page
+ are on the same host. Most of the time that's the case.
+
+
+ hide-referer is an alternate spelling of
+ hide-referrer and the two can be can be freely
+ substituted with each other. (referrer
is the
+ correct English spelling, however the HTTP specification has a bug - it
+ requires it to be spelled as referer
.)
- js-events
+ Example usage:
- This is a very radical measure. It removes virtually all JavaScript event bindings, which
- means that scripts can not react to user actions such as mouse movements or clicks, window
- resizing etc, anymore. Use with caution!
-
-
- We strongly discourage using this filter as a default since it breaks
- many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
- need to go there).
+ +hide-referrer{forge} or
+ +hide-referrer{http://www.yahoo.com/}
+
+
-
- html-annoyances
+
+
+
+hide-user-agent
+
+
+
+ Typical use:
+
+ Try to conceal your type of browser and client operating system
+
+
+
+
+ Effect:
- This filter will undo many common instances of HTML based abuse.
+ Replaces the value of the User-Agent:
HTTP header
+ in client requests with the specified value.
+
+
+
+
+ Type:
+
+
+ Parameterized.
+
+
+
+
+ Parameter:
+
- The BLINK and MARQUEE tags
- are neutralized (yeah baby!), and browser windows will be created as
- resizeable (as of course they should be!), and will have location,
- scroll and menu bars -- even if specified otherwise.
+ Any user-defined string.
- content-cookies
+ Notes:
+
+
+ This can lead to problems on web sites that depend on looking at this header in
+ order to customize their content for different browsers (which, by the
+ way, is NOT the right thing to do: good web sites
+ work browser-independently).
+
+
- Most cookies are set in the HTTP dialog, where they can be intercepted
- by the
- crunch-incoming-cookies
- and crunch-outgoing-cookies
- actions. But web sites increasingly make use of HTML meta tags and JavaScript
- to sneak cookies to the browser on the content level.
+ Using this action in multi-user setups or wherever different types of
+ browsers will access the same Privoxy is
+ not recommended . In single-user, single-browser
+ setups, you might use it to delete your OS version information from
+ the headers, because it is an invitation to exploit known bugs for your
+ OS. It is also occasionally useful to forge this in order to access
+ sites that won't let you in otherwise (though there may be a good
+ reason in some cases).
- This filter disables most HTML and JavaScript code that reads or sets
- cookies. It cannot detect all clever uses of these types of code, so it
- should not be relied on as an absolute fix. Use it wherever you would also
- use the cookie crunch actions.
+ More information on known user-agent strings can be found at
+ http://www.user-agents.org/
+ and
+ http://en.wikipedia.org/wiki/User_agent .
+
+
+
+
+
+ Example usage:
+
+
+ +hide-user-agent{Netscape 6.1 (X11; I; Linux 2.4.18 i686)}
+
+
+
+
+
+
+limit-connect
+
- refresh tags
+ Typical use:
-
- Disable any refresh tags if the interval is greater than nine seconds (so
- that redirections done via refresh tags are not destroyed). This is useful
- for dial-on-demand setups, or for those who find this HTML feature
- annoying.
-
+ Prevent abuse of Privoxy as a TCP proxy relay or disable SSL for untrusted sites
- unsolicited-popups
+ Effect:
- This filter attempts to prevent only unsolicited
pop-up
- windows from opening, yet still allow pop-up windows that the user
- has explicitly chosen to open. It was added in version 3.0.1,
- as an improvement over earlier such filters.
-
-
- Technical note: The filter works by redefining the window.open JavaScript
- function to a dummy function, PrivoxyWindowOpen() ,
- during the loading and rendering phase of each HTML page access, and
- restoring the function afterward.
-
-
- This is recommended only for browsers that cannot perform this function
- reliably themselves. And be aware that some sites require such windows
- in order to function normally. Use with caution.
+ Specifies to which ports HTTP CONNECT requests are allowable.
- all-popups
+ Type:
+
-
- Attempt to prevent all pop-up windows from opening.
- Note this should be used with even more discretion than the above, since
- it is more likely to break some sites that require pop-ups for normal
- usage. Use with caution.
-
+ Parameterized.
- img-reorder
+ Parameter:
- This is a helper filter that has no value if used alone. It makes the
- banners-by-size and banners-by-link
- (see below) filters more effective and should be enabled together with them.
+ A comma-separated list of ports or port ranges (the latter using dashes, with the minimum
+ defaulting to 0 and the maximum to 65K).
- banners-by-size
+ Notes:
- This filter removes image tags purely based on what size they are. Fortunately
- for us, many ads and banner images tend to conform to certain standardized
- sizes, which makes this filter quite effective for ad stripping purposes.
-
-
- Occasionally this filter will cause false positives on images that are not ads,
- but just happen to be of one of the standard banner sizes.
+ By default, i.e. if no limit-connect action applies,
+ Privoxy allows HTTP CONNECT requests to all
+ ports. Use limit-connect if fine-grained control
+ is desired for some or all destinations.
- Recommended only for those who require extreme ad blocking. The default
- block rules should catch 95+% of all ads without this filter enabled.
-
+ The CONNECT methods exists in HTTP to allow access to secure websites
+ (https://
URLs) through proxies. It works very simply:
+ the proxy connects to the server on the specified port, and then
+ short-circuits its connections to the client and to the remote server.
+ This means CONNECT-enabled proxies can be used as TCP relays very easily.
+
+
+ Privoxy relays HTTPS traffic without seeing
+ the decoded content. Websites can leverage this limitation to circumvent &my-app;'s
+ filters. By specifying an invalid port range you can disable HTTPS entirely.
+
- banners-by-link
+ Example usages:
-
- This is an experimental filter that attempts to kill any banners if
- their URLs seem to point to known or suspected click trackers. It is currently
- not of much value and is not recommended for use by default.
+
+
+
+
+ +limit-connect{443} # Port 443 is OK.
++limit-connect{80,443} # Ports 80 and 443 are OK.
++limit-connect{-3, 7, 20-100, 500-} # Ports less than 3, 7, 20 to 100 and above 500 are OK.
++limit-connect{-} # All ports are OK
++limit-connect{,} # No HTTPS/SSL traffic is allowed
+
+
+
+
+
+
+limit-cookie-lifetime
+
- webbugs
+ Typical use:
-
- Webbugs are small, invisible images (technically 1X1 GIF images), that
- are used to track users across websites, and collect information on them.
- As an HTML page is loaded by the browser, an embedded image tag causes the
- browser to contact a third-party site, disclosing the tracking information
- through the requested URL and/or cookies for that third-party domain, without
- the user ever becoming aware of the interaction with the third-party site.
- HTML-ized spam also uses a similar technique to verify email addresses.
-
-
- This filter removes the HTML code that loads such webbugs
.
-
+ Limit the lifetime of HTTP cookies to a couple of minutes or hours.
- tiny-textforms
+ Effect:
- A rather special-purpose filter that can be used to enlarge textareas (those
- multi-line text boxes in web forms) and turn off hard word wrap in them.
- It was written for the sourceforge.net tracker system where such boxes are
- a nuisance, but it can be handy on other sites, too.
-
-
- It is not recommended to use this filter as a default.
+ Overwrites the expires field in Set-Cookie server headers if it's above the specified limit.
- jumping-windows
+ Type:
+
-
- Many consider windows that move, or resize themselves to be abusive. This filter
- neutralizes the related JavaScript code. Note that some sites might not display
- or behave as intended when using this filter. Use with caution.
-
+ Parameterized.
- frameset-borders
+ Parameter:
- Some web designers seem to assume that everyone in the world will view their
- web sites using the same browser brand and version, screen resolution etc,
- because only that assumption could explain why they'd use static frame sizes,
- yet prevent their frames from being resized by the user, should they be too
- small to show their whole content.
-
-
- This filter removes the related HTML code. It should only be applied to sites
- which need it.
+ The lifetime limit in minutes, or 0.
- demoronizer
+ Notes:
- Many Microsoft products that generate HTML use non-standard extensions (read:
- violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
- HTML documents to display with errors on standard-compliant platforms.
+ This action reduces the lifetime of HTTP cookies coming from the
+ server to the specified number of minutes, starting from the time
+ the cookie passes Privoxy.
- This filter translates the MS-only characters into Latin-1 equivalents.
- It is not necessary when using MS products, and will cause corruption of
- all documents that use 8-bit character sets other than Latin-1. It's mostly
- worthwhile for Europeans on non-MS platforms, if weird garbage characters
- sometimes appear on some pages, or user agents that don't correct for this on
- the fly.
-
+ Cookies with a lifetime below the limit are not modified.
+ The lifetime of session cookies is set to the specified limit.
+
+
+ The effect of this action depends on the server.
+
+
+ In case of servers which refresh their cookies with each response
+ (or at least frequently), the lifetime limit set by this action
+ is updated as well.
+ Thus, a session associated with the cookie continues to work with
+ this action enabled, as long as a new request is made before the
+ last limit set is reached.
+
+
+ However, some servers send their cookies once, with a lifetime of several
+ years (the year 2037 is a popular choice), and do not refresh them
+ until a certain event in the future, for example the user logging out.
+ In this case this action may limit the absolute lifetime of the session,
+ even if requests are made frequently.
+
+
+ If the parameter is 0
, this action behaves like
+ session-cookies-only .
- shockwave-flash
+ Example usages:
-
- A filter for shockwave haters. As the name suggests, this filter strips code
- out of web pages that is used to embed shockwave flash objects.
-
-
+
+ +limit-cookie-lifetime{60}
+
+
+
+
+
+
+prevent-compression
+
- quicktime-kioskmode
+ Typical use:
- Change HTML code that embeds Quicktime objects so that kioskmode, which
- prevents saving, is disabled.
+ Ensure that servers send the content uncompressed, so it can be
+ passed through filter s.
- fun
+ Effect:
- Text replacements for subversive browsing fun. Make fun of your favorite
- Monopolist or play buzzword bingo.
+ Removes the Accept-Encoding header which can be used to ask for compressed transfer.
- crude-parental
+ Type:
+
-
- A demonstration-only filter that shows how Privoxy
- can be used to delete web content on a keyword basis.
-
+ Boolean.
- ie-exploits
+ Parameter:
- An experimental collection of text replacements to disable malicious HTML and JavaScript
- code that exploits known security holes in Internet Explorer.
-
-
- Presently, it only protects against Nimda and a cross-site scripting bug, and
- would need active maintenance to provide more substantial protection.
+ N/A
- site-specifics
+ Notes:
- Some web sites have very specific problems, the cure for which doesn't apply
- anywhere else, or could even cause damage on other sites.
+ More and more websites send their content compressed by default, which
+ is generally a good idea and saves bandwidth. But the filter and
+ deanimate-gifs
+ actions need access to the uncompressed data.
- This is a collection of such site-specific cures which should only be applied
- to the sites they were intended for, which is what the supplied
- default.action file does. Users shouldn't need to change
- anything regarding this filter.
+ When compiled with zlib support (available since &my-app; 3.0.7), content that should be
+ filtered is decompressed on-the-fly and you don't have to worry about this action.
+ If you are using an older &my-app; version, or one that hasn't been compiled with zlib
+ support, this action can be used to convince the server to send the content uncompressed.
+
+
+ Most text-based instances compress very well, the size is seldom decreased by less than 50%,
+ for markup-heavy instances like news feeds saving more than 90% of the original size isn't
+ unusual.
+
+
+ Not using compression will therefore slow down the transfer, and you should only
+ enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all
+ predefined action settings.
+
+
+ Note that some (rare) ill-configured sites don't handle requests for uncompressed
+ documents correctly. Broken PHP applications tend to send an empty document body,
+ some IIS versions only send the beginning of the content. If you enable
+ prevent-compression per default, you might want to add
+ exceptions for those sites. See the example for how to do that.
- google
+ Example usage (sections):
- A CSS based block for Google text ads. Also removes a width limitation
- and the toolbar advertisement.
-
-
-
+
+# Selectively turn off compression, and enable a filter
+#
+{ +filter{tiny-textforms} +prevent-compression }
+# Match only these sites
+ .google.
+ sourceforge.net
+ sf.net
+
+# Or instead, we could set a universal default:
+#
+{ +prevent-compression }
+ / # Match all sites
-
- yahoo
-
-
- Another CSS based block, this time for Yahoo text ads. And removes
- a width limitation as well.
+# Then maybe make exceptions for broken sites:
+#
+{ -prevent-compression }
+.compusa.com/
-
- msn
-
-
- Another CSS based block, this time for MSN text ads. And removes
- tracking URLs, as well as a width limitation.
-
-
-
+
+
+
+
+
+overwrite-last-modified
+
+
- blogspot
+ Typical use:
-
- Cleans up some Blogspot blogs. Read the fine print before using this one!
-
-
- This filter also intentionally removes some navigation stuff and sets the
- page width to 100%. As a result, some rounded corners
would
- appear to early or not at all and as fixing this would require a browser
- that understands background-size (CSS3), they are removed instead.
-
+ Prevent yet another way to track the user's steps between sessions.
-
- xml-to-html
+
+ Effect:
- Server-header filter to change the Content-Type from xml to html.
+ Deletes the Last-Modified:
HTTP server header or modifies its value.
-
- html-to-xml
+
+ Type:
+
-
- Server-header filter to change the Content-Type from html to xml.
-
+ Parameterized.
-
- no-ping
+
+ Parameter:
- Removes the non-standard ping attribute from
- anchor and area HTML tags.
+ One of the keywords: block
, reset-to-request-time
+ and randomize
-
- hide-tor-exit-notation
+
+ Notes:
- Client-header filter to remove the Tor exit node notation
- found in Host and Referer headers.
+ Removing the Last-Modified:
header is useful for filter
+ testing, where you want to force a real reload instead of getting status
+ code 304
, which would cause the browser to reuse the old
+ version of the page.
- If &my-app; and Tor are chained and &my-app;
- is configured to use socks4a, one can use http://www.example.org.foobar.exit/
- to access the host www.example.org
through the
- Tor exit node foobar
.
+ The randomize
option overwrites the value of the
+ Last-Modified:
header with a randomly chosen time
+ between the original value and the current time. In theory the server
+ could send each document with a different Last-Modified:
+ header to track visits without using cookies. Randomize
+ makes it impossible and the browser can still revalidate cached documents.
- As the HTTP client isn't aware of this notation, it treats the
- whole string www.example.org.foobar.exit
as host and uses it
- for the Host
and Referer
headers. From the
- server's point of view the resulting headers are invalid and can cause problems.
+ reset-to-request-time
overwrites the value of the
+ Last-Modified:
header with the current time. You could use
+ this option together with
+ hide-if-modified-since
+ to further customize your random range.
- An invalid Referer
header can trigger hot-linking
- protections, an invalid Host
header will make it impossible for
- the server to find the right vhost (several domains hosted on the same IP address).
+ The preferred parameter here is randomize
. It is safe
+ to use, as long as the time settings are more or less correct.
+ If the server sets the Last-Modified:
header to the time
+ of the request, the random range becomes zero and the value stays the same.
+ Therefore you should later randomize it a second time with
+ hided-if-modified-since ,
+ just to be sure.
- This client-header filter removes the foo.exit
part in those headers
- to prevent the mentioned problems. Note that it only modifies
- the HTTP headers, it doesn't make it impossible for the server
- to detect your Tor exit node based on the IP address
- the request is coming from.
+ It is also recommended to use this action together with
+ crunch-if-none-match .
-
-
-
-
-
-
-
-
-
-
-
-
-Privoxy's Template Files
-
- All Privoxy built-in pages, i.e. error pages such as the
- 404 - No Such Domain
- error page , the BLOCKED
- page
- and all pages of its web-based
- user interface , are generated from templates .
- (Privoxy must be running for the above links to work as
- intended.)
-
-
-
- These templates are stored in a subdirectory of the configuration
- directory called templates . On Unixish platforms,
- this is typically
- /etc/privoxy/templates/ .
-
-
-
- The templates are basically normal HTML files, but with place-holders (called symbols
- or exports), which Privoxy fills at run time. It
- is possible to edit the templates with a normal text editor, should you want
- to customize them. (Not recommended for the casual
- user ). Should you create your own custom templates, you should use
- the config setting templdir
- to specify an alternate location, so your templates do not get overwritten
- during upgrades.
-
-
- Note that just like in configuration files, lines starting
- with # are ignored when the templates are filled in.
-
-
-
- The place-holders are of the form @name@ , and you will
- find a list of available symbols, which vary from template to template,
- in the comments at the start of each file. Note that these comments are not
- always accurate, and that it's probably best to look at the existing HTML
- code to find out which symbols are supported and what they are filled in with.
-
-
-
- A special application of this substitution mechanism is to make whole
- blocks of HTML code disappear when a specific symbol is set. We use this
- for many purposes, one of them being to include the beta warning in all
- our user interface (CGI) pages when Privoxy
- is in an alpha or beta development stage:
-
-
-
-
-<!-- @if-unstable-start -->
-
- ... beta warning HTML code goes here ...
-
-<!-- if-unstable-end@ -->
-
-
-
- If the "unstable" symbol is set, everything in between and including
- @if-unstable-start and if-unstable-end@
- will disappear, leaving nothing but an empty comment:
-
-
-
- <!-- -->
-
-
-
- There's also an if-then-else construct and an #include
- mechanism, but you'll sure find out if you are inclined to edit the
- templates ;-)
-
-
-
- All templates refer to a style located at
- http://config.privoxy.org/send-stylesheet .
- This is, of course, locally served by Privoxy
- and the source for it can be found and edited in the
- cgi-style.css template.
-
-
-
-
-
-
+
+
+redirect
+
+
+
+ Typical use:
+
+
+ Redirect requests to other sites.
+
+
+
-Contacting the Developers, Bug Reporting and Feature
-Requests
-
-
- &contacting;
-
+
+ Effect:
+
+
+ Convinces the browser that the requested document has been moved
+ to another location and the browser should get it from there.
+
+
+
-
+
+ Type:
+
+
+ Parameterized
+
+
-
+
+ Parameter:
+
+
+ An absolute URL or a single pcrs command.
+
+
+
+
+ Notes:
+
+
+ Requests to which this action applies are answered with a
+ HTTP redirect to URLs of your choosing. The new URL is
+ either provided as parameter, or derived by applying a
+ single pcrs command to the original URL.
+
+
+ The syntax for pcrs commands is documented in the
+ filter file section.
+
+
+ Requests can't be blocked and redirected at the same time,
+ applying this action together with
+ block
+ is a configuration error. Currently the request is blocked
+ and an error message logged, the behavior may change in the
+ future and result in Privoxy rejecting the action file.
+
+
+ This action can be combined with
+ fast-redirects{check-decoded-url}
+ to redirect to a decoded version of a rewritten URL.
+
+
+ Use this action carefully, make sure not to create redirection loops
+ and be aware that using your own redirects might make it
+ possible to fingerprint your requests.
+
+
+ In case of problems with your redirects, or simply to watch
+ them working, enable debug 128.
+
+
+
-
-Privoxy Copyright, License and History
+
+ Example usages:
+
+
+ # Replace example.com's style sheet with another one
+{ +redirect{http://localhost/css-replacements/example.com.css} }
+ example.com/stylesheet\.css
-
- ©right;
-
+# Create a short, easy to remember nickname for a favorite site
+# (relies on the browser to accept and forward invalid URLs to &my-app;)
+{ +redirect{https://www.privoxy.org/user-manual/actions-file.html} }
+ a
-
-License
-
- &license;
-
-
-
+# Always use the expanded view for Undeadly.org articles
+# (Note the $ at the end of the URL pattern to make sure
+# the request for the rewritten URL isn't redirected as well)
+{+redirect{s@$@&mode=expanded@}}
+undeadly.org/cgi\?action=article&sid=\d*$
+# Redirect Google search requests to MSN
+{+redirect{s@^http://[^/]*/search\?q=([^&]*).*@http://search.msn.com/results.aspx?q=$1@}}
+.google.com/search
-
+# Redirect MSN search requests to Yahoo
+{+redirect{s@^http://[^/]*/results\.aspx\?q=([^&]*).*@http://search.yahoo.com/search?p=$1@}}
+search.msn.com//results\.aspx\?q=
-History
-
- &history;
-
-
+# Redirect http://example.com/&bla=fasel&toChange=foo (and any other value but "bar")
+# to http://example.com/&bla=fasel&toChange=bar
+#
+# The URL pattern makes sure that the following request isn't redirected again.
+{+redirect{s@toChange=[^&]+@toChange=bar@}}
+example.com/.*toChange=(?!bar)
-Authors
-
- &p-authors;
-
-
+# Add a shortcut to look up illumos bugs
+{+redirect{s@^http://i([0-9]+)/.*@https://www.illumos.org/issues/$1@}}
+# Redirected URL = http://i4974/
+# Redirect Destination = https://www.illumos.org/issues/4974
+i[0-9][0-9][0-9][0-9]*/
-
+# Redirect remote requests for this manual
+# to the local version delivered by Privoxy
+{+redirect{s@^http://www@http://config@}}
+www.privoxy.org/user-manual/
+
+
+
-
+
+
-See Also
-
- &seealso;
-
-
+
-
- Regular expressions do essentially the same thing, but are much, much more
- powerful. There are many more special characters
and ways of
- building complex patterns however. Let's look at a few of the common ones,
- and then some examples:
-
-
-
- . - Matches any single character, e.g. a
,
- A
, 4
, :
, or @
.
-
-
+
+
-
- And now something a little more complex:
-
-
- /.*/adv((er)?ts?|ertis(ing|ements?))?/ -
- We have several literal forward slashes again (/
), so we are
- building another expression that is a file path statement. We have another
- .*
, so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that must
- match our pattern is adv , together with
- the forward slashes. What comes after the adv
string is the
- interesting part.
-
+
+
+session-cookies-only
-
- Remember the ?
means the preceding expression (either a
- literal character or anything grouped with (...)
in this case)
- can exist or not, since this means either zero or one match. So
- ((er)?ts?|ertis(ing|ements?))
is optional, as are the
- individual sub-expressions: (er)
,
- (ing|ements?)
, and the s
. The |
- means or
. We have two of those. For instance,
- (ing|ements?)
, can expand to match either ing
- OR ements?
. What is being done here, is an
- attempt at matching as many variations of advertisement
, and
- similar, as possible. So this would expand to match just adv
,
- or advert
, or adverts
, or
- advertising
, or advertisement
, or
- advertisements
. You get the idea. But it would not match
- advertizements
(with a z
). We could fix that by
- changing our regular expression to:
- /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/
, which would then match
- either spelling.
-
+
+
+ Typical use:
+
+
+ Allow only temporary session
cookies (for the current
+ browser session only ).
+
+
+
-
- /.*/advert[0-9]+\.(gif|jpe?g) - Again
- another path statement with forward slashes. Anything in the square brackets
- [ ]
can be matched. This is using 0-9
as a
- shorthand expression to mean any digit one through nine. It is the same as
- saying 0123456789
. So any digit matches. The +
- means one or more of the preceding expression must be included. The preceding
- expression here is what is in the square brackets -- in this case, any digit
- one through nine. Then, at the end, we have a grouping: (gif|jpe?g)
.
- This includes a |
, so this needs to match the expression on
- either side of that bar character also. A simple gif
on one side, and the other
- side will in turn match either jpeg
or jpg
,
- since the ?
means the letter e
is optional and
- can be matched once or not at all. So we are building an expression here to
- match image GIF or JPEG type image file. It must include the literal
- string advert
, then one or more digits, and a .
- (which is now a literal, and not a special character, since it is escaped
- with \
), and lastly either gif
, or
- jpeg
, or jpg
. Some possible matches would
- include: //advert1.jpg
,
- /nasty/ads/advert1234.gif
,
- /banners/from/hell/advert99.jpg
. It would not match
- advert1.gif
(no leading slash), or
- /adverts232.jpg
(the expression does not include an
- s
), or /advert1.jsp
(jsp
is not
- in the expression anywhere).
-
+
+ Effect:
+
+
+ Deletes the expires
field from Set-Cookie:
+ server headers. Most browsers will not store such cookies permanently and
+ forget them in between sessions.
+
+
+
-
- We are barely scratching the surface of regular expressions here so that you
- can understand the default Privoxy
- configuration files, and maybe use this knowledge to customize your own
- installation. There is much, much more that can be done with regular
- expressions. Now that you know enough to get started, you can learn more on
- your own :/
-
+
+ Type:
+
+
+ Boolean.
+
+
-
- More reading on Perl Compatible Regular expressions:
- http://perldoc.perl.org/perlre.html
-
+
+ Parameter:
+
+
+ N/A
+
+
+
-
- For information on regular expression based substitutions and their applications
- in filters, please see the filter file tutorial
- in this manual.
-
-
+
+ Notes:
+
+
+ This is less strict than crunch-incoming-cookies /
+ crunch-outgoing-cookies and allows you to browse
+ websites that insist or rely on setting cookies, without compromising your privacy too badly.
+
+
+ Most browsers will not permanently store cookies that have been processed by
+ session-cookies-only and will forget about them between sessions.
+ This makes profiling cookies useless, but won't break sites which require cookies so
+ that you can log in for transactions. This is generally turned on for all
+ sites, and is the recommended setting.
+
+
+ It makes no sense at all to use session-cookies-only
+ together with crunch-incoming-cookies or
+ crunch-outgoing-cookies . If you do, cookies
+ will be plainly killed.
+
+
+ Note that it is up to the browser how it handles such cookies without an expires
+ field. If you use an exotic browser, you might want to try it out to be sure.
+
+
+ This setting also has no effect on cookies that may have been stored
+ previously by the browser before starting Privoxy .
+ These would have to be removed manually.
+
+
+ Privoxy also uses
+ the content-cookies filter
+ to block some types of cookies. Content cookies are not effected by
+ session-cookies-only .
+
+
+
-
+
+ Example usage:
+
+
+ +session-cookies-only
+
+
+
+
+
-
-Privoxy's Internal Pages
+
+set-image-blocker
-
- Since Privoxy proxies each requested
- web page, it is easy for Privoxy to
- trap certain special URLs. In this way, we can talk directly to
- Privoxy , and see how it is
- configured, see how our rules are being applied, change these
- rules and other configuration options, and even turn
- Privoxy's filtering off, all with
- a web browser.
+
+
+ Typical use:
+
+ Choose the replacement for blocked images
+
+
-
+
+ Effect:
+
+
+ This action alone doesn't do anything noticeable. If both
+ block and handle-as-image also
+ apply, i.e. if the request is to be blocked as an image,
+ then the parameter of this action decides what will be
+ sent as a replacement.
+
+
+
-
- The URLs listed below are the special ones that allow direct access
- to Privoxy . Of course,
- Privoxy must be running to access these. If
- not, you will get a friendly error message. Internet access is not
- necessary either.
-
+
+ Type:
+
+
+ Parameterized.
+
+
-
-
+
+ Parameter:
+
+
+
+
+ pattern
to send a built-in checkerboard pattern image. The image is visually
+ decent, scales very well, and makes it obvious where banners were busted.
+
+
+
+
+ blank
to send a built-in transparent image. This makes banners disappear
+ completely, but makes it hard to detect where Privoxy has blocked
+ images on a given page and complicates troubleshooting if Privoxy
+ has blocked innocent images, like navigation icons.
+
+
+
+
+ target-url
to
+ send a redirect to target-url . You can redirect
+ to any image anywhere, even in your local filesystem via file:///
URL.
+ (But note that not all browsers support redirecting to a local file system).
+
+
+ A good application of redirects is to use special Privoxy -built-in
+ URLs, which send the built-in images, as target-url .
+ This has the same visual effect as specifying blank
or pattern
in
+ the first place, but enables your browser to cache the replacement image, instead of requesting
+ it over and over again.
+
+
+
+
+
-
-
- Privoxy main page:
-
-
+
+ Notes:
+
- http://config.privoxy.org/
+ The URLs for the built-in images are http://config.privoxy.org/send-banner?type=type
, where type is
+ either blank
or pattern
.
-
-
- There is a shortcut: http://p.p/ (But it
- doesn't provide a fall-back to a real page, in case the request is not
- sent through Privoxy )
-
-
-
-
-
- Show information about the current configuration, including viewing and
- editing of actions files:
-
-
- http://config.privoxy.org/show-status
+ There is a third (advanced) type, called auto
. It is NOT to be
+ used in set-image-blocker , but meant for use from filters.
+ Auto will select the type of image that would have applied to the referring page, had it been an image.
-
-
+
+
-
-
- Show the source code version numbers:
-
-
+
+ Example usage:
+
- http://config.privoxy.org/show-version
+ Built-in pattern:
-
-
-
-
-
- Show the browser's request headers:
-
-
- http://config.privoxy.org/show-request
+ +set-image-blocker{pattern}
-
-
-
-
-
- Show which actions apply to a URL and why:
-
-
- http://config.privoxy.org/show-url-info
+ Redirect to the BSD daemon:
-
-
-
-
-
- Toggle Privoxy on or off. This feature can be turned off/on in the main
- config file. When toggled off
, Privoxy
- continues to run, but only as a pass-through proxy, with no actions taking
- place:
-
-
- http://config.privoxy.org/toggle
+ +set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif}
-
-
- Short cuts. Turn off, then on:
-
-
- http://config.privoxy.org/toggle?set=disable
+ Redirect to the built-in pattern for better caching:
-
-
- http://config.privoxy.org/toggle?set=enable
+ +set-image-blocker{http://config.privoxy.org/send-banner?type=pattern}
-
-
+
+
+
+
-
-
+
+
+Summary
- These may be bookmarked for quick reference. See next.
+ Note that many of these actions have the potential to cause a page to
+ misbehave, possibly even not to display at all. There are many ways
+ a site designer may choose to design his site, and what HTTP header
+ content, and other criteria, he may depend on. There is no way to have hard
+ and fast rules for all sites. See the Appendix for a brief example on troubleshooting
+ actions.
+
+
+
+
+
+Aliases
+
+ Custom actions
, known to Privoxy
+ as aliases
, can be defined by combining other actions.
+ These can in turn be invoked just like the built-in actions.
+ Currently, an alias name can contain any character except space, tab,
+ =
,
+ {
and }
, but we strongly
+ recommend that you only use a
to z
,
+ 0
to 9
, +
, and -
.
+ Alias names are not case sensitive, and are not required to start with a
+ +
or -
sign, since they are merely textually
+ expanded.
+
+
+ Aliases can be used throughout the actions file, but they must be
+ defined in a special section at the top of the file!
+ And there can only be one such section per actions file. Each actions file may
+ have its own alias section, and the aliases defined in it are only visible
+ within that file.
+
+
+ There are two main reasons to use aliases: One is to save typing for frequently
+ used combinations of actions, the other one is a gain in flexibility: If you
+ decide once how you want to handle shops by defining an alias called
+ shop
, you can later change your policy on shops in
+ one place, and your changes will take effect everywhere
+ in the actions file where the shop
alias is used. Calling aliases
+ by their purpose also makes your actions files more readable.
+
+
+ Currently, there is one big drawback to using aliases, though:
+ Privoxy 's built-in web-based action file
+ editor honors aliases when reading the actions files, but it expands
+ them before writing. So the effects of your aliases are of course preserved,
+ but the aliases themselves are lost when you edit sections that use aliases
+ with it.
-
-Bookmarklets
- Below are some bookmarklets
to allow you to easily access a
- mini
version of some of Privoxy's
- special pages. They are designed for MS Internet Explorer, but should work
- equally well in Netscape, Mozilla, and other browsers which support
- JavaScript. They are designed to run directly from your bookmarks - not by
- clicking the links below (although that should work for testing).
-
-
- To save them, right-click the link and choose Add to Favorites
- (IE) or Add Bookmark
(Netscape). You will get a warning that
- the bookmark may not be safe
- just click OK. Then you can run the
- Bookmarklet directly from your favorites/bookmarks. For even faster access,
- you can put them on the Links
bar (IE) or the Personal
- Toolbar
(Netscape), and run them with a single click.
+ Now let's define some aliases...
-
+
+ # Useful custom aliases we can use later.
+ #
+ # Note the (required!) section header line and that this section
+ # must be at the top of the actions file!
+ #
+ {{alias}}
-
-
- Privoxy - Enable
-
-
+ # These aliases just save typing later:
+ # (Note that some already use other aliases!)
+ #
+ +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
+ -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
+ +block-as-image = +block{Blocked image.} +handle-as-image
+ allow-all-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
-
- Privoxy - Disable
-
-
+ # These aliases define combinations of actions
+ # that are useful for certain types of sites:
+ #
+ fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer - prevent-compression
-
-
- Privoxy - Toggle Privoxy (Toggles between enabled and disabled)
-
-
+ shop = -crunch-all-cookies - filter{all-popups}
-
-
- Privoxy- View Status
-
-
-
-
-
- Privoxy - Why?
-
-
-
+ # Short names for other aliases, for really lazy people ;-)
+ #
+ c0 = +crunch-all-cookies
+ c1 = -crunch-all-cookies
- Credit: The site which gave us the general idea for these bookmarklets is
- www.bookmarklets.com . They
- have more information about bookmarklets.
+ ...and put them to use. These sections would appear in the lower part of an
+ actions file and define exceptions to the default actions (as specified further
+ up for the /
pattern):
+
+
+ # These sites are either very complex or very keen on
+ # user data and require minimal interference to work:
+ #
+ {fragile}
+ .office.microsoft.com
+ .windowsupdate.microsoft.com
+ # Gmail is really mail.google.com, not gmail.com
+ mail.google.com
-
+ # Shopping sites:
+ # Allow cookies (for setting and retrieving your customer data)
+ #
+ {shop}
+ .quietpc.com
+ .worldpay.com # for quietpc.com
+ mybank.example.com
+
+ # These shops require pop-ups:
+ #
+ {-filter{all-popups} -filter{unsolicited-popups}}
+ .dabs.com
+ .overclockers.co.uk
+
+
+ Aliases like shop
and fragile
are typically used for
+ problem
sites that require more than one action to be disabled
+ in order to function properly.
+
+
+
+
+Actions Files Tutorial
+
+ The above chapters have shown which actions files
+ there are and how they are organized, how actions are specified and applied
+ to URLs, how patterns work, and how to
+ define and use aliases. Now, let's look at an
+ example match-all.action , default.action
+ and user.action file and see how all these pieces come together:
+
+
+match-all.action
+
+ Remember all actions are disabled when matching starts ,
+ so we have to explicitly enable the ones we want.
+
-
-
-Chain of Events
- Let's take a quick look at how some of Privoxy's
- core features are triggered, and the ensuing sequence of events when a web
- page is requested by your browser:
+ While the match-all.action file only contains a
+ single section, it is probably the most important one. It has only one
+ pattern, /
, but this pattern
+ matches all URLs. Therefore, the set of
+ actions used in this default
section will
+ be applied to all requests as a start . It can be partly or
+ wholly overridden by other actions files like default.action
+ and user.action , but it will still be largely responsible
+ for your overall browsing experience.
-
-
-
- First, your web browser requests a web page. The browser knows to send
- the request to Privoxy , which will in turn,
- relay the request to the remote web server after passing the following
- tests:
-
-
-
-
- Privoxy traps any request for its own internal CGI
- pages (e.g http://p.p/ ) and sends the CGI page back to the browser.
-
-
-
-
- Next, Privoxy checks to see if the URL
- matches any +block
patterns. If
- so, the URL is then blocked, and the remote web server will not be contacted.
- +handle-as-image
- and
- +handle-as-empty-document
- are then checked, and if there is no match, an
- HTML BLOCKED
page is sent back to the browser. Otherwise, if
- it does match, an image is returned for the former, and an empty text
- document for the latter. The type of image would depend on the setting of
- +set-image-blocker
- (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
-
-
-
-
- Untrusted URLs are blocked. If URLs are being added to the
- trust file, then that is done.
-
-
-
-
- If the URL pattern matches the +fast-redirects
action,
- it is then processed. Unwanted parts of the requested URL are stripped.
-
-
-
-
- Now the rest of the client browser's request headers are processed. If any
- of these match any of the relevant actions (e.g. +hide-user-agent
,
- etc.), headers are suppressed or forged as determined by these actions and
- their parameters.
-
-
-
-
- Now the web server starts sending its response back (i.e. typically a web
- page).
-
-
-
-
- First, the server headers are read and processed to determine, among other
- things, the MIME type (document type) and encoding. The headers are then
- filtered as determined by the
- +crunch-incoming-cookies
,
- +session-cookies-only
,
- and +downgrade-http-version
- actions.
-
-
-
-
- If any +filter
action
- or +deanimate-gifs
- action applies (and the document type fits the action), the rest of the page is
- read into memory (up to a configurable limit). Then the filter rules (from
- default.filter and any other filter files) are
- processed against the buffered content. Filters are applied in the order
- they are specified in one of the filter files. Animated GIFs, if present,
- are reduced to either the first or last frame, depending on the action
- setting.The entire page, which is now filtered, is then sent by
- Privoxy back to your browser.
-
-
- If neither a +filter
action
- or +deanimate-gifs
- matches, then Privoxy passes the raw data through
- to the client browser as it becomes available.
-
-
-
-
- As the browser receives the now (possibly filtered) page content, it
- reads and then requests any URLs that may be embedded within the page
- source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
- frames), sounds, etc. For each of these objects, the browser issues a
- separate request (this is easily viewable in Privoxy's
- logs). And each such request is in turn processed just as above. Note that a
- complex web page will have many, many such embedded URLs. If these
- secondary requests are to a different server, then quite possibly a very
- differing set of actions is triggered.
-
-
+ Again, at the start of matching, all actions are disabled, so there is
+ no need to disable any actions here. (Remember: a +
+ preceding the action name enables the action, a -
disables!).
+ Also note how this long line has been made more readable by splitting it into
+ multiple lines with line continuation.
+
-
+
+
+{ \
+ + change-x-forwarded-for{block} \
+ + hide-from-header{block} \
+ + set-image-blocker{pattern} \
+}
+/ # Match all URLs
+
+
- NOTE: This is somewhat of a simplistic overview of what happens with each URL
- request. For the sake of brevity and simplicity, we have focused on
- Privoxy's core features only.
+ The default behavior is now set.
+
-
-
+
+default.action
-
-
-Troubleshooting: Anatomy of an Action
+
+ If you aren't a developer, there's no need for you to edit the
+ default.action file. It is maintained by
+ the &my-app; developers and if you disagree with some of the
+ sections, you should overrule them in your user.action .
+
- The way Privoxy applies
- actions and filters
- to any given URL can be complex, and not always so
- easy to understand what is happening. And sometimes we need to be able to
- see just what Privoxy is
- doing. Especially, if something Privoxy is doing
- is causing us a problem inadvertently. It can be a little daunting to look at
- the actions and filters files themselves, since they tend to be filled with
- regular expressions whose consequences are not
- always so obvious.
+ Understanding the default.action file can
+ help you with your user.action , though.
- One quick test to see if Privoxy is causing a problem
- or not, is to disable it temporarily. This should be the first troubleshooting
- step. See the Bookmarklets section on a quick
- and easy way to do this (be sure to flush caches afterward!). Looking at the
- logs is a good idea too. (Note that both the toggle feature and logging are
- enabled via config file settings, and may need to be
- turned on
.)
+ The first section in this file is a special section for internal use
+ that prevents older &my-app; versions from reading the file:
+
- Another easy troubleshooting step to try is if you have done any
- customization of your installation, revert back to the installed
- defaults and see if that helps. There are times the developers get complaints
- about one thing or another, and the problem is more related to a customized
- configuration issue.
+
+##########################################################################
+# Settings -- Don't change! For internal Privoxy use ONLY.
+##########################################################################
+{{settings}}
+for-privoxy-version=3.0.11
- Privoxy also provides the
- http://config.privoxy.org/show-url-info
- page that can show us very specifically how actions
- are being applied to any given URL. This is a big help for troubleshooting.
+ After that comes the (optional) alias section. We'll use the example
+ section from the above chapter on aliases,
+ that also explains why and how aliases are used:
- First, enter one URL (or partial URL) at the prompt, and then
- Privoxy will tell us
- how the current configuration will handle it. This will not
- help with filtering effects (i.e. the +filter
action) from
- one of the filter files since this is handled very
- differently and not so easy to trap! It also will not tell you about any other
- URLs that may be embedded within the URL you are testing. For instance, images
- such as ads are expressed as URLs within the raw page source of HTML pages. So
- you will only get info for the actual URL that is pasted into the prompt area
- -- not any sub-URLs. If you want to know about embedded URLs like ads, you
- will have to dig those out of the HTML source. Use your browser's View
- Page Source
option for this. Or right click on the ad, and grab the
- URL.
+
+##########################################################################
+# Aliases
+##########################################################################
+{{alias}}
+
+ # These aliases just save typing later:
+ # (Note that some already use other aliases!)
+ #
+ +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
+ -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
+ +block-as-image = +block{Blocked image.} +handle-as-image
+ mercy-for-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
+
+ # These aliases define combinations of actions
+ # that are useful for certain types of sites:
+ #
+ fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer
+ shop = -crunch-all-cookies - filter{all-popups}
- Let's try an example, google.com ,
- and look at it one section at a time in a sample configuration (your real
- configuration may vary):
+ The first of our specialized sections is concerned with fragile
+ sites, i.e. sites that require minimum interference, because they are either
+ very complex or very keen on tracking you (and have mechanisms in place that
+ make them unusable for people who avoid being tracked). We will use
+ our pre-defined fragile alias instead of stating the list
+ of actions explicitly:
- Matches for http://www.google.com:
-
- In file: default.action [ View ] [ Edit ]
+##########################################################################
+# Exceptions for sites that'll break under the default action set:
+##########################################################################
- {+change-x-forwarded-for{block}
- +deanimate-gifs {last}
- +fast-redirects {check-decoded-url}
- +filter {refresh-tags}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- +hide-from-header {block}
- +hide-referrer {forge}
- +session-cookies-only
- +set-image-blocker {pattern}
-/
+# "Fragile" Use a minimum set of actions for these sites (see alias above):
+#
+{ fragile }
+.office.microsoft.com # surprise, surprise!
+.windowsupdate.microsoft.com
+mail.google.com
+
- { -session-cookies-only }
- .google.com
+
+ Shopping sites are not as fragile, but they typically
+ require cookies to log in, and pop-up windows for shopping
+ carts or item details. Again, we'll use a pre-defined alias:
+
- { -fast-redirects }
- .google.com
+
+
+# Shopping sites:
+#
+{ shop }
+.quietpc.com
+.worldpay.com # for quietpc.com
+.jungle.com
+.scan.co.uk
+
-In file: user.action [ View ] [ Edit ]
-(no matches in this file)
-
+
+ The fast-redirects
+ action, which may have been enabled in match-all.action ,
+ breaks some sites. So disable it for popular sites where we know it misbehaves:
- This is telling us how we have defined our
- actions
, and
- which ones match for our test case, google.com
.
- Displayed is all the actions that are available to us. Remember,
- the + sign denotes on
. -
- denotes off
. So some are on
here, but many
- are off
. Each example we try may provide a slightly different
- end result, depending on our configuration directives.
+
+{ - fast-redirects }
+login.yahoo.com
+edit.*.yahoo.com
+.google.com
+.altavista.com/.*(like|url|link):http
+.altavista.com/trans.*urltext=http
+.nytimes.com
+
- The first listing
- is for our default.action file. The large, multi-line
- listing, is how the actions are set to match for all URLs, i.e. our default
- settings. If you look at your actions
file, this would be the
- section just below the aliases
section near the top. This
- will apply to all URLs as signified by the single forward slash at the end
- of the listing -- /
.
+ It is important that Privoxy knows which
+ URLs belong to images, so that if they are to
+ be blocked, a substitute image can be sent, rather than an HTML page.
+ Contacting the remote site to find out is not an option, since it
+ would destroy the loading time advantage of banner blocking, and it
+ would feed the advertisers information about you. We can mark any
+ URL as an image with the handle-as-image action,
+ and marking all URLs that end in a known image file extension is a
+ good start:
- But we have defined additional actions that would be exceptions to these general
- rules, and then we list specific URLs (or patterns) that these exceptions
- would apply to. Last match wins. Just below this then are two explicit
- matches for .google.com
. The first is negating our previous
- cookie setting, which was for +session-cookies-only
- (i.e. not persistent). So we will allow persistent cookies for google, at
- least that is how it is in this example. The second turns
- off any +fast-redirects
- action, allowing this to take place unmolested. Note that there is a leading
- dot here -- .google.com
. This will match any hosts and
- sub-domains, in the google.com domain also, such as
- www.google.com
or mail.google.com
. But it would not
- match www.google.de
! So, apparently, we have these two actions
- defined as exceptions to the general rules at the top somewhere in the lower
- part of our default.action file, and
- google.com
is referenced somewhere in these latter sections.
+
+##########################################################################
+# Images:
+##########################################################################
+
+# Define which file types will be treated as images, in case they get
+# blocked further down this file:
+#
+{ + handle-as-image }
+/.*\.(gif|jpe?g|png|bmp|ico)$
- Then, for our user.action file, we again have no hits.
- So there is nothing google-specific that we might have added to our own, local
- configuration. If there was, those actions would over-rule any actions from
- previously processed files, such as default.action .
- user.action typically has the last word. This is the
- best place to put hard and fast exceptions,
+ And then there are known banner sources. They often use scripts to
+ generate the banners, so it won't be visible from the URL that the
+ request is for an image. Hence we block them and
+ mark them as images in one go, with the help of our
+ +block-as-image alias defined above. (We could of
+ course just as well use + block
+ + handle-as-image here.)
+ Remember that the type of the replacement image is chosen by the
+ set-image-blocker
+ action. Since all URLs have matched the default section with its
+ + set-image-blocker{pattern}
+ action before, it still applies and needn't be repeated:
- And finally we pull it all together in the bottom section and summarize how
- Privoxy is applying all its actions
- to google.com
:
+
+# Known ad generators:
+#
+{ +block-as-image }
+ar.atwola.com
+.ad.doubleclick.net
+.ad.*.doubleclick.net
+.a.yimg.com/(?:(?!/i/).)*$
+.a[0-9].yimg.com/(?:(?!/i/).)*$
+bs*.gsanet.com
+.qkimg.net
+
+
+ One of the most important jobs of Privoxy
+ is to block banners. Many of these can be blocked
+ by the filter{banners-by-size}
+ action, which we enabled above, and which deletes the references to banner
+ images from the pages while they are loaded, so the browser doesn't request
+ them anymore, and hence they don't need to be blocked here. But this naturally
+ doesn't catch all banners, and some people choose not to use filters, so we
+ need a comprehensive list of patterns for banner URLs here, and apply the
+ block action to them.
+
+
+ First comes many generic patterns, which do most of the work, by
+ matching typical domain and path name components of banners. Then comes
+ a list of individual patterns for specific sites, which is omitted here
+ to keep the example short:
+##########################################################################
+# Block these fine banners:
+##########################################################################
+{ +block{Banner ads.} }
- Final results:
+# Generic patterns:
+#
+ad*.
+.*ads.
+banner?.
+count*.
+/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
+/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
- -add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs {last}
- -downgrade-http-version
- -fast-redirects
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header {block}
- -hide-if-modified-since
- +hide-referrer {forge}
- -hide-user-agent
- -limit-connect
- -overwrite-last-modified
- -prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- -session-cookies-only
- +set-image-blocker {pattern}
+# Site-specific patterns (abbreviated):
+#
+.hitbox.com
- Notice the only difference here to the previous listing, is to
- fast-redirects
and session-cookies-only
,
- which are activated specifically for this site in our configuration,
- and thus show in the Final Results
.
+ It's quite remarkable how many advertisers actually call their banner
+ servers ads.company .com, or call the directory
+ in which the banners are stored literally banners
. So the above
+ generic patterns are surprisingly effective.
-
- Now another example, ad.doubleclick.net
:
+ But being very generic, they necessarily also catch URLs that we don't want
+ to block. The pattern .*ads. e.g. catches
+ nasty-ads .nasty-corp.com
as intended,
+ but also downloads .sourcefroge.net
or
+ ads l.some-provider.net.
So here come some
+ well-known exceptions to the + block
+ section above.
+
+
+ Note that these are exceptions to exceptions from the default! Consider the URL
+ downloads.sourcefroge.net
: Initially, all actions are deactivated,
+ so it wouldn't get blocked. Then comes the defaults section, which matches the
+ URL, but just deactivates the block
+ action once again. Then it matches .*ads. , an exception to the
+ general non-blocking policy, and suddenly
+ +block applies. And now, it'll match
+ .*loads. , where -block
+ applies, so (unless it matches again further down) it ends up
+ with no block action applying.
+##########################################################################
+# Save some innocent victims of the above generic block patterns:
+##########################################################################
- { +block{Domains starts with "ad"} }
- ad*.
+# By domain:
+#
+{ - block }
+adv[io]*. # (for advogato.org and advice.*)
+adsl. # (has nothing to do with ads)
+adobe. # (has nothing to do with ads either)
+ad[ud]*. # (adult.* and add.*)
+.edu # (universities don't host banners (yet!))
+.*loads. # (downloads, uploads etc)
- { +block{Domain contains "ad"} }
- .ad.
+# By path:
+#
+/.*loads/
- { +block{Doubleclick banner server} +handle-as-image }
- .[a-vx-z]*.doubleclick.net
-
+# Site-specific:
+#
+www.globalintersec.com/adv # (adv = advanced)
+www.ugu.com/sui/ugu/adv
- We'll just show the interesting part here - the explicit matches. It is
- matched three different times. Two +block{}
sections,
- and a +block{} +handle-as-image
,
- which is the expanded form of one of our aliases that had been defined as:
- +block-as-image
. (Aliases
are defined in
- the first section of the actions file and typically used to combine more
- than one action.)
+ Filtering source code can have nasty side effects,
+ so make an exception for our friends at sourceforge.net,
+ and all paths with cvs
in them. Note that
+ - filter
+ disables all filters in one fell swoop!
- Any one of these would have done the trick and blocked this as an unwanted
- image. This is unnecessarily redundant since the last case effectively
- would also cover the first. No point in taking chances with these guys
- though ;-) Note that if you want an ad or obnoxious
- URL to be invisible, it should be defined as ad.doubleclick.net
- is done here -- as both a +block{}
- and an
- +handle-as-image
.
- The custom alias +block-as-image
just
- simplifies the process and make it more readable.
+
+# Don't filter code!
+#
+{ - filter }
+/(.*/)?cvs
+bugzilla.
+developer.
+wiki.
+.sourceforge.net
- One last example. Let's try http://www.example.net/adsl/HOWTO/
.
- This one is giving us problems. We are getting a blank page. Hmmm ...
+ The actual default.action is of course much more
+ comprehensive, but we hope this example made clear how it works.
-
-
-
- Matches for http://www.example.net/adsl/HOWTO/:
-
- In file: default.action [ View ] [ Edit ]
+
- {-add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs
- -downgrade-http-version
- +fast-redirects {check-decoded-url}
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header{block}
- +hide-referer{forge}
- -hide-user-agent
- -overwrite-last-modified
- +prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- +session-cookies-only
- +set-image-blocker{blank} }
- /
+user.action
- { +block{Path contains "ads".} +handle-as-image }
- /ads
-
+
+ So far we are painting with a broad brush by setting general policies,
+ which would be a reasonable starting point for many people. Now,
+ you might want to be more specific and have customized rules that
+ are more suitable to your personal habits and preferences. These would
+ be for narrowly defined situations like your ISP or your bank, and should
+ be placed in user.action , which is parsed after all other
+ actions files and hence has the last word, over-riding any previously
+ defined actions. user.action is also a
+ safe place for your personal settings, since
+ default.action is actively maintained by the
+ Privoxy developers and you'll probably want
+ to install updated versions from time to time.
- Ooops, the /adsl/
is matching /ads
in our
- configuration! But we did not want this at all! Now we see why we get the
- blank page. It is actually triggering two different actions here, and
- the effects are aggregated so that the URL is blocked, and &my-app; is told
- to treat the block as if it were an image. But this is, of course, all wrong.
- We could now add a new action below this (or better in our own
- user.action file) that explicitly
- un blocks (
- {-block}
) paths with
- adsl
in them (remember, last match in the configuration
- wins). There are various ways to handle such exceptions. Example:
+ So let's look at a few examples of things that one might typically do in
+ user.action :
+
+
+
+# My user.action file. <fred@example.com>
+
- { -block }
- /adsl
-
+
+ As aliases are local to the actions
+ file that they are defined in, you can't use the ones from
+ default.action , unless you repeat them here:
- Now the page displays ;-)
- Remember to flush your browser's caches when making these kinds of changes to
- your configuration to insure that you get a freshly delivered page! Or, try
- using Shift+Reload .
+
+# Aliases are local to the file they are defined in.
+# (Re-)define aliases for this file:
+#
+{{alias}}
+#
+# These aliases just save typing later, and the alias names should
+# be self explanatory.
+#
++crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
+-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
+ allow-all-cookies = -crunch-all-cookies -session-cookies-only
+ allow-popups = -filter{all-popups}
++block-as-image = +block{Blocked as image.} +handle-as-image
+-block-as-image = -block
+
+# These aliases define combinations of actions that are useful for
+# certain types of sites:
+#
+fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
+shop = -crunch-all-cookies allow-popups
+
+# Allow ads for selected useful free sites:
+#
+allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
+
+# Alias for specific file types that are text, but might have conflicting
+# MIME types. We want the browser to force these to be text documents.
+handle-as-text = - filter +- content-type-overwrite{text/plain} +- force-text-mode - hide-content-disposition
+
- But now what about a situation where we get no explicit matches like
- we did with:
+ Say you have accounts on some sites that you visit regularly, and
+ you don't want to have to log in manually each time. So you'd like
+ to allow persistent cookies for these sites. The
+ allow-all-cookies alias defined above does exactly
+ that, i.e. it disables crunching of cookies in any direction, and the
+ processing of cookies to make them only temporary.
-
- { +block{Path starts with "ads".} +handle-as-image }
- /ads
-
+{ allow-all-cookies }
+ sourceforge.net
+ .yahoo.com
+ .msdn.microsoft.com
+ .redhat.com
- That actually was very helpful and pointed us quickly to where the problem
- was. If you don't get this kind of match, then it means one of the default
- rules in the first section of default.action is causing
- the problem. This would require some guesswork, and maybe a little trial and
- error to isolate the offending rule. One likely cause would be one of the
- +filter
actions.
- These tend to be harder to troubleshoot.
- Try adding the URL for the site to one of aliases that turn off
- +filter
:
+ Your bank is allergic to some filter, but you don't know which, so you disable them all:
-
- { shop }
- .quietpc.com
- .worldpay.com # for quietpc.com
- .jungle.com
- .scan.co.uk
- .forbes.com
-
+{ - filter }
+ .your-home-banking-site.com
- { shop }
is an alias
that expands to
- { -filter -session-cookies-only }
.
- Or you could do your own exception to negate filtering:
-
+ Some file types you may not want to filter for various reasons:
+# Technical documentation is likely to contain strings that might
+# erroneously get altered by the JavaScript-oriented filters:
+#
+.tldp.org
+/(.*/)?selfhtml/
- { -filter }
- # Disable ALL filter actions for sites in this section
- .forbes.com
- developer.ibm.com
- localhost
-
+# And this stupid host sends streaming video with a wrong MIME type,
+# so that Privoxy thinks it is getting HTML and starts filtering:
+#
+stupid-server.example.com/
- This would turn off all filtering for these sites. This is best
- put in user.action , for local site
- exceptions. Note that when a simple domain pattern is used by itself (without
- the subsequent path portion), all sub-pages within that domain are included
- automatically in the scope of the action.
+ Example of a simple block action. Say you've
+ seen an ad on your favourite page on example.com that you want to get rid of.
+ You have right-clicked the image, selected copy image location
+ and pasted the URL below while removing the leading http://, into a
+ { +block{} } section. Note that { +handle-as-image
+ } need not be specified, since all URLs ending in
+ .gif will be tagged as images by the general rules as set
+ in default.action anyway:
- Images that are inexplicably being blocked, may well be hitting the
-+filter{banners-by-size}
- rule, which assumes
- that images of certain sizes are ad banners (works well
- most of the time since these tend to be standardized).
+
+{ + block{Nasty ads.} }
+ www.example.com/nasty-ads/sponsor\.gif
+ another.example.net/more/junk/here/
- { fragile }
is an alias that disables most
- actions that are the most likely to cause trouble. This can be used as a
- last resort for problem sites.
+ The URLs of dynamically generated banners, especially from large banner
+ farms, often don't use the well-known image file name extensions, which
+ makes it impossible for Privoxy to guess
+ the file type just by looking at the URL.
+ You can use the +block-as-image alias defined above for
+ these cases.
+ Note that objects which match this rule but then turn out NOT to be an
+ image are typically rendered as a broken image
icon by the
+ browser. Use cautiously.
+
-
- { fragile }
- # Handle with care: easy to break
- mail.google.
- mybank.example.com
+{ +block-as-image }
+ .doubleclick.net
+ .fastclick.net
+ /Realmedia/ads/
+ ar.atwola.com/
-
- Remember to flush caches! Note that the
- mail.google reference lacks the TLD portion (e.g.
- .com
). This will effectively match any TLD with
- google in it, such as mail.google.de. ,
- just as an example.
+ Now you noticed that the default configuration breaks Forbes Magazine,
+ but you were too lazy to find out which action is the culprit, and you
+ were again too lazy to give feedback, so
+ you just used the fragile alias on the site, and
+ -- whoa! -- it worked. The fragile
+ aliases disables those actions that are most likely to break a site. Also,
+ good for testing purposes to see if it is Privoxy
+ that is causing the problem or not. We later find other regular sites
+ that misbehave, and add those to our personalized list of troublemakers:
+
- If this still does not work, you will have to go through the remaining
- actions one by one to find which one(s) is causing the problem.
+
+{ fragile }
+ .forbes.com
+ webmail.example.com
+ .mybank.com
-
-
-
-
-
- Revision 2.90 2008/09/26 16:53:09 fabiankeil
- Update "What's new" section.
+
- Revision 2.89 2008/09/21 15:38:56 fabiankeil
- Fix Portage tree sync instructions in Gentoo section.
- Anonymously reported at ijbswa-developers@.
+
- Revision 2.88 2008/09/21 14:42:52 fabiankeil
- Add documentation for change-x-forwarded-for{},
- remove documentation for hide-forwarded-for-headers.
+
- Revision 2.87 2008/08/30 15:37:35 fabiankeil
- Update entities.
+
+Filter Files
- Revision 2.86 2008/08/16 10:12:23 fabiankeil
- Merge two sentences and move the URL to the end of the item.
+
+ On-the-fly text substitutions need
+ to be defined in a filter file
. Once defined, they
+ can then be invoked as an action
.
+
- Revision 2.85 2008/08/16 10:04:59 fabiankeil
- Some more syntax fixes. This version actually builds.
+
+ &my-app; supports three different pcrs-based filter actions:
+ filter to
+ rewrite the content that is send to the client,
+ client-header-filter
+ to rewrite headers that are send by the client, and
+ server-header-filter
+ to rewrite headers that are send by the server.
+
- Revision 2.84 2008/08/16 09:42:45 fabiankeil
- Turns out building docs works better if the syntax is valid.
+
+ &my-app; also supports two tagger actions:
+ client-header-tagger
+ and
+ server-header-tagger .
+ Taggers and filters use the same syntax in the filter files, the difference
+ is that taggers don't modify the text they are filtering, but use a rewritten
+ version of the filtered text as tag. The tags can then be used to change the
+ applying actions through sections with tag-patterns.
+
- Revision 2.83 2008/08/16 09:32:02 fabiankeil
- Mention changes since 3.0.9 beta.
+
+ Finally &my-app; supports the
+ external-filter action
+ to enable external filters
+ written in proper programming languages.
+
- Revision 2.82 2008/08/16 09:00:52 fabiankeil
- Fix example URL pattern (once more with feeling).
- Revision 2.81 2008/08/16 08:51:28 fabiankeil
- Update version-related entities.
+
+ Multiple filter files can be defined through the filterfile config directive. The filters
+ as supplied by the developers are located in
+ default.filter . It is recommended that any locally
+ defined or modified filters go in a separately defined file such as
+ user.filter .
+
- Revision 2.80 2008/07/18 16:54:30 fabiankeil
- Remove erroneous whitespace in documentation link.
- Reported by John Chronister in #2021611.
+
+ Common tasks for content filters are to eliminate common annoyances in
+ HTML and JavaScript, such as pop-up windows,
+ exit consoles, crippled windows without navigation tools, the
+ infamous <BLINK> tag etc, to suppress images with certain
+ width and height attributes (standard banner sizes or web-bugs),
+ or just to have fun.
+
- Revision 2.79 2008/06/27 18:00:53 markm68k
- remove outdated startup information for mac os x
+
+ Enabled content filters are applied to any content whose
+ Content Type
header is recognised as a sign
+ of text-based content, with the exception of text/plain .
+ Use the force-text-mode action
+ to also filter other content.
+
- Revision 2.78 2008/06/21 17:03:03 fabiankeil
- Fix typo.
+
+ Substitutions are made at the source level, so if you want to roll
+ your own
filters, you should first be familiar with HTML syntax,
+ and, of course, regular expressions.
+
- Revision 2.77 2008/06/14 13:45:22 fabiankeil
- Re-add a colon I unintentionally removed a few revisions ago.
+
+ Just like the actions files, the
+ filter file is organized in sections, which are called filters
+ here. Each filter consists of a heading line, that starts with one of the
+ keywords FILTER: ,
+ CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER:
+ followed by the filter's name , and a short (one line)
+ description of what it does. Below that line
+ come the jobs , i.e. lines that define the actual
+ text substitutions. By convention, the name of a filter
+ should describe what the filter eliminates . The
+ comment is used in the web-based
+ user interface .
+
- Revision 2.76 2008/06/14 13:21:28 fabiankeil
- Prepare for the upcoming 3.0.9 beta release.
+
+ Once a filter called name has been defined
+ in the filter file, it can be invoked by using an action of the form
+ + filter{name }
+ in any actions file.
+
- Revision 2.75 2008/06/13 16:06:48 fabiankeil
- Update the "What's New in this Release" section with
- the ChangeLog entries changelog2doc.pl could handle.
+
+ Filter definitions start with a header line that contains the filter
+ type, the filter name and the filter description.
+ A content filter header line for a filter called foo
could look
+ like this:
+
- Revision 2.74 2008/05/26 15:55:46 fabiankeil
- - Update "default profiles" table.
- - Add some more pcrs redirect examples and note that
- enabling debug 128 helps to get redirects working.
+
+ FILTER: foo Replace all "foo" with "bar"
+
- Revision 2.73 2008/05/23 14:43:18 fabiankeil
- Remove previously out-commented block that caused syntax problems.
+
+ Below that line, and up to the next header line, come the jobs that
+ define what text replacements the filter executes. They are specified
+ in a syntax that imitates Perl 's
+ s/// operator. If you are familiar with Perl, you
+ will find this to be quite intuitive, and may want to look at the
+ PCRS documentation for the subtle differences to Perl behaviour.
+
- Revision 2.72 2008/05/12 10:26:14 fabiankeil
- Synchronize content filter descriptions with the ones in default.filter.
+
+ Most notably, the non-standard option letter U is supported,
+ which turns the default to ungreedy matching (add ? to
+ quantifiers to turn them greedy again).
+
- Revision 2.71 2008/04/10 17:37:16 fabiankeil
- Actually we use "modern" POSIX 1003.2 regular
- expressions in path patterns, not PCRE.
+
+ The non-standard option letter D (dynamic) allows
+ to use the variables $host, $origin (the IP address the request came from),
+ $path, $url and $listen-address (the address on which Privoxy accepted the
+ client request. Example: 127.0.0.1:8118).
+ They will be replaced with the value they refer to before the filter
+ is executed.
+
- Revision 2.70 2008/04/10 15:59:12 fabiankeil
- Add another section to the client-header-tagger example that shows
- how to actually change the action settings once the tag is created.
+
+ Note that '$' is a bad choice for a delimiter in a dynamic filter as you
+ might end up with unintended variables if you use a variable name
+ directly after the delimiter. Variables will be resolved without
+ escaping anything, therefore you also have to be careful not to chose
+ delimiters that appear in the replacement text. For example '<' should
+ be save, while '?' will sooner or later cause conflicts with $url.
+
- Revision 2.69 2008/03/29 12:14:25 fabiankeil
- Remove send-wafer and send-vanilla-wafer actions.
+
+ The non-standard option letter T (trivial) prevents
+ parsing for backreferences in the substitute. Use it if you want to include
+ text like '$&' in your substitute without quoting.
+
- Revision 2.68 2008/03/28 15:13:43 fabiankeil
- Remove inspect-jpegs action.
+
+ If you are new to
+ Regular
+ Expressions
, you might want to take a look at
+ the Appendix on regular expressions, and
+ see the Perl
+ manual for
+ the
+ s/// operator's syntax and Perl-style regular
+ expressions in general.
+ The below examples might also help to get you started.
+
- Revision 2.67 2008/03/27 18:31:21 fabiankeil
- Remove kill-popups action.
- Revision 2.66 2008/03/06 16:33:47 fabiankeil
- If limit-connect isn't used, don't limit CONNECT requests to port 443.
+
- Revision 2.65 2008/03/04 18:30:40 fabiankeil
- Remove the treat-forbidden-connects-like-blocks action. We now
- use the "blocked" page for forbidden CONNECT requests by default.
+Filter File Tutorial
+
+ Now, let's complete our foo
content filter. We have already defined
+ the heading, but the jobs are still missing. Since all it does is to replace
+ foo
with bar
, there is only one (trivial) job
+ needed:
+
- Revision 2.64 2008/03/01 14:10:28 fabiankeil
- Use new block syntax. Still needs some polishing.
+
+ s/foo/bar/
+
- Revision 2.63 2008/02/22 05:50:37 markm68k
- fix merge problem
+
+ But wait! Didn't the comment say that all occurrences
+ of foo
should be replaced? Our current job will only take
+ care of the first foo
on each page. For global substitution,
+ we'll need to add the g option:
+
- Revision 2.62 2008/02/11 11:52:23 hal9
- Fix entity ... s/&/&
+
+ s/foo/bar/g
+
- Revision 2.61 2008/02/11 03:41:47 markm68k
- more updates for mac os x
+
+ Our complete filter now looks like this:
+
+
+ FILTER: foo Replace all "foo" with "bar"
+s/foo/bar/g
+
- Revision 2.60 2008/02/11 03:40:25 markm68k
- more updates for mac os x
+
+ Let's look at some real filters for more interesting examples. Here you see
+ a filter that protects against some common annoyances that arise from JavaScript
+ abuse. Let's look at its jobs one after the other:
+
- Revision 2.59 2008/02/11 00:52:34 markm68k
- reflect new changes for mac os x
- Revision 2.58 2008/02/03 21:37:40 hal9
- Apply patch from Mark: s/OSX/OS X/
+
+
+FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
- Revision 2.57 2008/02/03 19:10:14 fabiankeil
- Mention forward-socks5.
+# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
+#
+s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
+
- Revision 2.56 2008/01/31 19:11:35 fabiankeil
- Let the +client-header-filter{hide-tor-exit-notation} example apply
- to all requests as "tainted" Referers aren't limited to exit TLDs.
+
+ Following the header line and a comment, you see the job. Note that it uses
+ | as the delimiter instead of / , because
+ the pattern contains a forward slash, which would otherwise have to be escaped
+ by a backslash (\ ).
+
- Revision 2.55 2008/01/19 21:26:37 hal9
- Add IE7 to configuration section per Gerry.
+
+ Now, let's examine the pattern: it starts with the text <script.*
+ enclosed in parentheses. Since the dot matches any character, and *
+ means: Match an arbitrary number of the element left of myself
, this
+ matches <script
, followed by any text, i.e.
+ it matches the whole page, from the start of the first <script> tag.
+
- Revision 2.54 2008/01/19 17:52:39 hal9
- Re-commit to fix various minor issues for new release.
+
+ That's more than we want, but the pattern continues: document\.referrer
+ matches only the exact string document.referrer
. The dot needed to
+ be escaped , i.e. preceded by a backslash, to take away its
+ special meaning as a joker, and make it just a regular dot. So far, the meaning is:
+ Match from the start of the first <script> tag in a the page, up to, and including,
+ the text document.referrer
, if both are present
+ in the page (and appear in that order).
+
- Revision 2.53 2008/01/19 15:03:05 hal9
- Doc sources tagged for 3.0.8 release.
+
+ But there's still more pattern to go. The next element, again enclosed in parentheses,
+ is .*</script> . You already know what .*
+ means, so the whole pattern translates to: Match from the start of the first <script>
+ tag in a page to the end of the last <script> tag, provided that the text
+ document.referrer
appears somewhere in between.
+
- Revision 2.52 2008/01/17 01:49:51 hal9
- Change copyright notice for docs s/2007/2008/. All these will be rebuilt soon
- enough.
+
+ This is still not the whole story, since we have ignored the options and the parentheses:
+ The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
+ remembered and be available through the variables $1, $2, ... in
+ the substitute. The U option switches to ungreedy matching, which means
+ that the first .* in the pattern will only eat up
all
+ text in between <script
and the first occurrence
+ of document.referrer
, and that the second .* will
+ only span the text up to the first </script>
+ tag. Furthermore, the s option says that the match may span
+ multiple lines in the page, and the g option again means that the
+ substitution is global.
+
- Revision 2.51 2007/12/23 16:48:24 fabiankeil
- Use more precise example descriptions for the mysterious domain patterns.
+
+ So, to summarize, the pattern means: Match all scripts that contain the text
+ document.referrer
. Remember the parts of the script from
+ (and including) the start tag up to (and excluding) the string
+ document.referrer
as $1 , and the part following
+ that string, up to and including the closing tag, as $2 .
+
- Revision 2.50 2007/12/08 12:44:36 fabiankeil
- - Remove already commented out pre-3.0.7 changes.
- - Update the "new log defaults" paragraph.
+
+ Now the pattern is deciphered, but wasn't this about substituting things? So
+ lets look at the substitute: $1"Not Your Business!"$2 is
+ easy to read: The text remembered as $1 , followed by
+ "Not Your Business!" (including
+ the quotation marks!), followed by the text remembered as $2 .
+ This produces an exact copy of the original string, with the middle part
+ (the document.referrer
) replaced by "Not Your
+ Business!" .
+
- Revision 2.49 2007/12/06 18:21:55 fabiankeil
- Update hide-forwarded-for-headers description.
+
+ The whole job now reads: Replace document.referrer
by
+ "Not Your Business!" wherever it appears inside a
+ <script> tag. Note that this job won't break JavaScript syntax,
+ since both the original and the replacement are syntactically valid
+ string objects. The script just won't have access to the referrer
+ information anymore.
+
- Revision 2.48 2007/11/24 19:07:17 fabiankeil
- - Mention request rewriting.
- - Enable the conditional-forge paragraph.
- - Minor rewordings.
+
+ We'll show you two other jobs from the JavaScript taming department, but
+ this time only point out the constructs of special interest:
+
- Revision 2.47 2007/11/18 14:59:47 fabiankeil
- A few "Note to Upgraders" updates.
+
+
+# The status bar is for displaying link targets, not pointless blahblah
+#
+s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
+
- Revision 2.46 2007/11/17 17:24:44 fabiankeil
- - Use new action defaults.
- - Minor fixes and rewordings.
+
+ \s stands for whitespace characters (space, tab, newline,
+ carriage return, form feed), so that \s* means: zero
+ or more whitespace
. The ? in .*?
+ makes this matching of arbitrary text ungreedy. (Note that the U
+ option is not set). The ['"] construct means: a single
+ or a double quote
. Finally, \1 is
+ a back-reference to the first parenthesis just like $1 above,
+ with the difference that in the pattern , a backslash indicates
+ a back-reference, whereas in the substitute , it's the dollar.
+
- Revision 2.45 2007/11/16 11:48:46 hal9
- Fix one typo, and add a couple of small refinements.
+
+ So what does this job do? It replaces assignments of single- or double-quoted
+ strings to the window.status
object with a dummy assignment
+ (using a variable name that is hopefully odd enough not to conflict with
+ real variables in scripts). Thus, it catches many cases where e.g. pointless
+ descriptions are displayed in the status bar instead of the link target when
+ you move your mouse over links.
+
- Revision 2.44 2007/11/15 03:30:20 hal9
- Results of spell check.
+
+
+# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
+#
+s/(<body [^>]*)onunload(.*>)/$1never$2/iU
+
- Revision 2.43 2007/11/14 18:45:39 fabiankeil
- - Mention some more contributors in the "New in this Release" list.
- - Minor rewordings.
+
+ Including the
+ OnUnload
+ event binding in the HTML DOM was a CRIME .
+ When I close a browser window, I want it to close and die. Basta.
+ This job replaces the onunload
attribute in
+ <body>
tags with the dummy word never .
+ Note that the i option makes the pattern matching
+ case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
+ a minimal match: In the first parenthesis, we had to use [^>]*
+ instead of .* to prevent the match from exceeding the
+ <body> tag if it doesn't contain OnUnload
, but the page's
+ content does.
+
- Revision 2.42 2007/11/12 03:32:40 hal9
- Updates for "What's New" and "Notes to Upgraders". Various other changes in
- preparation for new release. User Manual is almost ready.
+
+ The last example is from the fun department:
+
- Revision 2.41 2007/11/11 16:32:11 hal9
- This is primarily syncing What's New and Note to Upgraders sections with the many
- new features and changes (gleaned from memory but mostly from ChangeLog).
+
+
+FILTER: fun Fun text replacements
- Revision 2.40 2007/11/10 17:10:59 fabiankeil
- In the first third of the file, mention several times that
- the action editor is disabled by default in 3.0.7 beta and later.
+# Spice the daily news:
+#
+s/microsoft(?!\.com)/MicroSuck/ig
+
- Revision 2.39 2007/11/05 02:34:49 hal9
- Various changes in preparation for the upcoming release. Much yet to be done.
+
+ Note the (?!\.com) part (a so-called negative lookahead)
+ in the job's pattern, which means: Don't match, if the string
+ .com
appears directly following microsoft
+ in the page. This prevents links to microsoft.com from being trashed, while
+ still replacing the word everywhere else.
+
- Revision 2.38 2007/09/22 16:01:42 fabiankeil
- Update embedded show-url-info output.
+
+
+# Buzzword Bingo (example for extended regex syntax)
+#
+s* industry[ -]leading \
+| cutting[ -]edge \
+| customer[ -]focused \
+| market[ -]driven \
+| award[ -]winning # Comments are OK, too! \
+| high[ -]performance \
+| solutions[ -]based \
+| unmatched \
+| unparalleled \
+| unrivalled \
+*<font color="red"><b>BINGO!</b></font> \
+*igx
+
- Revision 2.37 2007/08/27 16:09:55 fabiankeil
- Fix pre-chroot-nslookup description which I failed to
- copy and paste properly. Reported by Stephen Gildea.
+
+ The x option in this job turns on extended syntax, and allows for
+ e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
+
- Revision 2.36 2007/08/26 16:47:14 fabiankeil
- Add Stephen Gildea's pre-chroot-nslookup patch [#1276666],
- extensive comments moved to user manual.
+
+ You get the idea?
+
+
- Revision 2.35 2007/08/26 14:59:49 fabiankeil
- Minor rewordings and fixes.
+
- Revision 2.34 2007/08/05 15:19:50 fabiankeil
- - Don't claim HTTP/1.1 compliance.
- - Use $ in some of the path pattern examples.
- - Use a hide-user-agent example argument without
- leading and trailing space.
- - Make it clear that the cookie actions work with
- HTTP cookies only.
- - Rephrase the inspect-jpegs text to underline
- that it's only meant to protect against a single
- exploit.
+The Pre-defined Filters
- Revision 2.33 2007/07/27 10:57:35 hal9
- Add references for user-agent strings for hide-user-agenet
+
- Revision 2.30 2007/04/25 15:10:36 fabiankeil
- - Describe installation for FreeBSD.
- - Start to document taggers and tag patterns.
- - Don't confuse devils and daemons.
+
+The distribution default.filter file contains a selection of
+pre-defined filters for your convenience:
+
- Revision 2.29 2007/04/05 11:47:51 fabiankeil
- Some updates regarding header filtering,
- handling of compressed content and redirect's
- support for pcrs commands.
+
+
+ js-annoyances
+
+
+ The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
+ To that end, it
+
+
+
+ replaces JavaScript references to the browser's referrer information
+ with the string "Not Your Business!". This compliments the hide-referrer action on the content level.
+
+
+
+
+ removes the bindings to the DOM's
+ unload
+ event which we feel has no right to exist and is responsible for most exit consoles
, i.e.
+ nasty windows that pop up when you close another one.
+
+
+
+
+ removes code that causes new windows to be opened with undesired properties, such as being
+ full-screen, non-resizeable, without location, status or menu bar etc.
+
+
+
+
+
+ Use with caution. This is an aggressive filter, and can break sites that
+ rely heavily on JavaScript.
+
+
+
- Revision 2.28 2006/12/10 23:42:48 hal9
- Fix various typos reported by Adam P. Thanks.
+
+ js-events
+
+
+ This is a very radical measure. It removes virtually all JavaScript event bindings, which
+ means that scripts can not react to user actions such as mouse movements or clicks, window
+ resizing etc, anymore. Use with caution!
+
+
+ We strongly discourage using this filter as a default since it breaks
+ many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
+ need to go there).
+
+
+
- Revision 2.27 2006/11/14 01:57:47 hal9
- Dump all docs prior to 3.0.6 release. Various minor changes to faq and user
- manual.
+
+ html-annoyances
+
+
+ This filter will undo many common instances of HTML based abuse.
+
+
+ The BLINK and MARQUEE tags
+ are neutralized (yeah baby!), and browser windows will be created as
+ resizeable (as of course they should be!), and will have location,
+ scroll and menu bars -- even if specified otherwise.
+
+
+
- Revision 2.26 2006/10/24 11:16:44 hal9
- Add new filters.
+
+ content-cookies
+
+
+ Most cookies are set in the HTTP dialog, where they can be intercepted
+ by the
+ crunch-incoming-cookies
+ and crunch-outgoing-cookies
+ actions. But web sites increasingly make use of HTML meta tags and JavaScript
+ to sneak cookies to the browser on the content level.
+
+
+ This filter disables most HTML and JavaScript code that reads or sets
+ cookies. It cannot detect all clever uses of these types of code, so it
+ should not be relied on as an absolute fix. Use it wherever you would also
+ use the cookie crunch actions.
+
+
+
- Revision 2.25 2006/10/18 10:50:33 hal9
- Add note that since filters are off in Cautious, compression is ON. Turn off
- compression to make filters work on all sites.
+
+ refresh-tags
+
+
+ Disable any refresh tags if the interval is greater than nine seconds (so
+ that redirections done via refresh tags are not destroyed). This is useful
+ for dial-on-demand setups, or for those who find this HTML feature
+ annoying.
+
+
+
- Revision 2.24 2006/10/03 11:13:54 hal9
- More references to the new filters. Include html this time around.
+
+ unsolicited-popups
+
+
+ This filter attempts to prevent only unsolicited
pop-up
+ windows from opening, yet still allow pop-up windows that the user
+ has explicitly chosen to open. It was added in version 3.0.1,
+ as an improvement over earlier such filters.
+
+
+ Technical note: The filter works by redefining the window.open JavaScript
+ function to a dummy function, PrivoxyWindowOpen() ,
+ during the loading and rendering phase of each HTML page access, and
+ restoring the function afterward.
+
+
+ This is recommended only for browsers that cannot perform this function
+ reliably themselves. And be aware that some sites require such windows
+ in order to function normally. Use with caution.
+
+
+
- Revision 2.23 2006/10/02 22:43:53 hal9
- Contains new filter definitions from Fabian, and few other miscellaneous
- touch-ups.
+
+ all-popups
+
+
+ Attempt to prevent all pop-up windows from opening.
+ Note this should be used with even more discretion than the above, since
+ it is more likely to break some sites that require pop-ups for normal
+ usage. Use with caution.
+
+
+
- Revision 2.22 2006/09/22 01:27:55 hal9
- Final commit of probably various minor changes here and there. Unless
- something changes this should be ready for pending release.
+
+ img-reorder
+
+
+ This is a helper filter that has no value if used alone. It makes the
+ banners-by-size and banners-by-link
+ (see below) filters more effective and should be enabled together with them.
+
+
+
- Revision 2.21 2006/09/20 03:21:36 david__schmidt
- Just the tiniest tweak. Wafer thin!
+
+ banners-by-size
+
+
+ This filter removes image tags purely based on what size they are. Fortunately
+ for us, many ads and banner images tend to conform to certain standardized
+ sizes, which makes this filter quite effective for ad stripping purposes.
+
+
+ Occasionally this filter will cause false positives on images that are not ads,
+ but just happen to be of one of the standard banner sizes.
+
+
+ Recommended only for those who require extreme ad blocking. The default
+ block rules should catch 95+% of all ads without this filter enabled.
+
+
+
- Revision 2.20 2006/09/10 14:53:54 hal9
- Results of spell check. User manual has some updates to standard.actions file
- info.
+
+ banners-by-link
+
+
+ This is an experimental filter that attempts to kill any banners if
+ their URLs seem to point to known or suspected click trackers. It is currently
+ not of much value and is not recommended for use by default.
+
+
+
- Revision 2.19 2006/09/08 12:19:02 fabiankeil
- Adjust hide-if-modified-since example values
- to reflect the recent changes.
+
+ webbugs
+
+
+ Webbugs are small, invisible images (technically 1X1 GIF images), that
+ are used to track users across websites, and collect information on them.
+ As an HTML page is loaded by the browser, an embedded image tag causes the
+ browser to contact a third-party site, disclosing the tracking information
+ through the requested URL and/or cookies for that third-party domain, without
+ the user ever becoming aware of the interaction with the third-party site.
+ HTML-ized spam also uses a similar technique to verify email addresses.
+
+
+ This filter removes the HTML code that loads such webbugs
.
+
+
+
- Revision 2.18 2006/09/08 02:38:57 hal9
- Various changes:
- -Fix a number of broken links.
- -Migrate the new Windows service command line options, and reference as
- needed.
- -Rebuild so that can be used with the new "user-manual" config capabilities.
- -Etc.
+
+ tiny-textforms
+
+
+ A rather special-purpose filter that can be used to enlarge textareas (those
+ multi-line text boxes in web forms) and turn off hard word wrap in them.
+ It was written for the sourceforge.net tracker system where such boxes are
+ a nuisance, but it can be handy on other sites, too.
+
+
+ It is not recommended to use this filter as a default.
+
+
+
- Revision 2.17 2006/09/05 13:25:12 david__schmidt
- Add Windows service invocation stuff (duplicated) in FAQ and in user manual under Windows startup. One probably ought to reference the other.
+
+ jumping-windows
+
+
+ Many consider windows that move, or resize themselves to be abusive. This filter
+ neutralizes the related JavaScript code. Note that some sites might not display
+ or behave as intended when using this filter. Use with caution.
+
+
+
- Revision 2.16 2006/09/02 12:49:37 hal9
- Various small updates for new actions, filterfiles, etc.
+
+ frameset-borders
+
+
+ Some web designers seem to assume that everyone in the world will view their
+ web sites using the same browser brand and version, screen resolution etc,
+ because only that assumption could explain why they'd use static frame sizes,
+ yet prevent their frames from being resized by the user, should they be too
+ small to show their whole content.
+
+
+ This filter removes the related HTML code. It should only be applied to sites
+ which need it.
+
+
+
- Revision 2.15 2006/08/30 11:15:22 hal9
- More work on the new actions, especially filter-*-headers, and What's New
- section. User Manual is close to final form for 3.0.4 release. Some tinkering
- and proof reading left to do.
+
+ demoronizer
+
+
+ Many Microsoft products that generate HTML use non-standard extensions (read:
+ violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
+ HTML documents to display with errors on standard-compliant platforms.
+
+
+ This filter translates the MS-only characters into Latin-1 equivalents.
+ It is not necessary when using MS products, and will cause corruption of
+ all documents that use 8-bit character sets other than Latin-1. It's mostly
+ worthwhile for Europeans on non-MS platforms, if weird garbage characters
+ sometimes appear on some pages, or user agents that don't correct for this on
+ the fly.
+
+
+
+
- Revision 2.14 2006/08/29 10:59:36 hal9
- Add a "Whats New in this release" Section. Further work on multiple filter
- files, and assorted other minor changes.
+
+ shockwave-flash
+
+
+ A filter for shockwave haters. As the name suggests, this filter strips code
+ out of web pages that is used to embed shockwave flash objects.
+
+
+
+
+
- Revision 2.13 2006/08/22 11:04:59 hal9
- Silence warnings and errors. This should build now. New filters were only
- stubbed in. More to be done.
+
+ quicktime-kioskmode
+
+
+ Change HTML code that embeds Quicktime objects so that kioskmode, which
+ prevents saving, is disabled.
+
+
+
- Revision 2.12 2006/08/14 08:40:39 fabiankeil
- Documented new actions that were part of
- the "minor Privoxy improvements".
+
+ fun
+
+
+ Text replacements for subversive browsing fun. Make fun of your favorite
+ Monopolist or play buzzword bingo.
+
+
+
- Revision 2.11 2006/07/18 14:48:51 david__schmidt
- Reorganizing the repository: swapping out what was HEAD (the old 3.1 branch)
- with what was really the latest development (the v_3_0_branch branch)
+
+ crude-parental
+
+
+ A demonstration-only filter that shows how Privoxy
+ can be used to delete web content on a keyword basis.
+
+
+
- Revision 1.123.2.43 2005/05/23 09:59:10 hal9
- Fix typo 'loose'
+
+ ie-exploits
+
+
+ An experimental collection of text replacements to disable malicious HTML and JavaScript
+ code that exploits known security holes in Internet Explorer.
+
+
+ Presently, it only protects against Nimda and a cross-site scripting bug, and
+ would need active maintenance to provide more substantial protection.
+
+
+
- Revision 1.123.2.42 2004/12/04 14:39:57 hal9
- Fix two minor typos per bug SF report.
+
+ site-specifics
+
+
+ Some web sites have very specific problems, the cure for which doesn't apply
+ anywhere else, or could even cause damage on other sites.
+
+
+ This is a collection of such site-specific cures which should only be applied
+ to the sites they were intended for, which is what the supplied
+ default.action file does. Users shouldn't need to change
+ anything regarding this filter.
+
+
+
- Revision 1.123.2.41 2004/03/23 12:58:42 oes
- Fixed an inaccuracy
+
+ google
+
+
+ A CSS based block for Google text ads. Also removes a width limitation
+ and the toolbar advertisement.
+
+
+
- Revision 1.123.2.40 2004/02/27 12:48:49 hal9
- Add comment re: redirecting to local file system for set-image-blocker may
- is dependent on browser.
+
+ yahoo
+
+
+ Another CSS based block, this time for Yahoo text ads. And removes
+ a width limitation as well.
+
+
+
- Revision 1.123.2.39 2004/01/30 22:31:40 oes
- Added a hint re bookmarklets to Quickstart section
+
+ msn
+
+
+ Another CSS based block, this time for MSN text ads. And removes
+ tracking URLs, as well as a width limitation.
+
+
+
- Revision 1.123.2.38 2004/01/30 16:47:51 oes
- Some minor clarifications
+
+ blogspot
+
+
+ Cleans up some Blogspot blogs. Read the fine print before using this one!
+
+
+ This filter also intentionally removes some navigation stuff and sets the
+ page width to 100%. As a result, some rounded corners
would
+ appear to early or not at all and as fixing this would require a browser
+ that understands background-size (CSS3), they are removed instead.
+
+
+
- Revision 1.123.2.37 2004/01/29 22:36:11 hal9
- Updates for no longer filtering text/plain, and demoronizer default settings,
- and copyright notice dates.
+
+ xml-to-html
+
+
+ Server-header filter to change the Content-Type from xml to html.
+
+
+
- Revision 1.123.2.36 2003/12/10 02:26:26 hal9
- Changed the demoronizer filter description.
+
+ html-to-xml
+
+
+ Server-header filter to change the Content-Type from html to xml.
+
+
+
- Revision 1.123.2.35 2003/11/06 13:36:37 oes
- Updated link to nightly CVS tarball
+
+ no-ping
+
+
+ Removes the non-standard ping attribute from
+ anchor and area HTML tags.
+
+
+
- Revision 1.123.2.34 2003/06/26 23:50:16 hal9
- Add a small bit on filtering and problems re: source code being corrupted.
+
+ hide-tor-exit-notation
+
+
+ Client-header filter to remove the Tor exit node notation
+ found in Host and Referer headers.
+
+
+ If &my-app; and Tor are chained and &my-app;
+ is configured to use socks4a, one can use http://www.example.org.foobar.exit/
+ to access the host www.example.org
through the
+ Tor exit node foobar
.
+
+
+ As the HTTP client isn't aware of this notation, it treats the
+ whole string www.example.org.foobar.exit
as host and uses it
+ for the Host
and Referer
headers. From the
+ server's point of view the resulting headers are invalid and can cause problems.
+
+
+ An invalid Referer
header can trigger hot-linking
+ protections, an invalid Host
header will make it impossible for
+ the server to find the right vhost (several domains hosted on the same IP address).
+
+
+ This client-header filter removes the foo.exit
part in those headers
+ to prevent the mentioned problems. Note that it only modifies
+ the HTTP headers, it doesn't make it impossible for the server
+ to detect your Tor exit node based on the IP address
+ the request is coming from.
+
+
+
- Revision 1.123.2.33 2003/05/08 18:17:33 roro
- Use apt-get instead of dpkg to install Debian package, which is more
- solid, uses the correct and most recent Debian version automatically.
+
+
- Revision 1.123.2.32 2003/04/11 03:13:57 hal9
- Add small note about only one filterfile (as opposed to multiple actions
- files).
+
- Revision 1.123.2.31 2003/03/26 02:03:43 oes
- Updated hard-coded copyright dates
+
+External filter syntax
+
+ External filters are scripts or programs that can modify the content in
+ case common filters
+ aren't powerful enough.
+
+
+ External filters can be written in any language the platform &my-app; runs
+ on supports.
+
+
+ They are controlled with the
+ external-filter action
+ and have to be defined in the filterfile
+ first.
+
+
+ The header looks like any other filter, but instead of pcrs jobs, external
+ filters contain a single job which can be a program or a shell script (which
+ may call other scripts or programs).
+
+
+ External filters read the content from STDIN and write the rewritten
+ content to STDOUT.
+ The environment variables PRIVOXY_URL, PRIVOXY_PATH, PRIVOXY_HOST,
+ PRIVOXY_ORIGIN, PRIVOXY_LISTEN_ADDRESS can be used to get some details
+ about the client request.
+
+
+ &my-app; will temporary store the content to filter in the
+ temporary-directory .
+
+
+
+EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content
+/bin/cat
- Revision 1.123.2.30 2003/03/24 12:58:56 hal9
- Add new section on Predefined Filters.
+# Incorrect reimplementation of the filter above in POSIX shell.
+#
+# Note that it's a single job that spans multiple lines, the line
+# breaks are not passed to the shell, thus the semicolons are required.
+#
+# If the script isn't trivial, it is recommended to put it into an external file.
+#
+# In general, writing external filters entirely in POSIX shell is not
+# considered a good idea.
+EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content
+while read line; \
+do \
+ echo "$line"; \
+done
+
+EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value.
+/usr/local/bin/convert - -rotate 180 -
+
+EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment.
+/usr/local/bin/convert - -pointsize 16 -fill white -annotate +17+418 "[citation needed]" -
+
+
- Revision 1.123.2.29 2003/03/20 02:45:29 hal9
- More problems with \-\-chroot causing markup problems :(
+
+
+ Currently external filters are executed with &my-app;'s privileges!
+ Only use external filters you understand and trust.
+
+
+
+ External filters are experimental and the syntax may change in the future.
+
+
- Revision 1.123.2.28 2003/03/19 00:35:24 hal9
- Manual edit of revision log because 'chroot' (even inside a comment) was
- causing Docbook to hang here (due to double hyphen and the processor thinking
- it was a comment).
+
- Revision 1.123.2.27 2003/03/18 19:37:14 oes
- s/Advanced|Radical/Adventuresome/g to avoid complaints re fun filter
+
- Revision 1.123.2.26 2003/03/17 16:50:53 oes
- Added documentation for new chroot option
- Revision 1.123.2.25 2003/03/15 18:36:55 oes
- Adapted to the new filters
- Revision 1.123.2.24 2002/11/17 06:41:06 hal9
- Move default profiles table from FAQ to U-M, and other minor related changes.
- Add faq on cookies.
+
- Revision 1.123.2.23 2002/10/21 02:32:01 hal9
- Updates to the user.action examples section. A few new ones.
+
+Privoxy's Template Files
+
+ All Privoxy built-in pages, i.e. error pages such as the
+ 404 - No Such Domain
+ error page , the BLOCKED
+ page
+ and all pages of its web-based
+ user interface , are generated from templates .
+ (Privoxy must be running for the above links to work as
+ intended.)
+
- Revision 1.123.2.22 2002/10/12 00:51:53 hal9
- Add demoronizer to filter section.
+
+ These templates are stored in a subdirectory of the configuration
+ directory called templates . On Unixish platforms,
+ this is typically
+ /etc/privoxy/templates/ .
+
- Revision 1.123.2.21 2002/10/10 04:09:35 hal9
- s/Advanced/Radical/ and added very brief note.
+
+ The templates are basically normal HTML files, but with place-holders (called symbols
+ or exports), which Privoxy fills at run time. It
+ is possible to edit the templates with a normal text editor, should you want
+ to customize them. (Not recommended for the casual
+ user ). Should you create your own custom templates, you should use
+ the config setting templdir
+ to specify an alternate location, so your templates do not get overwritten
+ during upgrades.
+
+
+ Note that just like in configuration files, lines starting
+ with # are ignored when the templates are filled in.
+
- Revision 1.123.2.20 2002/10/10 03:49:21 hal9
- Add notes to session-cookies-only and Quickstart about pre-existing
- cookies. Also, note content-cookies work differently.
+
+ The place-holders are of the form @name@ , and you will
+ find a list of available symbols, which vary from template to template,
+ in the comments at the start of each file. Note that these comments are not
+ always accurate, and that it's probably best to look at the existing HTML
+ code to find out which symbols are supported and what they are filled in with.
+
- Revision 1.123.2.19 2002/09/26 01:25:36 hal9
- More explanation on Privoxy patterns, more on content-cookies and SSL.
+
+ A special application of this substitution mechanism is to make whole
+ blocks of HTML code disappear when a specific symbol is set. We use this
+ for many purposes, one of them being to include the beta warning in all
+ our user interface (CGI) pages when Privoxy
+ is in an alpha or beta development stage:
+
- Revision 1.123.2.18 2002/08/22 23:47:58 hal9
- Add 'Documentation' to Privoxy Menu shot in Configuration section to match
- CGIs.
+
+
+<!-- @if-unstable-start -->
- Revision 1.123.2.17 2002/08/18 01:13:05 hal9
- Spell checked (only one typo this time!).
+ ... beta warning HTML code goes here ...
- Revision 1.123.2.16 2002/08/09 19:20:54 david__schmidt
- Update to Mac OS X startup script name
+<!-- if-unstable-end@ -->
+
- Revision 1.123.2.15 2002/08/07 17:32:11 oes
- Converted some internal links from ulink to link for PDF creation; no content changed
+
+ If the "unstable" symbol is set, everything in between and including
+ @if-unstable-start and if-unstable-end@
+ will disappear, leaving nothing but an empty comment:
+
- Revision 1.123.2.14 2002/08/06 09:16:13 oes
- Nits re: actions file download
+
+ <!-- -->
+
- Revision 1.123.2.13 2002/08/02 18:23:19 g_sauthoff
- Just 2 small corrections to the Gentoo sections
+
+ There's also an if-then-else construct and an #include
+ mechanism, but you'll sure find out if you are inclined to edit the
+ templates ;-)
+
- Revision 1.123.2.12 2002/08/02 18:17:21 g_sauthoff
- Added 2 Gentoo sections
+
+ All templates refer to a style located at
+ http://config.privoxy.org/send-stylesheet .
+ This is, of course, locally served by Privoxy
+ and the source for it can be found and edited in the
+ cgi-style.css template.
+
- Revision 1.123.2.11 2002/07/26 15:20:31 oes
- - Added version info to title
- - Added info on new filters
- - Revised parts of the filter file tutorial
- - Added info on where to get updated actions files
+
- Revision 1.123.2.10 2002/07/25 21:42:29 hal9
- Add brief notes on not proxying non-HTTP protocols.
+
- Revision 1.123.2.9 2002/07/11 03:40:28 david__schmidt
- Updated Mac OS X sections due to installation location change
- Revision 1.123.2.8 2002/06/09 16:36:32 hal9
- Clarifications on filtering and MIME. Hardcode 'latest release' in index.html.
+
- Revision 1.123.2.7 2002/06/09 00:29:34 hal9
- Touch ups on filtering, in actions section and Anatomy.
+Contacting the Developers, Bug Reporting and Feature
+Requests
- Revision 1.123.2.6 2002/06/06 23:11:03 hal9
- Fix broken link. Linkchecked all docs.
+
+ &contacting;
+
- Revision 1.123.2.5 2002/05/29 02:01:02 hal9
- This is break out of the entire config section from u-m, so it can
- eventually be used to generate the comments, etc in the main config file
- so that these are in sync with each other.
+
- Revision 1.123.2.4 2002/05/27 03:28:45 hal9
- Ooops missed something from David.
+
- Revision 1.123.2.3 2002/05/27 03:23:17 hal9
- Fix FIXMEs for OS2 and Mac OS X startup. Fix Redhat typos (should be Red Hat).
- That's a wrap, I think.
- Revision 1.123.2.2 2002/05/26 19:02:09 hal9
- Move Amiga stuff around to take of FIXME in start up section.
+
+Privoxy Copyright, License and History
- Revision 1.123.2.1 2002/05/26 17:04:25 hal9
- -Spellcheck, very minor edits, and sync across branches
+
+ ©right;
+
- Revision 1.123 2002/05/24 23:19:23 hal9
- Include new image (Proxy setup). More fun with guibutton.
- Minor corrections/clarifications here and there.
+
+ Privoxy is free software; you can
+ redistribute it and/or modify it under the terms of the
+ GNU General Public License , version 2,
+ as published by the Free Software Foundation and included in
+ the next section.
+
- Revision 1.122 2002/05/24 13:24:08 oes
- Added Bookmarklet for one-click pre-filled access to show-url-info
+
+License
+
+
+
- Revision 1.121 2002/05/23 23:20:17 oes
- - Changed more (all?) references to actions to the
- style.
- - Small fixes in the actions chapter
- - Small clarifications in the quickstart to ad blocking
- - Removed from s since the new doc CSS
- renders them red (bad in TOC).
+
+
- Revision 1.120 2002/05/23 19:16:43 roro
- Correct Debian specials (installation and startup).
- Revision 1.119 2002/05/22 17:17:05 oes
- Added Security hint
+
- Revision 1.118 2002/05/21 04:54:55 hal9
- -New Section: Quickstart to Ad Blocking
- -Reformat Actions Anatomy to match new CGI layout
+History
+
+ &history;
+
+
- Revision 1.117 2002/05/17 13:56:16 oes
- - Reworked & extended Templates chapter
- - Small changes to Regex appendix
- - #included authors.sgml into (C) and hist chapter
+Authors
+
+ &p-authors;
+
+
- Revision 1.116 2002/05/17 03:23:46 hal9
- Fixing merge conflict in Quickstart section.
+
- Revision 1.115 2002/05/16 16:25:00 oes
- Extended the Filter File chapter & minor fixes
+
- Revision 1.114 2002/05/16 09:42:50 oes
- More ulink->link, added some hints to Quickstart section
- Revision 1.113 2002/05/15 21:07:25 oes
- Extended and further commented the example actions files
+
+See Also
+
+ &seealso;
+
+
- Revision 1.112 2002/05/15 03:57:14 hal9
- Spell check. A few minor edits here and there for better syntax and
- clarification.
- Revision 1.111 2002/05/14 23:01:36 oes
- Fixing the fixes
- Revision 1.110 2002/05/14 19:10:45 oes
- Restored alphabetical order of actions
+
+Appendix
- Revision 1.109 2002/05/14 17:23:11 oes
- Renamed the prevent-*-cookies actions, extended aliases section and moved it before the example AFs
- Revision 1.108 2002/05/14 15:29:12 oes
- Completed proofreading the actions chapter
+
+
+Regular Expressions
+
+ Privoxy uses Perl-style regular
+ expressions
in its actions
+ files and filter file,
+ through the PCRE and
+
+ PCRS libraries.
+
- Revision 1.107 2002/05/12 03:20:41 hal9
- Small clarifications for 127.0.0.1 vs localhost for listen-address since this
- apparently an important distinction for some OS's.
+
+ If you are reading this, you probably don't understand what regular
+ expressions
are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+
- Revision 1.106 2002/05/10 01:48:20 hal9
- This is mostly proposed copyright/licensing additions and changes. Docs
- are still GPL, but licensing and copyright are more visible. Also, copyright
- changed in doc header comments (eliminate references to JB except FAQ).
+
+ Regular expressions provide a language to describe patterns that can be
+ run against strings of characters (letter, numbers, etc), to see if they
+ match the string or not. The patterns are themselves (sometimes complex)
+ strings of literal characters, combined with wild-cards, and other special
+ characters, called meta-characters. The meta-characters
have
+ special meanings and are used to build complex patterns to be matched against.
+ Perl Compatible Regular Expressions are an especially convenient
+ dialect
of the regular expression language.
+
- Revision 1.105 2002/05/05 20:26:02 hal9
- Sorting out license vs copyright in these docs.
+
+ To make a simple analogy, we do something similar when we use wild-card
+ characters when listing files with the dir command in DOS.
+ *.* matches all filenames. The special
+ character here is the asterisk which matches any and all characters. We can be
+ more specific and use ? to match just individual
+ characters. So dir file?.text
would match
+ file1.txt
, file2.txt
, etc. We are pattern
+ matching, using a similar technique to regular expressions
!
+
- Revision 1.104 2002/05/04 08:44:45 swa
- bumped version
+
+ Regular expressions do essentially the same thing, but are much, much more
+ powerful. There are many more special characters
and ways of
+ building complex patterns however. Let's look at a few of the common ones,
+ and then some examples:
+
- Revision 1.103 2002/05/04 00:40:53 hal9
- -Remove the TOC first page kludge. It's fixed proper now in ldp.dsl.in.
- -Some minor additions to Quickstart.
+
+
+ . - Matches any single character, e.g. a
,
+ A
, 4
, :
, or @
.
+
+
- Revision 1.102 2002/05/03 17:46:00 oes
- Further proofread & reactivated short build instructions
+
+
+ ? - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+
+
- Revision 1.101 2002/05/03 03:58:30 hal9
- Move the user-manual config directive to top of section. Add note about
- Privoxy needing read permissions for configs, and write for logs.
+
+
+ + - The preceding character or expression is matched ONE or MORE
+ times.
+
+
- Revision 1.100 2002/04/29 03:05:55 hal9
- Add clarification on differences of new actions files.
+
+
+ * - The preceding character or expression is matched ZERO or MORE
+ times.
+
+
- Revision 1.99 2002/04/28 16:59:05 swa
- more structure in starting section
+
+
+ \ - The escape
character denotes that
+ the following character should be taken literally. This is used where one of the
+ special characters (e.g. .
) needs to be taken literally and
+ not as a special meta-character. Example: example\.com
, makes
+ sure the period is recognized only as a period (and not expanded to its
+ meta-character meaning of any single character).
+
+
- Revision 1.98 2002/04/28 05:43:59 hal9
- This is the break up of configuration.html into multiple files. This
- will probably break links elsewhere :(
+
+
+ [ ] - Characters enclosed in brackets will be matched if
+ any of the enclosed characters are encountered. For instance, [0-9]
+ matches any numeric digit (zero through nine). As an example, we can combine
+ this with +
to match any digit one of more times: [0-9]+
.
+
+
- Revision 1.97 2002/04/27 21:04:42 hal9
- -Rewrite of Actions File example.
- -Add section for user-manual directive in config.
+
+
+ ( ) - parentheses are used to group a sub-expression,
+ or multiple sub-expressions.
+
+
- Revision 1.96 2002/04/27 05:32:00 hal9
- -Add short section to Filter Files to tie in with +filter action.
- -Start rewrite of examples in Actions Examples (not finished).
+
+
+ | - The bar
character works like an
+ or
conditional statement. A match is successful if the
+ sub-expression on either side of |
matches. As an example:
+ /(this|that) example/
uses grouping and the bar character
+ and would match either this example
or that
+ example
, and nothing else.
+
+
- Revision 1.95 2002/04/26 17:23:29 swa
- bookmarks cleaned, changed structure of user manual, screen and programlisting cleanups, and numerous other changes that I forgot
+
+ These are just some of the ones you are likely to use when matching URLs with
+ Privoxy , and is a long way from a definitive
+ list. This is enough to get us started with a few simple examples which may
+ be more illuminating:
+
- Revision 1.94 2002/04/26 05:24:36 hal9
- -Add most of Andreas suggestions to Chain of Events section.
- -A few other minor corrections and touch up.
+
+ /.*/banners/.* - A simple example
+ that uses the common combination of .
and *
to
+ denote any character, zero or more times. In other words, any string at all.
+ So we start with a literal forward slash, then our regular expression pattern
+ (.*
) another literal forward slash, the string
+ banners
, another forward slash, and lastly another
+ .*
. We are building
+ a directory path here. This will match any file with the path that has a
+ directory named banners
in it. The .*
matches
+ any characters, and this could conceivably be more forward slashes, so it
+ might expand into a much longer looking path. For example, this could match:
+ /eye/hate/spammers/banners/annoy_me_please.gif
, or just
+ /banners/annoying.html
, or almost an infinite number of other
+ possible combinations, just so it has banners
in the path
+ somewhere.
+
- Revision 1.92 2002/04/25 18:55:13 hal9
- More catchups on new actions files, and new actions names.
- Other assorted cleanups, and minor modifications.
+
+ And now something a little more complex:
+
- Revision 1.91 2002/04/24 02:39:31 hal9
- Add 'Chain of Events' section.
+
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ -
+ We have several literal forward slashes again (/
), so we are
+ building another expression that is a file path statement. We have another
+ .*
, so we are matching against any conceivable sub-path, just so
+ it matches our expression. The only true literal that must
+ match our pattern is adv , together with
+ the forward slashes. What comes after the adv
string is the
+ interesting part.
+
- Revision 1.90 2002/04/23 21:41:25 hal9
- Linuxconf is deprecated on RH, substitute chkconfig.
+
+ Remember the ?
means the preceding expression (either a
+ literal character or anything grouped with (...)
in this case)
+ can exist or not, since this means either zero or one match. So
+ ((er)?ts?|ertis(ing|ements?))
is optional, as are the
+ individual sub-expressions: (er)
,
+ (ing|ements?)
, and the s
. The |
+ means or
. We have two of those. For instance,
+ (ing|ements?)
, can expand to match either ing
+ OR ements?
. What is being done here, is an
+ attempt at matching as many variations of advertisement
, and
+ similar, as possible. So this would expand to match just adv
,
+ or advert
, or adverts
, or
+ advertising
, or advertisement
, or
+ advertisements
. You get the idea. But it would not match
+ advertizements
(with a z
). We could fix that by
+ changing our regular expression to:
+ /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/
, which would then match
+ either spelling.
+
- Revision 1.89 2002/04/23 21:05:28 oes
- Added hint for startup on Red Hat
+
+ /.*/advert[0-9]+\.(gif|jpe?g) - Again
+ another path statement with forward slashes. Anything in the square brackets
+ [ ]
can be matched. This is using 0-9
as a
+ shorthand expression to mean any digit one through nine. It is the same as
+ saying 0123456789
. So any digit matches. The +
+ means one or more of the preceding expression must be included. The preceding
+ expression here is what is in the square brackets -- in this case, any digit
+ one through nine. Then, at the end, we have a grouping: (gif|jpe?g)
.
+ This includes a |
, so this needs to match the expression on
+ either side of that bar character also. A simple gif
on one side, and the other
+ side will in turn match either jpeg
or jpg
,
+ since the ?
means the letter e
is optional and
+ can be matched once or not at all. So we are building an expression here to
+ match image GIF or JPEG type image file. It must include the literal
+ string advert
, then one or more digits, and a .
+ (which is now a literal, and not a special character, since it is escaped
+ with \
), and lastly either gif
, or
+ jpeg
, or jpg
. Some possible matches would
+ include: //advert1.jpg
,
+ /nasty/ads/advert1234.gif
,
+ /banners/from/hell/advert99.jpg
. It would not match
+ advert1.gif
(no leading slash), or
+ /adverts232.jpg
(the expression does not include an
+ s
), or /advert1.jsp
(jsp
is not
+ in the expression anywhere).
+
- Revision 1.88 2002/04/23 05:37:54 hal9
- Add AmigaOS install stuff.
+
+ We are barely scratching the surface of regular expressions here so that you
+ can understand the default Privoxy
+ configuration files, and maybe use this knowledge to customize your own
+ installation. There is much, much more that can be done with regular
+ expressions. Now that you know enough to get started, you can learn more on
+ your own :/
+
- Revision 1.87 2002/04/23 02:53:15 david__schmidt
- Updated Mac OS X installation section
- Added a few English tweaks here an there
+
+ More reading on Perl Compatible Regular expressions:
+ http://perldoc.perl.org/perlre.html
+
- Revision 1.86 2002/04/21 01:46:32 hal9
- Re-write actions section.
+
+ For information on regular expression based substitutions and their applications
+ in filters, please see the filter file tutorial
+ in this manual.
+
+
- Revision 1.85 2002/04/18 21:23:23 hal9
- Fix ugly typo (mine).
+
- Revision 1.84 2002/04/18 21:17:13 hal9
- Spell Redhat correctly (ie Red Hat). A few minor grammar corrections.
- Revision 1.83 2002/04/18 18:21:12 oes
- Added RPM install detail
+
+
+Privoxy's Internal Pages
- Revision 1.82 2002/04/18 12:04:50 oes
- Cosmetics
+
+ Since Privoxy proxies each requested
+ web page, it is easy for Privoxy to
+ trap certain special URLs. In this way, we can talk directly to
+ Privoxy , and see how it is
+ configured, see how our rules are being applied, change these
+ rules and other configuration options, and even turn
+ Privoxy's filtering off, all with
+ a web browser.
- Revision 1.81 2002/04/18 11:50:24 oes
- Extended Install section - needs fixing by packagers
+
- Revision 1.80 2002/04/18 10:45:19 oes
- Moved text to buildsource.sgml, renamed some filters, details
+
+ The URLs listed below are the special ones that allow direct access
+ to Privoxy . Of course,
+ Privoxy must be running to access these. If
+ not, you will get a friendly error message. Internet access is not
+ necessary either.
+
- Revision 1.79 2002/04/18 03:18:06 hal9
- Spellcheck, and minor touchups.
+
+
- Revision 1.78 2002/04/17 18:04:16 oes
- Proofreading part 2
+
+
+ Privoxy main page:
+
+
+
+ http://config.privoxy.org/
+
+
+
+ There is a shortcut: http://p.p/ (But it
+ doesn't provide a fall-back to a real page, in case the request is not
+ sent through Privoxy )
+
+
- Revision 1.77 2002/04/17 13:51:23 oes
- Proofreading, part one
+
+
+ Show information about the current configuration, including viewing and
+ editing of actions files:
+
+
+
+ http://config.privoxy.org/show-status
+
+
+
- Revision 1.76 2002/04/16 04:25:51 hal9
- -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section.
- -Note about proxy may need requests to re-read config files.
+
+
+ Show the source code version numbers:
+
+
+
+ http://config.privoxy.org/show-version
+
+
+
- Revision 1.75 2002/04/12 02:08:48 david__schmidt
- Remove OS/2 building info... it is already in the developer-manual
+
+
+ Show the browser's request headers:
+
+
+
+ http://config.privoxy.org/show-request
+
+
+
- Revision 1.74 2002/04/11 00:54:38 hal9
- Add small section on submitting actions.
+
+
+ Show which actions apply to a URL and why:
+
+
+
+ http://config.privoxy.org/show-url-info
+
+
+
- Revision 1.73 2002/04/10 18:45:15 swa
- generated
+
+
+ Toggle Privoxy on or off. This feature can be turned off/on in the main
+ config file. When toggled off
, Privoxy
+ continues to run, but only as a pass-through proxy, with no actions taking
+ place:
+
+
+
+ http://config.privoxy.org/toggle
+
+
+
+ Short cuts. Turn off, then on:
+
+
+
+ http://config.privoxy.org/toggle?set=disable
+
+
+
+
+ http://config.privoxy.org/toggle?set=enable
+
+
+
- Revision 1.72 2002/04/10 04:06:19 hal9
- Added actions feedback to Bookmarklets section
+
+
- Revision 1.71 2002/04/08 22:59:26 hal9
- Version update. Spell chkconfig correctly :)
+
- Revision 1.70 2002/04/08 20:53:56 swa
- ?
- Revision 1.69 2002/04/06 05:07:29 hal9
- -Add privoxy-man-page.sgml, for man page.
- -Add authors.sgml for AUTHORS (and p-authors.sgml)
- -Reworked various aspects of various docs.
- -Added additional comments to sub-docs.
+
+
+Chain of Events
+
+ Let's take a quick look at how some of Privoxy's
+ core features are triggered, and the ensuing sequence of events when a web
+ page is requested by your browser:
+
- Revision 1.68 2002/04/04 18:46:47 swa
- consistent look. reuse of copyright, history et. al.
+
+
+
+
+ First, your web browser requests a web page. The browser knows to send
+ the request to Privoxy , which will in turn,
+ relay the request to the remote web server after passing the following
+ tests:
+
+
+
+
+ Privoxy traps any request for its own internal CGI
+ pages (e.g http://p.p/ ) and sends the CGI page back to the browser.
+
+
+
+
+ Next, Privoxy checks to see if the URL
+ matches any +block
patterns. If
+ so, the URL is then blocked, and the remote web server will not be contacted.
+ +handle-as-image
+ and
+ +handle-as-empty-document
+ are then checked, and if there is no match, an
+ HTML BLOCKED
page is sent back to the browser. Otherwise, if
+ it does match, an image is returned for the former, and an empty text
+ document for the latter. The type of image would depend on the setting of
+ +set-image-blocker
+ (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
+
+
+
+
+ Untrusted URLs are blocked. If URLs are being added to the
+ trust file, then that is done.
+
+
+
+
+ If the URL pattern matches the +fast-redirects
action,
+ it is then processed. Unwanted parts of the requested URL are stripped.
+
+
+
+
+ Now the rest of the client browser's request headers are processed. If any
+ of these match any of the relevant actions (e.g. +hide-user-agent
,
+ etc.), headers are suppressed or forged as determined by these actions and
+ their parameters.
+
+
+
+
+ Now the web server starts sending its response back (i.e. typically a web
+ page).
+
+
+
+
+ First, the server headers are read and processed to determine, among other
+ things, the MIME type (document type) and encoding. The headers are then
+ filtered as determined by the
+ +crunch-incoming-cookies
,
+ +session-cookies-only
,
+ and +downgrade-http-version
+ actions.
+
+
+
+
+ If any +filter
action
+ or +deanimate-gifs
+ action applies (and the document type fits the action), the rest of the page is
+ read into memory (up to a configurable limit). Then the filter rules (from
+ default.filter and any other filter files) are
+ processed against the buffered content. Filters are applied in the order
+ they are specified in one of the filter files. Animated GIFs, if present,
+ are reduced to either the first or last frame, depending on the action
+ setting.The entire page, which is now filtered, is then sent by
+ Privoxy back to your browser.
+
+
+ If neither a +filter
action
+ or +deanimate-gifs
+ matches, then Privoxy passes the raw data through
+ to the client browser as it becomes available.
+
+
+
+
+ As the browser receives the now (possibly filtered) page content, it
+ reads and then requests any URLs that may be embedded within the page
+ source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
+ frames), sounds, etc. For each of these objects, the browser issues a
+ separate request (this is easily viewable in Privoxy's
+ logs). And each such request is in turn processed just as above. Note that a
+ complex web page will have many, many such embedded URLs. If these
+ secondary requests are to a different server, then quite possibly a very
+ differing set of actions is triggered.
+
+
- Revision 1.67 2002/04/04 17:27:57 swa
- more single file to be included at multiple points. make maintaining easier
+
+
+
+ NOTE: This is somewhat of a simplistic overview of what happens with each URL
+ request. For the sake of brevity and simplicity, we have focused on
+ Privoxy's core features only.
+
- Revision 1.66 2002/04/04 06:48:37 hal9
- Structural changes to allow for conditional inclusion/exclusion of content
- based on entity toggles, e.g. 'entity % p-not-stable "INCLUDE"'. And
- definition of internal entities, e.g. 'entity p-version "2.9.13"' that will
- eventually be set by Makefile.
- More boilerplate text for use across multiple docs.
+
- Revision 1.65 2002/04/03 19:52:07 swa
- enhance squid section due to user suggestion
- Revision 1.64 2002/04/03 03:53:43 hal9
- A few minor bug fixes, and touch ups. Ready for review.
+
+
+Troubleshooting: Anatomy of an Action
- Revision 1.63 2002/04/01 16:24:49 hal9
- Define entities to include boilerplate text. See doc/source/*.
+
+ The way Privoxy applies
+ actions and filters
+ to any given URL can be complex, and not always so
+ easy to understand what is happening. And sometimes we need to be able to
+ see just what Privoxy is
+ doing. Especially, if something Privoxy is doing
+ is causing us a problem inadvertently. It can be a little daunting to look at
+ the actions and filters files themselves, since they tend to be filled with
+ regular expressions whose consequences are not
+ always so obvious.
+
- Revision 1.62 2002/03/30 04:15:53 hal9
- - Fix privoxy.org/config links.
- - Paste in Bookmarklets from Toggle page.
- - Move Quickstart nearer top, and minor rework.
+
+ One quick test to see if Privoxy is causing a problem
+ or not, is to disable it temporarily. This should be the first troubleshooting
+ step (be sure to flush caches afterward!). Looking at the
+ logs is a good idea too. (Note that both the toggle feature and logging are
+ enabled via config file settings, and may need to be
+ turned on
.)
+
+
+ Another easy troubleshooting step to try is if you have done any
+ customization of your installation, revert back to the installed
+ defaults and see if that helps. There are times the developers get complaints
+ about one thing or another, and the problem is more related to a customized
+ configuration issue.
+
- Revision 1.61 2002/03/29 01:31:08 hal9
- Minor update.
+
+ Privoxy also provides the
+ http://config.privoxy.org/show-url-info
+ page that can show us very specifically how actions
+ are being applied to any given URL. This is a big help for troubleshooting.
+
- Revision 1.60 2002/03/27 01:57:34 hal9
- Added more to Anatomy section.
+
+ First, enter one URL (or partial URL) at the prompt, and then
+ Privoxy will tell us
+ how the current configuration will handle it. This will not
+ help with filtering effects (i.e. the +filter
action) from
+ one of the filter files since this is handled very
+ differently and not so easy to trap! It also will not tell you about any other
+ URLs that may be embedded within the URL you are testing. For instance, images
+ such as ads are expressed as URLs within the raw page source of HTML pages. So
+ you will only get info for the actual URL that is pasted into the prompt area
+ -- not any sub-URLs. If you want to know about embedded URLs like ads, you
+ will have to dig those out of the HTML source. Use your browser's View
+ Page Source
option for this. Or right click on the ad, and grab the
+ URL.
+
- Revision 1.59 2002/03/27 00:54:33 hal9
- Touch up intro for new name.
+
+ Let's try an example, google.com ,
+ and look at it one section at a time in a sample configuration (your real
+ configuration may vary):
+
- Revision 1.58 2002/03/26 22:29:55 swa
- we have a new homepage!
+
+
+ Matches for http://www.google.com:
- Revision 1.57 2002/03/24 20:33:30 hal9
- A few minor catch ups with name change.
+ In file: default.action [ View ] [ Edit ]
- Revision 1.56 2002/03/24 16:17:06 swa
- configure needs to be generated.
+ {+change-x-forwarded-for{block}
+ +deanimate-gifs {last}
+ +fast-redirects {check-decoded-url}
+ +filter {refresh-tags}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ +hide-from-header {block}
+ +hide-referrer {forge}
+ +session-cookies-only
+ +set-image-blocker {pattern}
+/
- Revision 1.55 2002/03/24 16:08:08 swa
- we are too lazy to make a block-built
- privoxy logo. hence removed the option.
+ { -session-cookies-only }
+ .google.com
- Revision 1.54 2002/03/24 15:46:20 swa
- name change related issue.
+ { -fast-redirects }
+ .google.com
- Revision 1.53 2002/03/24 11:51:00 swa
- name change. changed filenames.
+In file: user.action [ View ] [ Edit ]
+(no matches in this file)
+
+
- Revision 1.52 2002/03/24 11:01:06 swa
- name change
+
+ This is telling us how we have defined our
+ actions
, and
+ which ones match for our test case, google.com
.
+ Displayed is all the actions that are available to us. Remember,
+ the + sign denotes on
. -
+ denotes off
. So some are on
here, but many
+ are off
. Each example we try may provide a slightly different
+ end result, depending on our configuration directives.
+
+
+ The first listing
+ is for our default.action file. The large, multi-line
+ listing, is how the actions are set to match for all URLs, i.e. our default
+ settings. If you look at your actions
file, this would be the
+ section just below the aliases
section near the top. This
+ will apply to all URLs as signified by the single forward slash at the end
+ of the listing -- /
.
+
- Revision 1.51 2002/03/23 15:13:11 swa
- renamed every reference to the old name with foobar.
- fixed "application foobar application" tag, fixed
- "the foobar" with "foobar". left junkbustser in cvs
- comments and remarks to history untouched.
+
+ But we have defined additional actions that would be exceptions to these general
+ rules, and then we list specific URLs (or patterns) that these exceptions
+ would apply to. Last match wins. Just below this then are two explicit
+ matches for .google.com
. The first is negating our previous
+ cookie setting, which was for +session-cookies-only
+ (i.e. not persistent). So we will allow persistent cookies for google, at
+ least that is how it is in this example. The second turns
+ off any +fast-redirects
+ action, allowing this to take place unmolested. Note that there is a leading
+ dot here -- .google.com
. This will match any hosts and
+ sub-domains, in the google.com domain also, such as
+ www.google.com
or mail.google.com
. But it would not
+ match www.google.de
! So, apparently, we have these two actions
+ defined as exceptions to the general rules at the top somewhere in the lower
+ part of our default.action file, and
+ google.com
is referenced somewhere in these latter sections.
+
- Revision 1.50 2002/03/23 05:06:21 hal9
- Touch up.
+
+ Then, for our user.action file, we again have no hits.
+ So there is nothing google-specific that we might have added to our own, local
+ configuration. If there was, those actions would over-rule any actions from
+ previously processed files, such as default.action .
+ user.action typically has the last word. This is the
+ best place to put hard and fast exceptions,
+
- Revision 1.49 2002/03/21 17:01:05 hal9
- New section in Appendix.
+
+ And finally we pull it all together in the bottom section and summarize how
+ Privoxy is applying all its actions
+ to google.com
:
- Revision 1.48 2002/03/12 06:33:01 hal9
- Catching up to Andreas and re_filterfile changes.
+
- Revision 1.47 2002/03/11 13:13:27 swa
- correct feedback channels
+
+
- Revision 1.46 2002/03/10 00:51:08 hal9
- Added section on JB internal pages in Appendix.
+ Final results:
- Revision 1.45 2002/03/09 17:43:53 swa
- more distros
+ -add-header
+ -block
+ +change-x-forwarded-for{block}
+ -client-header-filter{hide-tor-exit-notation}
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs {last}
+ -downgrade-http-version
+ -fast-redirects
+ -filter {js-events}
+ -filter {content-cookies}
+ -filter {all-popups}
+ -filter {banners-by-link}
+ -filter {tiny-textforms}
+ -filter {frameset-borders}
+ -filter {demoronizer}
+ -filter {shockwave-flash}
+ -filter {quicktime-kioskmode}
+ -filter {fun}
+ -filter {crude-parental}
+ -filter {site-specifics}
+ -filter {js-annoyances}
+ -filter {html-annoyances}
+ +filter {refresh-tags}
+ -filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter {google}
+ -filter {yahoo}
+ -filter {msn}
+ -filter {blogspot}
+ -filter {no-ping}
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-from-header {block}
+ -hide-if-modified-since
+ +hide-referrer {forge}
+ -hide-user-agent
+ -limit-connect
+ -overwrite-last-modified
+ -prevent-compression
+ -redirect
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+ -session-cookies-only
+ +set-image-blocker {pattern}
+
- Revision 1.44 2002/03/09 17:08:48 hal9
- New section on Jon's actions file editor, and move some stuff around.
+
+ Notice the only difference here to the previous listing, is to
+ fast-redirects
and session-cookies-only
,
+ which are activated specifically for this site in our configuration,
+ and thus show in the Final Results
.
+
- Revision 1.43 2002/03/08 00:47:32 hal9
- Added imageblock{pattern}.
+
+ Now another example, ad.doubleclick.net
:
+
- Revision 1.42 2002/03/07 18:16:55 swa
- looks better
+
+
- Revision 1.41 2002/03/07 16:46:43 hal9
- Fix a few markup problems for jade.
+ { +block{Domains starts with "ad"} }
+ ad*.
- Revision 1.40 2002/03/07 16:28:39 swa
- provide correct feedback channels
+ { +block{Domain contains "ad"} }
+ .ad.
- Revision 1.39 2002/03/06 16:19:28 hal9
- Note on perceived filtering slowdown per FR.
+ { +block{Doubleclick banner server} +handle-as-image }
+ .[a-vx-z]*.doubleclick.net
+
+
- Revision 1.38 2002/03/05 23:55:14 hal9
- Stupid I did it again. Double hyphen in comment breaks jade.
+
+ We'll just show the interesting part here - the explicit matches. It is
+ matched three different times. Two +block{}
sections,
+ and a +block{} +handle-as-image
,
+ which is the expanded form of one of our aliases that had been defined as:
+ +block-as-image
. (Aliases
are defined in
+ the first section of the actions file and typically used to combine more
+ than one action.)
+
- Revision 1.37 2002/03/05 23:53:49 hal9
- jade barfs on '- -' embedded in comments. - -user option broke it.
+
+ Any one of these would have done the trick and blocked this as an unwanted
+ image. This is unnecessarily redundant since the last case effectively
+ would also cover the first. No point in taking chances with these guys
+ though ;-) Note that if you want an ad or obnoxious
+ URL to be invisible, it should be defined as ad.doubleclick.net
+ is done here -- as both a +block{}
+ and an
+ +handle-as-image
.
+ The custom alias +block-as-image
just
+ simplifies the process and make it more readable.
+
- Revision 1.36 2002/03/05 22:53:28 hal9
- Add new - - user option.
+
+ One last example. Let's try http://www.example.net/adsl/HOWTO/
.
+ This one is giving us problems. We are getting a blank page. Hmmm ...
+
- Revision 1.35 2002/03/05 00:17:27 hal9
- Added section on command line options.
+
+
- Revision 1.34 2002/03/04 19:32:07 oes
- Changed default port to 8118
+ Matches for http://www.example.net/adsl/HOWTO/:
- Revision 1.33 2002/03/03 19:46:13 hal9
- Emphasis on where/how to report bugs, etc
+ In file: default.action [ View ] [ Edit ]
- Revision 1.32 2002/03/03 09:26:06 joergs
- AmigaOS changes, config is now loaded from PROGDIR: instead of
- AmiTCP:db/junkbuster/ if no configuration file is specified on the
- command line.
+ {-add-header
+ -block
+ +change-x-forwarded-for{block}
+ -client-header-filter{hide-tor-exit-notation}
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs
+ -downgrade-http-version
+ +fast-redirects {check-decoded-url}
+ -filter {js-events}
+ -filter {content-cookies}
+ -filter {all-popups}
+ -filter {banners-by-link}
+ -filter {tiny-textforms}
+ -filter {frameset-borders}
+ -filter {demoronizer}
+ -filter {shockwave-flash}
+ -filter {quicktime-kioskmode}
+ -filter {fun}
+ -filter {crude-parental}
+ -filter {site-specifics}
+ -filter {js-annoyances}
+ -filter {html-annoyances}
+ +filter {refresh-tags}
+ -filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter {google}
+ -filter {yahoo}
+ -filter {msn}
+ -filter {blogspot}
+ -filter {no-ping}
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-from-header{block}
+ +hide-referer{forge}
+ -hide-user-agent
+ -overwrite-last-modified
+ +prevent-compression
+ -redirect
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+ +session-cookies-only
+ +set-image-blocker{blank} }
+ /
- Revision 1.31 2002/03/02 22:45:52 david__schmidt
- Just tweaking
+ { +block{Path contains "ads".} +handle-as-image }
+ /ads
+
+
- Revision 1.30 2002/03/02 22:00:14 hal9
- Updated 'New Features' list. Ran through spell-checker.
+
+ Ooops, the /adsl/
is matching /ads
in our
+ configuration! But we did not want this at all! Now we see why we get the
+ blank page. It is actually triggering two different actions here, and
+ the effects are aggregated so that the URL is blocked, and &my-app; is told
+ to treat the block as if it were an image. But this is, of course, all wrong.
+ We could now add a new action below this (or better in our own
+ user.action file) that explicitly
+ un blocks (
+ {-block}
) paths with
+ adsl
in them (remember, last match in the configuration
+ wins). There are various ways to handle such exceptions. Example:
+
- Revision 1.29 2002/03/02 20:34:07 david__schmidt
- Update OS/2 build section
+
+
- Revision 1.28 2002/02/24 14:34:24 jongfoster
- Formatting changes. Now changing the doctype to DocBook XML 4.1
- will work - no other changes are needed.
+ { -block }
+ /adsl
+
+
- Revision 1.27 2002/01/11 14:14:32 hal9
- Added a very short section on Templates
+
+ Now the page displays ;-)
+ Remember to flush your browser's caches when making these kinds of changes to
+ your configuration to insure that you get a freshly delivered page! Or, try
+ using Shift+Reload .
+
- Revision 1.26 2002/01/09 20:02:50 hal9
- Fix bug re: auto-detect config file changes.
+
+ But now what about a situation where we get no explicit matches like
+ we did with:
+
- Revision 1.25 2002/01/09 18:20:30 hal9
- Touch ups for *.action files.
+
+
- Revision 1.24 2001/12/02 01:13:42 hal9
- Fix typo.
+ { +block{Path starts with "ads".} +handle-as-image }
+ /ads
+
+
- Revision 1.23 2001/12/02 00:20:41 hal9
- Updates for recent changes.
+
+ That actually was very helpful and pointed us quickly to where the problem
+ was. If you don't get this kind of match, then it means one of the default
+ rules in the first section of default.action is causing
+ the problem. This would require some guesswork, and maybe a little trial and
+ error to isolate the offending rule. One likely cause would be one of the
+ +filter
actions.
+ These tend to be harder to troubleshoot.
+ Try adding the URL for the site to one of aliases that turn off
+ +filter
:
+
- Revision 1.22 2001/11/05 23:57:51 hal9
- Minor update for startup now daemon mode.
+
+
- Revision 1.21 2001/10/31 21:11:03 hal9
- Correct 2 minor errors
+ { shop }
+ .quietpc.com
+ .worldpay.com # for quietpc.com
+ .jungle.com
+ .scan.co.uk
+ .forbes.com
+
+
- Revision 1.18 2001/10/24 18:45:26 hal9
- *** empty log message ***
+
+ { shop }
is an alias
that expands to
+ { -filter -session-cookies-only }
.
+ Or you could do your own exception to negate filtering:
- Revision 1.17 2001/10/24 17:10:55 hal9
- Catching up with Jon's recent work, and a few other things.
+
- Revision 1.16 2001/10/21 17:19:21 swa
- wrong url in documentation
+
+
- Revision 1.15 2001/10/14 23:46:24 hal9
- Various minor changes. Fleshed out SEE ALSO section.
+ { -filter }
+ # Disable ALL filter actions for sites in this section
+ .forbes.com
+ developer.ibm.com
+ localhost
+
+
- Revision 1.13 2001/10/10 17:28:33 hal9
- Very minor changes.
+
+ This would turn off all filtering for these sites. This is best
+ put in user.action , for local site
+ exceptions. Note that when a simple domain pattern is used by itself (without
+ the subsequent path portion), all sub-pages within that domain are included
+ automatically in the scope of the action.
+
- Revision 1.12 2001/09/28 02:57:04 hal9
- Ditto :/
+
+ Images that are inexplicably being blocked, may well be hitting the
++filter{banners-by-size}
+ rule, which assumes
+ that images of certain sizes are ad banners (works well
+ most of the time since these tend to be standardized).
+
- Revision 1.11 2001/09/28 02:25:20 hal9
- Ditto.
+
+ { fragile }
is an alias that disables most
+ actions that are the most likely to cause trouble. This can be used as a
+ last resort for problem sites.
+
+
+
- Revision 1.9 2001/09/27 23:50:29 hal9
- A few changes. A short section on regular expression in appendix.
+ { fragile }
+ # Handle with care: easy to break
+ mail.google.
+ mybank.example.com
+
- Revision 1.8 2001/09/25 00:34:59 hal9
- Some additions, and re-arranging.
- Revision 1.7 2001/09/24 14:31:36 hal9
- Diddling.
+
+ Remember to flush caches! Note that the
+ mail.google reference lacks the TLD portion (e.g.
+ .com
). This will effectively match any TLD with
+ google in it, such as mail.google.de. ,
+ just as an example.
+
+
+ If this still does not work, you will have to go through the remaining
+ actions one by one to find which one(s) is causing the problem.
+
- Revision 1.6 2001/09/24 14:10:32 hal9
- Including David's OS/2 installation instructions.
+
- Revision 1.2 2001/09/13 15:27:40 swa
- cosmetics
+
- Revision 1.1 2001/09/12 15:36:41 swa
- source files for junkbuster documentation
+