X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=e0ec372ea09cc071f39ee2435df0b22b34904720;hp=68673ad8fc7390fd6c75276359f24b9f10c42a22;hb=305ebc18d945e01bd76a2ac36529489e9f650414;hpb=4ca1b3964ffd8d2ac27cf2a9f4de9bb7ac67259e diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index 68673ad8..e0ec372e 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -9,13 +9,15 @@ + - - + + + - - + + @@ -34,9 +36,9 @@ This file belongs into ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/ - $Id: user-manual.sgml,v 2.145 2011/12/26 17:04:19 fabiankeil Exp $ + $Id: user-manual.sgml,v 2.205 2016/03/17 10:42:54 fabiankeil Exp $ - Copyright (C) 2001-2011 Privoxy Developers http://www.privoxy.org/ + Copyright (C) 2001-2014 Privoxy Developers http://www.privoxy.org/ See LICENSE. ======================================================================== @@ -55,12 +57,12 @@ - Copyright &my-copy; 2001-2011 by + Copyright &my-copy; 2001-2014 by Privoxy Developers -$Id: user-manual.sgml,v 2.145 2011/12/26 17:04:19 fabiankeil Exp $ +$Id: user-manual.sgml,v 2.205 2016/03/17 10:42:54 fabiankeil Exp $ - -Red Hat and Fedora RPMs - - - RPMs can be installed with rpm -Uvh privoxy-&p-version;-1.rpm, - and will use /etc/privoxy for the location - of configuration files. - - - - Note that on Red Hat, Privoxy will - not be automatically started on system boot. You will - need to enable that using chkconfig, - ntsysv, or similar methods. - - - - If you have problems with failed dependencies, try rebuilding the SRC RPM: - rpm --rebuild privoxy-&p-version;-1.src.rpm. This - will use your locally installed libraries and RPM version. - - - - Also note that if you have a Junkbuster RPM installed - on your system, you need to remove it first, because the packages conflict. - Otherwise, RPM will try to remove Junkbuster - automatically if found, before installing Privoxy. - - - Debian and Ubuntu @@ -262,16 +234,6 @@ How to install the binary packages depends on your operating system: - -Solaris <!--, NetBSD, HP-UX--> - - - Create a new directory, cd to it, then unzip and - untar the archive. For the most part, you'll have to figure out where - things go. - - - OS/2 @@ -301,72 +263,83 @@ How to install the binary packages depends on your operating system: Mac OS X - Unzip the downloaded file (you can either double-click on the zip file - icon from the Finder, or from the desktop if you downloaded it there). - Then, double-click on the package installer icon and follow the - installation process. + Installation instructions for the OS X platform depend upon whether + you downloaded a ready-built installation package (.pkg or .mpkg) or have + downloaded the source code. + + +Installation from ready-built package - The privoxy service will automatically start after a successful - installation (in addition to every time your computer starts up). To - prevent the privoxy service from automatically starting when your - computer starts up, remove or rename the folder named - /Library/StartupItems/Privoxy. + The downloaded file will either be a .pkg (for OS X 10.5 upwards) or a bzipped + .mpkg file (for OS X 10.4). The former can be double-clicked as is and the + installation will start; double-clicking the latter will unzip the .mpkg file + which can then be double-clicked to commence the installation. - To manually start or stop the privoxy service, use the Privoxy Utility - for Mac OS X. This application controls the privoxy service (e.g. - starting and stopping the service as well as uninstalling the software). + The privoxy service will automatically start after a successful installation + (and thereafter every time your computer starts up) however you will need to + configure your web browser(s) to use it. To do so, configure them to use a + proxy for HTTP and HTTPS at the address 127.0.0.1:8118. + + + To prevent the privoxy service from automatically starting when your computer + starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist + (on OS X 10.5 and higher) or the folder named + /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger'). - - - -AmigaOS - Copy and then unpack the lha archive to a suitable location. - All necessary files will be installed into Privoxy - directory, including all configuration and log files. To uninstall, just - remove this directory. + To manually start or stop the privoxy service, use the scripts startPrivoxy.sh + and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an + administrator account, using sudo. + + + To uninstall, run /Applications/Privoxy/uninstall.command as sudo from an + administrator account. - - -FreeBSD - + +Installation from source - Privoxy is part of FreeBSD's Ports Collection, you can build and install - it with cd /usr/ports/www/privoxy; make install clean. + To build and install the Privoxy source code on OS X you will need to obtain + the macsetup module from the Privoxy Sourceforge CVS repository (refer to + Sourceforge help for details of how to set up a CVS client to have read-only + access to the repository). This module contains scripts that leverage the usual + open-source tools (available as part of Apple's free of charge Xcode + distribution or via the usual open-source software package managers for OS X + (MacPorts, Homebrew, Fink etc.) to build and then install the privoxy binary + and associated files. The macsetup module's README file contains complete + instructions for its use. - If you don't use the ports, you can fetch and install - the package with pkg_add -r privoxy. + The privoxy service will automatically start after a successful installation + (and thereafter every time your computer starts up) however you will need to + configure your web browser(s) to use it. To do so, configure them to use a + proxy for HTTP and HTTPS at the address 127.0.0.1:8118. - The port skeleton and the package can also be downloaded from the - File Release - Page, but there's no reason to use them unless you're interested in the - beta releases which are only available there. + To prevent the privoxy service from automatically starting when your computer + starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist + (on OS X 10.5 and higher) or the folder named + /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger'). - - - -Gentoo - Gentoo source packages (Ebuilds) for Privoxy are - contained in the Gentoo Portage Tree (they are not on the download page, - but there is a Gentoo section, where you can see when a new - Privoxy Version is added to the Portage Tree). + To manually start or stop the privoxy service, use the Privoxy Utility + for Mac OS X (also part of the macsetup module). This application can start + and stop the privoxy service and display its log and configuration files. - Before installing Privoxy under Gentoo just do - first emerge --sync to get the latest changes from the - Portage tree. With emerge privoxy you install the latest - version. + To uninstall, run the macsetup module's uninstall.sh as sudo from an + administrator account. + + + +FreeBSD + - Configuration files are in /etc/privoxy, the - documentation is in /usr/share/doc/privoxy-&p-version; - and the Log directory is in /var/log/privoxy. + Privoxy is part of FreeBSD's Ports Collection, you can build and install + it with cd /usr/ports/www/privoxy; make install clean. @@ -402,13 +375,6 @@ How to install the binary packages depends on your operating system: Keeping your Installation Up-to-Date - - As user feedback comes in and development continues, we will make updated versions - of both the main actions file (as a separate - package) and the software itself (including the actions file) available for - download. - If you wish to receive an email notification whenever we release updates of @@ -436,5677 +402,2196 @@ How to install the binary packages depends on your operating system: What's New in this Release + +&changelog; + + + + +Note to Upgraders + - Privoxy 3.0.19 is a stable release. - The changes since 3.0.18 stable are: + A quick list of things to be aware of before upgrading from earlier + versions of Privoxy: - + + + + The recommended way to upgrade &my-app; is to backup your old + configuration files, install the new ones, verify that &my-app; + is working correctly and finally merge back your changes using + diff and maybe patch. + + + There are a number of new features in each &my-app; release and + most of them have to be explicitly enabled in the configuration + files. Old configuration files obviously don't do that and due + to syntax changes using old configuration files with a new + &my-app; isn't always possible anyway. + + + + + Note that some installers remove earlier versions completely, + including configuration files, therefore you should really save + any important configuration files! + + + + + On the other hand, other installers don't overwrite existing configuration + files, thinking you will want to do that yourself. + + + + + In the default configuration only fatal errors are logged now. + You can change that in the debug section + of the configuration file. You may also want to enable more verbose + logging until you verified that the new &my-app; version is working + as expected. + + + + + + Three other config file settings are now off by default: + enable-remote-toggle, + enable-remote-http-toggle, + and enable-edit-actions. + If you use or want these, you will need to explicitly enable them, and + be aware of the security issues involved. + + + + + + - - The following changes were made between 3.0.17 and 3.0.18: - + + + +Quickstart to Using Privoxy - - - Bug fixes: - - - - If a generated redirect URL contains characters RFC 3986 doesn't - permit, they are (re)encoded. Not doing this makes Privoxy versions - from 3.0.5 to 3.0.17 susceptible to HTTP response splitting (CWE-113) - attacks if the +fast-redirects{check-decoded-url} action is used. - - - - - Fix a logic bug that could cause Privoxy to reuse a server - socket after it got tainted by a server-header-tagger-induced - block that was triggered before the whole server response had - been read. If keep-alive was enabled and the request following - the blocked one was to the same host and using the same forwarding - settings, Privoxy would send it on the tainted server socket. - While the server would simply treat it as a pipelined request, - Privoxy would later on fail to properly parse the server's - response as it would try to parse the unread data from the - first response as server headers for the second one. - Regression introduced in 3.0.17. - - - - - When implying keep-alive in client_connection(), remember that - the client didn't. Fixes a regression introduced in 3.0.13 that - would cause Privoxy to wait for additional client requests after - receiving a HTTP/1.1 request with "Connection: close" set - and connection sharing enabled. - With clients which terminates the client connection after detecting - that the whole body has been received it doesn't really matter, - but with clients that don't the connection would be kept open until - it timed out. - - - - - Fix a subtle race condition between prepare_csp_for_next_request() - and sweep(). A thread preparing itself for the next client request - could briefly appear to be inactive. - If all other threads were already using more recent files, - the thread could get its files swept away under its feet. - So far this has only been reproduced while stress testing in - valgrind while touching action files in a loop. It's unlikely - to have caused any actual problems in the real world. - - - - - Disable filters if SDCH compression is used unless filtering is forced. - If SDCH was combined with a supported compression algorithm, Privoxy - previously could try to decompress it and ditch the Content-Encoding - header even though the SDCH compression wasn't dealt with. - Reported by zebul666 in #3225863. - - - - - Make a copy of the --user value and only mess with that when splitting - user and group. On some operating systems modifying the value directly - is reflected in the output of ps and friends and can be misleading. - Reported by zepard in #3292710. - - - - - If forwarded-connect-retries is set, only retry if Privoxy is actually - forwarding the request. Previously direct connections would be retried - as well. - - - - - Fixed a small memory leak when retrying connections with IPv6 - support enabled. - - - - - Remove an incorrect assertion in compile_dynamic_pcrs_job_list() - It could be triggered by a pcrs job with an invalid pcre - pattern (for example one that contains a lone quantifier). - - - - - If the --user argument user[.group] contains a dot, always bail out - if no group has been specified. Previously the intended, but undocumented - (and apparently untested), behaviour was to try interpreting the whole - argument as user name, but the detection was flawed and checked for '0' - instead of '\0', thus merely preventing group names beginning with a zero. - - - - - In html_code_map[], use a numeric character reference instead of ' - which wasn't standardized before XHTML 1.0. - - - - - Fix an invalid free when compiled with FEATURE_GRACEFUL_TERMINATION - and shut down through http://config.privoxy.org/die - - - - - In get_actions(), fix the "temporary" backwards compatibility hack - to accept block actions without reason. - It also covered other actions that should be rejected as invalid. - Reported by Billy Crook. - - - - - - - - General improvements: - - - - Privoxy can (re)compress buffered content before delivering - it to the client. Disabled by default as most users wouldn't - benefit from it. - - - - - The +fast-redirects{check-decoded-url} action checks URL - segments separately. If there are other parameters behind - the redirect URL, this makes it unnecessary to cut them off - by additionally using a +redirect{} pcrs command. - Initial patch submitted by Jamie Zawinski in #3429848. - - - - - When loading action sections, verify that the referenced filters - exist. Currently missing filters only result in an error message, - but eventually the severity will be upgraded to fatal. - - - - - Allow to bind to multiple separate addresses. - Patch set submitted by Petr Pisar in #3354485. - - - - - Set socket_error to errno if connecting fails in rfc2553_connect_to(). - Previously rejected direct connections could be incorrectly reported - as DNS issues if Privoxy was compiled with IPv6 support. - - - - - Adjust url_code_map[] so spaces are replaced with %20 instead of '+' - While '+' can be used by client's submitting form data, this is not - actually what Privoxy is using the lookups for. This is more of a - cosmetic issue and doesn't fix any known problems. - - - - - When compiled without FEATURE_FAST_REDIRECTS, do not silently - ignore +fast-redirect{} directives - - - - - Added a workaround for GNU libc's strptime() reporting negative - year values when the parsed year is only specified with two digits. - On affected systems cookies with such a date would not be turned - into session cookies by the +session-cookies-only action. - Reported by Vaeinoe in #3403560 - - - - - Fixed bind failures with certain GNU libc versions if no non-loopback - IP address has been configured on the system. This is mainly an issue - if the system is using DHCP and Privoxy is started before the network - is completely configured. - Reported by Raphael Marichez in #3349356. - Additional insight from Petr Pisar. - - - - - Privoxy log messages now use the ISO 8601 date format %Y-%m-%d. - It's only slightly longer than the old format, but contains - the full date including the year and allows sorting by date - (when grepping in multiple log files) without hassle. - - - - - In get_last_url(), do not bother trying to decode URLs that do - not contain at least one '%' sign. It reduces the log noise and - a number of unnecessary memory allocations. - - - - - In case of SOCKS5 failures, dump the socks response in the log message. - - - - - Simplify the signal setup in main(). - - - - - Streamline socks5_connect() slightly. - - - - - In socks5_connect(), require a complete socks response from the server. - Previously Privoxy didn't care how much data the server response - contained as long as the first two bytes contained the expected - values. While at it, shrink the buffer size so Privoxy can't read - more than a whole socks response. - - - - - In chat(), do not bother to generate a client request in case of - direct CONNECT requests. It will not be used anyway. - - - - - Reduce server_last_modified()'s stack size. - - - - - Shorten get_http_time() by using strftime(). - - - - - Constify the known_http_methods pointers in unknown_method(). - - - - - Constify the time_formats pointers in parse_header_time(). - - - - - Constify the formerly_valid_actions pointers in action_used_to_be_valid(). - - - - - Introduce a GNUMakefile MAN_PAGE variable that defaults to privoxy.1. - The Debian package uses section 8 for the man page and this - should simplify the patch. - - - - - Deduplicate the INADDR_NONE definition for Solaris by moving it to jbsockets.h - - - - - In block_url(), ditch the obsolete workaround for ancient Netscape versions - that supposedly couldn't properly deal with status code 403. - - - - - Remove a useless NULL pointer check in load_trustfile(). - - - - - Remove two useless NULL pointer checks in load_one_re_filterfile(). - - - - - Change url_code_map[] from an array of pointers to an array of arrays - It removes an unnecessary layer of indirection and on 64bit system reduces - the size of the binary a bit. - - - - - Fix various typos. Fixes taken from Debian's 29_typos.dpatch by Roland Rosenfeld. - - - - - Add a dok-tidy GNUMakefile target to clean up the messy HTML - generated by the other dok targets. - - - - - GNUisms in the GNUMakefile have been removed. - - - - - Change the HTTP version in static responses to 1.1 - - - - - Synced config.sub and config.guess with upstream - 2011-11-11/386c7218162c145f5f9e1ff7f558a3fbb66c37c5. - - - - - Add a dedicated function to parse the values of toggles. Reduces duplicated - code in load_config() and provides better error handling. Invalid or missing - toggle values are now a fatal error instead of being silently ignored. - - - - - Terminate HTML lines in static error messages with \n instead of \r\n. - - - - - Simplify cgi_error_unknown() a bit. - - - - - In LogPutString(), don't bother looking at pszText when not - actually logging anything. - - - - - Change ssplit()'s fourth parameter from int to size_t. - Fixes a clang complaint. - - - - - Add a warning that the statistics currently can't be trusted. - Mention Privoxy-Log-Parser's --statistics option as - an alternative for the time being. - - - - - In rfc2553_connect_to(), start setting cgi->error_message on error. - - - - - Change the expected status code returned for http://p.p/die depending - on whether or not FEATURE_GRACEFUL_TERMINATION is available. - - - - - In cgi_die(), mark the client connection for closing. - If the client will fetch the style sheet through another connection - it gets the main thread out of the accept() state and should thus - trigger the actual shutdown. - - - - - Add a proper CGI message for cgi_die(). - - - - - Don't enforce a logical line length limit in read_config_line(). - - - - - Slightly refactor server_last_modified() to remove useless gmtime*() calls. - - - - - In get_content_type(), also recognize '.jpeg' as JPEG extension. - - - - - Add '.png' to the list of recognized file extensions in get_content_type(). - - - - - In block_url(), consistently use the block reason "Request blocked by Privoxy" - In two places the reason was "Request for blocked URL" which hides the - fact that the request got blocked by Privoxy and isn't necessarily - correct as the block may be due to tags. - - - - - In listen_loop(), reload the configuration files after accepting - a new connection instead of before. - Previously the first connection that arrived after a configuration - change would still be handled with the old configuration. - - - - - In chat()'s receive-data loop, skip a client socket check if - the socket will be written to right away anyway. This can - increase the transfer speed for unfiltered content on fast - network connections. - - - - - The socket timeout is used for SOCKS negotiations as well which - previously couldn't timeout. - - - - - Don't keep the client connection alive if any configuration file - changed since the time the connection came in. This is closer to - Privoxy's behaviour before keep-alive support for client connection - has been added and also less confusing in general. - - - - - Treat all Content-Type header values containing the pattern - 'script' as a sign of text. Reported by pribog in #3134970. - - - - - - - - Action file improvements: - - - - Moved the site-specific block pattern section below the one for the - generic patterns so for requests that are matched in both, the block - reason for the domain is shown which is usually more useful than showing - the one for the generic pattern. - - - - - Remove -prevent-compression from the fragile alias. It's no longer - used anywhere by default and isn't known to break stuff anyway. - - - - - Add a (disabled) section to block various Facebook tracking URLs. - Reported by Dan Stahlke in #3421764. - - - - - Add a (disabled) section to rewrite and redirect click-tracking - URLs used on news.google.com. - Reported by Dan Stahlke in #3421755. - - - - - Unblock linuxcounter.net/. - Reported by Dan Stahlke in #3422612. - - - - - Block 'www91.intel.com/' which is used by Omniture. - Reported by Adam Piggott in #3167370. - - - - - Disable the handle-as-empty-doc-returns-ok option and mark it as deprecated. - Reminded by tceverling in #2790091. - - - - - Add ".ivwbox.de/" to the "Cross-site user tracking" section. - Reported by Nettozahler in #3172525. - - - - - Unblock and fast-redirect ".awin1.com/.*=http://". - Reported by Adam Piggott in #3170921. - - - - - Block "b.collective-media.net/". - - - - - Widen the Debian popcon exception to "qa.debian.org/popcon". - Seen in Debian's 05_default_action.dpatch by Roland Rosenfeld. - - - - - Block ".gemius.pl/" which only seems to be used for user tracking. - Reported by johnd16 in #3002731. Additional input from Lee and movax. - - - - - Disable banners-by-size filters for '.thinkgeek.com/'. - The filter only seems to catch pictures of the inventory. - - - - - Block requests for 'go.idmnet.bbelements.com/please/showit/'. - Reported by kacperdominik in #3372959. - - - - - Unblock adainitiative.org/. - - - - - Add a fast-redirects exception for '.googleusercontent.com/.*=cache'. - - - - - Add a fast-redirects exception for webcache.googleusercontent.com/. - - - - - Unblock http://adassier.wordpress.com/ and http://adassier.files.wordpress.com/. - - - - - - - - Filter file improvements: - - - - Let the yahoo filter hide '.ads'. - - - - - Let the msn filter hide overlay ads for Facebook 'likes' in search - results and elements with the id 's_notf_div'. They only seem to be - used to advertise site 'enhancements'. - - - - - Let the js-events filter additionally disarm setInterval(). - Suggested by dg1727 in #3423775. - - - - - - - - Documentation improvements: - - - - Clarify the effect of compiling Privoxy with zlib support. - Suggested by dg1727 in #3423782. - - - - - Point out that the SourceForge messaging system works like a black - hole and should thus not be used to contact individual developers. - - - - - Mention some of the problems one can experience when not explicitly - configuring an IP addresses as listen address. - - - - - Explicitly mention that hostnames can be used instead of IP addresses - for the listen-address, that only the first address returned will be - used and what happens if the address is invalid. - Requested by Calestyo in #3302213. - - - - - - - - Log message improvements: - - - - If only the server connection is kept alive, do not pretend to - wait for a new client request. - - - - - Remove a superfluous log message in forget_connection(). - - - - - In chat(), properly report missing server responses as such - instead of calling them empty. - - - - - In forwarded_connect(), fix a log message nobody should ever see. - - - - - Fix a log message in socks5_connect(), a failed write operation - was logged as failed read operation. - - - - - Let load_one_actions_file() properly complain about a missing - '{' at the beginning of the file. - Simply stating that a line is invalid isn't particularly helpful. - - - - - Do not claim to listen on a socket until Privoxy actually does. - Patch submitted by Petr Pisar #3354485 - - - - - Prevent a duplicated LOG_LEVEL_CLF message when sending out - the "no-server-data" response. - - - - - Also log the client socket when dropping a connection. - - - - - Include the destination host in the 'Request ... marked for - blocking. limit-connect{...} doesn't allow CONNECT ...' message - Patch submitted by Saperski in #3296250. - - - - - Prevent a duplicated log message if none of the resolved IP - addresses were reachable. - - - - - In connect_to(), do not pretend to retry if forwarded-connect-retries - is zero or unset. - - - - - When a specified user or group can't be found, put the name in - single-quotes when logging it. - - - - - In rfc2553_connect_to(), explain getnameinfo() errors better. - - - - - Remove a useless log message in chat(). - - - - - When retrying to connect, also log the maximum number of connection - attempts. - - - - - Rephrase a log message in compile_dynamic_pcrs_job_list(). - Divide the error code and its meaning with a colon. Call the pcrs - job dynamic and not the filter. Filters may contain dynamic and - non-dynamic pcrs jobs at the same time. Only mention the name of - the filter or tagger, but don't claim it's a filter when it could - be a tagger. - - - - - In a fatal error message in load_one_actions_file(), cover both - URL and TAG patterns. - - - - - In pcrs_strerror(), properly report unknown positive error code - values as such. Previously they were handled like 0 (no error). - - - - - In compile_dynamic_pcrs_job_list(), also log the actual error code as - pcrs_strerror() doesn't handle all errors reported by pcre. - - - - - Don't bother trying to continue chatting if the client didn't ask for it. - Reduces log noise a bit. - - - - - Make two fatal error message in load_one_actions_file() more descriptive. - - - - - In cgi_send_user_manual(), log when rejecting a file name due to '/' or '..'. - - - - - In load_file(), log a message if opening a file failed. - The CGI error message alone isn't too helpful. - - - - - In connection_destination_matches(), improve two log messages - to help understand why the destinations don't match. - - - - - Rephrase a log message in serve(). Client request arrival - should be differentiated from closed client connections now. - - - - - In serve(), log if a client connection isn't reused due to a - configuration file change. - - - - - Let mark_server_socket_tainted() always mark the server socket tainted, - just don't talk about it in cases where it has no effect. It doesn't change - Privoxy's behaviour, but makes understanding the log file easier. - - - - - - - - configure: - - - - Added a --disable-ipv6-support switch for platforms where support - is detected but doesn't actually work. - - - - - Do not check for the existence of strerror() and memmove() twice - - - - - Remove a useless test for setpgrp(2). Privoxy doesn't need it and - it can cause problems when cross-compiling. - - - - - Rename the --disable-acl-files switch to --disable-acl-support. - Since about 2001, ACL directives are specified in the standard - config file. - - - - - Update the URL of the 'Removing outdated PCRE version after the - next stable release' posting. The old URL stopped working after - one of SF's recent site "optimizations". Reported by Han Liu. - - - - - - - - Privoxy-Regression-Test: - - - - Added --shuffle-tests option to increase the chances of detection race conditions. - - - - - Added a --local-test-file option that allows to use Privoxy-Regression-Test without Privoxy. - - - - - Added tests for missing socks4 and socks4a forwarders. - - - - - The --privoxy-address option now works with IPv6 addresses containing brackets, too. - - - - - Perform limited sanity checks for parameters that are supposed to have numerical values. - - - - - Added a --sleep-time option to specify a number of seconds to - sleep between tests, defaults to 0. - - - - - Disable the range-requests tagger for tests that break if it's enabled. - - - - - Log messages use the ISO 8601 date format %Y-%m-%d. - - - - - Fix spelling in two error messages. - - - - - In the --help output, include a list of supported tests and their default levels. - - - - - Adjust the tests to properly deal with FEATURE_TOGGLE being disabled. - - - - - - - - Privoxy-Log-Parser: - - - - Perform limited sanity checks for command line parameters that - are supposed to have numerical values. - - - - - Implement a --unbreak-lines-only option to try to revert MUA breakage. - - - - - Accept and highlight: Added header: Content-Encoding: deflate - - - - - Accept and highlight: Compressed content from 29258 to 8630 bytes. - - - - - Accept and highlight: Client request arrived in time on socket 21. - - - - - Highlight: Didn't receive data in time: a.fsdn.com:443 - - - - - Accept log messages with ISO 8601 time stamps, too. - - - - - - - - uagen: - - - - Bump generated Firefox version to 8.0. - - - - - Only randomize the release date if the new --randomize-release-date - option is enabled. Firefox versions after 4 use a fixed date string - without meaning. - - - - - - - - - - - - -Note to Upgraders - - - A quick list of things to be aware of before upgrading from earlier - versions of Privoxy: - - - - - - - - The recommended way to upgrade &my-app; is to backup your old - configuration files, install the new ones, verify that &my-app; - is working correctly and finally merge back your changes using - diff and maybe patch. - - - There are a number of new features in each &my-app; release and - most of them have to be explicitly enabled in the configuration - files. Old configuration files obviously don't do that and due - to syntax changes using old configuration files with a new - &my-app; isn't always possible anyway. - - - - - Note that some installers remove earlier versions completely, - including configuration files, therefore you should really save - any important configuration files! - - - - - On the other hand, other installers don't overwrite existing configuration - files, thinking you will want to do that yourself. - - - - - standard.action has been merged into - the default.action file. - - - - - In the default configuration only fatal errors are logged now. - You can change that in the debug section - of the configuration file. You may also want to enable more verbose - logging until you verified that the new &my-app; version is working - as expected. - - - - - - Three other config file settings are now off by default: - enable-remote-toggle, - enable-remote-http-toggle, - and enable-edit-actions. - If you use or want these, you will need to explicitly enable them, and - be aware of the security issues involved. - - - - - - - - - - - - - -Quickstart to Using Privoxy - - - - - - Install Privoxy. See the Installation Section below for platform specific - information. - - - - - - Advanced users and those who want to offer Privoxy - service to more than just their local machine should check the main config file, especially the security-relevant options. These are - off by default. - - - - - - Start Privoxy, if the installation program has - not done this already (may vary according to platform). See the section - Starting Privoxy. - - - - - - Set your browser to use Privoxy as HTTP and - HTTPS (SSL) proxy - by setting the proxy configuration for address of - 127.0.0.1 and port 8118. - DO NOT activate proxying for FTP or - any protocols besides HTTP and HTTPS (SSL) unless you intend to prevent your - browser from using these protocols. - - - - - - Flush your browser's disk and memory caches, to remove any cached ad images. - If using Privoxy to manage - cookies, - you should remove any currently stored cookies too. - - - - - - A default installation should provide a reasonable starting point for - most. There will undoubtedly be occasions where you will want to adjust the - configuration, but that can be dealt with as the need arises. Little - to no initial configuration is required in most cases, you may want - to enable the - web-based action editor though. - Be sure to read the warnings first. - - - See the Configuration section for more - configuration options, and how to customize your installation. - You might also want to look at the next section for a quick - introduction to how Privoxy blocks ads and - banners. - - - - - - If you experience ads that slip through, innocent images that are - blocked, or otherwise feel the need to fine-tune - Privoxy's behavior, take a look at the actions files. As a quick start, you might - find the richly commented examples - helpful. You can also view and edit the actions files through the web-based user interface. The - Appendix Troubleshooting: Anatomy of an - Action has hints on how to understand and debug actions that - misbehave. - - - - - - - - Please see the section Contacting the - Developers on how to report bugs, problems with websites or to get - help. - - - - - - Now enjoy surfing with enhanced control, comfort and privacy! - - - - - - - - - - -Quickstart to Ad Blocking - - - Ad blocking is but one of Privoxy's - array of features. Many of these features are for the technically minded advanced - user. But, ad and banner blocking is surely common ground for everybody. - - - This section will provide a quick summary of ad blocking so - you can get up to speed quickly without having to read the more extensive - information provided below, though this is highly recommended. - - - First a bit of a warning ... blocking ads is much like blocking SPAM: the - more aggressive you are about it, the more likely you are to block - things that were not intended. And the more likely that some things - may not work as intended. So there is a trade off here. If you want - extreme ad free browsing, be prepared to deal with more - problem sites, and to spend more time adjusting the - configuration to solve these unintended consequences. In short, there is - not an easy way to eliminate all ads. Either take - the easy way and settle for most ads blocked with the - default configuration, or jump in and tweak it for your personal surfing - habits and preferences. - - - Secondly, a brief explanation of Privoxy's - actions. Actions in this context, are - the directives we use to tell Privoxy to perform - some task relating to HTTP transactions (i.e. web browsing). We tell - Privoxy to take some action. Each - action has a unique name and function. While there are many potential - actions in Privoxy's - arsenal, only a few are used for ad blocking. Actions, and action - configuration files, are explained in depth below. - - - Actions are specified in Privoxy's configuration, - followed by one or more URLs to which the action should apply. URLs - can actually be URL type patterns that use - wildcards so they can apply potentially to a range of similar URLs. The - actions, together with the URL patterns are called a section. - - - When you connect to a website, the full URL will either match one or more - of the sections as defined in Privoxy's configuration, - or not. If so, then Privoxy will perform the - respective actions. If not, then nothing special happens. Furthermore, web - pages may contain embedded, secondary URLs that your web browser will - use to load additional components of the page, as it parses the - original page's HTML content. An ad image for instance, is just an URL - embedded in the page somewhere. The image itself may be on the same server, - or a server somewhere else on the Internet. Complex web pages will have many - such embedded URLs. &my-app; can deal with each URL individually, so, for - instance, the main page text is not touched, but images from such-and-such - server are blocked. - - - - The most important actions for basic ad blocking are: block, handle-as-image, - handle-as-empty-document,and - set-image-blocker: - - - - - - - - block - this is perhaps - the single most used action, and is particularly important for ad blocking. - This action stops any contact between your browser and any URL patterns - that match this action's configuration. It can be used for blocking ads, - but also anything that is determined to be unwanted. By itself, it simply - stops any communication with the remote server and sends - Privoxy's own built-in BLOCKED page instead to - let you now what has happened (with some exceptions, see below). - - - - - - handle-as-image - - tells Privoxy to treat this URL as an image. - Privoxy's default configuration already does this - for all common image types (e.g. GIF), but there are many situations where this - is not so easy to determine. So we'll force it in these cases. This is particularly - important for ad blocking, since only if we know that it's an image of - some kind, can we replace it with an image of our choosing, instead of the - Privoxy BLOCKED page (which would only result in - a broken image icon). There are some limitations to this - though. For instance, you can't just brute-force an image substitution for - an entire HTML page in most situations. - - - - - - handle-as-empty-document - - sends an empty document instead of Privoxy's - normal BLOCKED HTML page. This is useful for file types that are neither - HTML nor images, such as blocking JavaScript files. - - - - - - set-image-blocker - tells - Privoxy what to display in place of an ad image that - has hit a block rule. For this to come into play, the URL must match a - block action somewhere in the - configuration, and, it must also match an - handle-as-image action. - - - The configuration options on what to display instead of the ad are: - - - -    pattern - a checkerboard pattern, so that an ad - replacement is obvious. This is the default. - - - - -    blank - A very small empty GIF image is displayed. - This is the so-called invisible configuration option. - - - - -    http://<URL> - A redirect to any image anywhere - of the user's choosing (advanced usage). - - - - - - - - - Advanced users will eventually want to explore &my-app; - filters as well. Filters - are very different from blocks. - A block blocks a site, page, or unwanted contented. Filters - are a way of filtering or modifying what is actually on the page. An example - filter usage: a text replacement of no-no for - nasty-word. That is a very simple example. This process can be - used for ad blocking, but it is more in the realm of advanced usage and has - some pitfalls to be wary off. - - - - The quickest way to adjust any of these settings is with your browser through - the special Privoxy editor at http://config.privoxy.org/show-status - (shortcut: http://p.p/show-status). This - is an internal page, and does not require Internet access. - - - - Note that as of Privoxy 3.0.7 beta the - action editor is disabled by default. Check the - enable-edit-actions - section in the configuration file to learn why and in which - cases it's safe to enable again. - - - - If you decided to enable the action editor, select the appropriate - actions file, and click - Edit. It is best to put personal or - local preferences in user.action since this is not - meant to be overwritten during upgrades, and will over-ride the settings in - other files. Here you can insert new actions, and URLs for ad - blocking or other purposes, and make other adjustments to the configuration. - Privoxy will detect these changes automatically. - - - - A quick and simple step by step example: - - - - - - - - Right click on the ad image to be blocked, then select - Copy Link Location from the - pop-up menu. - - - - - Set your browser to - http://config.privoxy.org/show-status - - - - - Find user.action in the top section, and click - on Edit: - - - - -
Actions Files in Use - - - - - - [ Screenshot of Actions Files in Use ] - - -
-
-
- - - - You should have a section with only - block listed under - Actions:. - If not, click a Insert new section below - button, and in the new section that just appeared, click the - Edit button right under the word Actions:. - This will bring up a list of all actions. Find - block near the top, and click - in the Enabled column, then Submit - just below the list. - - - - - Now, in the block actions section, - click the Add button, and paste the URL the - browser got from Copy Link Location. - Remove the http:// at the beginning of the URL. Then, click - Submit (or - OK if in a pop-up window). - - - - - Now go back to the original page, and press SHIFT-Reload - (or flush all browser caches). The image should be gone now. - - - -
-
- - - This is a very crude and simple example. There might be good reasons to use a - wildcard pattern match to include potentially similar images from the same - site. For a more extensive explanation of patterns, and - the entire actions concept, see the Actions - section. - - - - For advanced users who want to hand edit their config files, you might want - to now go to the Actions Files Tutorial. - The ideas explained therein also apply to the web-based editor. - - - There are also various - filters that can be used for ad blocking - (filters are a special subset of actions). These - fall into the advanced usage category, and are explained in - depth in later sections. - - -
- -
- - - - - - -Starting Privoxy - - Before launching Privoxy for the first time, you - will want to configure your browser(s) to use - Privoxy as a HTTP and HTTPS (SSL) - proxy. The default is - 127.0.0.1 (or localhost) for the proxy address, and port 8118 (earlier versions - used port 8000). This is the one configuration step that must be done -! - - - Please note that Privoxy can only proxy HTTP and - HTTPS traffic. It will not work with FTP or other protocols. - - - - -
Proxy Configuration Showing - Mozilla/Netscape HTTP and HTTPS (SSL) Settings - - - - - - [ Screenshot of Mozilla Proxy Configuration ] - - -
-
- - - - With Firefox, this is typically set under: - - - - Tools -> Options -> Advanced -> Network ->Connection -> Settings - - - - - Or optionally on some platforms: - - - - Edit -> Preferences -> General -> Connection Settings -> Manual Proxy Configuration - - - - - - With Netscape (and - Mozilla), this can be set under: - - - - - - - Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy - - - - - For Internet Explorer v.5-7: - - - - Tools -> Internet Options -> Connections -> LAN Settings - - - - Then, check Use Proxy and fill in the appropriate info - (Address: 127.0.0.1, Port: 8118). Include HTTPS (SSL), if you want HTTPS - proxy support too (sometimes labeled Secure). Make sure any - checkboxes like Use the same proxy server for all protocols is - UNCHECKED. You want only HTTP and HTTPS (SSL)! - - - - -
Proxy Configuration Showing - Internet Explorer HTTP and HTTPS (Secure) Settings - - - - - - [ Screenshot of IE Proxy Configuration ] - - -
-
- - - - After doing this, flush your browser's disk and memory caches to force a - re-reading of all pages and to get rid of any ads that may be cached. Remove - any cookies, - if you want Privoxy to manage that. You are now - ready to start enjoying the benefits of using - Privoxy! - - - - Privoxy itself is typically started by specifying the - main configuration file to be used on the command line. If no configuration - file is specified on the command line, Privoxy - will look for a file named config in the current - directory. Except on Win32 where it will try config.txt. - - - -Red Hat and Fedora - - A default Red Hat installation may not start &my-app; upon boot. It will use - the file /etc/privoxy/config as its main configuration - file. - - - - # /etc/rc.d/init.d/privoxy start - - - - Or ... - - - - # service privoxy start - - - - - -Debian - - We use a script. Note that Debian typically starts &my-app; upon booting per - default. It will use the file - /etc/privoxy/config as its main configuration - file. - - - - # /etc/init.d/privoxy start - - - - - -Windows - -Click on the &my-app; Icon to start Privoxy. If no configuration file is - specified on the command line, Privoxy will look - for a file named config.txt. Note that Windows will - automatically start &my-app; when the system starts if you chose that option - when installing. - - - Privoxy can run with full Windows service functionality. - On Windows only, the &my-app; program has two new command line arguments - to install and uninstall &my-app; as a service. See the - Windows Installation - instructions for details. - - - - -Solaris, NetBSD, FreeBSD, HP-UX and others - -Example Unix startup command: - - - - # /usr/sbin/privoxy /etc/privoxy/config - - - - - -OS/2 - - During installation, Privoxy is configured to - start automatically when the system restarts. You can start it manually by - double-clicking on the Privoxy icon in the - Privoxy folder. - - - - -Mac OS X - - After downloading the privoxy software, unzip the downloaded file by - double-clicking on the zip file icon. Then, double-click on the - installer package icon and follow the installation process. - - - The privoxy service will automatically start after a successful - installation. In addition, the privoxy service will automatically - start every time your computer starts up. - - - To prevent the privoxy service from automatically starting when your - computer starts up, remove or rename the folder named - /Library/StartupItems/Privoxy. - - - A simple application named Privoxy Utility has been created which - enables administrators to easily start and stop the privoxy service. - - - In addition, the Privoxy Utility presents a simple way for - administrators to edit the various privoxy config files. A method - to uninstall the software is also available. - - - An administrator username and password must be supplied in order for - the Privoxy Utility to perform any of the tasks. - - - - - -AmigaOS - - Start Privoxy (with RUN <>NIL:) in your - startnet script (AmiTCP), in - s:user-startup (RoadShow), as startup program in your - startup script (Genesis), or as startup action (Miami and MiamiDx). - Privoxy will automatically quit when you quit your - TCP/IP stack (just ignore the harmless warning your TCP/IP stack may display that - Privoxy is still running). - - - - -Gentoo - - A script is again used. It will use the file /etc/privoxy/config - as its main configuration file. - - - - /etc/init.d/privoxy start - - - - Note that Privoxy is not automatically started at - boot time by default. You can change this with the rc-update - command. - - - - rc-update add privoxy default - - - - - - - - -Command Line Options - - Privoxy may be invoked with the following - command-line options: - - - - - - - - --version - - - Print version info and exit. Unix only. - - - - - --help - - - Print short usage info and exit. Unix only. - - - - - --no-daemon - - - Don't become a daemon, i.e. don't fork and become process group - leader, and don't detach from controlling tty. Unix only. - - - - - --pidfile FILE - - - On startup, write the process ID to FILE. Delete the - FILE on exit. Failure to create or delete the - FILE is non-fatal. If no FILE - option is given, no PID file will be used. Unix only. - - - - - --user USER[.GROUP] - - - After (optionally) writing the PID file, assume the user ID of - USER, and if included the GID of GROUP. Exit if the - privileges are not sufficient to do so. Unix only. - - - - - --chroot - - - Before changing to the user ID given in the --user option, - chroot to that user's home directory, i.e. make the kernel pretend to the &my-app; - process that the directory tree starts there. If set up carefully, this can limit - the impact of possible vulnerabilities in &my-app; to the files contained in that hierarchy. - Unix only. - - - - - --pre-chroot-nslookup hostname - - - Specifies a hostname to look up before doing a chroot. On some systems, initializing the - resolver library involves reading config files from /etc and/or loading additional shared - libraries from /lib. On these systems, doing a hostname lookup before the chroot reduces - the number of files that must be copied into the chroot tree. - - - For fastest startup speed, a good value is a hostname that is not in /etc/hosts but that - your local name server (listed in /etc/resolv.conf) can resolve without recursion - (that is, without having to ask any other name servers). The hostname need not exist, - but if it doesn't, an error message (which can be ignored) will be output. - - - - - - configfile - - - If no configfile is included on the command line, - Privoxy will look for a file named - config in the current directory (except on Win32 - where it will look for config.txt instead). Specify - full path to avoid confusion. If no config file is found, - Privoxy will fail to start. - - - - - - - - On MS Windows only there are two additional - command-line options to allow Privoxy to install and - run as a service. See the -Window Installation section -for details. - - - - -
- - - - - -Privoxy Configuration - - All Privoxy configuration is stored - in text files. These files can be edited with a text editor. - Many important aspects of Privoxy can - also be controlled easily with a web browser. - - - - - - -Controlling Privoxy with Your Web Browser - - Privoxy's user interface can be reached through the special - URL http://config.privoxy.org/ - (shortcut: http://p.p/), - which is a built-in page and works without Internet access. - You will see the following section: - - - - - - -     Privoxy Menu - - - -         ▪  View & change the current configuration - - -         ▪  View the source code version numbers - - -         ▪  View the request headers. - - -         ▪  Look up which actions apply to a URL and why - - -         ▪  Toggle Privoxy on or off - - -         ▪  Documentation - - - - - - - - This should be self-explanatory. Note the first item leads to an editor for the - actions files, which is where the ad, banner, - cookie, and URL blocking magic is configured as well as other advanced features of - Privoxy. This is an easy way to adjust various - aspects of Privoxy configuration. The actions - file, and other configuration files, are explained in detail below. - - - - Toggle Privoxy On or Off is handy for sites that might - have problems with your current actions and filters. You can in fact use - it as a test to see whether it is Privoxy - causing the problem or not. Privoxy continues - to run as a proxy in this case, but all manipulation is disabled, i.e. - Privoxy acts like a normal forwarding proxy. There - is even a toggle Bookmarklet offered, so - that you can toggle Privoxy with one click from - your browser. - - - - Note that several of the features described above are disabled by default - in Privoxy 3.0.7 beta and later. - Check the - configuration file to learn why - and in which cases it's safe to enable them again. - - - - - - - - - - - - -Configuration Files Overview - - For Unix, *BSD and Linux, all configuration files are located in - /etc/privoxy/ by default. For MS Windows, OS/2, and - AmigaOS these are all in the same directory as the - Privoxy executable. - - - - The installed defaults provide a reasonable starting point, though - some settings may be aggressive by some standards. For the time being, the - principle configuration files are: - - - - - - - - The main configuration file is named config - on Linux, Unix, BSD, OS/2, and AmigaOS and config.txt - on Windows. This is a required file. - - - - - - match-all.action is used to define which actions - relating to banner-blocking, images, pop-ups, content modification, cookie handling - etc should be applied by default. It should be the first actions file loaded. - - - default.action defines many exceptions (both positive and negative) - from the default set of actions that's configured in match-all.action. - It should be the second actions file loaded and shouldn't be edited by the user. - - - Multiple actions files may be defined in config. These - are processed in the order they are defined. Local customizations and locally - preferred exceptions to the default policies as defined in - match-all.action (which you will most probably want - to define sooner or later) are best applied in user.action, - where you can preserve them across upgrades. The file isn't installed by all - installers, but you can easily create it yourself with a text editor. - - - There is also a web based editor that can be accessed from - http://config.privoxy.org/show-status - (Shortcut: http://p.p/show-status) for the - various actions files. - - - - - - Filter files (the filter - file) can be used to re-write the raw page content, including - viewable text as well as embedded HTML and JavaScript, and whatever else - lurks on any given web page. The filtering jobs are only pre-defined here; - whether to apply them or not is up to the actions files. - default.filter includes various filters made - available for use by the developers. Some are much more intrusive than - others, and all should be used with caution. You may define additional - filter files in config as you can with - actions files. We suggest user.filter for any - locally defined filters or customizations. - - - - - - - - The syntax of the configuration and filter files may change between different - Privoxy versions, unfortunately some enhancements cost backwards compatibility. - - - - - All files use the # character to denote a - comment (the rest of the line will be ignored) and understand line continuation - through placing a backslash ("\") as the very last character - in a line. If the # is preceded by a backslash, it looses - its special function. Placing a # in front of an otherwise - valid configuration line to prevent it from being interpreted is called "commenting - out" that line. Blank lines are ignored. - - - - The actions files and filter files - can use Perl style regular expressions for - maximum flexibility. - - - - After making any changes, there is no need to restart - Privoxy in order for the changes to take - effect. Privoxy detects such changes - automatically. Note, however, that it may take one or two additional - requests for the change to take effect. When changing the listening address - of Privoxy, these wake up requests - must obviously be sent to the old listening address. - - - - While under development, the configuration content is subject to change. - The below documentation may not be accurate by the time you read this. - Also, what constitutes a default setting, may change, so - please check all your configuration files on important issues. -
-]]> - -
- - - - - - - - - - &config; - - - - - - - - - -Actions Files - - - - - The actions files are used to define what actions - Privoxy takes for which URLs, and thus determines - how ad images, cookies and various other aspects of HTTP content and - transactions are handled, and on which sites (or even parts thereof). - There are a number of such actions, with a wide range of functionality. - Each action does something a little different. - These actions give us a veritable arsenal of tools with which to exert - our control, preferences and independence. Actions can be combined so that - their effects are aggregated when applied against a given set of URLs. - - - There - are three action files included with Privoxy with - differing purposes: - - - - - - match-all.action - is used to define which - actions relating to banner-blocking, images, pop-ups, - content modification, cookie handling etc should be applied by default. - It should be the first actions file loaded - - - - - default.action - defines many exceptions (both - positive and negative) from the default set of actions that's configured - in match-all.action. It is a set of rules that should - work reasonably well as-is for most users. This file is only supposed to - be edited by the developers. It should be the second actions file loaded. - - - - - user.action - is intended to be for local site - preferences and exceptions. As an example, if your ISP or your bank - has specific requirements, and need special handling, this kind of - thing should go here. This file will not be upgraded. - - - - - Edit Set to Cautious Set to Medium Set to Advanced - - - These have increasing levels of aggressiveness and have no - influence on your browsing unless you select them explicitly in the - editor. A default installation should be pre-set to - Cautious. New users should try this for a while before - adjusting the settings to more aggressive levels. The more aggressive - the settings, then the more likelihood there is of problems such as sites - not working as they should. - - - The Edit button allows you to turn each - action on/off individually for fine-tuning. The Cautious - button changes the actions list to low/safe settings which will activate - ad blocking and a minimal set of &my-app;'s features, and subsequently - there will be less of a chance for accidental problems. The - Medium button sets the list to a medium level of - other features and a low level set of privacy features. The - Advanced button sets the list to a high level of - ad blocking and medium level of privacy. See the chart below. The latter - three buttons over-ride any changes via with the - Edit button. More fine-tuning can be done in the - lower sections of this internal page. - - - While the actions file editor allows to enable these settings in all - actions files, they are only supposed to be enabled in the first one - to make sure you don't unintentionally overrule earlier rules. - - - The default profiles, and their associated actions, as pre-defined in - default.action are: - - - Default Configurations - - - - - - - - Feature - Cautious - Medium - Advanced - - - - - - - - - - - - - - Ad-blocking Aggressiveness - medium - high - high - - - - Ad-filtering by size - no - yes - yes - - - - Ad-filtering by link - no - no - yes - - - Pop-up killing - blocks only - blocks only - blocks only - - - - Privacy Features - low - medium - medium/high - - - - Cookie handling - none - session-only - kill - - - - Referer forging - no - yes - yes - - - - GIF de-animation - no - yes - yes - - - - Fast redirects - no - no - yes - - - - HTML taming - no - no - yes - - - - JavaScript taming - no - no - yes - - - - Web-bug killing - no - yes - yes - - - - Image tag reordering - no - yes - yes - - - - -
-
- -
-
-
- - - The list of actions files to be used are defined in the main configuration - file, and are processed in the order they are defined (e.g. - default.action is typically processed before - user.action). The content of these can all be viewed and - edited from http://config.privoxy.org/show-status. - The over-riding principle when applying actions, is that the last action that - matches a given URL wins. The broadest, most general rules go first - (defined in default.action), - followed by any exceptions (typically also in - default.action), which are then followed lastly by any - local preferences (typically in user.action). - Generally, user.action has the last word. - - - - An actions file typically has multiple sections. If you want to use - aliases in an actions file, you have to place the (optional) - alias section at the top of that file. - Then comes the default set of rules which will apply universally to all - sites and pages (be very careful with using such a - universal set in user.action or any other actions file after - default.action, because it will override the result - from consulting any previous file). And then below that, - exceptions to the defined universal policies. You can regard - user.action as an appendix to default.action, - with the advantage that it is a separate file, which makes preserving your - personal settings across Privoxy upgrades easier. - - - - Actions can be used to block anything you want, including ads, banners, or - just some obnoxious URL whose content you would rather not see. Cookies can be accepted - or rejected, or accepted only during the current browser session (i.e. not - written to disk), content can be modified, some JavaScripts tamed, user-tracking - fooled, and much more. See below for a complete list - of actions. - - - - -Finding the Right Mix - - Note that some actions, like cookie suppression - or script disabling, may render some sites unusable that rely on these - techniques to work properly. Finding the right mix of actions is not always easy and - certainly a matter of personal taste. And, things can always change, requiring - refinements in the configuration. In general, it can be said that the more - aggressive your default settings (in the top section of the - actions file) are, the more exceptions for trusted sites you - will have to make later. If, for example, you want to crunch all cookies per - default, you'll have to make exceptions from that rule for sites that you - regularly use and that require cookies for actually useful purposes, like maybe - your bank, favorite shop, or newspaper. - - - - We have tried to provide you with reasonable rules to start from in the - distribution actions files. But there is no general rule of thumb on these - things. There just are too many variables, and sites are constantly changing. - Sooner or later you will want to change the rules (and read this chapter again :). - - - - - -How to Edit - - The easiest way to edit the actions files is with a browser by - using our browser-based editor, which can be reached from http://config.privoxy.org/show-status. - Note: the config file option enable-edit-actions must be enabled for - this to work. The editor allows both fine-grained control over every single - feature on a per-URL basis, and easy choosing from wholesale sets of defaults - like Cautious, Medium or - Advanced. Warning: the Advanced setting is more - aggressive, and will be more likely to cause problems for some sites. - Experienced users only! - - - - If you prefer plain text editing to GUIs, you can of course also directly edit the - the actions files with your favorite text editor. Look at - default.action which is richly commented with many - good examples. - - - - - -How Actions are Applied to Requests - - Actions files are divided into sections. There are special sections, - like the alias sections which will - be discussed later. For now let's concentrate on regular sections: They have a - heading line (often split up to multiple lines for readability) which consist - of a list of actions, separated by whitespace and enclosed in curly braces. - Below that, there is a list of URL and tag patterns, each on a separate line. - - - - To determine which actions apply to a request, the URL of the request is - compared to all URL patterns in each action file. - Every time it matches, the list of applicable actions for the request is - incrementally updated, using the heading of the section in which the - pattern is located. The same is done again for tags and tag patterns later on. - - - - If multiple applying sections set the same action differently, - the last match wins. If not, the effects are aggregated. - E.g. a URL might match a regular section with a heading line of { - +handle-as-image }, - then later another one with just { - +block }, resulting - in both actions to apply. And there may well be - cases where you will want to combine actions together. Such a section then - might look like: - - - - - { +handle-as-image +block{Banner ads.} } - # Block these as if they were images. Send no block page. - banners.example.com - media.example.com/.*banners - .example.com/images/ads/ - - - - You can trace this process for URL patterns and any given URL by visiting http://config.privoxy.org/show-url-info. - - - - Examples and more detail on this is provided in the Appendix, - Troubleshooting: Anatomy of an Action section. - - - - - -Patterns - - As mentioned, Privoxy uses patterns - to determine what actions might apply to which sites and - pages your browser attempts to access. These patterns use wild - card type pattern matching to achieve a high degree of - flexibility. This allows one expression to be expanded and potentially match - against many similar patterns. - - - - Generally, an URL pattern has the form - <domain><port>/<path>, where the - <domain>, the <port> - and the <path> are optional. (This is why the special - / pattern matches all URLs). Note that the protocol - portion of the URL pattern (e.g. http://) should - not be included in the pattern. This is assumed already! - - - The pattern matching syntax is different for the domain and path parts of - the URL. The domain part uses a simple globbing type matching technique, - while the path part uses more flexible - Regular - Expressions (POSIX 1003.2). - - - The port part of a pattern is a decimal port number preceded by a colon - (:). If the domain part contains a numerical IPv6 address, - it has to be put into angle brackets - (<, >). - - - - - www.example.com/ - - - is a domain-only pattern and will match any request to www.example.com, - regardless of which document on that server is requested. So ALL pages in - this domain would be covered by the scope of this action. Note that a - simple example.com is different and would NOT match. - - - - - www.example.com - - - means exactly the same. For domain-only patterns, the trailing / may - be omitted. - - - - - www.example.com/index.html - - - matches all the documents on www.example.com - whose name starts with /index.html. - - - - - www.example.com/index.html$ - - - matches only the single document /index.html - on www.example.com. - - - - - /index.html$ - - - matches the document /index.html, regardless of the domain, - i.e. on any web server anywhere. - - - - - / - - - Matches any URL because there's no requirement for either the - domain or the path to match anything. - - - - - :8000/ - - - Matches any URL pointing to TCP port 8000. - - - - - <2001:db8::1>/ - - - Matches any URL with the host address 2001:db8::1. - (Note that the real URL uses plain brackets, not angle brackets.) - - - - - index.html - - - matches nothing, since it would be interpreted as a domain name and - there is no top-level domain called .html. So its - a mistake. - - - - - - - -The Domain Pattern - - - The matching of the domain part offers some flexible options: if the - domain starts or ends with a dot, it becomes unanchored at that end. - For example: - - - - - .example.com - - - matches any domain with first-level domain com - and second-level domain example. - For example www.example.com, - example.com and foo.bar.baz.example.com. - Note that it wouldn't match if the second-level domain was another-example. - - - - - www. - - - matches any domain that STARTS with - www. (It also matches the domain - www but most of the time that doesn't matter.) - - - - - .example. - - - matches any domain that CONTAINS .example.. - And, by the way, also included would be any files or documents that exist - within that domain since no path limitations are specified. (Correctly - speaking: It matches any FQDN that contains example as - a domain.) This might be www.example.com, - news.example.de, or - www.example.net/cgi/testing.pl for instance. All these - cases are matched. - - - - - - - Additionally, there are wild-cards that you can use in the domain names - themselves. These work similarly to shell globbing type wild-cards: - * represents zero or more arbitrary characters (this is - equivalent to the - Regular - Expression based syntax of .*), - ? represents any single character (this is equivalent to the - regular expression syntax of a simple .), and you can define - character classes in square brackets which is similar to - the same regular expression technique. All of this can be freely mixed: - - - - - ad*.example.com - - - matches adserver.example.com, - ads.example.com, etc but not sfads.example.com - - - - - *ad*.example.com - - - matches all of the above, and then some. - - - - - .?pix.com - - - matches www.ipix.com, - pictures.epix.com, a.b.c.d.e.upix.com etc. - - - - - www[1-9a-ez].example.c* - - - matches www1.example.com, - www4.example.cc, wwwd.example.cy, - wwwz.example.com etc., but not - wwww.example.com. - - - - - - - While flexible, this is not the sophistication of full regular expression based syntax. - - - - - - - - -The Path Pattern - - - Privoxy uses modern POSIX 1003.2 - Regular - Expressions for matching the path portion (after the slash), - and is thus more flexible. - - - - There is an Appendix with a brief quick-start into regular - expressions, you also might want to have a look at your operating system's documentation - on regular expressions (try man re_format). - - - - Note that the path pattern is automatically left-anchored at the /, - i.e. it matches as if it would start with a ^ (regular expression speak - for the beginning of a line). - - - - Please also note that matching in the path is CASE INSENSITIVE - by default, but you can switch to case sensitive at any point in the pattern by using the - (?-i) switch: www.example.com/(?-i)PaTtErN.* will match - only documents whose path starts with PaTtErN in - exactly this capitalization. - - - - - .example.com/.* - - - Is equivalent to just .example.com, since any documents - within that domain are matched with or without the .* - regular expression. This is redundant - - - - - .example.com/.*/index.html$ - - - Will match any page in the domain of example.com that is - named index.html, and that is part of some path. For - example, it matches www.example.com/testing/index.html but - NOT www.example.com/index.html because the regular - expression called for at least two /'s, thus the path - requirement. It also would match - www.example.com/testing/index_html, because of the - special meta-character .. - - - - - .example.com/(.*/)?index\.html$ - - - This regular expression is conditional so it will match any page - named index.html regardless of path which in this case can - have one or more /'s. And this one must contain exactly - .html (but does not have to end with that!). - - - - - .example.com/(.*/)(ads|banners?|junk) - - - This regular expression will match any path of example.com - that contains any of the words ads, banner, - banners (because of the ?) or junk. - The path does not have to end in these words, just contain them. - - - - - .example.com/(.*/)(ads|banners?|junk)/.*\.(jpe?g|gif|png)$ - - - This is very much the same as above, except now it must end in either - .jpg, .jpeg, .gif or .png. So this - one is limited to common image formats. - - - - - - - There are many, many good examples to be found in default.action, - and more tutorials below in Appendix on regular expressions. - - - - - - - - -The Tag Pattern - - - Tag patterns are used to change the applying actions based on the - request's tags. Tags can be created with either the - client-header-tagger - or the server-header-tagger action. - - - - Tag patterns have to start with TAG:, so &my-app; - can tell them apart from URL patterns. Everything after the colon - including white space, is interpreted as a regular expression with - path pattern syntax, except that tag patterns aren't left-anchored - automatically (&my-app; doesn't silently add a ^, - you have to do it yourself if you need it). - - - - To match all requests that are tagged with foo - your pattern line should be TAG:^foo$, - TAG:foo would work as well, but it would also - match requests whose tags contain foo somewhere. - TAG: foo wouldn't work as it requires white space. - - - - Sections can contain URL and tag patterns at the same time, - but tag patterns are checked after the URL patterns and thus - always overrule them, even if they are located before the URL patterns. - - - - Once a new tag is added, Privoxy checks right away if it's matched by one - of the tag patterns and updates the action settings accordingly. As a result - tags can be used to activate other tagger actions, as long as these other - taggers look for headers that haven't already be parsed. - - - - For example you could tag client requests which use the - POST method, - then use this tag to activate another tagger that adds a tag if cookies - are sent, and then use a block action based on the cookie tag. This allows - the outcome of one action, to be input into a subsequent action. However if - you'd reverse the position of the described taggers, and activated the - method tagger based on the cookie tagger, no method tags would be created. - The method tagger would look for the request line, but at the time - the cookie tag is created, the request line has already been parsed. - - - - While this is a limitation you should be aware of, this kind of - indirection is seldom needed anyway and even the example doesn't - make too much sense. - - - - - - - - - - - - -Actions - - All actions are disabled by default, until they are explicitly enabled - somewhere in an actions file. Actions are turned on if preceded with a - +, and turned off if preceded with a -. So a - +action means do that action, e.g. - +block means please block URLs that match the - following patterns, and -block means don't - block URLs that match the following patterns, even if +block - previously applied. - - - - - Again, actions are invoked by placing them on a line, enclosed in curly braces and - separated by whitespace, like in - {+some-action -some-other-action{some-parameter}}, - followed by a list of URL patterns, one per line, to which they apply. - Together, the actions line and the following pattern lines make up a section - of the actions file. - - - - Actions fall into three categories: - - - - - - - Boolean, i.e the action can only be enabled or - disabled. Syntax: - - - - +name # enable action name - -name # disable action name - - - Example: +handle-as-image - - - - - - - Parameterized, where some value is required in order to enable this type of action. - Syntax: - - - - +name{param} # enable action and set parameter to param, - # overwriting parameter from previous match if necessary - -name # disable action. The parameter can be omitted - - - Note that if the URL matches multiple positive forms of a parameterized action, - the last match wins, i.e. the params from earlier matches are simply ignored. - - - Example: +hide-user-agent{Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070602 Firefox/2.0.0.4} - - - - - - Multi-value. These look exactly like parameterized actions, - but they behave differently: If the action applies multiple times to the - same URL, but with different parameters, all the parameters - from all matches are remembered. This is used for actions - that can be executed for the same request repeatedly, like adding multiple - headers, or filtering through multiple filters. Syntax: - - - - +name{param} # enable action and add param to the list of parameters - -name{param} # remove the parameter param from the list of parameters - # If it was the last one left, disable the action. - -name # disable this action completely and remove all parameters from the list - - - Examples: +add-header{X-Fun-Header: Some text} and - +filter{html-annoyances} - - - - - - - - If nothing is specified in any actions file, no actions are - taken. So in this case Privoxy would just be a - normal, non-blocking, non-filtering proxy. You must specifically enable the - privacy and blocking features you need (although the provided default actions - files will give a good starting point). - - - - Later defined action sections always over-ride earlier ones of the same type. - So exceptions to any rules you make, should come in the latter part of the file (or - in a file that is processed later when using multiple actions files such - as user.action). For multi-valued actions, the actions - are applied in the order they are specified. Actions files are processed in - the order they are defined in config (the default - installation has three actions files). It also quite possible for any given - URL to match more than one pattern (because of wildcards and - regular expressions), and thus to trigger more than one set of actions! Last - match wins. - - - - - The list of valid Privoxy actions are: - - - - - - - - - - - - - -add-header - - - - Typical use: - - Confuse log analysis, custom applications - - - - - Effect: - - - Sends a user defined HTTP header to the web server. - - - - - - Type: - - - Multi-value. - - - - - Parameter: - - - Any string value is possible. Validity of the defined HTTP headers is not checked. - It is recommended that you use the X- prefix - for custom headers. - - - - - - Notes: - - - This action may be specified multiple times, in order to define multiple - headers. This is rarely needed for the typical user. If you don't know what - HTTP headers are, you definitely don't need to worry about this - one. - - - Headers added by this action are not modified by other actions. - - - - - - Example usage: - - - +add-header{X-User-Tracking: sucks} - - - - - - - - - -block - - - - Typical use: - - Block ads or other unwanted content - - - - - Effect: - - - Requests for URLs to which this action applies are blocked, i.e. the - requests are trapped by &my-app; and the requested URL is never retrieved, - but is answered locally with a substitute page or image, as determined by - the handle-as-image, - set-image-blocker, and - handle-as-empty-document actions. - - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - A block reason that should be given to the user. - - - - - Notes: - - - Privoxy sends a special BLOCKED page - for requests to blocked pages. This page contains the block reason given as - parameter, a link to find out why the block action applies, and a click-through - to the blocked content (the latter only if the force feature is available and - enabled). - - - A very important exception occurs if both - block and handle-as-image, - apply to the same request: it will then be replaced by an image. If - set-image-blocker - (see below) also applies, the type of image will be determined by its parameter, - if not, the standard checkerboard pattern is sent. - - - It is important to understand this process, in order - to understand how Privoxy deals with - ads and other unwanted content. Blocking is a core feature, and one - upon which various other features depend. - - - The filter - action can perform a very similar task, by blocking - banner images and other content through rewriting the relevant URLs in the - document's HTML source, so they don't get requested in the first place. - Note that this is a totally different technique, and it's easy to confuse the two. - - - - - - Example usage (section): - - - {+block{No nasty stuff for you.}} -# Block and replace with "blocked" page - .nasty-stuff.example.com - -{+block{Doubleclick banners.} +handle-as-image} -# Block and replace with image - .ad.doubleclick.net - .ads.r.us/banners/ - -{+block{Layered ads.} +handle-as-empty-document} -# Block and then ignore - adserver.example.net/.*\.js$ - - - - - - - - - - - -change-x-forwarded-for - - - - Typical use: - - Improve privacy by not forwarding the source of the request in the HTTP headers. - - - - - Effect: - - - Deletes the X-Forwarded-For: HTTP header from the client request, - or adds a new one. - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - - - block to delete the header. - - - - add to create the header (or append - the client's IP address to an already existing one). - - - - - - - - Notes: - - - It is safe and recommended to use block. - - - Forwarding the source address of the request may make - sense in some multi-user setups but is also a privacy risk. - - - - - Example usage: - - - +change-x-forwarded-for{block} - - - - - - - - -client-header-filter - - - - Typical use: - - - Rewrite or remove single client headers. - - - - - - Effect: - - - All client headers to which this action applies are filtered on-the-fly through - the specified regular expression based substitutions. - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - - The name of a client-header filter, as defined in one of the - filter files. - - - - - - Notes: - - - Client-header filters are applied to each header on its own, not to - all at once. This makes it easier to diagnose problems, but on the downside - you can't write filters that only change header x if header y's value is z. - You can do that by using tags though. - - - Client-header filters are executed after the other header actions have finished - and use their output as input. - - - If the request URL gets changed, &my-app; will detect that and use the new - one. This can be used to rewrite the request destination behind the client's - back, for example to specify a Tor exit relay for certain requests. - - - Please refer to the filter file chapter - to learn which client-header filters are available by default, and how to - create your own. - - - - - - - Example usage (section): - - - -# Hide Tor exit notation in Host and Referer Headers -{+client-header-filter{hide-tor-exit-notation}} -/ - - - - - - - - - - - -client-header-tagger - - - - Typical use: - - - Block requests based on their headers. - - - - - - Effect: - - - Client headers to which this action applies are filtered on-the-fly through - the specified regular expression based substitutions, the result is used as - tag. - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - - The name of a client-header tagger, as defined in one of the - filter files. - - - - - - Notes: - - - Client-header taggers are applied to each header on its own, - and as the header isn't modified, each tagger sees - the original. - - - Client-header taggers are the first actions that are executed - and their tags can be used to control every other action. - - - - - - Example usage (section): - - - -# Tag every request with the User-Agent header -{+client-header-tagger{user-agent}} -/ - -# Tagging itself doesn't change the action -# settings, sections with TAG patterns do: -# -# If it's a download agent, use a different forwarding proxy, -# show the real User-Agent and make sure resume works. -{+forward-override{forward-socks5 10.0.0.2:2222 .} \ - -hide-if-modified-since \ - -overwrite-last-modified \ - -hide-user-agent \ - -filter \ - -deanimate-gifs \ -} -TAG:^User-Agent: NetBSD-ftp/ -TAG:^User-Agent: Novell ZYPP Installer -TAG:^User-Agent: RPM APT-HTTP/ -TAG:^User-Agent: fetch libfetch/ -TAG:^User-Agent: Ubuntu APT-HTTP/ -TAG:^User-Agent: MPlayer/ - - - - - - - - - - - -content-type-overwrite - - - - Typical use: - - Stop useless download menus from popping up, or change the browser's rendering mode - - - - - Effect: - - - Replaces the Content-Type: HTTP server header. - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - - Any string. - - - - - - Notes: - - - The Content-Type: HTTP server header is used by the - browser to decide what to do with the document. The value of this - header can cause the browser to open a download menu instead of - displaying the document by itself, even if the document's format is - supported by the browser. - - - The declared content type can also affect which rendering mode - the browser chooses. If XHTML is delivered as text/html, - many browsers treat it as yet another broken HTML document. - If it is send as application/xml, browsers with - XHTML support will only display it, if the syntax is correct. - - - If you see a web site that proudly uses XHTML buttons, but sets - Content-Type: text/html, you can use &my-app; - to overwrite it with application/xml and validate - the web master's claim inside your XHTML-supporting browser. - If the syntax is incorrect, the browser will complain loudly. - - - You can also go the opposite direction: if your browser prints - error messages instead of rendering a document falsely declared - as XHTML, you can overwrite the content type with - text/html and have it rendered as broken HTML document. - - - By default content-type-overwrite only replaces - Content-Type: headers that look like some kind of text. - If you want to overwrite it unconditionally, you have to combine it with - force-text-mode. - This limitation exists for a reason, think twice before circumventing it. - - - Most of the time it's easier to replace this action with a custom - server-header filter. - It allows you to activate it for every document of a certain site and it will still - only replace the content types you aimed at. - - - Of course you can apply content-type-overwrite - to a whole site and then make URL based exceptions, but it's a lot - more work to get the same precision. - - - - - - Example usage (sections): - - - # Check if www.example.net/ really uses valid XHTML -{ +content-type-overwrite{application/xml} } -www.example.net/ - -# but leave the content type unmodified if the URL looks like a style sheet -{-content-type-overwrite} -www.example.net/.*\.css$ -www.example.net/.*style - - - - - - - - - - - -crunch-client-header - - - - Typical use: - - Remove a client header Privoxy has no dedicated action for. - - - - - Effect: - - - Deletes every header sent by the client that contains the string the user supplied as parameter. - - - - - - Type: - - - Parameterized. - - - - - Parameter: - - - Any string. - - - - - - Notes: - - - This action allows you to block client headers for which no dedicated - Privoxy action exists. - Privoxy will remove every client header that - contains the string you supplied as parameter. - - - Regular expressions are not supported and you can't - use this action to block different headers in the same request, unless - they contain the same string. - - - crunch-client-header is only meant for quick tests. - If you have to block several different headers, or only want to modify - parts of them, you should use a - client-header filter. - - - - Don't block any header without understanding the consequences. - - - - - - - Example usage (section): - - - # Block the non-existent "Privacy-Violation:" client header -{ +crunch-client-header{Privacy-Violation:} } -/ - - - - - - + + + Install Privoxy. See the Installation Section below for platform specific + information. + + - - -crunch-if-none-match - - - - Typical use: - - Prevent yet another way to track the user's steps between sessions. - - - - - Effect: - - - Deletes the If-None-Match: HTTP client header. - - - + + + Advanced users and those who want to offer Privoxy + service to more than just their local machine should check the main config file, especially the security-relevant options. These are + off by default. + + - - Type: - - - Boolean. - - + + + Start Privoxy, if the installation program has + not done this already (may vary according to platform). See the section + Starting Privoxy. + + - - Parameter: - - - N/A - - - + + + Set your browser to use Privoxy as HTTP and + HTTPS (SSL) proxy + by setting the proxy configuration for address of + 127.0.0.1 and port 8118. + DO NOT activate proxying for FTP or + any protocols besides HTTP and HTTPS (SSL) unless you intend to prevent your + browser from using these protocols. + + - - Notes: - - - Removing the If-None-Match: HTTP client header - is useful for filter testing, where you want to force a real - reload instead of getting status code 304 which - would cause the browser to use a cached copy of the page. - - - It is also useful to make sure the header isn't used as a cookie - replacement (unlikely but possible). - - - Blocking the If-None-Match: header shouldn't cause any - caching problems, as long as the If-Modified-Since: header - isn't blocked or missing as well. - - - It is recommended to use this action together with - hide-if-modified-since - and - overwrite-last-modified. - - - + + + Flush your browser's disk and memory caches, to remove any cached ad images. + If using Privoxy to manage + cookies, + you should remove any currently stored cookies too. + + - - Example usage (section): - - - # Let the browser revalidate cached documents but don't -# allow the server to use the revalidation headers for user tracking. -{+hide-if-modified-since{-60} \ - +overwrite-last-modified{randomize} \ - +crunch-if-none-match} -/ - - - - - + + + A default installation should provide a reasonable starting point for + most. There will undoubtedly be occasions where you will want to adjust the + configuration, but that can be dealt with as the need arises. Little + to no initial configuration is required in most cases, you may want + to enable the + web-based action editor though. + Be sure to read the warnings first. + + + See the Configuration section for more + configuration options, and how to customize your installation. + You might also want to look at the next section for a quick + introduction to how Privoxy blocks ads and + banners. + + + + + If you experience ads that slip through, innocent images that are + blocked, or otherwise feel the need to fine-tune + Privoxy's behavior, take a look at the actions files. As a quick start, you might + find the richly commented examples + helpful. You can also view and edit the actions files through the web-based user interface. The + Appendix Troubleshooting: Anatomy of an + Action has hints on how to understand and debug actions that + misbehave. + + - - -crunch-incoming-cookies + + + Please see the section Contacting the + Developers on how to report bugs, problems with websites or to get + help. + + - - - Typical use: - - - Prevent the web server from setting HTTP cookies on your system - - - + + + Now enjoy surfing with enhanced control, comfort and privacy! + + - - Effect: - - - Deletes any Set-Cookie: HTTP headers from server replies. - - - + + - - Type: - - - Boolean. - - - - Parameter: - - - N/A - - - + - - Notes: - - - This action is only concerned with incoming HTTP cookies. For - outgoing HTTP cookies, use - crunch-outgoing-cookies. - Use both to disable HTTP cookies completely. - - - It makes no sense at all to use this action in conjunction - with the session-cookies-only action, - since it would prevent the session cookies from being set. See also - filter-content-cookies. - - - + +Quickstart to Ad Blocking + + + Ad blocking is but one of Privoxy's + array of features. Many of these features are for the technically minded advanced + user. But, ad and banner blocking is surely common ground for everybody. + + + This section will provide a quick summary of ad blocking so + you can get up to speed quickly without having to read the more extensive + information provided below, though this is highly recommended. + + + First a bit of a warning ... blocking ads is much like blocking SPAM: the + more aggressive you are about it, the more likely you are to block + things that were not intended. And the more likely that some things + may not work as intended. So there is a trade off here. If you want + extreme ad free browsing, be prepared to deal with more + problem sites, and to spend more time adjusting the + configuration to solve these unintended consequences. In short, there is + not an easy way to eliminate all ads. Either take + the easy way and settle for most ads blocked with the + default configuration, or jump in and tweak it for your personal surfing + habits and preferences. + + + Secondly, a brief explanation of Privoxy's + actions. Actions in this context, are + the directives we use to tell Privoxy to perform + some task relating to HTTP transactions (i.e. web browsing). We tell + Privoxy to take some action. Each + action has a unique name and function. While there are many potential + actions in Privoxy's + arsenal, only a few are used for ad blocking. Actions, and action + configuration files, are explained in depth below. + + + Actions are specified in Privoxy's configuration, + followed by one or more URLs to which the action should apply. URLs + can actually be URL type patterns that use + wildcards so they can apply potentially to a range of similar URLs. The + actions, together with the URL patterns are called a section. + + + When you connect to a website, the full URL will either match one or more + of the sections as defined in Privoxy's configuration, + or not. If so, then Privoxy will perform the + respective actions. If not, then nothing special happens. Furthermore, web + pages may contain embedded, secondary URLs that your web browser will + use to load additional components of the page, as it parses the + original page's HTML content. An ad image for instance, is just an URL + embedded in the page somewhere. The image itself may be on the same server, + or a server somewhere else on the Internet. Complex web pages will have many + such embedded URLs. &my-app; can deal with each URL individually, so, for + instance, the main page text is not touched, but images from such-and-such + server are blocked. + - - Example usage: - - - +crunch-incoming-cookies - - - - - + + The most important actions for basic ad blocking are: block, handle-as-image, + handle-as-empty-document,and + set-image-blocker: + + + - - -crunch-server-header - - - - Typical use: - - Remove a server header Privoxy has no dedicated action for. - - + + + block - this is perhaps + the single most used action, and is particularly important for ad blocking. + This action stops any contact between your browser and any URL patterns + that match this action's configuration. It can be used for blocking ads, + but also anything that is determined to be unwanted. By itself, it simply + stops any communication with the remote server and sends + Privoxy's own built-in BLOCKED page instead to + let you now what has happened (with some exceptions, see below). + + - - Effect: - - - Deletes every header sent by the server that contains the string the user supplied as parameter. - - - + + + handle-as-image - + tells Privoxy to treat this URL as an image. + Privoxy's default configuration already does this + for all common image types (e.g. GIF), but there are many situations where this + is not so easy to determine. So we'll force it in these cases. This is particularly + important for ad blocking, since only if we know that it's an image of + some kind, can we replace it with an image of our choosing, instead of the + Privoxy BLOCKED page (which would only result in + a broken image icon). There are some limitations to this + though. For instance, you can't just brute-force an image substitution for + an entire HTML page in most situations. + + - - Type: - - - Parameterized. - - + + + handle-as-empty-document - + sends an empty document instead of Privoxy's + normal BLOCKED HTML page. This is useful for file types that are neither + HTML nor images, such as blocking JavaScript files. + + - - Parameter: - - - Any string. - + + + set-image-blocker - tells + Privoxy what to display in place of an ad image that + has hit a block rule. For this to come into play, the URL must match a + block action somewhere in the + configuration, and, it must also match an + handle-as-image action. + + + The configuration options on what to display instead of the ad are: + + + +    pattern - a checkerboard pattern, so that an ad + replacement is obvious. This is the default. + + + + +    blank - A very small empty GIF image is displayed. + This is the so-called invisible configuration option. + + + + +    http://<URL> - A redirect to any image anywhere + of the user's choosing (advanced usage). + + - - - Notes: - - - This action allows you to block server headers for which no dedicated - Privoxy action exists. Privoxy - will remove every server header that contains the string you supplied as parameter. - - - Regular expressions are not supported and you can't - use this action to block different headers in the same request, unless - they contain the same string. - - - crunch-server-header is only meant for quick tests. - If you have to block several different headers, or only want to modify - parts of them, you should use a custom - server-header filter. - - - - Don't block any header without understanding the consequences. - - - - + + - - Example usage (section): - - - # Crunch server headers that try to prevent caching -{ +crunch-server-header{no-cache} } -/ - - - - -
+ + Advanced users will eventually want to explore &my-app; + filters as well. Filters + are very different from blocks. + A block blocks a site, page, or unwanted contented. Filters + are a way of filtering or modifying what is actually on the page. An example + filter usage: a text replacement of no-no for + nasty-word. That is a very simple example. This process can be + used for ad blocking, but it is more in the realm of advanced usage and has + some pitfalls to be wary off. + + + The quickest way to adjust any of these settings is with your browser through + the special Privoxy editor at http://config.privoxy.org/show-status + (shortcut: http://p.p/show-status). This + is an internal page, and does not require Internet access. + - - -crunch-outgoing-cookies + + Note that as of Privoxy 3.0.7 beta the + action editor is disabled by default. Check the + enable-edit-actions + section in the configuration file to learn why and in which + cases it's safe to enable again. + - - - Typical use: - - - Prevent the web server from reading any HTTP cookies from your system - - - + + If you decided to enable the action editor, select the appropriate + actions file, and click + Edit. It is best to put personal or + local preferences in user.action since this is not + meant to be overwritten during upgrades, and will over-ride the settings in + other files. Here you can insert new actions, and URLs for ad + blocking or other purposes, and make other adjustments to the configuration. + Privoxy will detect these changes automatically. + - - Effect: - - - Deletes any Cookie: HTTP headers from client requests. - - - + + A quick and simple step by step example: + - - Type: - - - Boolean. - - + + - - Parameter: - N/A + Right click on the ad image to be blocked, then select + Copy Link Location from the + pop-up menu. - - - - Notes: - This action is only concerned with outgoing HTTP cookies. For - incoming HTTP cookies, use - crunch-incoming-cookies. - Use both to disable HTTP cookies completely. - - - It makes no sense at all to use this action in conjunction - with the session-cookies-only action, - since it would prevent the session cookies from being read. + Set your browser to + http://config.privoxy.org/show-status - - - - Example usage: - +crunch-outgoing-cookies + Find user.action in the top section, and click + on Edit: - - - - + + +
Actions Files in Use + + + + + + [ Screenshot of Actions Files in Use ] + + +
+
+ + + + You should have a section with only + block listed under + Actions:. + If not, click a Insert new section below + button, and in the new section that just appeared, click the + Edit button right under the word Actions:. + This will bring up a list of all actions. Find + block near the top, and click + in the Enabled column, then Submit + just below the list. + + + + + Now, in the block actions section, + click the Add button, and paste the URL the + browser got from Copy Link Location. + Remove the http:// at the beginning of the URL. Then, click + Submit (or + OK if in a pop-up window). + + + + + Now go back to the original page, and press SHIFT-Reload + (or flush all browser caches). The image should be gone now. + + - - -deanimate-gifs + + - - - Typical use: - - Stop those annoying, distracting animated GIF images. - - + + This is a very crude and simple example. There might be good reasons to use a + wildcard pattern match to include potentially similar images from the same + site. For a more extensive explanation of patterns, and + the entire actions concept, see the Actions + section. + - - Effect: - - - De-animate GIF animations, i.e. reduce them to their first or last image. - - - + + For advanced users who want to hand edit their config files, you might want + to now go to the Actions Files Tutorial. + The ideas explained therein also apply to the web-based editor. + + + There are also various + filters that can be used for ad blocking + (filters are a special subset of actions). These + fall into the advanced usage category, and are explained in + depth in later sections. + - - Type: - - - Parameterized. - - + - - Parameter: - - - last or first - - - + - - Notes: - - - This will also shrink the images considerably (in bytes, not pixels!). If - the option first is given, the first frame of the animation - is used as the replacement. If last is given, the last - frame of the animation is used instead, which probably makes more sense for - most banner animations, but also has the risk of not showing the entire - last frame (if it is only a delta to an earlier frame). - - - You can safely use this action with patterns that will also match non-GIF - objects, because no attempt will be made at anything that doesn't look like - a GIF. - - - + - - Example usage: - - - +deanimate-gifs{last} - - - - - - -downgrade-http-version + +Starting Privoxy + + Before launching Privoxy for the first time, you + will want to configure your browser(s) to use + Privoxy as a HTTP and HTTPS (SSL) + proxy. The default is + 127.0.0.1 (or localhost) for the proxy address, and port 8118 (earlier versions + used port 8000). This is the one configuration step that must be done +! + + + Please note that Privoxy can only proxy HTTP and + HTTPS traffic. It will not work with FTP or other protocols. + - - - Typical use: - - Work around (very rare) problems with HTTP/1.1 - - + + +
Proxy Configuration Showing + Mozilla/Netscape HTTP and HTTPS (SSL) Settings + + + + + + [ Screenshot of Mozilla Proxy Configuration ] + + +
+
- - Effect: - - - Downgrades HTTP/1.1 client requests and server replies to HTTP/1.0. - - - - - Type: - - - Boolean. - - + + With Firefox, this is typically set under: + + + + Tools -> Options -> Advanced -> Network ->Connection -> Settings + + + + + Or optionally on some platforms: + + + + Edit -> Preferences -> General -> Connection Settings -> Manual Proxy Configuration + + + - - Parameter: - - - N/A - - - + + With Netscape (and + Mozilla), this can be set under: + - - Notes: - - - This is a left-over from the time when Privoxy - didn't support important HTTP/1.1 features well. It is left here for the - unlikely case that you experience HTTP/1.1-related problems with some server - out there. - - - Note that enabling this action is only a workaround. It should not - be enabled for sites that work without it. While it shouldn't break - any pages, it has an (usually negative) performance impact. - - - If you come across a site where enabling this action helps, please report it, - so the cause of the problem can be analyzed. If the problem turns out to be - caused by a bug in Privoxy it should be - fixed so the following release works without the work around. - - - - - Example usage (section): - - - {+downgrade-http-version} -problem-host.example.com - - - + + + + Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy -
-
+ - - -fast-redirects + + For Internet Explorer v.5-7: + - - - Typical use: - - Fool some click-tracking scripts and speed up indirect links. - - + + Tools -> Internet Options -> Connections -> LAN Settings + - - Effect: - - - Detects redirection URLs and redirects the browser without contacting - the redirection server first. - - - + + Then, check Use Proxy and fill in the appropriate info + (Address: 127.0.0.1, Port: 8118). Include HTTPS (SSL), if you want HTTPS + proxy support too (sometimes labeled Secure). Make sure any + checkboxes like Use the same proxy server for all protocols is + UNCHECKED. You want only HTTP and HTTPS (SSL)! + - - Type: - - - Parameterized. - - + + +
Proxy Configuration Showing + Internet Explorer HTTP and HTTPS (Secure) Settings + + + + + + [ Screenshot of IE Proxy Configuration ] + + +
+
- - Parameter: - - - - - simple-check to just search for the string http:// - to detect redirection URLs. - - - - - check-decoded-url to decode URLs (if necessary) before searching - for redirection URLs. - - - - - - - Notes: - - - Many sites, like yahoo.com, don't just link to other sites. Instead, they - will link to some script on their own servers, giving the destination as a - parameter, which will then redirect you to the final target. URLs - resulting from this scheme typically look like: - http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/. - - - Sometimes, there are even multiple consecutive redirects encoded in the - URL. These redirections via scripts make your web browsing more traceable, - since the server from which you follow such a link can see where you go - to. Apart from that, valuable bandwidth and time is wasted, while your - browser asks the server for one redirect after the other. Plus, it feeds - the advertisers. - - - This feature is currently not very smart and is scheduled for improvement. - If it is enabled by default, you will have to create some exceptions to - this action. It can lead to failures in several ways: - - - Not every URLs with other URLs as parameters is evil. - Some sites offer a real service that requires this information to work. - For example a validation service needs to know, which document to validate. - fast-redirects assumes that every URL parameter that - looks like another URL is a redirection target, and will always redirect to - the last one. Most of the time the assumption is correct, but if it isn't, - the user gets redirected anyway. - - - Another failure occurs if the URL contains other parameters after the URL parameter. - The URL: - http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar. - contains the redirection URL http://www.example.net/, - followed by another parameter. fast-redirects doesn't know that - and will cause a redirect to http://www.example.net/&foo=bar. - Depending on the target server configuration, the parameter will be silently ignored - or lead to a page not found error. You can prevent this problem by - first using the redirect action - to remove the last part of the URL, but it requires a little effort. - - - To detect a redirection URL, fast-redirects only - looks for the string http://, either in plain text - (invalid but often used) or encoded as http%3a//. - Some sites use their own URL encoding scheme, encrypt the address - of the target server or replace it with a database id. In theses cases - fast-redirects is fooled and the request reaches the - redirection server where it probably gets logged. - - - + + After doing this, flush your browser's disk and memory caches to force a + re-reading of all pages and to get rid of any ads that may be cached. Remove + any cookies, + if you want Privoxy to manage that. You are now + ready to start enjoying the benefits of using + Privoxy! + - - Example usage: - - - - { +fast-redirects{simple-check} } - one.example.com + + Privoxy itself is typically started by specifying the + main configuration file to be used on the command line. If no configuration + file is specified on the command line, Privoxy + will look for a file named config in the current + directory. Except on Win32 where it will try config.txt. + - { +fast-redirects{check-decoded-url} } - another.example.com/testing - - - + +Debian + + We use a script. Note that Debian typically starts &my-app; upon booting per + default. It will use the file + /etc/privoxy/config as its main configuration + file. + + + + # /etc/init.d/privoxy start + + + -
-
+ +FreeBSD and ElectroBSD + + To start Privoxy upon booting, add + "privoxy_enable='YES'" to /etc/rc.conf. + Privoxy will use + /usr/local/etc/privoxy/config as its main + configuration file. + + + If you installed Privoxy into a jail, the + paths above are relative to the jail root. + + + To start Privoxy manually, run: + + + + # service privoxy onestart + + + + +Windows + +Click on the &my-app; Icon to start Privoxy. If no configuration file is + specified on the command line, Privoxy will look + for a file named config.txt. Note that Windows will + automatically start &my-app; when the system starts if you chose that option + when installing. + + + Privoxy can run with full Windows service functionality. + On Windows only, the &my-app; program has two new command line arguments + to install and uninstall &my-app; as a service. See the + Windows Installation + instructions for details. + + - - -filter + +Generic instructions for Unix derivates (Solaris, NetBSD, HP-UX etc.) + +Example Unix startup command: + + + + # /usr/sbin/privoxy --user privoxy /etc/privoxy/config + + + + Note that if you installed Privoxy through + a package manager, the package will probably contain a platform-specific + script or configuration file to start Privoxy + upon boot. + + - - - Typical use: - - Get rid of HTML and JavaScript annoyances, banner advertisements (by size), - do fun text replacements, add personalized effects, etc. - - + +OS/2 + + During installation, Privoxy is configured to + start automatically when the system restarts. You can start it manually by + double-clicking on the Privoxy icon in the + Privoxy folder. + + - - Effect: - - - All instances of text-based type, most notably HTML and JavaScript, to which - this action applies, can be filtered on-the-fly through the specified regular - expression based substitutions. (Note: as of version 3.0.3 plain text documents - are exempted from filtering, because web servers often use the - text/plain MIME type for all files whose type they don't know.) - - - + +Mac OS X + + The privoxy service will automatically start after a successful installation + (and thereafter every time your computer starts up) however you will need to + configure your web browser(s) to use it. To do so, configure them to use a + proxy for HTTP and HTTPS at the address 127.0.0.1:8118. + + + To prevent the privoxy service from automatically starting when your computer + starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist + (on OS X 10.5 and higher) or the folder named + /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger'). + + + To manually start or stop the privoxy service, use the scripts startPrivoxy.sh + and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an + administrator account, using sudo. + + - - Type: - - - Parameterized. - - - - Parameter: - - - The name of a content filter, as defined in the filter file. - Filters can be defined in one or more files as defined by the - filterfile - option in the config file. - default.filter is the collection of filters - supplied by the developers. Locally defined filters should go - in their own file, such as user.filter. - - - When used in its negative form, - and without parameters, all filtering is completely disabled. - - - + - -force-text-mode - - - - Typical use: - - Force Privoxy to treat a document as if it was in some kind of text format. - - + + You will probably want to keep an eye out for sites for which you may prefer + persistent cookies, and add these to your actions configuration as needed. By + default, most of these will be accepted only during the current browser + session (aka session cookies), unless you add them to the + configuration. If you want the browser to handle this instead, you will need + to edit user.action (or through the web based interface) + and disable this feature. If you use more than one browser, it would make + more sense to let Privoxy handle this. In which + case, the browser(s) should be set to accept all cookies. + - - Effect: - - - Declares a document as text, even if the Content-Type: isn't detected as such. - - - + + Another feature where you will probably want to define exceptions for trusted + sites is the popup-killing (through +filter{popups}), + because your favorite shopping, banking, or leisure site may need + popups (explained below). + - - Type: - - - Boolean. - - + + Privoxy does not support all of the optional HTTP/1.1 + features yet. In the unlikely event that you experience inexplicable problems + with browsers that use HTTP/1.1 per default + (like Mozilla or recent versions of I.E.), you might + try to force HTTP/1.0 compatibility. For Mozilla, look under Edit -> + Preferences -> Debug -> Networking. + Alternatively, set the +downgrade-http-version config option in + default.action which will downgrade your browser's HTTP + requests from HTTP/1.1 to HTTP/1.0 before processing them. + - - Parameter: - - - N/A - - - + + After running Privoxy for a while, you can + start to fine tune the configuration to suit your personal, or site, + preferences and requirements. There are many, many aspects that can + be customized. Actions + can be adjusted by pointing your browser to + http://config.privoxy.org/ + (shortcut: http://p.p/), + and then follow the link to View & Change the Current Configuration. + (This is an internal page and does not require Internet access.) + - - Notes: - - - As explained above, - Privoxy tries to only filter files that are - in some kind of text format. The same restrictions apply to - content-type-overwrite. - force-text-mode declares a document as text, - without looking at the Content-Type: first. - - - - Think twice before activating this action. Filtering binary data - with regular expressions can cause file damage. - - - - + + In fact, various aspects of Privoxy + configuration can be viewed from this page, including + current configuration parameters, source code version numbers, + the browser's request headers, and actions that apply + to a given URL. In addition to the actions file + editor mentioned above, Privoxy can also + be turned on and off (toggled) from this page. + - - Example usage: - - - -+force-text-mode - - - - - - + + If you encounter problems, try loading the page without + Privoxy. If that helps, enter the URL where + you have the problems into the browser + based rule tracing utility. See which rules apply and why, and + then try turning them off for that site one after the other, until the problem + is gone. When you have found the culprit, you might want to turn the rest on + again. + + + + If the above paragraph sounds gibberish to you, you might want to read more about the actions concept + or even dive deep into the Appendix + on actions. + + + + If you can't get rid of the problem at all, think you've found a bug in + Privoxy, want to propose a new feature or smarter rules, please see the + section Contacting the + Developers below. + + +--> + + + +Command Line Options + + Privoxy may be invoked with the following + command-line options: + + + + + + + + --config-test + + + Exit after loading the configuration files before binding to + the listen address. The exit code signals whether or not the + configuration files have been successfully loaded. + + + If the exit code is 1, at least one of the configuration files + is invalid, if it is 0, all the configuration files have been + successfully loaded (but may still contain errors that can + currently only be detected at run time). + + + This option doesn't affect the log setting, combination with + --no-daemon is recommended if a configured + log file shouldn't be used. + + + + + --version + + + Print version info and exit. Unix only. + + + + + --help + + + Print short usage info and exit. Unix only. + + + + + --no-daemon + + + Don't become a daemon, i.e. don't fork and become process group + leader, and don't detach from controlling tty. Unix only. + + + + + --pidfile FILE + + + On startup, write the process ID to FILE. Delete the + FILE on exit. Failure to create or delete the + FILE is non-fatal. If no FILE + option is given, no PID file will be used. Unix only. + + + + + --user USER[.GROUP] + + + After (optionally) writing the PID file, assume the user ID of + USER, and if included the GID of GROUP. Exit if the + privileges are not sufficient to do so. Unix only. + + + + + --chroot + + + Before changing to the user ID given in the --user option, + chroot to that user's home directory, i.e. make the kernel pretend to the &my-app; + process that the directory tree starts there. If set up carefully, this can limit + the impact of possible vulnerabilities in &my-app; to the files contained in that hierarchy. + Unix only. + + + + + --pre-chroot-nslookup hostname + + + Specifies a hostname (for example www.privoxy.org) to look up before doing a chroot. + On some systems, initializing the resolver library involves reading config files from + /etc and/or loading additional shared libraries from /lib. + On these systems, doing a hostname lookup before the chroot reduces + the number of files that must be copied into the chroot tree. + + + For fastest startup speed, a good value is a hostname that is not in /etc/hosts but that + your local name server (listed in /etc/resolv.conf) can resolve without recursion + (that is, without having to ask any other name servers). The hostname need not exist, + but if it doesn't, an error message (which can be ignored) will be output. + + + + + + configfile + + + If no configfile is included on the command line, + Privoxy will look for a file named + config in the current directory (except on Win32 + where it will look for config.txt instead). Specify + full path to avoid confusion. If no config file is found, + Privoxy will fail to start. + + + + - - -forward-override - - - - Typical use: - - Change the forwarding settings based on User-Agent or request origin - - + + On MS Windows only there are two additional + command-line options to allow Privoxy to install and + run as a service. See the +Window Installation section +for details. + - - Effect: - - - Overrules the forward directives in the configuration file. - - - + - - Type: - - - Multi-value. - - + - - Parameter: - - - - forward . to use a direct connection without any additional proxies. - - - - forward 127.0.0.1:8123 to use the HTTP proxy listening at 127.0.0.1 port 8123. - - - - - forward-socks4a 127.0.0.1:9050 . to use the socks4a proxy listening at - 127.0.0.1 port 9050. Replace forward-socks4a with forward-socks4 - to use a socks4 connection (with local DNS resolution) instead, use forward-socks5 - for socks5 connections (with remote DNS resolution). - - - - - forward-socks4a 127.0.0.1:9050 proxy.example.org:8000 to use the socks4a proxy - listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000. - Replace forward-socks4a with forward-socks4 to use a socks4 connection - (with local DNS resolution) instead, use forward-socks5 - for socks5 connections (with remote DNS resolution). - - - - - + - - Notes: - - - This action takes parameters similar to the - forward directives in the configuration - file, but without the URL pattern. It can be used as replacement, but normally it's only - used in cases where matching based on the request URL isn't sufficient. - - - - Please read the description for the forward directives before - using this action. Forwarding to the wrong people will reduce your privacy and increase the - chances of man-in-the-middle attacks. - - - If the ports are missing or invalid, default values will be used. This might change - in the future and you shouldn't rely on it. Otherwise incorrect syntax causes Privoxy - to exit. - - - Use the show-url-info CGI page - to verify that your forward settings do what you thought the do. - - - - - - Example usage: - - - -# Always use direct connections for requests previously tagged as -# User-Agent: fetch libfetch/2.0 and make sure -# resuming downloads continues to work. -# This way you can continue to use Tor for your normal browsing, -# without overloading the Tor network with your FreeBSD ports updates -# or downloads of bigger files like ISOs. -# Note that HTTP headers are easy to fake and therefore their -# values are as (un)trustworthy as your clients and users. -{+forward-override{forward .} \ - -hide-if-modified-since \ - -overwrite-last-modified \ -} -TAG:^User-Agent: fetch libfetch/2\.0$ - - - - - - + +Privoxy Configuration + + All Privoxy configuration is stored + in text files. These files can be edited with a text editor. + Many important aspects of Privoxy can + also be controlled easily with a web browser. + - -handle-as-empty-document - - - - Typical use: - - Mark URLs that should be replaced by empty documents if they get blocked - - - - Effect: - - - This action alone doesn't do anything noticeable. It just marks URLs. - If the block action also applies, - the presence or absence of this mark decides whether an HTML BLOCKED - page, or an empty document will be sent to the client as a substitute for the blocked content. - The empty document isn't literally empty, but actually contains a single space. - - - + +Controlling Privoxy with Your Web Browser + + Privoxy's user interface can be reached through the special + URL http://config.privoxy.org/ + (shortcut: http://p.p/), + which is a built-in page and works without Internet access. + You will see the following section: - - Type: - - - Boolean. - - + - - Parameter: - - - N/A - - - + + + +     Privoxy Menu + + + +         ▪  View & change the current configuration + + +         ▪  View the source code version numbers + + +         ▪  View the request headers. + + +         ▪  Look up which actions apply to a URL and why + + +         ▪  Toggle Privoxy on or off + + +         ▪  Documentation + + + + + + + + This should be self-explanatory. Note the first item leads to an editor for the + actions files, which is where the ad, banner, + cookie, and URL blocking magic is configured as well as other advanced features of + Privoxy. This is an easy way to adjust various + aspects of Privoxy configuration. The actions + file, and other configuration files, are explained in detail below. + + + + Toggle Privoxy On or Off is handy for sites that might + have problems with your current actions and filters. You can in fact use + it as a test to see whether it is Privoxy + causing the problem or not. Privoxy continues + to run as a proxy in this case, but all manipulation is disabled, i.e. + Privoxy acts like a normal forwarding proxy. + + + + Note that several of the features described above are disabled by default + in Privoxy 3.0.7 beta and later. + Check the + configuration file to learn why + and in which cases it's safe to enable them again. + + + + + - - Notes: - - - Some browsers complain about syntax errors if JavaScript documents - are blocked with Privoxy's - default HTML page; this option can be used to silence them. - And of course this action can also be used to eliminate the &my-app; - BLOCKED message in frames. - - - The content type for the empty document can be specified with - content-type-overwrite{}, - but usually this isn't necessary. - - - - - Example usage: - - - # Block all documents on example.org that end with ".js", -# but send an empty document instead of the usual HTML message. -{+block{Blocked JavaScript} +handle-as-empty-document} -example.org/.*\.js$ - - - - - - - -handle-as-image - - - Typical use: - - Mark URLs as belonging to images (so they'll be replaced by images if they do get blocked, rather than HTML pages) - - + +Configuration Files Overview + + For Unix, *BSD and Linux, all configuration files are located in + /etc/privoxy/ by default. For MS Windows, OS/2, and + AmigaOS these are all in the same directory as the + Privoxy executable. + - - Effect: - - - This action alone doesn't do anything noticeable. It just marks URLs as images. - If the block action also applies, - the presence or absence of this mark decides whether an HTML blocked - page, or a replacement image (as determined by the set-image-blocker action) will be sent to the - client as a substitute for the blocked content. - - - + + The installed defaults provide a reasonable starting point, though + some settings may be aggressive by some standards. For the time being, the + principle configuration files are: + - - Type: - - - Boolean. - - + + - - Parameter: - N/A + The main configuration file is named config + on Linux, Unix, BSD, OS/2, and AmigaOS and config.txt + on Windows. This is a required file. - - - Notes: - The below generic example section is actually part of default.action. - It marks all URLs with well-known image file name extensions as images and should - be left intact. + match-all.action is used to define which actions + relating to banner-blocking, images, pop-ups, content modification, cookie handling + etc should be applied by default. It should be the first actions file loaded. - Users will probably only want to use the handle-as-image action in conjunction with - block, to block sources of banners, whose URLs don't - reflect the file type, like in the second example section. + default.action defines many exceptions (both positive and negative) + from the default set of actions that's configured in match-all.action. + It should be the second actions file loaded and shouldn't be edited by the user. - Note that you cannot treat HTML pages as images in most cases. For instance, (in-line) ad - frames require an HTML page to be sent, or they won't display properly. - Forcing handle-as-image in this situation will not replace the - ad frame with an image, but lead to error messages. + Multiple actions files may be defined in config. These + are processed in the order they are defined. Local customizations and locally + preferred exceptions to the default policies as defined in + match-all.action (which you will most probably want + to define sooner or later) are best applied in user.action, + where you can preserve them across upgrades. The file isn't installed by all + installers, but you can easily create it yourself with a text editor. + + + There is also a web based editor that can be accessed from + http://config.privoxy.org/show-status + (Shortcut: http://p.p/show-status) for the + various actions files. - - - Example usage (sections): - # Generic image extensions: -# -{+handle-as-image} -/.*\.(gif|jpg|jpeg|png|bmp|ico)$ - -# These don't look like images, but they're banners and should be -# blocked as images: -# -{+block{Nasty banners.} +handle-as-image} -nasty-banner-server.example.com/junk.cgi\?output=trash - + Filter files (the filter + file) can be used to re-write the raw page content, including + viewable text as well as embedded HTML and JavaScript, and whatever else + lurks on any given web page. The filtering jobs are only pre-defined here; + whether to apply them or not is up to the actions files. + default.filter includes various filters made + available for use by the developers. Some are much more intrusive than + others, and all should be used with caution. You may define additional + filter files in config as you can with + actions files. We suggest user.filter for any + locally defined filters or customizations. - - - + + - - -hide-accept-language - - - - Typical use: - - Pretend to use different language settings. - - + + The syntax of the configuration and filter files may change between different + Privoxy versions, unfortunately some enhancements cost backwards compatibility. + + - - Effect: - - - Deletes or replaces the Accept-Language: HTTP header in client requests. - - - + + All files use the # character to denote a + comment (the rest of the line will be ignored) and understand line continuation + through placing a backslash ("\") as the very last character + in a line. If the # is preceded by a backslash, it looses + its special function. Placing a # in front of an otherwise + valid configuration line to prevent it from being interpreted is called "commenting + out" that line. Blank lines are ignored. + - - Type: - - - Parameterized. - - + + The actions files and filter files + can use Perl style regular expressions for + maximum flexibility. + - - Parameter: - - - Keyword: block, or any user defined value. - - - + + After making any changes, there is no need to restart + Privoxy in order for the changes to take + effect. Privoxy detects such changes + automatically. Note, however, that it may take one or two additional + requests for the change to take effect. When changing the listening address + of Privoxy, these wake up requests + must obviously be sent to the old listening address. + - - Notes: - - - Faking the browser's language settings can be useful to make a - foreign User-Agent set with - hide-user-agent - more believable. - - - However some sites with content in different languages check the - Accept-Language: to decide which one to take by default. - Sometimes it isn't possible to later switch to another language without - changing the Accept-Language: header first. - - - Therefore it's a good idea to either only change the - Accept-Language: header to languages you understand, - or to languages that aren't wide spread. - - - Before setting the Accept-Language: header - to a rare language, you should consider that it helps to - make your requests unique and thus easier to trace. - If you don't plan to change this header frequently, - you should stick to a common language. - - - + + While under development, the configuration content is subject to change. + The below documentation may not be accurate by the time you read this. + Also, what constitutes a default setting, may change, so + please check all your configuration files on important issues. + +]]> - - Example usage (section): - - - # Pretend to use Canadian language settings. -{+hide-accept-language{en-ca} \ -+hide-user-agent{Mozilla/5.0 (X11; U; OpenBSD i386; en-CA; rv:1.8.0.4) Gecko/20060628 Firefox/1.5.0.4} \ -} -/ - - - - - + + + + + + + + + + + &config; + + + + + + + + + +Actions Files - - -hide-content-disposition - - - Typical use: - - Prevent download menus for content you prefer to view inside the browser. - - - - - Effect: + + The actions files are used to define what actions + Privoxy takes for which URLs, and thus determines + how ad images, cookies and various other aspects of HTTP content and + transactions are handled, and on which sites (or even parts thereof). + There are a number of such actions, with a wide range of functionality. + Each action does something a little different. + These actions give us a veritable arsenal of tools with which to exert + our control, preferences and independence. Actions can be combined so that + their effects are aggregated when applied against a given set of URLs. + + + There + are three action files included with Privoxy with + differing purposes: + + + - Deletes or replaces the Content-Disposition: HTTP header set by some servers. + match-all.action - is used to define which + actions relating to banner-blocking, images, pop-ups, + content modification, cookie handling etc should be applied by default. + It should be the first actions file loaded - - - - Type: - - Parameterized. + + default.action - defines many exceptions (both + positive and negative) from the default set of actions that's configured + in match-all.action. It is a set of rules that should + work reasonably well as-is for most users. This file is only supposed to + be edited by the developers. It should be the second actions file loaded. + - - - - Parameter: - Keyword: block, or any user defined value. + user.action - is intended to be for local site + preferences and exceptions. As an example, if your ISP or your bank + has specific requirements, and need special handling, this kind of + thing should go here. This file will not be upgraded. - - - - Notes: - Some servers set the Content-Disposition: HTTP header for - documents they assume you want to save locally before viewing them. - The Content-Disposition: header contains the file name - the browser is supposed to use by default. + Edit Set to Cautious Set to Medium Set to Advanced - In most browsers that understand this header, it makes it impossible to - just view the document, without downloading it first, - even if it's just a simple text file or an image. + These have increasing levels of aggressiveness and have no + influence on your browsing unless you select them explicitly in the + editor. A default installation should be pre-set to + Cautious. New users should try this for a while before + adjusting the settings to more aggressive levels. The more aggressive + the settings, then the more likelihood there is of problems such as sites + not working as they should. - Removing the Content-Disposition: header helps - to prevent this annoyance, but some browsers additionally check the - Content-Type: header, before they decide if they can - display a document without saving it first. In these cases, you have - to change this header as well, before the browser stops displaying - download menus. + The Edit button allows you to turn each + action on/off individually for fine-tuning. The Cautious + button changes the actions list to low/safe settings which will activate + ad blocking and a minimal set of &my-app;'s features, and subsequently + there will be less of a chance for accidental problems. The + Medium button sets the list to a medium level of + other features and a low level set of privacy features. The + Advanced button sets the list to a high level of + ad blocking and medium level of privacy. See the chart below. The latter + three buttons over-ride any changes via with the + Edit button. More fine-tuning can be done in the + lower sections of this internal page. - It is also possible to change the server's file name suggestion - to another one, but in most cases it isn't worth the time to set - it up. + While the actions file editor allows to enable these settings in all + actions files, they are only supposed to be enabled in the first one + to make sure you don't unintentionally overrule earlier rules. - This action will probably be removed in the future, - use server-header filters instead. + The default profiles, and their associated actions, as pre-defined in + default.action are: + + Default Configurations + + + + + + + + Feature + Cautious + Medium + Advanced + + + + + + + + + + + + + + Ad-blocking Aggressiveness + medium + high + high + + + + Ad-filtering by size + no + yes + yes + + + + Ad-filtering by link + no + no + yes + + + Pop-up killing + blocks only + blocks only + blocks only + + + + Privacy Features + low + medium + medium/high + + + + Cookie handling + none + session-only + kill + + + + Referer forging + no + yes + yes + + + + GIF de-animation + no + yes + yes + + + + Fast redirects + no + no + yes + + + + HTML taming + no + no + yes + + + + JavaScript taming + no + no + yes + + + + Web-bug killing + no + yes + yes + + + + Image tag reordering + no + yes + yes + + + + +
+
+
-
+ + + + + The list of actions files to be used are defined in the main configuration + file, and are processed in the order they are defined (e.g. + default.action is typically processed before + user.action). The content of these can all be viewed and + edited from http://config.privoxy.org/show-status. + The over-riding principle when applying actions, is that the last action that + matches a given URL wins. The broadest, most general rules go first + (defined in default.action), + followed by any exceptions (typically also in + default.action), which are then followed lastly by any + local preferences (typically in user.action). + Generally, user.action has the last word. + + + + An actions file typically has multiple sections. If you want to use + aliases in an actions file, you have to place the (optional) + alias section at the top of that file. + Then comes the default set of rules which will apply universally to all + sites and pages (be very careful with using such a + universal set in user.action or any other actions file after + default.action, because it will override the result + from consulting any previous file). And then below that, + exceptions to the defined universal policies. You can regard + user.action as an appendix to default.action, + with the advantage that it is a separate file, which makes preserving your + personal settings across Privoxy upgrades easier. + - - Example usage: - - - # Disarm the download link in Sourceforge's patch tracker -{ -filter \ - +content-type-overwrite{text/plain}\ - +hide-content-disposition{block} } - .sourceforge.net/tracker/download\.php - - - -
-
+ + Actions can be used to block anything you want, including ads, banners, or + just some obnoxious URL whose content you would rather not see. Cookies can be accepted + or rejected, or accepted only during the current browser session (i.e. not + written to disk), content can be modified, some JavaScripts tamed, user-tracking + fooled, and much more. See below for a complete list + of actions. + + + + +Finding the Right Mix + + Note that some actions, like cookie suppression + or script disabling, may render some sites unusable that rely on these + techniques to work properly. Finding the right mix of actions is not always easy and + certainly a matter of personal taste. And, things can always change, requiring + refinements in the configuration. In general, it can be said that the more + aggressive your default settings (in the top section of the + actions file) are, the more exceptions for trusted sites you + will have to make later. If, for example, you want to crunch all cookies per + default, you'll have to make exceptions from that rule for sites that you + regularly use and that require cookies for actually useful purposes, like maybe + your bank, favorite shop, or newspaper. + + + We have tried to provide you with reasonable rules to start from in the + distribution actions files. But there is no general rule of thumb on these + things. There just are too many variables, and sites are constantly changing. + Sooner or later you will want to change the rules (and read this chapter again :). + + - -hide-if-modified-since - - - - Typical use: - - Prevent yet another way to track the user's steps between sessions. - - + +How to Edit + + The easiest way to edit the actions files is with a browser by + using our browser-based editor, which can be reached from http://config.privoxy.org/show-status. + Note: the config file option enable-edit-actions must be enabled for + this to work. The editor allows both fine-grained control over every single + feature on a per-URL basis, and easy choosing from wholesale sets of defaults + like Cautious, Medium or + Advanced. Warning: the Advanced setting is more + aggressive, and will be more likely to cause problems for some sites. + Experienced users only! + - - Effect: - - - Deletes the If-Modified-Since: HTTP client header or modifies its value. - - - + + If you prefer plain text editing to GUIs, you can of course also directly edit the + the actions files with your favorite text editor. Look at + default.action which is richly commented with many + good examples. + + - - Type: - - - Parameterized. - - - - Parameter: - - - Keyword: block, or a user defined value that specifies a range of hours. - - - + +How Actions are Applied to Requests + + Actions files are divided into sections. There are special sections, + like the alias sections which will + be discussed later. For now let's concentrate on regular sections: They have a + heading line (often split up to multiple lines for readability) which consist + of a list of actions, separated by whitespace and enclosed in curly braces. + Below that, there is a list of URL and tag patterns, each on a separate line. + - - Notes: - - - Removing this header is useful for filter testing, where you want to force a real - reload instead of getting status code 304, which would cause the - browser to use a cached copy of the page. - - - Instead of removing the header, hide-if-modified-since can - also add or subtract a random amount of time to/from the header's value. - You specify a range of minutes where the random factor should be chosen from and - Privoxy does the rest. A negative value means - subtracting, a positive value adding. - - - Randomizing the value of the If-Modified-Since: makes - it less likely that the server can use the time as a cookie replacement, - but you will run into caching problems if the random range is too high. - - - It is a good idea to only use a small negative value and let - overwrite-last-modified - handle the greater changes. - - - It is also recommended to use this action together with - crunch-if-none-match, - otherwise it's more or less pointless. - - - + + To determine which actions apply to a request, the URL of the request is + compared to all URL patterns in each action file. + Every time it matches, the list of applicable actions for the request is + incrementally updated, using the heading of the section in which the + pattern is located. The same is done again for tags and tag patterns later on. + - - Example usage (section): - - - # Let the browser revalidate but make tracking based on the time less likely. -{+hide-if-modified-since{-60} \ - +overwrite-last-modified{randomize} \ - +crunch-if-none-match} -/ - - - - - + + If multiple applying sections set the same action differently, + the last match wins. If not, the effects are aggregated. + E.g. a URL might match a regular section with a heading line of { + +handle-as-image }, + then later another one with just { + +block }, resulting + in both actions to apply. And there may well be + cases where you will want to combine actions together. Such a section then + might look like: + + + + + { +handle-as-image +block{Banner ads.} } + # Block these as if they were images. Send no block page. + banners.example.com + media.example.com/.*banners + .example.com/images/ads/ + + + + You can trace this process for URL patterns and any given URL by visiting http://config.privoxy.org/show-url-info. + + + Examples and more detail on this is provided in the Appendix, + Troubleshooting: Anatomy of an Action section. + + - -hide-from-header + +Patterns + + As mentioned, Privoxy uses patterns + to determine what actions might apply to which sites and + pages your browser attempts to access. These patterns use wild + card type pattern matching to achieve a high degree of + flexibility. This allows one expression to be expanded and potentially match + against many similar patterns. + - - - Typical use: - - Keep your (old and ill) browser from telling web servers your email address - - + + Generally, an URL pattern has the form + <host><port>/<path>, where the + <host>, the <port> + and the <path> are optional. (This is why the special + / pattern matches all URLs). Note that the protocol + portion of the URL pattern (e.g. http://) should + not be included in the pattern. This is assumed already! + + + The pattern matching syntax is different for the host and path parts of + the URL. The host part uses a simple globbing type matching technique, + while the path part uses more flexible + Regular + Expressions (POSIX 1003.2). + + + The port part of a pattern is a decimal port number preceded by a colon + (:). If the host part contains a numerical IPv6 address, + it has to be put into angle brackets + (<, >). + + - Effect: + www.example.com/ - Deletes any existing From: HTTP header, or replaces it with the - specified string. - - - - - - Type: - - - Parameterized. + is a host-only pattern and will match any request to www.example.com, + regardless of which document on that server is requested. So ALL pages in + this domain would be covered by the scope of this action. Note that a + simple example.com is different and would NOT match. + - - Parameter: + www.example.com - Keyword: block, or any user defined value. + means exactly the same. For host-only patterns, the trailing / may + be omitted. - - Notes: + www.example.com/index.html - The keyword block will completely remove the header - (not to be confused with the block - action). - - - Alternately, you can specify any value you prefer to be sent to the web - server. If you do, it is a matter of fairness not to use any address that - is actually used by a real person. - - - This action is rarely needed, as modern web browsers don't send - From: headers anymore. + matches all the documents on www.example.com + whose name starts with /index.html. - - Example usage: + www.example.com/index.html$ - +hide-from-header{block} or - +hide-from-header{spam-me-senseless@sittingduck.example.com} + matches only the single document /index.html + on www.example.com. - - - - - - -hide-referrer - - - Typical use: + /index.html$ - Conceal which link you followed to get to a particular site + + matches the document /index.html, regardless of the domain, + i.e. on any web server anywhere. + - - Effect: + / - Deletes the Referer: (sic) HTTP header from the client request, - or replaces it with a forged one. + Matches any URL because there's no requirement for either the + domain or the path to match anything. - - Type: - + :8000/ - Parameterized. + + Matches any URL pointing to TCP port 8000. + - - Parameter: + 10.0.0.1/ - - - conditional-block to delete the header completely if the host has changed. - - - conditional-forge to forge the header if the host has changed. - - - block to delete the header unconditionally. - - - forge to pretend to be coming from the homepage of the server we are talking to. - - - Any other string to set a user defined referrer. - - + + Matches any URL with the host address 10.0.0.1. + (Note that the real URL uses plain brackets, not angle brackets.) + - - Notes: + <2001:db8::1>/ - conditional-block is the only parameter, - that isn't easily detected in the server's log file. If it blocks the - referrer, the request will look like the visitor used a bookmark or - typed in the address directly. - - - Leaving the referrer unmodified for requests on the same host - allows the server owner to see the visitor's click path, - but in most cases she could also get that information by comparing - other parts of the log file: for example the User-Agent if it isn't - a very common one, or the user's IP address if it doesn't change between - different requests. - - - Always blocking the referrer, or using a custom one, can lead to - failures on servers that check the referrer before they answer any - requests, in an attempt to prevent their content from being - embedded or linked to elsewhere. - - - Both conditional-block and forge - will work with referrer checks, as long as content and valid referring page - are on the same host. Most of the time that's the case. - - - hide-referer is an alternate spelling of - hide-referrer and the two can be can be freely - substituted with each other. (referrer is the - correct English spelling, however the HTTP specification has a bug - it - requires it to be spelled as referer.) + Matches any URL with the host address 2001:db8::1. + (Note that the real URL uses plain brackets, not angle brackets.) - - Example usage: + index.html - +hide-referrer{forge} or - +hide-referrer{http://www.yahoo.com/} + matches nothing, since it would be interpreted as a domain name and + there is no top-level domain called .html. So its + a mistake. - - -hide-user-agent +The Host Pattern + + + The matching of the host part offers some flexible options: if the + host pattern starts or ends with a dot, it becomes unanchored at that end. + The host pattern is often referred to as domain pattern as it is usually + used to match domain names and not IP addresses. + For example: + - Typical use: + .example.com - Try to conceal your type of browser and client operating system + + matches any domain with first-level domain com + and second-level domain example. + For example www.example.com, + example.com and foo.bar.baz.example.com. + Note that it wouldn't match if the second-level domain was another-example. + - - Effect: + www. - Replaces the value of the User-Agent: HTTP header - in client requests with the specified value. + matches any domain that STARTS with + www. (It also matches the domain + www but most of the time that doesn't matter.) - - Type: - + .example. - Parameterized. + + matches any domain that CONTAINS .example.. + And, by the way, also included would be any files or documents that exist + within that domain since no path limitations are specified. (Correctly + speaking: It matches any FQDN that contains example as + a domain.) This might be www.example.com, + news.example.de, or + www.example.net/cgi/testing.pl for instance. All these + cases are matched. + + + + + Additionally, there are wild-cards that you can use in the domain names + themselves. These work similarly to shell globbing type wild-cards: + * represents zero or more arbitrary characters (this is + equivalent to the + Regular + Expression based syntax of .*), + ? represents any single character (this is equivalent to the + regular expression syntax of a simple .), and you can define + character classes in square brackets which is similar to + the same regular expression technique. All of this can be freely mixed: + + - Parameter: + ad*.example.com - Any user-defined string. + matches adserver.example.com, + ads.example.com, etc but not sfads.example.com - - Notes: + *ad*.example.com - - - This can lead to problems on web sites that depend on looking at this header in - order to customize their content for different browsers (which, by the - way, is NOT the right thing to do: good web sites - work browser-independently). - - - Using this action in multi-user setups or wherever different types of - browsers will access the same Privoxy is - not recommended. In single-user, single-browser - setups, you might use it to delete your OS version information from - the headers, because it is an invitation to exploit known bugs for your - OS. It is also occasionally useful to forge this in order to access - sites that won't let you in otherwise (though there may be a good - reason in some cases). + matches all of the above, and then some. + + + + .?pix.com + - More information on known user-agent strings can be found at - http://www.user-agents.org/ - and - http://en.wikipedia.org/wiki/User_agent. + matches www.ipix.com, + pictures.epix.com, a.b.c.d.e.upix.com etc. - + - - Example usage: + www[1-9a-ez].example.c* - +hide-user-agent{Netscape 6.1 (X11; I; Linux 2.4.18 i686)} + matches www1.example.com, + www4.example.cc, wwwd.example.cy, + wwwz.example.com etc., but not + wwww.example.com. + + + While flexible, this is not the sophistication of full regular expression based syntax. + + + + + + +The Path Pattern + + + Privoxy uses modern POSIX 1003.2 + Regular + Expressions for matching the path portion (after the slash), + and is thus more flexible. + + + + There is an Appendix with a brief quick-start into regular + expressions, you also might want to have a look at your operating system's documentation + on regular expressions (try man re_format). + + + + Note that the path pattern is automatically left-anchored at the /, + i.e. it matches as if it would start with a ^ (regular expression speak + for the beginning of a line). + - - -limit-connect + + Please also note that matching in the path is CASE INSENSITIVE + by default, but you can switch to case sensitive at any point in the pattern by using the + (?-i) switch: www.example.com/(?-i)PaTtErN.* will match + only documents whose path starts with PaTtErN in + exactly this capitalization. + - Typical use: - - Prevent abuse of Privoxy as a TCP proxy relay or disable SSL for untrusted sites - - - - - Effect: + .example.com/.* - Specifies to which ports HTTP CONNECT requests are allowable. + Is equivalent to just .example.com, since any documents + within that domain are matched with or without the .* + regular expression. This is redundant - - Type: - + .example.com/.*/index.html$ - Parameterized. + + Will match any page in the domain of example.com that is + named index.html, and that is part of some path. For + example, it matches www.example.com/testing/index.html but + NOT www.example.com/index.html because the regular + expression called for at least two /'s, thus the path + requirement. It also would match + www.example.com/testing/index_html, because of the + special meta-character .. + - - Parameter: + .example.com/(.*/)?index\.html$ - A comma-separated list of ports or port ranges (the latter using dashes, with the minimum - defaulting to 0 and the maximum to 65K). + This regular expression is conditional so it will match any page + named index.html regardless of path which in this case can + have one or more /'s. And this one must contain exactly + .html (but does not have to end with that!). - - Notes: + .example.com/(.*/)(ads|banners?|junk) - By default, i.e. if no limit-connect action applies, - Privoxy allows HTTP CONNECT requests to all - ports. Use limit-connect if fine-grained control - is desired for some or all destinations. + This regular expression will match any path of example.com + that contains any of the words ads, banner, + banners (because of the ?) or junk. + The path does not have to end in these words, just contain them. - - The CONNECT methods exists in HTTP to allow access to secure websites - (https:// URLs) through proxies. It works very simply: - the proxy connects to the server on the specified port, and then - short-circuits its connections to the client and to the remote server. - This means CONNECT-enabled proxies can be used as TCP relays very easily. - - - Privoxy relays HTTPS traffic without seeing - the decoded content. Websites can leverage this limitation to circumvent &my-app;'s - filters. By specifying an invalid port range you can disable HTTPS entirely. - - - Example usages: + .example.com/(.*/)(ads|banners?|junk)/.*\.(jpe?g|gif|png)$ - - - - - +limit-connect{443} # Port 443 is OK. -+limit-connect{80,443} # Ports 80 and 443 are OK. -+limit-connect{-3, 7, 20-100, 500-} # Ports less than 3, 7, 20 to 100 and above 500 are OK. -+limit-connect{-} # All ports are OK -+limit-connect{,} # No HTTPS/SSL traffic is allowed + + This is very much the same as above, except now it must end in either + .jpg, .jpeg, .gif or .png. So this + one is limited to common image formats. + + + There are many, many good examples to be found in default.action, + and more tutorials below in Appendix on regular expressions. + + + + + - -prevent-compression +The Request Tag Pattern - - - Typical use: - - - Ensure that servers send the content uncompressed, so it can be - passed through filters. - - - + + Request tag patterns are used to change the applying actions based on the + request's tags. Tags can be created based on HTTP headers with either + the client-header-tagger + or the server-header-tagger action. + - - Effect: - - - Removes the Accept-Encoding header which can be used to ask for compressed transfer. - - - + + Request tag patterns have to start with TAG:, so &my-app; + can tell them apart from other patterns. Everything after the colon + including white space, is interpreted as a regular expression with + path pattern syntax, except that tag patterns aren't left-anchored + automatically (&my-app; doesn't silently add a ^, + you have to do it yourself if you need it). + + + + To match all requests that are tagged with foo + your pattern line should be TAG:^foo$, + TAG:foo would work as well, but it would also + match requests whose tags contain foo somewhere. + TAG: foo wouldn't work as it requires white space. + + + + Sections can contain URL and request tag patterns at the same time, + but request tag patterns are checked after the URL patterns and thus + always overrule them, even if they are located before the URL patterns. + + + + Once a new request tag is added, Privoxy checks right away if it's matched by one + of the request tag patterns and updates the action settings accordingly. As a result + request tags can be used to activate other tagger actions, as long as these other + taggers look for headers that haven't already be parsed. + + + + For example you could tag client requests which use the + POST method, + then use this tag to activate another tagger that adds a tag if cookies + are sent, and then use a block action based on the cookie tag. This allows + the outcome of one action, to be input into a subsequent action. However if + you'd reverse the position of the described taggers, and activated the + method tagger based on the cookie tagger, no method tags would be created. + The method tagger would look for the request line, but at the time + the cookie tag is created, the request line has already been parsed. + + + + While this is a limitation you should be aware of, this kind of + indirection is seldom needed anyway and even the example doesn't + make too much sense. + + + + + +The Negative Request Tag Patterns + + + To match requests that do not have a certain request tag, specify a negative tag pattern + by prefixing the tag pattern line with either NO-REQUEST-TAG: + or NO-RESPONSE-TAG: instead of TAG:. + + + + Negative request tag patterns created with NO-REQUEST-TAG: are checked + after all client headers are scanned, the ones created with NO-RESPONSE-TAG: + are checked after all server headers are scanned. In both cases all the created + tags are considered. + + + +The Client Tag Pattern + + + + + + This is an experimental feature. The syntax is likely to change in future versions. + + + + + Client tag patterns are not set based on HTTP headers but based on + the client's IP address. Users can enable them themselves, but the + Privoxy admin controls which tags are available and what their effect + is. + + + + After a client-specific tag has been defined with the + client-specific-tag, + directive, action sections can be activated based on the tag by using a + CLIENT-TAG pattern. The CLIENT-TAG pattern is evaluated at the same priority + as URL patterns, as a result the last matching pattern wins. Tags that + are created based on client or server headers are evaluated later on + and can overrule CLIENT-TAG and URL patterns! + + + The tag is set for all requests that come from clients that requested + it to be set. Note that "clients" are differentiated by IP address, + if the IP address changes the tag has to be requested again. + + + Clients can request tags to be set by using the CGI interface http://config.privoxy.org/show-client-tags. + + + + Example: + + + + +# If the admin defined the client-specific-tag circumvent-blocks, +# and the request comes from a client that previously requested +# the tag to be set, overrule all previous +block actions that +# are enabled based on URL to CLIENT-TAG patterns. +{-block} +CLIENT-TAG:^circumvent-blocks$ + +# This section is not overruled because it's located after +# the previous one. +{+block{Nobody is supposed to request this.}} +example.org/blocked-example-page + + + + + + + + + + +Actions + + All actions are disabled by default, until they are explicitly enabled + somewhere in an actions file. Actions are turned on if preceded with a + +, and turned off if preceded with a -. So a + +action means do that action, e.g. + +block means please block URLs that match the + following patterns, and -block means don't + block URLs that match the following patterns, even if +block + previously applied. + + + + + Again, actions are invoked by placing them on a line, enclosed in curly braces and + separated by whitespace, like in + {+some-action -some-other-action{some-parameter}}, + followed by a list of URL patterns, one per line, to which they apply. + Together, the actions line and the following pattern lines make up a section + of the actions file. + + + + Actions fall into three categories: + + + + + + + Boolean, i.e the action can only be enabled or + disabled. Syntax: + + + + +name # enable action name + -name # disable action name + + + Example: +handle-as-image + + + + + + + Parameterized, where some value is required in order to enable this type of action. + Syntax: + + + + +name{param} # enable action and set parameter to param, + # overwriting parameter from previous match if necessary + -name # disable action. The parameter can be omitted + + + Note that if the URL matches multiple positive forms of a parameterized action, + the last match wins, i.e. the params from earlier matches are simply ignored. + + + Example: +hide-user-agent{Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070602 Firefox/2.0.0.4} + + - - Type: - - - Boolean. - - + + + Multi-value. These look exactly like parameterized actions, + but they behave differently: If the action applies multiple times to the + same URL, but with different parameters, all the parameters + from all matches are remembered. This is used for actions + that can be executed for the same request repeatedly, like adding multiple + headers, or filtering through multiple filters. Syntax: + + + + +name{param} # enable action and add param to the list of parameters + -name{param} # remove the parameter param from the list of parameters + # If it was the last one left, disable the action. + -name # disable this action completely and remove all parameters from the list + + + Examples: +add-header{X-Fun-Header: Some text} and + +filter{html-annoyances} + + - - Parameter: - - - N/A - - - + + - - Notes: - - - More and more websites send their content compressed by default, which - is generally a good idea and saves bandwidth. But the filter and - deanimate-gifs - actions need access to the uncompressed data. - - - When compiled with zlib support (available since &my-app; 3.0.7), content that should be - filtered is decompressed on-the-fly and you don't have to worry about this action. - If you are using an older &my-app; version, or one that hasn't been compiled with zlib - support, this action can be used to convince the server to send the content uncompressed. - - - Most text-based instances compress very well, the size is seldom decreased by less than 50%, - for markup-heavy instances like news feeds saving more than 90% of the original size isn't - unusual. - - - Not using compression will therefore slow down the transfer, and you should only - enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all - predefined action settings. - - - Note that some (rare) ill-configured sites don't handle requests for uncompressed - documents correctly. Broken PHP applications tend to send an empty document body, - some IIS versions only send the beginning of the content. If you enable - prevent-compression per default, you might want to add - exceptions for those sites. See the example for how to do that. - - - + + If nothing is specified in any actions file, no actions are + taken. So in this case Privoxy would just be a + normal, non-blocking, non-filtering proxy. You must specifically enable the + privacy and blocking features you need (although the provided default actions + files will give a good starting point). + - - Example usage (sections): - - - -# Selectively turn off compression, and enable a filter -# -{ +filter{tiny-textforms} +prevent-compression } -# Match only these sites - .google. - sourceforge.net - sf.net + + Later defined action sections always over-ride earlier ones of the same type. + So exceptions to any rules you make, should come in the latter part of the file (or + in a file that is processed later when using multiple actions files such + as user.action). For multi-valued actions, the actions + are applied in the order they are specified. Actions files are processed in + the order they are defined in config (the default + installation has three actions files). It also quite possible for any given + URL to match more than one pattern (because of wildcards and + regular expressions), and thus to trigger more than one set of actions! Last + match wins. + -# Or instead, we could set a universal default: -# -{ +prevent-compression } - / # Match all sites + + + The list of valid Privoxy actions are: + -# Then maybe make exceptions for broken sites: -# -{ -prevent-compression } -.compusa.com/ - - - - - + + + + + - -overwrite-last-modified - + + +add-header + Typical use: - Prevent yet another way to track the user's steps between sessions. + Confuse log analysis, custom applications @@ -6114,16 +2599,16 @@ new action Effect: - Deletes the Last-Modified: HTTP server header or modifies its value. + Sends a user defined HTTP header to the web server. Type: - + - Parameterized. + Multi-value. @@ -6131,48 +2616,24 @@ new action Parameter: - One of the keywords: block, reset-to-request-time - and randomize + Any string value is possible. Validity of the defined HTTP headers is not checked. + It is recommended that you use the X- prefix + for custom headers. - + Notes: - Removing the Last-Modified: header is useful for filter - testing, where you want to force a real reload instead of getting status - code 304, which would cause the browser to reuse the old - version of the page. - - - The randomize option overwrites the value of the - Last-Modified: header with a randomly chosen time - between the original value and the current time. In theory the server - could send each document with a different Last-Modified: - header to track visits without using cookies. Randomize - makes it impossible and the browser can still revalidate cached documents. - - - reset-to-request-time overwrites the value of the - Last-Modified: header with the current time. You could use - this option together with - hide-if-modified-since - to further customize your random range. - - - The preferred parameter here is randomize. It is safe - to use, as long as the time settings are more or less correct. - If the server sets the Last-Modified: header to the time - of the request, the random range becomes zero and the value stays the same. - Therefore you should later randomize it a second time with - hided-if-modified-since, - just to be sure. + This action may be specified multiple times, in order to define multiple + headers. This is rarely needed for the typical user. If you don't know what + HTTP headers are, you definitely don't need to worry about this + one. - It is also recommended to use this action together with - crunch-if-none-match. + Headers added by this action are not modified by other actions. @@ -6181,11 +2642,7 @@ new action Example usage: - # Let the browser revalidate without being tracked across sessions -{ +hide-if-modified-since{-60} \ - +overwrite-last-modified{randomize} \ - +crunch-if-none-match} -/ + +add-header{X-User-Tracking: sucks} @@ -6194,18 +2651,14 @@ new action - -redirect - + +block + Typical use: - - Redirect requests to other sites. - + Block ads or other unwanted content @@ -6213,107 +2666,103 @@ new action Effect: - Convinces the browser that the requested document has been moved - to another location and the browser should get it from there. + Requests for URLs to which this action applies are blocked, i.e. the + requests are trapped by &my-app; and the requested URL is never retrieved, + but is answered locally with a substitute page or image, as determined by + the handle-as-image, + set-image-blocker, and + handle-as-empty-document actions. + Type: - + - Parameterized + Parameterized. Parameter: - - An absolute URL or a single pcrs command. - + A block reason that should be given to the user. - + Notes: - Requests to which this action applies are answered with a - HTTP redirect to URLs of your choosing. The new URL is - either provided as parameter, or derived by applying a - single pcrs command to the original URL. + Privoxy sends a special BLOCKED page + for requests to blocked pages. This page contains the block reason given as + parameter, a link to find out why the block action applies, and a click-through + to the blocked content (the latter only if the force feature is available and + enabled). - This action will be ignored if you use it together with - block. - It can be combined with - fast-redirects{check-decoded-url} - to redirect to a decoded version of a rewritten URL. + A very important exception occurs if both + block and handle-as-image, + apply to the same request: it will then be replaced by an image. If + set-image-blocker + (see below) also applies, the type of image will be determined by its parameter, + if not, the standard checkerboard pattern is sent. - - Use this action carefully, make sure not to create redirection loops - and be aware that using your own redirects might make it - possible to fingerprint your requests. + + It is important to understand this process, in order + to understand how Privoxy deals with + ads and other unwanted content. Blocking is a core feature, and one + upon which various other features depend. - In case of problems with your redirects, or simply to watch - them working, enable debug 128. + The filter + action can perform a very similar task, by blocking + banner images and other content through rewriting the relevant URLs in the + document's HTML source, so they don't get requested in the first place. + Note that this is a totally different technique, and it's easy to confuse the two. - Example usages: + Example usage (section): - - # Replace example.com's style sheet with another one -{ +redirect{http://localhost/css-replacements/example.com.css} } - example.com/stylesheet\.css - -# Create a short, easy to remember nickname for a favorite site -# (relies on the browser accept and forward invalid URLs to &my-app;) -{ +redirect{http://www.privoxy.org/user-manual/actions-file.html} } - a - -# Always use the expanded view for Undeadly.org articles -# (Note the $ at the end of the URL pattern to make sure -# the request for the rewritten URL isn't redirected as well) -{+redirect{s@$@&mode=expanded@}} -undeadly.org/cgi\?action=article&sid=\d*$ - -# Redirect Google search requests to MSN -{+redirect{s@^http://[^/]*/search\?q=([^&]*).*@http://search.msn.com/results.aspx?q=$1@}} -.google.com/search + + {+block{No nasty stuff for you.}} +# Block and replace with "blocked" page + .nasty-stuff.example.com -# Redirect MSN search requests to Yahoo -{+redirect{s@^http://[^/]*/results\.aspx\?q=([^&]*).*@http://search.yahoo.com/search?p=$1@}} -search.msn.com//results\.aspx\?q= +{+block{Doubleclick banners.} +handle-as-image} +# Block and replace with image + .ad.doubleclick.net + .ads.r.us/banners/ -# Redirect remote requests for this manual -# to the local version delivered by Privoxy -{+redirect{s@^http://www@http://config@}} -www.privoxy.org/user-manual/ - +{+block{Layered ads.} +handle-as-empty-document} +# Block and then ignore + adserver.example.net/.*\.js$ + + - -server-header-filter + +change-x-forwarded-for Typical use: - - Rewrite or remove single server headers. - + Improve privacy by not forwarding the source of the request in the HTTP headers. @@ -6321,15 +2770,15 @@ www.privoxy.org/user-manual/ Effect: - All server headers to which this action applies are filtered on-the-fly - through the specified regular expression based substitutions. + Deletes the X-Forwarded-For: HTTP header from the client request, + or adds a new one. Type: - + Parameterized. @@ -6338,10 +2787,17 @@ www.privoxy.org/user-manual/ Parameter: - - The name of a server-header filter, as defined in one of the - filter files. - + + + block to delete the header. + + + + add to create the header (or append + the client's IP address to an already existing one). + + + @@ -6349,52 +2805,35 @@ www.privoxy.org/user-manual/ Notes: - Server-header filters are applied to each header on its own, not to - all at once. This makes it easier to diagnose problems, but on the downside - you can't write filters that only change header x if header y's value is z. - You can do that by using tags though. - - - Server-header filters are executed after the other header actions have finished - and use their output as input. + It is safe and recommended to use block. - Please refer to the filter file chapter - to learn which server-header filters are available by default, and how to - create your own. + Forwarding the source address of the request may make + sense in some multi-user setups but is also a privacy risk. - + - - Example usage (section): + Example usage: - -{+server-header-filter{html-to-xml}} -example.org/xml-instance-that-is-delivered-as-html - -{+server-header-filter{xml-to-html}} -example.org/instance-that-is-delivered-as-xml-but-is-not - - + +change-x-forwarded-for{block} + - - - -server-header-tagger + +client-header-filter Typical use: - Enable or disable filters based on the Content-Type header. + Rewrite or remove single client headers. @@ -6403,9 +2842,8 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Effect: - Server headers to which this action applies are filtered on-the-fly through - the specified regular expression based substitutions, the result is used as - tag. + All client headers to which this action applies are filtered on-the-fly through + the specified regular expression based substitutions. @@ -6414,7 +2852,7 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Type: - Parameterized. + Multi-value. @@ -6422,7 +2860,7 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Parameter: - The name of a server-header tagger, as defined in one of the + The name of a client-header filter, as defined in one of the filter files. @@ -6432,23 +2870,27 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Notes: - Server-header taggers are applied to each header on its own, - and as the header isn't modified, each tagger sees - the original. + Client-header filters are applied to each header on its own, not to + all at once. This makes it easier to diagnose problems, but on the downside + you can't write filters that only change header x if header y's value is z. + You can do that by using tags though. - Server-header taggers are executed before all other header actions - that modify server headers. Their tags can be used to control - all of the other server-header actions, the content filters - and the crunch actions (redirect - and block). + Client-header filters are executed after the other header actions have finished + and use their output as input. - Obviously crunching based on tags created by server-header taggers - doesn't prevent the request from showing up in the server's log file. + If the request URI gets changed, &my-app; will detect that and use the new + one. This can be used to rewrite the request destination behind the client's + back, for example to specify a Tor exit relay for certain requests. + + + Please refer to the filter file chapter + to learn which client-header filters are available by default, and how to + create your own. - + @@ -6456,11 +2898,11 @@ example.org/instance-that-is-delivered-as-xml-but-is-not -# Tag every request with the content type declared by the server -{+server-header-tagger{content-type}} +# Hide Tor exit notation in Host and Referer Headers +{+client-header-filter{hide-tor-exit-notation}} / - + @@ -6469,16 +2911,15 @@ example.org/instance-that-is-delivered-as-xml-but-is-not - -session-cookies-only + +client-header-tagger Typical use: - Allow only temporary session cookies (for the current - browser session only). + Block requests based on their headers. @@ -6487,18 +2928,18 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Effect: - Deletes the expires field from Set-Cookie: - server headers. Most browsers will not store such cookies permanently and - forget them in between sessions. + Client headers to which this action applies are filtered on-the-fly through + the specified regular expression based substitutions, the result is used as + tag. - + Type: - + - Boolean. + Multi-value. @@ -6506,7 +2947,8 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Parameter: - N/A + The name of a client-header tagger, as defined in one of the + filter files. @@ -6515,62 +2957,78 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Notes: - This is less strict than crunch-incoming-cookies / - crunch-outgoing-cookies and allows you to browse - websites that insist or rely on setting cookies, without compromising your privacy too badly. - - - Most browsers will not permanently store cookies that have been processed by - session-cookies-only and will forget about them between sessions. - This makes profiling cookies useless, but won't break sites which require cookies so - that you can log in for transactions. This is generally turned on for all - sites, and is the recommended setting. - - - It makes no sense at all to use session-cookies-only - together with crunch-incoming-cookies or - crunch-outgoing-cookies. If you do, cookies - will be plainly killed. - - - Note that it is up to the browser how it handles such cookies without an expires - field. If you use an exotic browser, you might want to try it out to be sure. - - - This setting also has no effect on cookies that may have been stored - previously by the browser before starting Privoxy. - These would have to be removed manually. + Client-header taggers are applied to each header on its own, + and as the header isn't modified, each tagger sees + the original. - Privoxy also uses - the content-cookies filter - to block some types of cookies. Content cookies are not effected by - session-cookies-only. + Client-header taggers are the first actions that are executed + and their tags can be used to control every other action. - + - Example usage: + Example usage (section): - - +session-cookies-only + + +# Tag every request with the User-Agent header +{+client-header-tagger{user-agent}} +/ + +# Tagging itself doesn't change the action +# settings, sections with TAG patterns do: +# +# If it's a download agent, use a different forwarding proxy, +# show the real User-Agent and make sure resume works. +{+forward-override{forward-socks5 10.0.0.2:2222 .} \ + -hide-if-modified-since \ + -overwrite-last-modified \ + -hide-user-agent \ + -filter \ + -deanimate-gifs \ +} +TAG:^User-Agent: NetBSD-ftp/ +TAG:^User-Agent: Novell ZYPP Installer +TAG:^User-Agent: RPM APT-HTTP/ +TAG:^User-Agent: fetch libfetch/ +TAG:^User-Agent: Ubuntu APT-HTTP/ +TAG:^User-Agent: MPlayer/ + + + +# Tag all requests with the Range header set +{+client-header-tagger{range-requests}} +/ + +# Disable filtering for the tagged requests. +# +# With filtering enabled Privoxy would remove the Range headers +# to be able to filter the whole response. The downside is that +# it prevents clients from resuming downloads or skipping over +# parts of multimedia files. +{-filter -deanimate-gifs} +TAG:^RANGE-REQUEST$ + + + - -set-image-blocker + +content-type-overwrite Typical use: - Choose the replacement for blocked images + Stop useless download menus from popping up, or change the browser's rendering mode @@ -6578,12 +3036,7 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Effect: - This action alone doesn't do anything noticeable. If both - block and handle-as-image also - apply, i.e. if the request is to be blocked as an image, - then the parameter of this action decides what will be - sent as a replacement. + Replaces the Content-Type: HTTP server header. @@ -6599,37 +3052,9 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Parameter: - - - - pattern to send a built-in checkerboard pattern image. The image is visually - decent, scales very well, and makes it obvious where banners were busted. - - - - - blank to send a built-in transparent image. This makes banners disappear - completely, but makes it hard to detect where Privoxy has blocked - images on a given page and complicates troubleshooting if Privoxy - has blocked innocent images, like navigation icons. - - - - - target-url to - send a redirect to target-url. You can redirect - to any image anywhere, even in your local filesystem via file:/// URL. - (But note that not all browsers support redirecting to a local file system). - - - A good application of redirects is to use special Privoxy-built-in - URLs, which send the built-in images, as target-url. - This has the same visual effect as specifying blank or pattern in - the first place, but enables your browser to cache the replacement image, instead of requesting - it over and over again. - - - + + Any string. + @@ -6637,3816 +3062,5654 @@ example.org/instance-that-is-delivered-as-xml-but-is-not Notes: - The URLs for the built-in images are http://config.privoxy.org/send-banner?type=type, where type is - either blank or pattern. - - - There is a third (advanced) type, called auto. It is NOT to be - used in set-image-blocker, but meant for use from filters. - Auto will select the type of image that would have applied to the referring page, had it been an image. + The Content-Type: HTTP server header is used by the + browser to decide what to do with the document. The value of this + header can cause the browser to open a download menu instead of + displaying the document by itself, even if the document's format is + supported by the browser. - - - - - Example usage: - - Built-in pattern: + The declared content type can also affect which rendering mode + the browser chooses. If XHTML is delivered as text/html, + many browsers treat it as yet another broken HTML document. + If it is send as application/xml, browsers with + XHTML support will only display it, if the syntax is correct. - +set-image-blocker{pattern} + If you see a web site that proudly uses XHTML buttons, but sets + Content-Type: text/html, you can use &my-app; + to overwrite it with application/xml and validate + the web master's claim inside your XHTML-supporting browser. + If the syntax is incorrect, the browser will complain loudly. - Redirect to the BSD daemon: + You can also go the opposite direction: if your browser prints + error messages instead of rendering a document falsely declared + as XHTML, you can overwrite the content type with + text/html and have it rendered as broken HTML document. - +set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif} + By default content-type-overwrite only replaces + Content-Type: headers that look like some kind of text. + If you want to overwrite it unconditionally, you have to combine it with + force-text-mode. + This limitation exists for a reason, think twice before circumventing it. - Redirect to the built-in pattern for better caching: + Most of the time it's easier to replace this action with a custom + server-header filter. + It allows you to activate it for every document of a certain site and it will still + only replace the content types you aimed at. - +set-image-blocker{http://config.privoxy.org/send-banner?type=pattern} + Of course you can apply content-type-overwrite + to a whole site and then make URL based exceptions, but it's a lot + more work to get the same precision. - - - - - - -Summary - - Note that many of these actions have the potential to cause a page to - misbehave, possibly even not to display at all. There are many ways - a site designer may choose to design his site, and what HTTP header - content, and other criteria, he may depend on. There is no way to have hard - and fast rules for all sites. See the Appendix for a brief example on troubleshooting - actions. - - - - - - -Aliases - - Custom actions, known to Privoxy - as aliases, can be defined by combining other actions. - These can in turn be invoked just like the built-in actions. - Currently, an alias name can contain any character except space, tab, - =, - { and }, but we strongly - recommend that you only use a to z, - 0 to 9, +, and -. - Alias names are not case sensitive, and are not required to start with a - + or - sign, since they are merely textually - expanded. - - - Aliases can be used throughout the actions file, but they must be - defined in a special section at the top of the file! - And there can only be one such section per actions file. Each actions file may - have its own alias section, and the aliases defined in it are only visible - within that file. - - - There are two main reasons to use aliases: One is to save typing for frequently - used combinations of actions, the other one is a gain in flexibility: If you - decide once how you want to handle shops by defining an alias called - shop, you can later change your policy on shops in - one place, and your changes will take effect everywhere - in the actions file where the shop alias is used. Calling aliases - by their purpose also makes your actions files more readable. - - - Currently, there is one big drawback to using aliases, though: - Privoxy's built-in web-based action file - editor honors aliases when reading the actions files, but it expands - them before writing. So the effects of your aliases are of course preserved, - but the aliases themselves are lost when you edit sections that use aliases - with it. - - - - Now let's define some aliases... - - - - - # Useful custom aliases we can use later. - # - # Note the (required!) section header line and that this section - # must be at the top of the actions file! - # - {{alias}} - - # These aliases just save typing later: - # (Note that some already use other aliases!) - # - +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies - -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - +block-as-image = +block{Blocked image.} +handle-as-image - allow-all-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - - # These aliases define combinations of actions - # that are useful for certain types of sites: - # - fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer -prevent-compression - - shop = -crunch-all-cookies -filter{all-popups} - - # Short names for other aliases, for really lazy people ;-) - # - c0 = +crunch-all-cookies - c1 = -crunch-all-cookies - - - - ...and put them to use. These sections would appear in the lower part of an - actions file and define exceptions to the default actions (as specified further - up for the / pattern): - - - - - # These sites are either very complex or very keen on - # user data and require minimal interference to work: - # - {fragile} - .office.microsoft.com - .windowsupdate.microsoft.com - # Gmail is really mail.google.com, not gmail.com - mail.google.com - - # Shopping sites: - # Allow cookies (for setting and retrieving your customer data) - # - {shop} - .quietpc.com - .worldpay.com # for quietpc.com - mybank.example.com - # These shops require pop-ups: - # - {-filter{all-popups} -filter{unsolicited-popups}} - .dabs.com - .overclockers.co.uk - + + Example usage (sections): + + + # Check if www.example.net/ really uses valid XHTML +{ +content-type-overwrite{application/xml} } +www.example.net/ - - Aliases like shop and fragile are typically used for - problem sites that require more than one action to be disabled - in order to function properly. - - - - - -Actions Files Tutorial - - The above chapters have shown which actions files - there are and how they are organized, how actions are specified and applied - to URLs, how patterns work, and how to - define and use aliases. Now, let's look at an - example match-all.action, default.action - and user.action file and see how all these pieces come together: - +# but leave the content type unmodified if the URL looks like a style sheet +{-content-type-overwrite} +www.example.net/.*\.css$ +www.example.net/.*style + + + + + + - -match-all.action - - Remember all actions are disabled when matching starts, - so we have to explicitly enable the ones we want. - - - While the match-all.action file only contains a - single section, it is probably the most important one. It has only one - pattern, /, but this pattern - matches all URLs. Therefore, the set of - actions used in this default section will - be applied to all requests as a start. It can be partly or - wholly overridden by other actions files like default.action - and user.action, but it will still be largely responsible - for your overall browsing experience. - + + + +crunch-client-header - - Again, at the start of matching, all actions are disabled, so there is - no need to disable any actions here. (Remember: a + - preceding the action name enables the action, a - disables!). - Also note how this long line has been made more readable by splitting it into - multiple lines with line continuation. - + + + Typical use: + + Remove a client header Privoxy has no dedicated action for. + + - - -{ \ - +change-x-forwarded-for{block} \ - +hide-from-header{block} \ - +set-image-blocker{pattern} \ -} -/ # Match all URLs - - + + Effect: + + + Deletes every header sent by the client that contains the string the user supplied as parameter. + + + - - The default behavior is now set. - + + Type: + + + Parameterized. + + + + + Parameter: + + + Any string. + + + + + + Notes: + + + This action allows you to block client headers for which no dedicated + Privoxy action exists. + Privoxy will remove every client header that + contains the string you supplied as parameter. + + + Regular expressions are not supported and you can't + use this action to block different headers in the same request, unless + they contain the same string. + + + crunch-client-header is only meant for quick tests. + If you have to block several different headers, or only want to modify + parts of them, you should use a + client-header filter. + + + + Don't block any header without understanding the consequences. + + + + + + + Example usage (section): + + + # Block the non-existent "Privacy-Violation:" client header +{ +crunch-client-header{Privacy-Violation:} } +/ + + + + + - -default.action - - If you aren't a developer, there's no need for you to edit the - default.action file. It is maintained by - the &my-app; developers and if you disagree with some of the - sections, you should overrule them in your user.action. - + + +crunch-if-none-match + + + + Typical use: + + Prevent yet another way to track the user's steps between sessions. + + - - Understanding the default.action file can - help you with your user.action, though. - + + Effect: + + + Deletes the If-None-Match: HTTP client header. + + + - - The first section in this file is a special section for internal use - that prevents older &my-app; versions from reading the file: - + + Type: + + + Boolean. + + - - -########################################################################## -# Settings -- Don't change! For internal Privoxy use ONLY. -########################################################################## -{{settings}} -for-privoxy-version=3.0.11 - + + Parameter: + + + N/A + + + - - After that comes the (optional) alias section. We'll use the example - section from the above chapter on aliases, - that also explains why and how aliases are used: - + + Notes: + + + Removing the If-None-Match: HTTP client header + is useful for filter testing, where you want to force a real + reload instead of getting status code 304 which + would cause the browser to use a cached copy of the page. + + + It is also useful to make sure the header isn't used as a cookie + replacement (unlikely but possible). + + + Blocking the If-None-Match: header shouldn't cause any + caching problems, as long as the If-Modified-Since: header + isn't blocked or missing as well. + + + It is recommended to use this action together with + hide-if-modified-since + and + overwrite-last-modified. + + + - - -########################################################################## -# Aliases -########################################################################## -{{alias}} + + Example usage (section): + + + # Let the browser revalidate cached documents but don't +# allow the server to use the revalidation headers for user tracking. +{+hide-if-modified-since{-60} \ + +overwrite-last-modified{randomize} \ + +crunch-if-none-match} +/ + + + + + - # These aliases just save typing later: - # (Note that some already use other aliases!) - # - +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies - -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - +block-as-image = +block{Blocked image.} +handle-as-image - mercy-for-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - # These aliases define combinations of actions - # that are useful for certain types of sites: - # - fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer - shop = -crunch-all-cookies -filter{all-popups} - + + +crunch-incoming-cookies - - The first of our specialized sections is concerned with fragile - sites, i.e. sites that require minimum interference, because they are either - very complex or very keen on tracking you (and have mechanisms in place that - make them unusable for people who avoid being tracked). We will simply use - our pre-defined fragile alias instead of stating the list - of actions explicitly: - + + + Typical use: + + + Prevent the web server from setting HTTP cookies on your system + + + - - -########################################################################## -# Exceptions for sites that'll break under the default action set: -########################################################################## + + Effect: + + + Deletes any Set-Cookie: HTTP headers from server replies. + + + -# "Fragile" Use a minimum set of actions for these sites (see alias above): -# -{ fragile } -.office.microsoft.com # surprise, surprise! -.windowsupdate.microsoft.com -mail.google.com - + + Type: + + + Boolean. + + - - Shopping sites are not as fragile, but they typically - require cookies to log in, and pop-up windows for shopping - carts or item details. Again, we'll use a pre-defined alias: - + + Parameter: + + + N/A + + + - - -# Shopping sites: -# -{ shop } -.quietpc.com -.worldpay.com # for quietpc.com -.jungle.com -.scan.co.uk - + + Notes: + + + This action is only concerned with incoming HTTP cookies. For + outgoing HTTP cookies, use + crunch-outgoing-cookies. + Use both to disable HTTP cookies completely. + + + It makes no sense at all to use this action in conjunction + with the session-cookies-only action, + since it would prevent the session cookies from being set. See also + filter-content-cookies. + + + - - The fast-redirects - action, which may have been enabled in match-all.action, - breaks some sites. So disable it for popular sites where we know it misbehaves: - + + Example usage: + + + +crunch-incoming-cookies + + + + + - - -{ -fast-redirects } -login.yahoo.com -edit.*.yahoo.com -.google.com -.altavista.com/.*(like|url|link):http -.altavista.com/trans.*urltext=http -.nytimes.com - - - It is important that Privoxy knows which - URLs belong to images, so that if they are to - be blocked, a substitute image can be sent, rather than an HTML page. - Contacting the remote site to find out is not an option, since it - would destroy the loading time advantage of banner blocking, and it - would feed the advertisers information about you. We can mark any - URL as an image with the handle-as-image action, - and marking all URLs that end in a known image file extension is a - good start: - + + +crunch-server-header + + + + Typical use: + + Remove a server header Privoxy has no dedicated action for. + + - - -########################################################################## -# Images: -########################################################################## + + Effect: + + + Deletes every header sent by the server that contains the string the user supplied as parameter. + + + -# Define which file types will be treated as images, in case they get -# blocked further down this file: -# -{ +handle-as-image } -/.*\.(gif|jpe?g|png|bmp|ico)$ - + + Type: + + + Parameterized. + + - - And then there are known banner sources. They often use scripts to - generate the banners, so it won't be visible from the URL that the - request is for an image. Hence we block them and - mark them as images in one go, with the help of our - +block-as-image alias defined above. (We could of - course just as well use +block - +handle-as-image here.) - Remember that the type of the replacement image is chosen by the - set-image-blocker - action. Since all URLs have matched the default section with its - +set-image-blocker{pattern} - action before, it still applies and needn't be repeated: - + + Parameter: + + + Any string. + + + - - -# Known ad generators: -# -{ +block-as-image } -ar.atwola.com -.ad.doubleclick.net -.ad.*.doubleclick.net -.a.yimg.com/(?:(?!/i/).)*$ -.a[0-9].yimg.com/(?:(?!/i/).)*$ -bs*.gsanet.com -.qkimg.net - + + Notes: + + + This action allows you to block server headers for which no dedicated + Privoxy action exists. Privoxy + will remove every server header that contains the string you supplied as parameter. + + + Regular expressions are not supported and you can't + use this action to block different headers in the same request, unless + they contain the same string. + + + crunch-server-header is only meant for quick tests. + If you have to block several different headers, or only want to modify + parts of them, you should use a custom + server-header filter. + + + + Don't block any header without understanding the consequences. + + + + - - One of the most important jobs of Privoxy - is to block banners. Many of these can be blocked - by the filter{banners-by-size} - action, which we enabled above, and which deletes the references to banner - images from the pages while they are loaded, so the browser doesn't request - them anymore, and hence they don't need to be blocked here. But this naturally - doesn't catch all banners, and some people choose not to use filters, so we - need a comprehensive list of patterns for banner URLs here, and apply the - block action to them. - - - First comes many generic patterns, which do most of the work, by - matching typical domain and path name components of banners. Then comes - a list of individual patterns for specific sites, which is omitted here - to keep the example short: - + + Example usage (section): + + + # Crunch server headers that try to prevent caching +{ +crunch-server-header{no-cache} } +/ + + + + + - - -########################################################################## -# Block these fine banners: -########################################################################## -{ +block{Banner ads.} } -# Generic patterns: -# -ad*. -.*ads. -banner?. -count*. -/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) -/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ + + +crunch-outgoing-cookies + + + + Typical use: + + + Prevent the web server from reading any HTTP cookies from your system + + + -# Site-specific patterns (abbreviated): -# -.hitbox.com - + + Effect: + + + Deletes any Cookie: HTTP headers from client requests. + + + - - It's quite remarkable how many advertisers actually call their banner - servers ads.company.com, or call the directory - in which the banners are stored simply banners. So the above - generic patterns are surprisingly effective. - - - But being very generic, they necessarily also catch URLs that we don't want - to block. The pattern .*ads. e.g. catches - nasty-ads.nasty-corp.com as intended, - but also downloads.sourcefroge.net or - adsl.some-provider.net. So here come some - well-known exceptions to the +block - section above. - - - Note that these are exceptions to exceptions from the default! Consider the URL - downloads.sourcefroge.net: Initially, all actions are deactivated, - so it wouldn't get blocked. Then comes the defaults section, which matches the - URL, but just deactivates the block - action once again. Then it matches .*ads., an exception to the - general non-blocking policy, and suddenly - +block applies. And now, it'll match - .*loads., where -block - applies, so (unless it matches again further down) it ends up - with no block action applying. - + + Type: + + + Boolean. + + - - -########################################################################## -# Save some innocent victims of the above generic block patterns: -########################################################################## + + Parameter: + + + N/A + + + -# By domain: -# -{ -block } -adv[io]*. # (for advogato.org and advice.*) -adsl. # (has nothing to do with ads) -adobe. # (has nothing to do with ads either) -ad[ud]*. # (adult.* and add.*) -.edu # (universities don't host banners (yet!)) -.*loads. # (downloads, uploads etc) + + Notes: + + + This action is only concerned with outgoing HTTP cookies. For + incoming HTTP cookies, use + crunch-incoming-cookies. + Use both to disable HTTP cookies completely. + + + It makes no sense at all to use this action in conjunction + with the session-cookies-only action, + since it would prevent the session cookies from being read. + + + -# By path: -# -/.*loads/ + + Example usage: + + + +crunch-outgoing-cookies + + + -# Site-specific: -# -www.globalintersec.com/adv # (adv = advanced) -www.ugu.com/sui/ugu/adv - + + - - Filtering source code can have nasty side effects, - so make an exception for our friends at sourceforge.net, - and all paths with cvs in them. Note that - -filter - disables all filters in one fell swoop! - - - -# Don't filter code! -# -{ -filter } -/(.*/)?cvs -bugzilla. -developer. -wiki. -.sourceforge.net - + + +deanimate-gifs - - The actual default.action is of course much more - comprehensive, but we hope this example made clear how it works. - + + + Typical use: + + Stop those annoying, distracting animated GIF images. + + - + + Effect: + + + De-animate GIF animations, i.e. reduce them to their first or last image. + + + -user.action + + Type: + + + Parameterized. + + - - So far we are painting with a broad brush by setting general policies, - which would be a reasonable starting point for many people. Now, - you might want to be more specific and have customized rules that - are more suitable to your personal habits and preferences. These would - be for narrowly defined situations like your ISP or your bank, and should - be placed in user.action, which is parsed after all other - actions files and hence has the last word, over-riding any previously - defined actions. user.action is also a - safe place for your personal settings, since - default.action is actively maintained by the - Privoxy developers and you'll probably want - to install updated versions from time to time. - + + Parameter: + + + last or first + + + - - So let's look at a few examples of things that one might typically do in - user.action: - + + Notes: + + + This will also shrink the images considerably (in bytes, not pixels!). If + the option first is given, the first frame of the animation + is used as the replacement. If last is given, the last + frame of the animation is used instead, which probably makes more sense for + most banner animations, but also has the risk of not showing the entire + last frame (if it is only a delta to an earlier frame). + + + You can safely use this action with patterns that will also match non-GIF + objects, because no attempt will be made at anything that doesn't look like + a GIF. + + + + + Example usage: + + + +deanimate-gifs{last} + + + + + - + + +downgrade-http-version - - -# My user.action file. <fred@example.com> - + + + Typical use: + + Work around (very rare) problems with HTTP/1.1 + + - - As aliases are local to the actions - file that they are defined in, you can't use the ones from - default.action, unless you repeat them here: - + + Effect: + + + Downgrades HTTP/1.1 client requests and server replies to HTTP/1.0. + + + - - -# Aliases are local to the file they are defined in. -# (Re-)define aliases for this file: -# -{{alias}} -# -# These aliases just save typing later, and the alias names should -# be self explanatory. -# -+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies --crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - allow-all-cookies = -crunch-all-cookies -session-cookies-only - allow-popups = -filter{all-popups} -+block-as-image = +block{Blocked as image.} +handle-as-image --block-as-image = -block + + Type: + + + Boolean. + + -# These aliases define combinations of actions that are useful for -# certain types of sites: -# -fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer -shop = -crunch-all-cookies allow-popups + + Parameter: + + + N/A + + + -# Allow ads for selected useful free sites: -# -allow-ads = -block -filter{banners-by-size} -filter{banners-by-link} + + Notes: + + + This is a left-over from the time when Privoxy + didn't support important HTTP/1.1 features well. It is left here for the + unlikely case that you experience HTTP/1.1-related problems with some server + out there. + + + Note that enabling this action is only a workaround. It should not + be enabled for sites that work without it. While it shouldn't break + any pages, it has an (usually negative) performance impact. + + + If you come across a site where enabling this action helps, please report it, + so the cause of the problem can be analyzed. If the problem turns out to be + caused by a bug in Privoxy it should be + fixed so the following release works without the work around. + + + -# Alias for specific file types that are text, but might have conflicting -# MIME types. We want the browser to force these to be text documents. -handle-as-text = -filter +-content-type-overwrite{text/plain} +-force-text-mode -hide-content-disposition + + Example usage (section): + + + {+downgrade-http-version} +problem-host.example.com + + + - + + - - Say you have accounts on some sites that you visit regularly, and - you don't want to have to log in manually each time. So you'd like - to allow persistent cookies for these sites. The - allow-all-cookies alias defined above does exactly - that, i.e. it disables crunching of cookies in any direction, and the - processing of cookies to make them only temporary. - + + +external-filter - - -{ allow-all-cookies } - sourceforge.net - .yahoo.com - .msdn.microsoft.com - .redhat.com - + + + Typical use: + + Modify content using a programming language of your choice. + + - - Your bank is allergic to some filter, but you don't know which, so you disable them all: - + + Effect: + + + All instances of text-based type, most notably HTML and JavaScript, to which + this action applies, can be filtered on-the-fly through the specified external + filter. + By default plain text documents are exempted from filtering, because web + servers often use the text/plain MIME type for all files + whose type they don't know.) + + + - - -{ -filter } - .your-home-banking-site.com - + + Type: + + + Multi-value. + + - - Some file types you may not want to filter for various reasons: - + + Parameter: + + + The name of an external content filter, as defined in the + filter file. + External filters can be defined in one or more files as defined by the + filterfile + option in the config file. + + + When used in its negative form, + and without parameters, all filtering with external + filters is completely disabled. + + + - - -# Technical documentation is likely to contain strings that might -# erroneously get altered by the JavaScript-oriented filters: -# -.tldp.org -/(.*/)?selfhtml/ + + Notes: + + + External filters are scripts or programs that can modify the content in + case common filters + aren't powerful enough. With the exception that this action doesn't + use pcrs-based filters, the notes in the + filter section apply. + + + + Currently external filters are executed with &my-app;'s privileges. + Only use external filters you understand and trust. + + + + This feature is experimental, the syntax + may change in the future. + -# And this stupid host sends streaming video with a wrong MIME type, -# so that Privoxy thinks it is getting HTML and starts filtering: -# -stupid-server.example.com/ - + + - - Example of a simple block action. Say you've - seen an ad on your favourite page on example.com that you want to get rid of. - You have right-clicked the image, selected copy image location - and pasted the URL below while removing the leading http://, into a - { +block{} } section. Note that { +handle-as-image - } need not be specified, since all URLs ending in - .gif will be tagged as images by the general rules as set - in default.action anyway: - + + Example usage: + + + +external-filter{fancy-filter} + + + + + - - -{ +block{Nasty ads.} } - www.example.com/nasty-ads/sponsor\.gif - another.example.net/more/junk/here/ - + + +fast-redirects - - The URLs of dynamically generated banners, especially from large banner - farms, often don't use the well-known image file name extensions, which - makes it impossible for Privoxy to guess - the file type just by looking at the URL. - You can use the +block-as-image alias defined above for - these cases. - Note that objects which match this rule but then turn out NOT to be an - image are typically rendered as a broken image icon by the - browser. Use cautiously. - + + + Typical use: + + Fool some click-tracking scripts and speed up indirect links. + + - - -{ +block-as-image } - .doubleclick.net - .fastclick.net - /Realmedia/ads/ - ar.atwola.com/ - + + Effect: + + + Detects redirection URLs and redirects the browser without contacting + the redirection server first. + + + - - Now you noticed that the default configuration breaks Forbes Magazine, - but you were too lazy to find out which action is the culprit, and you - were again too lazy to give feedback, so - you just used the fragile alias on the site, and - -- whoa! -- it worked. The fragile - aliases disables those actions that are most likely to break a site. Also, - good for testing purposes to see if it is Privoxy - that is causing the problem or not. We later find other regular sites - that misbehave, and add those to our personalized list of troublemakers: - + + Type: + + + Parameterized. + + - - -{ fragile } - .forbes.com - webmail.example.com - .mybank.com - + + Parameter: + + + + + simple-check to just search for the string http:// + to detect redirection URLs. + + + + + check-decoded-url to decode URLs (if necessary) before searching + for redirection URLs. + + + + + + + + Notes: + + + Many sites, like yahoo.com, don't just link to other sites. Instead, they + will link to some script on their own servers, giving the destination as a + parameter, which will then redirect you to the final target. URLs + resulting from this scheme typically look like: + http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/. + + + Sometimes, there are even multiple consecutive redirects encoded in the + URL. These redirections via scripts make your web browsing more traceable, + since the server from which you follow such a link can see where you go + to. Apart from that, valuable bandwidth and time is wasted, while your + browser asks the server for one redirect after the other. Plus, it feeds + the advertisers. + + + This feature is currently not very smart and is scheduled for improvement. + If it is enabled by default, you will have to create some exceptions to + this action. It can lead to failures in several ways: + + + Not every URLs with other URLs as parameters is evil. + Some sites offer a real service that requires this information to work. + For example a validation service needs to know, which document to validate. + fast-redirects assumes that every URL parameter that + looks like another URL is a redirection target, and will always redirect to + the last one. Most of the time the assumption is correct, but if it isn't, + the user gets redirected anyway. + + + Another failure occurs if the URL contains other parameters after the URL parameter. + The URL: + http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar. + contains the redirection URL http://www.example.net/, + followed by another parameter. fast-redirects doesn't know that + and will cause a redirect to http://www.example.net/&foo=bar. + Depending on the target server configuration, the parameter will be silently ignored + or lead to a page not found error. You can prevent this problem by + first using the redirect action + to remove the last part of the URL, but it requires a little effort. + + + To detect a redirection URL, fast-redirects only + looks for the string http://, either in plain text + (invalid but often used) or encoded as http%3a//. + Some sites use their own URL encoding scheme, encrypt the address + of the target server or replace it with a database id. In theses cases + fast-redirects is fooled and the request reaches the + redirection server where it probably gets logged. + + + - - You like the fun text replacements in default.filter, - but it is disabled in the distributed actions file. - So you'd like to turn it on in your private, - update-safe config, once and for all: - + + Example usage: + + + + { +fast-redirects{simple-check} } + one.example.com - - -{ +filter{fun} } - / # For ALL sites! - + { +fast-redirects{check-decoded-url} } + another.example.com/testing + + + - - Note that the above is not really a good idea: There are exceptions - to the filters in default.action for things that - really shouldn't be filtered, like code on CVS->Web interfaces. Since - user.action has the last word, these exceptions - won't be valid for the fun filtering specified here. - + + - - You might also worry about how your favourite free websites are - funded, and find that they rely on displaying banner advertisements - to survive. So you might want to specifically allow banners for those - sites that you feel provide value to you: - - - -{ allow-ads } - .sourceforge.net - .slashdot.org - .osdn.net - + + +filter - - Note that allow-ads has been aliased to - -block, - -filter{banners-by-size}, and - -filter{banners-by-link} above. - + + + Typical use: + + Get rid of HTML and JavaScript annoyances, banner advertisements (by size), + do fun text replacements, add personalized effects, etc. + + - - Invoke another alias here to force an over-ride of the MIME type - application/x-sh which typically would open a download type - dialog. In my case, I want to look at the shell script, and then I can save - it should I choose to. - + + Effect: + + + All instances of text-based type, most notably HTML and JavaScript, to which + this action applies, can be filtered on-the-fly through the specified regular + expression based substitutions. (Note: as of version 3.0.3 plain text documents + are exempted from filtering, because web servers often use the + text/plain MIME type for all files whose type they don't know.) + + + - - -{ handle-as-text } - /.*\.sh$ - + + Type: + + + Multi-value. + + - - user.action is generally the best place to define - exceptions and additions to the default policies of - default.action. Some actions are safe to have their - default policies set here though. So let's set a default policy to have a - blank image as opposed to the checkerboard pattern for - ALL sites. / of course matches all URL - paths and patterns: - + + Parameter: + + + The name of a content filter, as defined in the filter file. + Filters can be defined in one or more files as defined by the + filterfile + option in the config file. + default.filter is the collection of filters + supplied by the developers. Locally defined filters should go + in their own file, such as user.filter. + + + When used in its negative form, + and without parameters, all filtering is completely disabled. + + + - - -{ +set-image-blocker{blank} } -/ # ALL sites - + + Notes: + + + For your convenience, there are a number of pre-defined filters available + in the distribution filter file that you can use. See the examples below for + a list. + + + Filtering requires buffering the page content, which may appear to + slow down page rendering since nothing is displayed until all content has + passed the filters. (The total time until the page is completely rendered + doesn't change much, but it may be perceived as slower since the page is + not incrementally displayed.) + This effect will be more noticeable on slower connections. + + + Rolling your own + filters requires a knowledge of + Regular + Expressions and + HTML. + This is very powerful feature, and potentially very intrusive. + Filters should be used with caution, and where an equivalent + action is not available. + + + The amount of data that can be filtered is limited to the + buffer-limit + option in the main config file. The + default is 4096 KB (4 Megs). Once this limit is exceeded, the buffered + data, and all pending data, is passed through unfiltered. + + + Inappropriate MIME types, such as zipped files, are not filtered at all. + (Again, only text-based types except plain text). Encrypted SSL data + (from HTTPS servers) cannot be filtered either, since this would violate + the integrity of the secure transaction. In some situations it might + be necessary to protect certain text, like source code, from filtering + by defining appropriate -filter exceptions. + + + Compressed content can't be filtered either, but if &my-app; + is compiled with zlib support and a supported compression algorithm + is used (gzip or deflate), &my-app; can first decompress the content + and then filter it. + + + If you use a &my-app; version without zlib support, but want filtering to work on + as much documents as possible, even those that would normally be sent compressed, + you must use the prevent-compression + action in conjunction with filter. + + + Content filtering can achieve some of the same effects as the + block + action, i.e. it can be used to block ads and banners. But the mechanism + works quite differently. One effective use, is to block ad banners + based on their size (see below), since many of these seem to be somewhat + standardized. + + + Feedback with suggestions for new or + improved filters is particularly welcome! + + + The below list has only the names and a one-line description of each + predefined filter. There are more + verbose explanations of what these filters do in the filter file chapter. + + + + + Example usage (with filters from the distribution default.filter file). + See the Predefined Filters section for + more explanation on each: + + + + +filter{js-annoyances} # Get rid of particularly annoying JavaScript abuse. + + + + +filter{js-events} # Kill JavaScript event bindings and timers (Radically destructive! Only for extra nasty sites). + + + + +filter{html-annoyances} # Get rid of particularly annoying HTML abuse. + + + + +filter{content-cookies} # Kill cookies that come in the HTML or JS content. + + + + +filter{refresh-tags} # Kill automatic refresh tags if refresh time is larger than 9 seconds. + + + + +filter{unsolicited-popups} # Disable only unsolicited pop-up windows. + + + + +filter{all-popups} # Kill all popups in JavaScript and HTML. + + + + +filter{img-reorder} # Reorder attributes in <img> tags to make the banners-by-* filters more effective. + + + + +filter{banners-by-size} # Kill banners by size. + + + + +filter{banners-by-link} # Kill banners by their links to known clicktrackers. + + + + +filter{webbugs} # Squish WebBugs (1x1 invisible GIFs used for user tracking). + + + + +filter{tiny-textforms} # Extend those tiny textareas up to 40x80 and kill the hard wrap. + + + + +filter{jumping-windows} # Prevent windows from resizing and moving themselves. + + + + +filter{frameset-borders} # Give frames a border and make them resizable. + + + + +filter{iframes} # Removes all detected iframes. Should only be enabled for individual sites. + + + + +filter{demoronizer} # Fix MS's non-standard use of standard charsets. + + + + +filter{shockwave-flash} # Kill embedded Shockwave Flash objects. + + + + +filter{quicktime-kioskmode} # Make Quicktime movies saveable. + + + + +filter{fun} # Text replacements for subversive browsing fun! + + + + +filter{crude-parental} # Crude parental filtering. Note that this filter doesn't work reliably. + + + + +filter{ie-exploits} # Disable some known Internet Explorer bug exploits. + + + + +filter{site-specifics} # Cure for site-specific problems. Don't apply generally! + + + + +filter{no-ping} # Removes non-standard ping attributes in <a> and <area> tags. + + + + +filter{google} # CSS-based block for Google text ads. Also removes a width limitation and the toolbar advertisement. + + + + +filter{yahoo} # CSS-based block for Yahoo text ads. Also removes a width limitation. + + + + +filter{msn} # CSS-based block for MSN text ads. Also removes tracking URLs and a width limitation. + + + + +filter{blogspot} # Cleans up some Blogspot blogs. Read the fine print before using this. + + + + - - -
+ + +force-text-mode + + + + Typical use: + + Force Privoxy to treat a document as if it was in some kind of text format. + + - + + Effect: + + + Declares a document as text, even if the Content-Type: isn't detected as such. + + + - + + Type: + + + Boolean. + + - -Filter Files + + Parameter: + + + N/A + + + - - On-the-fly text substitutions need - to be defined in a filter file. Once defined, they - can then be invoked as an action. - + + Notes: + + + As explained above, + Privoxy tries to only filter files that are + in some kind of text format. The same restrictions apply to + content-type-overwrite. + force-text-mode declares a document as text, + without looking at the Content-Type: first. + + + + Think twice before activating this action. Filtering binary data + with regular expressions can cause file damage. + + + + - - &my-app; supports three different filter actions: - filter to - rewrite the content that is send to the client, - client-header-filter - to rewrite headers that are send by the client, and - server-header-filter - to rewrite headers that are send by the server. - + + Example usage: + + + ++force-text-mode + + + + + + - - &my-app; also supports two tagger actions: - client-header-tagger - and - server-header-tagger. - Taggers and filters use the same syntax in the filter files, the difference - is that taggers don't modify the text they are filtering, but use a rewritten - version of the filtered text as tag. The tags can then be used to change the - applying actions through sections with tag-patterns. - + + +forward-override + + + + Typical use: + + Change the forwarding settings based on User-Agent or request origin + + - - Multiple filter files can be defined through the filterfile config directive. The filters - as supplied by the developers are located in - default.filter. It is recommended that any locally - defined or modified filters go in a separately defined file such as - user.filter. - + + Effect: + + + Overrules the forward directives in the configuration file. + + + - - Common tasks for content filters are to eliminate common annoyances in - HTML and JavaScript, such as pop-up windows, - exit consoles, crippled windows without navigation tools, the - infamous <BLINK> tag etc, to suppress images with certain - width and height attributes (standard banner sizes or web-bugs), - or just to have fun. - + + Type: + + + Parameterized. + + - - Enabled content filters are applied to any content whose - Content Type header is recognised as a sign - of text-based content, with the exception of text/plain. - Use the force-text-mode action - to also filter other content. - + + Parameter: + + + + forward . to use a direct connection without any additional proxies. + + + + forward 127.0.0.1:8123 to use the HTTP proxy listening at 127.0.0.1 port 8123. + + + + + forward-socks4a 127.0.0.1:9050 . to use the socks4a proxy listening at + 127.0.0.1 port 9050. Replace forward-socks4a with forward-socks4 + to use a socks4 connection (with local DNS resolution) instead, use forward-socks5 + for socks5 connections (with remote DNS resolution). + + + + + forward-socks4a 127.0.0.1:9050 proxy.example.org:8000 to use the socks4a proxy + listening at 127.0.0.1 port 9050 to reach the HTTP proxy listening at proxy.example.org port 8000. + Replace forward-socks4a with forward-socks4 to use a socks4 connection + (with local DNS resolution) instead, use forward-socks5 + for socks5 connections (with remote DNS resolution). + + + + + forward-webserver 127.0.0.1:80 to use the HTTP + server listening at 127.0.0.1 port 80 without adjusting the + request headers. + + + This makes it more convenient to use Privoxy to make + existing websites available as onion services as well. + + + Many websites serve content with hardcoded URLs and + can't be easily adjusted to change the domain based + on the one used by the client. + + + Putting Privoxy between Tor and the webserver (or an stunnel + that forwards to the webserver) allows to rewrite headers and + content to make client and server happy at the same time. + + + Using Privoxy for webservers that are only reachable through + onion addresses and whose location is supposed to be secret + is not recommended and should not be necessary anyway. + + + + + - - Substitutions are made at the source level, so if you want to roll - your own filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. - + + Notes: + + + This action takes parameters similar to the + forward directives in the configuration + file, but without the URL pattern. It can be used as replacement, but normally it's only + used in cases where matching based on the request URL isn't sufficient. + + + + Please read the description for the forward directives before + using this action. Forwarding to the wrong people will reduce your privacy and increase the + chances of man-in-the-middle attacks. + + + If the ports are missing or invalid, default values will be used. This might change + in the future and you shouldn't rely on it. Otherwise incorrect syntax causes Privoxy + to exit. Due to design limitations, invalid parameter syntax isn't detected until the + action is used the first time. + + + Use the show-url-info CGI page + to verify that your forward settings do what you thought the do. + + + + - - Just like the actions files, the - filter file is organized in sections, which are called filters - here. Each filter consists of a heading line, that starts with one of the - keywords FILTER:, - CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: - followed by the filter's name, and a short (one line) - description of what it does. Below that line - come the jobs, i.e. lines that define the actual - text substitutions. By convention, the name of a filter - should describe what the filter eliminates. The - comment is used in the web-based - user interface. - + + Example usage: + + + +# Use an ssh tunnel for requests previously tagged as +# User-Agent: fetch libfetch/2.0 and make sure +# resuming downloads continues to work. +# +# This way you can continue to use Tor for your normal browsing, +# without overloading the Tor network with your FreeBSD ports updates +# or downloads of bigger files like ISOs. +# +# Note that HTTP headers are easy to fake and therefore their +# values are as (un)trustworthy as your clients and users. +{+forward-override{forward-socks5 10.0.0.2:2222 .} \ + -hide-if-modified-since \ + -overwrite-last-modified \ +} +TAG:^User-Agent: fetch libfetch/2\.0$ + + + + + + - - Once a filter called name has been defined - in the filter file, it can be invoked by using an action of the form - +filter{name} - in any actions file. - - - Filter definitions start with a header line that contains the filter - type, the filter name and the filter description. - A content filter header line for a filter called foo could look - like this: - + + +handle-as-empty-document + + + + Typical use: + + Mark URLs that should be replaced by empty documents if they get blocked + + - - FILTER: foo Replace all "foo" with "bar" - + + Effect: + + + This action alone doesn't do anything noticeable. It just marks URLs. + If the block action also applies, + the presence or absence of this mark decides whether an HTML BLOCKED + page, or an empty document will be sent to the client as a substitute for the blocked content. + The empty document isn't literally empty, but actually contains a single space. + + + - - Below that line, and up to the next header line, come the jobs that - define what text replacements the filter executes. They are specified - in a syntax that imitates Perl's - s/// operator. If you are familiar with Perl, you - will find this to be quite intuitive, and may want to look at the - PCRS documentation for the subtle differences to Perl behaviour. Most - notably, the non-standard option letter U is supported, - which turns the default to ungreedy matching. - + + Type: + + + Boolean. + + - - If you are new to - Regular - Expressions, you might want to take a look at - the Appendix on regular expressions, and - see the Perl - manual for - the - s/// operator's syntax and Perl-style regular - expressions in general. - The below examples might also help to get you started. - + + Parameter: + + + N/A + + + + + Notes: + + + Some browsers complain about syntax errors if JavaScript documents + are blocked with Privoxy's + default HTML page; this option can be used to silence them. + And of course this action can also be used to eliminate the &my-app; + BLOCKED message in frames. + + + The content type for the empty document can be specified with + content-type-overwrite{}, + but usually this isn't necessary. + + + - + + Example usage: + + + # Block all documents on example.org that end with ".js", +# but send an empty document instead of the usual HTML message. +{+block{Blocked JavaScript} +handle-as-empty-document} +example.org/.*\.js$ + + + + + + -Filter File Tutorial - - Now, let's complete our foo content filter. We have already defined - the heading, but the jobs are still missing. Since all it does is to replace - foo with bar, there is only one (trivial) job - needed: - - - s/foo/bar/ - + + +handle-as-image - - But wait! Didn't the comment say that all occurrences - of foo should be replaced? Our current job will only take - care of the first foo on each page. For global substitution, - we'll need to add the g option: - + + + Typical use: + + Mark URLs as belonging to images (so they'll be replaced by images if they do get blocked, rather than HTML pages) + + - - s/foo/bar/g - + + Effect: + + + This action alone doesn't do anything noticeable. It just marks URLs as images. + If the block action also applies, + the presence or absence of this mark decides whether an HTML blocked + page, or a replacement image (as determined by the set-image-blocker action) will be sent to the + client as a substitute for the blocked content. + + + - - Our complete filter now looks like this: - - - FILTER: foo Replace all "foo" with "bar" -s/foo/bar/g - + + Type: + + + Boolean. + + - - Let's look at some real filters for more interesting examples. Here you see - a filter that protects against some common annoyances that arise from JavaScript - abuse. Let's look at its jobs one after the other: - + + Parameter: + + + N/A + + + + + Notes: + + + The below generic example section is actually part of default.action. + It marks all URLs with well-known image file name extensions as images and should + be left intact. + + + Users will probably only want to use the handle-as-image action in conjunction with + block, to block sources of banners, whose URLs don't + reflect the file type, like in the second example section. + + + Note that you cannot treat HTML pages as images in most cases. For instance, (in-line) ad + frames require an HTML page to be sent, or they won't display properly. + Forcing handle-as-image in this situation will not replace the + ad frame with an image, but lead to error messages. + + + - - -FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse + + Example usage (sections): + + + # Generic image extensions: +# +{+handle-as-image} +/.*\.(gif|jpg|jpeg|png|bmp|ico)$ -# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm +# These don't look like images, but they're banners and should be +# blocked as images: # -s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg - +{+block{Nasty banners.} +handle-as-image} +nasty-banner-server.example.com/junk.cgi\?output=trash + + + + + + - - Following the header line and a comment, you see the job. Note that it uses - | as the delimiter instead of /, because - the pattern contains a forward slash, which would otherwise have to be escaped - by a backslash (\). - - - Now, let's examine the pattern: it starts with the text <script.* - enclosed in parentheses. Since the dot matches any character, and * - means: Match an arbitrary number of the element left of myself, this - matches <script, followed by any text, i.e. - it matches the whole page, from the start of the first <script> tag. - + + +hide-accept-language + + + + Typical use: + + Pretend to use different language settings. + + - - That's more than we want, but the pattern continues: document\.referrer - matches only the exact string document.referrer. The dot needed to - be escaped, i.e. preceded by a backslash, to take away its - special meaning as a joker, and make it just a regular dot. So far, the meaning is: - Match from the start of the first <script> tag in a the page, up to, and including, - the text document.referrer, if both are present - in the page (and appear in that order). - + + Effect: + + + Deletes or replaces the Accept-Language: HTTP header in client requests. + + + - - But there's still more pattern to go. The next element, again enclosed in parentheses, - is .*</script>. You already know what .* - means, so the whole pattern translates to: Match from the start of the first <script> - tag in a page to the end of the last <script> tag, provided that the text - document.referrer appears somewhere in between. - + + Type: + + + Parameterized. + + - - This is still not the whole story, since we have ignored the options and the parentheses: - The portions of the page matched by sub-patterns that are enclosed in parentheses, will be - remembered and be available through the variables $1, $2, ... in - the substitute. The U option switches to ungreedy matching, which means - that the first .* in the pattern will only eat up all - text in between <script and the first occurrence - of document.referrer, and that the second .* will - only span the text up to the first </script> - tag. Furthermore, the s option says that the match may span - multiple lines in the page, and the g option again means that the - substitution is global. - + + Parameter: + + + Keyword: block, or any user defined value. + + + - - So, to summarize, the pattern means: Match all scripts that contain the text - document.referrer. Remember the parts of the script from - (and including) the start tag up to (and excluding) the string - document.referrer as $1, and the part following - that string, up to and including the closing tag, as $2. - + + Notes: + + + Faking the browser's language settings can be useful to make a + foreign User-Agent set with + hide-user-agent + more believable. + + + However some sites with content in different languages check the + Accept-Language: to decide which one to take by default. + Sometimes it isn't possible to later switch to another language without + changing the Accept-Language: header first. + + + Therefore it's a good idea to either only change the + Accept-Language: header to languages you understand, + or to languages that aren't wide spread. + + + Before setting the Accept-Language: header + to a rare language, you should consider that it helps to + make your requests unique and thus easier to trace. + If you don't plan to change this header frequently, + you should stick to a common language. + + + - - Now the pattern is deciphered, but wasn't this about substituting things? So - lets look at the substitute: $1"Not Your Business!"$2 is - easy to read: The text remembered as $1, followed by - "Not Your Business!" (including - the quotation marks!), followed by the text remembered as $2. - This produces an exact copy of the original string, with the middle part - (the document.referrer) replaced by "Not Your - Business!". - + + Example usage (section): + + + # Pretend to use Canadian language settings. +{+hide-accept-language{en-ca} \ ++hide-user-agent{Mozilla/5.0 (X11; U; OpenBSD i386; en-CA; rv:1.8.0.4) Gecko/20060628 Firefox/1.5.0.4} \ +} +/ + + + + + - - The whole job now reads: Replace document.referrer by - "Not Your Business!" wherever it appears inside a - <script> tag. Note that this job won't break JavaScript syntax, - since both the original and the replacement are syntactically valid - string objects. The script just won't have access to the referrer - information anymore. - - - We'll show you two other jobs from the JavaScript taming department, but - this time only point out the constructs of special interest: - + + +hide-content-disposition + + + + Typical use: + + Prevent download menus for content you prefer to view inside the browser. + + + + + Effect: + + + Deletes or replaces the Content-Disposition: HTTP header set by some servers. + + + + + + Type: + + + Parameterized. + + + + + Parameter: + + + Keyword: block, or any user defined value. + + + - - -# The status bar is for displaying link targets, not pointless blahblah -# -s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig - + + Notes: + + + Some servers set the Content-Disposition: HTTP header for + documents they assume you want to save locally before viewing them. + The Content-Disposition: header contains the file name + the browser is supposed to use by default. + + + In most browsers that understand this header, it makes it impossible to + just view the document, without downloading it first, + even if it's just a simple text file or an image. + + + Removing the Content-Disposition: header helps + to prevent this annoyance, but some browsers additionally check the + Content-Type: header, before they decide if they can + display a document without saving it first. In these cases, you have + to change this header as well, before the browser stops displaying + download menus. + + + It is also possible to change the server's file name suggestion + to another one, but in most cases it isn't worth the time to set + it up. + + + This action will probably be removed in the future, + use server-header filters instead. + + + - - \s stands for whitespace characters (space, tab, newline, - carriage return, form feed), so that \s* means: zero - or more whitespace. The ? in .*? - makes this matching of arbitrary text ungreedy. (Note that the U - option is not set). The ['"] construct means: a single - or a double quote. Finally, \1 is - a back-reference to the first parenthesis just like $1 above, - with the difference that in the pattern, a backslash indicates - a back-reference, whereas in the substitute, it's the dollar. - + + Example usage: + + + # Disarm the download link in Sourceforge's patch tracker +{ -filter \ + +content-type-overwrite{text/plain}\ + +hide-content-disposition{block} } + .sourceforge.net/tracker/download\.php + + + + + - - So what does this job do? It replaces assignments of single- or double-quoted - strings to the window.status object with a dummy assignment - (using a variable name that is hopefully odd enough not to conflict with - real variables in scripts). Thus, it catches many cases where e.g. pointless - descriptions are displayed in the status bar instead of the link target when - you move your mouse over links. - - - -# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html -# -s/(<body [^>]*)onunload(.*>)/$1never$2/iU - + + +hide-if-modified-since + + + + Typical use: + + Prevent yet another way to track the user's steps between sessions. + + - - Including the - OnUnload - event binding in the HTML DOM was a CRIME. - When I close a browser window, I want it to close and die. Basta. - This job replaces the onunload attribute in - <body> tags with the dummy word never. - Note that the i option makes the pattern matching - case-insensitive. Also note that ungreedy matching alone doesn't always guarantee - a minimal match: In the first parenthesis, we had to use [^>]* - instead of .* to prevent the match from exceeding the - <body> tag if it doesn't contain OnUnload, but the page's - content does. - + + Effect: + + + Deletes the If-Modified-Since: HTTP client header or modifies its value. + + + - - The last example is from the fun department: - + + Type: + + + Parameterized. + + - - -FILTER: fun Fun text replacements + + Parameter: + + + Keyword: block, or a user defined value that specifies a range of hours. + + + -# Spice the daily news: -# -s/microsoft(?!\.com)/MicroSuck/ig - + + Notes: + + + Removing this header is useful for filter testing, where you want to force a real + reload instead of getting status code 304, which would cause the + browser to use a cached copy of the page. + + + Instead of removing the header, hide-if-modified-since can + also add or subtract a random amount of time to/from the header's value. + You specify a range of minutes where the random factor should be chosen from and + Privoxy does the rest. A negative value means + subtracting, a positive value adding. + + + Randomizing the value of the If-Modified-Since: makes + it less likely that the server can use the time as a cookie replacement, + but you will run into caching problems if the random range is too high. + + + It is a good idea to only use a small negative value and let + overwrite-last-modified + handle the greater changes. + + + It is also recommended to use this action together with + crunch-if-none-match, + otherwise it's more or less pointless. + + + - - Note the (?!\.com) part (a so-called negative lookahead) - in the job's pattern, which means: Don't match, if the string - .com appears directly following microsoft - in the page. This prevents links to microsoft.com from being trashed, while - still replacing the word everywhere else. - + + Example usage (section): + + + # Let the browser revalidate but make tracking based on the time less likely. +{+hide-if-modified-since{-60} \ + +overwrite-last-modified{randomize} \ + +crunch-if-none-match} +/ + + + + + - - -# Buzzword Bingo (example for extended regex syntax) -# -s* industry[ -]leading \ -| cutting[ -]edge \ -| customer[ -]focused \ -| market[ -]driven \ -| award[ -]winning # Comments are OK, too! \ -| high[ -]performance \ -| solutions[ -]based \ -| unmatched \ -| unparalleled \ -| unrivalled \ -*<font color="red"><b>BINGO!</b></font> \ -*igx - - - The x option in this job turns on extended syntax, and allows for - e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting. - + + +hide-from-header - - You get the idea? - - + + + Typical use: + + Keep your (old and ill) browser from telling web servers your email address + + - + + Effect: + + + Deletes any existing From: HTTP header, or replaces it with the + specified string. + + + -The Pre-defined Filters + + Type: + + + Parameterized. + + - + + Example usage: + + + +hide-from-header{block} or + +hide-from-header{spam-me-senseless@sittingduck.example.com} + + + + +
- -The distribution default.filter file contains a selection of -pre-defined filters for your convenience: - + + +hide-referrer + - js-annoyances + Typical use: + + Conceal which link you followed to get to a particular site + + + + + Effect: - The purpose of this filter is to get rid of particularly annoying JavaScript abuse. - To that end, it + Deletes the Referer: (sic) HTTP header from the client request, + or replaces it with a forged one. + + + + + + Type: + + + Parameterized. + + + + + Parameter: + - - replaces JavaScript references to the browser's referrer information - with the string "Not Your Business!". This compliments the hide-referrer action on the content level. - + conditional-block to delete the header completely if the host has changed. - - removes the bindings to the DOM's - unload - event which we feel has no right to exist and is responsible for most exit consoles, i.e. - nasty windows that pop up when you close another one. - + conditional-forge to forge the header if the host has changed. - - removes code that causes new windows to be opened with undesired properties, such as being - full-screen, non-resizeable, without location, status or menu bar etc. - + block to delete the header unconditionally. + + + forge to pretend to be coming from the homepage of the server we are talking to. + + + Any other string to set a user defined referrer. - - - Use with caution. This is an aggressive filter, and can break sites that - rely heavily on JavaScript. - - js-events + Notes: - This is a very radical measure. It removes virtually all JavaScript event bindings, which - means that scripts can not react to user actions such as mouse movements or clicks, window - resizing etc, anymore. Use with caution! + conditional-block is the only parameter, + that isn't easily detected in the server's log file. If it blocks the + referrer, the request will look like the visitor used a bookmark or + typed in the address directly. - We strongly discourage using this filter as a default since it breaks - many legitimate scripts. It is meant for use only on extra-nasty sites (should you really - need to go there). + Leaving the referrer unmodified for requests on the same host + allows the server owner to see the visitor's click path, + but in most cases she could also get that information by comparing + other parts of the log file: for example the User-Agent if it isn't + a very common one, or the user's IP address if it doesn't change between + different requests. - - - - - html-annoyances - - This filter will undo many common instances of HTML based abuse. + Always blocking the referrer, or using a custom one, can lead to + failures on servers that check the referrer before they answer any + requests, in an attempt to prevent their content from being + embedded or linked to elsewhere. - The BLINK and MARQUEE tags - are neutralized (yeah baby!), and browser windows will be created as - resizeable (as of course they should be!), and will have location, - scroll and menu bars -- even if specified otherwise. + Both conditional-block and forge + will work with referrer checks, as long as content and valid referring page + are on the same host. Most of the time that's the case. + + + hide-referer is an alternate spelling of + hide-referrer and the two can be can be freely + substituted with each other. (referrer is the + correct English spelling, however the HTTP specification has a bug - it + requires it to be spelled as referer.) - content-cookies + Example usage: - Most cookies are set in the HTTP dialog, where they can be intercepted - by the - crunch-incoming-cookies - and crunch-outgoing-cookies - actions. But web sites increasingly make use of HTML meta tags and JavaScript - to sneak cookies to the browser on the content level. + +hide-referrer{forge} or + +hide-referrer{http://www.yahoo.com/} + + + + + + + + +hide-user-agent + + + + Typical use: + + Try to conceal your type of browser and client operating system + + + + + Effect: + - This filter disables most HTML and JavaScript code that reads or sets - cookies. It cannot detect all clever uses of these types of code, so it - should not be relied on as an absolute fix. Use it wherever you would also - use the cookie crunch actions. + Replaces the value of the User-Agent: HTTP header + in client requests with the specified value. - refresh tags + Type: + + + Parameterized. + + + + + Parameter: - Disable any refresh tags if the interval is greater than nine seconds (so - that redirections done via refresh tags are not destroyed). This is useful - for dial-on-demand setups, or for those who find this HTML feature - annoying. + Any user-defined string. - unsolicited-popups + Notes: + + + This can lead to problems on web sites that depend on looking at this header in + order to customize their content for different browsers (which, by the + way, is NOT the right thing to do: good web sites + work browser-independently). + + - This filter attempts to prevent only unsolicited pop-up - windows from opening, yet still allow pop-up windows that the user - has explicitly chosen to open. It was added in version 3.0.1, - as an improvement over earlier such filters. + Using this action in multi-user setups or wherever different types of + browsers will access the same Privoxy is + not recommended. In single-user, single-browser + setups, you might use it to delete your OS version information from + the headers, because it is an invitation to exploit known bugs for your + OS. It is also occasionally useful to forge this in order to access + sites that won't let you in otherwise (though there may be a good + reason in some cases). - Technical note: The filter works by redefining the window.open JavaScript - function to a dummy function, PrivoxyWindowOpen(), - during the loading and rendering phase of each HTML page access, and - restoring the function afterward. + More information on known user-agent strings can be found at + http://www.user-agents.org/ + and + http://en.wikipedia.org/wiki/User_agent. + + + + + Example usage: + - This is recommended only for browsers that cannot perform this function - reliably themselves. And be aware that some sites require such windows - in order to function normally. Use with caution. + +hide-user-agent{Netscape 6.1 (X11; I; Linux 2.4.18 i686)} + + + + + + + + + +limit-connect + + + + Typical use: + + Prevent abuse of Privoxy as a TCP proxy relay or disable SSL for untrusted sites + + + + + Effect: + + + Specifies to which ports HTTP CONNECT requests are allowable. - all-popups + Type: + - - Attempt to prevent all pop-up windows from opening. - Note this should be used with even more discretion than the above, since - it is more likely to break some sites that require pop-ups for normal - usage. Use with caution. - + Parameterized. - img-reorder + Parameter: - This is a helper filter that has no value if used alone. It makes the - banners-by-size and banners-by-link - (see below) filters more effective and should be enabled together with them. + A comma-separated list of ports or port ranges (the latter using dashes, with the minimum + defaulting to 0 and the maximum to 65K). - banners-by-size + Notes: - This filter removes image tags purely based on what size they are. Fortunately - for us, many ads and banner images tend to conform to certain standardized - sizes, which makes this filter quite effective for ad stripping purposes. - - - Occasionally this filter will cause false positives on images that are not ads, - but just happen to be of one of the standard banner sizes. + By default, i.e. if no limit-connect action applies, + Privoxy allows HTTP CONNECT requests to all + ports. Use limit-connect if fine-grained control + is desired for some or all destinations. - Recommended only for those who require extreme ad blocking. The default - block rules should catch 95+% of all ads without this filter enabled. - + The CONNECT methods exists in HTTP to allow access to secure websites + (https:// URLs) through proxies. It works very simply: + the proxy connects to the server on the specified port, and then + short-circuits its connections to the client and to the remote server. + This means CONNECT-enabled proxies can be used as TCP relays very easily. + + + Privoxy relays HTTPS traffic without seeing + the decoded content. Websites can leverage this limitation to circumvent &my-app;'s + filters. By specifying an invalid port range you can disable HTTPS entirely. + - banners-by-link + Example usages: - - This is an experimental filter that attempts to kill any banners if - their URLs seem to point to known or suspected click trackers. It is currently - not of much value and is not recommended for use by default. + + + + + +limit-connect{443} # Port 443 is OK. ++limit-connect{80,443} # Ports 80 and 443 are OK. ++limit-connect{-3, 7, 20-100, 500-} # Ports less than 3, 7, 20 to 100 and above 500 are OK. ++limit-connect{-} # All ports are OK ++limit-connect{,} # No HTTPS/SSL traffic is allowed + + + + + +limit-cookie-lifetime + + - webbugs + Typical use: - - Webbugs are small, invisible images (technically 1X1 GIF images), that - are used to track users across websites, and collect information on them. - As an HTML page is loaded by the browser, an embedded image tag causes the - browser to contact a third-party site, disclosing the tracking information - through the requested URL and/or cookies for that third-party domain, without - the user ever becoming aware of the interaction with the third-party site. - HTML-ized spam also uses a similar technique to verify email addresses. - - - This filter removes the HTML code that loads such webbugs. - + Limit the lifetime of HTTP cookies to a couple of minutes or hours. - tiny-textforms + Effect: - A rather special-purpose filter that can be used to enlarge textareas (those - multi-line text boxes in web forms) and turn off hard word wrap in them. - It was written for the sourceforge.net tracker system where such boxes are - a nuisance, but it can be handy on other sites, too. - - - It is not recommended to use this filter as a default. + Overwrites the expires field in Set-Cookie server headers if it's above the specified limit. - jumping-windows + Type: + - - Many consider windows that move, or resize themselves to be abusive. This filter - neutralizes the related JavaScript code. Note that some sites might not display - or behave as intended when using this filter. Use with caution. - + Parameterized. - frameset-borders + Parameter: - Some web designers seem to assume that everyone in the world will view their - web sites using the same browser brand and version, screen resolution etc, - because only that assumption could explain why they'd use static frame sizes, - yet prevent their frames from being resized by the user, should they be too - small to show their whole content. - - - This filter removes the related HTML code. It should only be applied to sites - which need it. + The lifetime limit in minutes, or 0. - demoronizer + Notes: - Many Microsoft products that generate HTML use non-standard extensions (read: - violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those - HTML documents to display with errors on standard-compliant platforms. + This action reduces the lifetime of HTTP cookies coming from the + server to the specified number of minutes, starting from the time + the cookie passes Privoxy. - This filter translates the MS-only characters into Latin-1 equivalents. - It is not necessary when using MS products, and will cause corruption of - all documents that use 8-bit character sets other than Latin-1. It's mostly - worthwhile for Europeans on non-MS platforms, if weird garbage characters - sometimes appear on some pages, or user agents that don't correct for this on - the fly. - + Cookies with a lifetime below the limit are not modified. + The lifetime of session cookies is set to the specified limit. + + + The effect of this action depends on the server. + + + In case of servers which refresh their cookies with each response + (or at least frequently), the lifetime limit set by this action + is updated as well. + Thus, a session associated with the cookie continues to work with + this action enabled, as long as a new request is made before the + last limit set is reached. + + + However, some servers send their cookies once, with a lifetime of several + years (the year 2037 is a popular choice), and do not refresh them + until a certain event in the future, for example the user logging out. + In this case this action may limit the absolute lifetime of the session, + even if requests are made frequently. + + + If the parameter is 0, this action behaves like + session-cookies-only. - shockwave-flash + Example usages: - - A filter for shockwave haters. As the name suggests, this filter strips code - out of web pages that is used to embed shockwave flash objects. - - + + +limit-cookie-lifetime{60} + + + + + +prevent-compression + + - quicktime-kioskmode + Typical use: - Change HTML code that embeds Quicktime objects so that kioskmode, which - prevents saving, is disabled. + Ensure that servers send the content uncompressed, so it can be + passed through filters. - fun + Effect: - Text replacements for subversive browsing fun. Make fun of your favorite - Monopolist or play buzzword bingo. + Removes the Accept-Encoding header which can be used to ask for compressed transfer. - crude-parental + Type: + + + Boolean. + + + + + Parameter: - A demonstration-only filter that shows how Privoxy - can be used to delete web content on a keyword basis. + N/A - ie-exploits + Notes: - An experimental collection of text replacements to disable malicious HTML and JavaScript - code that exploits known security holes in Internet Explorer. + More and more websites send their content compressed by default, which + is generally a good idea and saves bandwidth. But the filter and + deanimate-gifs + actions need access to the uncompressed data. - Presently, it only protects against Nimda and a cross-site scripting bug, and - would need active maintenance to provide more substantial protection. + When compiled with zlib support (available since &my-app; 3.0.7), content that should be + filtered is decompressed on-the-fly and you don't have to worry about this action. + If you are using an older &my-app; version, or one that hasn't been compiled with zlib + support, this action can be used to convince the server to send the content uncompressed. + + + Most text-based instances compress very well, the size is seldom decreased by less than 50%, + for markup-heavy instances like news feeds saving more than 90% of the original size isn't + unusual. + + + Not using compression will therefore slow down the transfer, and you should only + enable this action if you really need it. As of &my-app; 3.0.7 it's disabled in all + predefined action settings. + + + Note that some (rare) ill-configured sites don't handle requests for uncompressed + documents correctly. Broken PHP applications tend to send an empty document body, + some IIS versions only send the beginning of the content. If you enable + prevent-compression per default, you might want to add + exceptions for those sites. See the example for how to do that. - site-specifics + Example usage (sections): - Some web sites have very specific problems, the cure for which doesn't apply - anywhere else, or could even cause damage on other sites. + +# Selectively turn off compression, and enable a filter +# +{ +filter{tiny-textforms} +prevent-compression } +# Match only these sites + .google. + sourceforge.net + sf.net + +# Or instead, we could set a universal default: +# +{ +prevent-compression } + / # Match all sites + +# Then maybe make exceptions for broken sites: +# +{ -prevent-compression } +.compusa.com/ + + + + + + + + + +overwrite-last-modified + + + + Typical use: + + Prevent yet another way to track the user's steps between sessions. + + + + + Effect: + - This is a collection of such site-specific cures which should only be applied - to the sites they were intended for, which is what the supplied - default.action file does. Users shouldn't need to change - anything regarding this filter. + Deletes the Last-Modified: HTTP server header or modifies its value. - google + Type: + - - A CSS based block for Google text ads. Also removes a width limitation - and the toolbar advertisement. - + Parameterized. - - yahoo + + Parameter: - Another CSS based block, this time for Yahoo text ads. And removes - a width limitation as well. + One of the keywords: block, reset-to-request-time + and randomize - - msn + + Notes: - Another CSS based block, this time for MSN text ads. And removes - tracking URLs, as well as a width limitation. + Removing the Last-Modified: header is useful for filter + testing, where you want to force a real reload instead of getting status + code 304, which would cause the browser to reuse the old + version of the page. + + + The randomize option overwrites the value of the + Last-Modified: header with a randomly chosen time + between the original value and the current time. In theory the server + could send each document with a different Last-Modified: + header to track visits without using cookies. Randomize + makes it impossible and the browser can still revalidate cached documents. + + + reset-to-request-time overwrites the value of the + Last-Modified: header with the current time. You could use + this option together with + hide-if-modified-since + to further customize your random range. + + + The preferred parameter here is randomize. It is safe + to use, as long as the time settings are more or less correct. + If the server sets the Last-Modified: header to the time + of the request, the random range becomes zero and the value stays the same. + Therefore you should later randomize it a second time with + hided-if-modified-since, + just to be sure. + + + It is also recommended to use this action together with + crunch-if-none-match. - blogspot + Example usage: - - Cleans up some Blogspot blogs. Read the fine print before using this one! - - - This filter also intentionally removes some navigation stuff and sets the - page width to 100%. As a result, some rounded corners would - appear to early or not at all and as fixing this would require a browser - that understands background-size (CSS3), they are removed instead. + + # Let the browser revalidate without being tracked across sessions +{ +hide-if-modified-since{-60} \ + +overwrite-last-modified{randomize} \ + +crunch-if-none-match} +/ + + - - xml-to-html + + + +redirect + + + + Typical use: - Server-header filter to change the Content-Type from xml to html. + Redirect requests to other sites. - - html-to-xml + + Effect: - Server-header filter to change the Content-Type from html to xml. + Convinces the browser that the requested document has been moved + to another location and the browser should get it from there. - - no-ping + + Type: + - - Removes the non-standard ping attribute from - anchor and area HTML tags. - + Parameterized - - hide-tor-exit-notation + + Parameter: - Client-header filter to remove the Tor exit node notation - found in Host and Referer headers. + An absolute URL or a single pcrs command. + + + + + Notes: + - If &my-app; and Tor are chained and &my-app; - is configured to use socks4a, one can use http://www.example.org.foobar.exit/ - to access the host www.example.org through the - Tor exit node foobar. + Requests to which this action applies are answered with a + HTTP redirect to URLs of your choosing. The new URL is + either provided as parameter, or derived by applying a + single pcrs command to the original URL. - As the HTTP client isn't aware of this notation, it treats the - whole string www.example.org.foobar.exit as host and uses it - for the Host and Referer headers. From the - server's point of view the resulting headers are invalid and can cause problems. + The syntax for pcrs commands is documented in the + filter file section. - An invalid Referer header can trigger hot-linking - protections, an invalid Host header will make it impossible for - the server to find the right vhost (several domains hosted on the same IP address). + Requests can't be blocked and redirected at the same time, + applying this action together with + block + is a configuration error. Currently the request is blocked + and an error message logged, the behavior may change in the + future and result in Privoxy rejecting the action file. - This client-header filter removes the foo.exit part in those headers - to prevent the mentioned problems. Note that it only modifies - the HTTP headers, it doesn't make it impossible for the server - to detect your Tor exit node based on the IP address - the request is coming from. + This action can be combined with + fast-redirects{check-decoded-url} + to redirect to a decoded version of a rewritten URL. - - - - - - - - - - - - - - - - -Privoxy's Template Files - - All Privoxy built-in pages, i.e. error pages such as the - 404 - No Such Domain - error page, the BLOCKED - page - and all pages of its web-based - user interface, are generated from templates. - (Privoxy must be running for the above links to work as - intended.) - - - - These templates are stored in a subdirectory of the configuration - directory called templates. On Unixish platforms, - this is typically - /etc/privoxy/templates/. - - - - The templates are basically normal HTML files, but with place-holders (called symbols - or exports), which Privoxy fills at run time. It - is possible to edit the templates with a normal text editor, should you want - to customize them. (Not recommended for the casual - user). Should you create your own custom templates, you should use - the config setting templdir - to specify an alternate location, so your templates do not get overwritten - during upgrades. - - - Note that just like in configuration files, lines starting - with # are ignored when the templates are filled in. - - - - The place-holders are of the form @name@, and you will - find a list of available symbols, which vary from template to template, - in the comments at the start of each file. Note that these comments are not - always accurate, and that it's probably best to look at the existing HTML - code to find out which symbols are supported and what they are filled in with. - - - - A special application of this substitution mechanism is to make whole - blocks of HTML code disappear when a specific symbol is set. We use this - for many purposes, one of them being to include the beta warning in all - our user interface (CGI) pages when Privoxy - is in an alpha or beta development stage: - - - - -<!-- @if-unstable-start --> - - ... beta warning HTML code goes here ... - -<!-- if-unstable-end@ --> - - - - If the "unstable" symbol is set, everything in between and including - @if-unstable-start and if-unstable-end@ - will disappear, leaving nothing but an empty comment: - - - - <!-- --> - - - - There's also an if-then-else construct and an #include - mechanism, but you'll sure find out if you are inclined to edit the - templates ;-) - - - - All templates refer to a style located at - http://config.privoxy.org/send-stylesheet. - This is, of course, locally served by Privoxy - and the source for it can be found and edited in the - cgi-style.css template. - - - - - - - + In case of problems with your redirects, or simply to watch + them working, enable debug 128. + + + - + + Example usages: + + + # Replace example.com's style sheet with another one +{ +redirect{http://localhost/css-replacements/example.com.css} } + example.com/stylesheet\.css -Contacting the Developers, Bug Reporting and Feature -Requests +# Create a short, easy to remember nickname for a favorite site +# (relies on the browser to accept and forward invalid URLs to &my-app;) +{ +redirect{http://www.privoxy.org/user-manual/actions-file.html} } + a - - &contacting; - +# Always use the expanded view for Undeadly.org articles +# (Note the $ at the end of the URL pattern to make sure +# the request for the rewritten URL isn't redirected as well) +{+redirect{s@$@&mode=expanded@}} +undeadly.org/cgi\?action=article&sid=\d*$ - +# Redirect Google search requests to MSN +{+redirect{s@^http://[^/]*/search\?q=([^&]*).*@http://search.msn.com/results.aspx?q=$1@}} +.google.com/search - +# Redirect MSN search requests to Yahoo +{+redirect{s@^http://[^/]*/results\.aspx\?q=([^&]*).*@http://search.yahoo.com/search?p=$1@}} +search.msn.com//results\.aspx\?q= +# Redirect http://example.com/&bla=fasel&toChange=foo (and any other value but "bar") +# to http://example.com/&bla=fasel&toChange=bar +# +# The URL pattern makes sure that the following request isn't redirected again. +{+redirect{s@toChange=[^&]+@toChange=bar@}} +example.com/.*toChange=(?!bar) - -Privoxy Copyright, License and History +# Add a shortcut to look up illumos bugs +{+redirect{s@^http://i([0-9]+)/.*@https://www.illumos.org/issues/$1@}} +# Redirected URL = http://i4974/ +# Redirect Destination = https://www.illumos.org/issues/4974 +i[0-9][0-9][0-9][0-9]*/ - - ©right; - +# Redirect remote requests for this manual +# to the local version delivered by Privoxy +{+redirect{s@^http://www@http://config@}} +www.privoxy.org/user-manual/ + + + - -License - - &license; - - - + + + +server-header-filter -History - - &history; - - - -Authors - - &p-authors; - - + + + Typical use: + + + Rewrite or remove single server headers. + + + - + + Effect: + + + All server headers to which this action applies are filtered on-the-fly + through the specified regular expression based substitutions. + + + - + + Type: + + + Multi-value. + + + + Parameter: + + + The name of a server-header filter, as defined in one of the + filter files. + + + - -See Also - - &seealso; - - + + Notes: + + + Server-header filters are applied to each header on its own, not to + all at once. This makes it easier to diagnose problems, but on the downside + you can't write filters that only change header x if header y's value is z. + You can do that by using tags though. + + + Server-header filters are executed after the other header actions have finished + and use their output as input. + + + Please refer to the filter file chapter + to learn which server-header filters are available by default, and how to + create your own. + + + + + Example usage (section): + + + +{+server-header-filter{html-to-xml}} +example.org/xml-instance-that-is-delivered-as-html +{+server-header-filter{xml-to-html}} +example.org/instance-that-is-delivered-as-xml-but-is-not + + + + - -Appendix + + - -Regular Expressions - - Privoxy uses Perl-style regular - expressions in its actions - files and filter file, - through the PCRE and - - PCRS libraries. - - - - If you are reading this, you probably don't understand what regular - expressions are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) - - - - Regular expressions provide a language to describe patterns that can be - run against strings of characters (letter, numbers, etc), to see if they - match the string or not. The patterns are themselves (sometimes complex) - strings of literal characters, combined with wild-cards, and other special - characters, called meta-characters. The meta-characters have - special meanings and are used to build complex patterns to be matched against. - Perl Compatible Regular Expressions are an especially convenient - dialect of the regular expression language. - - - - To make a simple analogy, we do something similar when we use wild-card - characters when listing files with the dir command in DOS. - *.* matches all filenames. The special - character here is the asterisk which matches any and all characters. We can be - more specific and use ? to match just individual - characters. So dir file?.text would match - file1.txt, file2.txt, etc. We are pattern - matching, using a similar technique to regular expressions! - - - - Regular expressions do essentially the same thing, but are much, much more - powerful. There are many more special characters and ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples: - - - - - . - Matches any single character, e.g. a, - A, 4, :, or @. - - + +server-header-tagger - - - ? - The preceding character or expression is matched ZERO or ONE - times. Either/or. - - + + + Typical use: + + + Enable or disable filters based on the Content-Type header. + + + - - - + - The preceding character or expression is matched ONE or MORE - times. - - + + Effect: + + + Server headers to which this action applies are filtered on-the-fly through + the specified regular expression based substitutions, the result is used as + tag. + + + - - - * - The preceding character or expression is matched ZERO or MORE - times. - - + + Type: + + + Multi-value. + + - - - \ - The escape character denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g. .) needs to be taken literally and - not as a special meta-character. Example: example\.com, makes - sure the period is recognized only as a period (and not expanded to its - meta-character meaning of any single character). - - + + Parameter: + + + The name of a server-header tagger, as defined in one of the + filter files. + + + - - - [ ] - Characters enclosed in brackets will be matched if - any of the enclosed characters are encountered. For instance, [0-9] - matches any numeric digit (zero through nine). As an example, we can combine - this with + to match any digit one of more times: [0-9]+. - - + + Notes: + + + Server-header taggers are applied to each header on its own, + and as the header isn't modified, each tagger sees + the original. + + + Server-header taggers are executed before all other header actions + that modify server headers. Their tags can be used to control + all of the other server-header actions, the content filters + and the crunch actions (redirect + and block). + + + Obviously crunching based on tags created by server-header taggers + doesn't prevent the request from showing up in the server's log file. + - - - ( ) - parentheses are used to group a sub-expression, - or multiple sub-expressions. - - + + - - - | - The bar character works like an - or conditional statement. A match is successful if the - sub-expression on either side of | matches. As an example: - /(this|that) example/ uses grouping and the bar character - and would match either this example or that - example, and nothing else. - - + + Example usage (section): + + + +# Tag every request with the content type declared by the server +{+server-header-tagger{content-type}} +/ - - These are just some of the ones you are likely to use when matching URLs with - Privoxy, and is a long way from a definitive - list. This is enough to get us started with a few simple examples which may - be more illuminating: - +# If the response has a tag starting with 'image/' enable an external +# filter that only applies to images. +# +# Note that the filter is not available by default, it's just a +# silly example. +{+external-filter{rotate-image} +force-text-mode} +TAG:^image/ + + + + - - /.*/banners/.* - A simple example - that uses the common combination of . and * to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (.*) another literal forward slash, the string - banners, another forward slash, and lastly another - .*. We are building - a directory path here. This will match any file with the path that has a - directory named banners in it. The .* matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: - /eye/hate/spammers/banners/annoy_me_please.gif, or just - /banners/annoying.html, or almost an infinite number of other - possible combinations, just so it has banners in the path - somewhere. - + + - - And now something a little more complex: - - - /.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again (/), so we are - building another expression that is a file path statement. We have another - .*, so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must - match our pattern is adv, together with - the forward slashes. What comes after the adv string is the - interesting part. - + + +session-cookies-only - - Remember the ? means the preceding expression (either a - literal character or anything grouped with (...) in this case) - can exist or not, since this means either zero or one match. So - ((er)?ts?|ertis(ing|ements?)) is optional, as are the - individual sub-expressions: (er), - (ing|ements?), and the s. The | - means or. We have two of those. For instance, - (ing|ements?), can expand to match either ing - OR ements?. What is being done here, is an - attempt at matching as many variations of advertisement, and - similar, as possible. So this would expand to match just adv, - or advert, or adverts, or - advertising, or advertisement, or - advertisements. You get the idea. But it would not match - advertizements (with a z). We could fix that by - changing our regular expression to: - /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match - either spelling. - + + + Typical use: + + + Allow only temporary session cookies (for the current + browser session only). + + + - - /.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square brackets - [ ] can be matched. This is using 0-9 as a - shorthand expression to mean any digit one through nine. It is the same as - saying 0123456789. So any digit matches. The + - means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping: (gif|jpe?g). - This includes a |, so this needs to match the expression on - either side of that bar character also. A simple gif on one side, and the other - side will in turn match either jpeg or jpg, - since the ? means the letter e is optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string advert, then one or more digits, and a . - (which is now a literal, and not a special character, since it is escaped - with \), and lastly either gif, or - jpeg, or jpg. Some possible matches would - include: //advert1.jpg, - /nasty/ads/advert1234.gif, - /banners/from/hell/advert99.jpg. It would not match - advert1.gif (no leading slash), or - /adverts232.jpg (the expression does not include an - s), or /advert1.jsp (jsp is not - in the expression anywhere). - + + Effect: + + + Deletes the expires field from Set-Cookie: + server headers. Most browsers will not store such cookies permanently and + forget them in between sessions. + + + - - We are barely scratching the surface of regular expressions here so that you - can understand the default Privoxy - configuration files, and maybe use this knowledge to customize your own - installation. There is much, much more that can be done with regular - expressions. Now that you know enough to get started, you can learn more on - your own :/ - + + Type: + + + Boolean. + + - - More reading on Perl Compatible Regular expressions: - http://perldoc.perl.org/perlre.html - + + Parameter: + + + N/A + + + - - For information on regular expression based substitutions and their applications - in filters, please see the filter file tutorial - in this manual. - - + + Notes: + + + This is less strict than crunch-incoming-cookies / + crunch-outgoing-cookies and allows you to browse + websites that insist or rely on setting cookies, without compromising your privacy too badly. + + + Most browsers will not permanently store cookies that have been processed by + session-cookies-only and will forget about them between sessions. + This makes profiling cookies useless, but won't break sites which require cookies so + that you can log in for transactions. This is generally turned on for all + sites, and is the recommended setting. + + + It makes no sense at all to use session-cookies-only + together with crunch-incoming-cookies or + crunch-outgoing-cookies. If you do, cookies + will be plainly killed. + + + Note that it is up to the browser how it handles such cookies without an expires + field. If you use an exotic browser, you might want to try it out to be sure. + + + This setting also has no effect on cookies that may have been stored + previously by the browser before starting Privoxy. + These would have to be removed manually. + + + Privoxy also uses + the content-cookies filter + to block some types of cookies. Content cookies are not effected by + session-cookies-only. + + + - + + Example usage: + + + +session-cookies-only + + + + + - -Privoxy's Internal Pages + +set-image-blocker - - Since Privoxy proxies each requested - web page, it is easy for Privoxy to - trap certain special URLs. In this way, we can talk directly to - Privoxy, and see how it is - configured, see how our rules are being applied, change these - rules and other configuration options, and even turn - Privoxy's filtering off, all with - a web browser. + + + Typical use: + + Choose the replacement for blocked images + + - + + Effect: + + + This action alone doesn't do anything noticeable. If both + block and handle-as-image also + apply, i.e. if the request is to be blocked as an image, + then the parameter of this action decides what will be + sent as a replacement. + + + - - The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, - Privoxy must be running to access these. If - not, you will get a friendly error message. Internet access is not - necessary either. - + + Type: + + + Parameterized. + + - - + + Parameter: + + + + + pattern to send a built-in checkerboard pattern image. The image is visually + decent, scales very well, and makes it obvious where banners were busted. + + + + + blank to send a built-in transparent image. This makes banners disappear + completely, but makes it hard to detect where Privoxy has blocked + images on a given page and complicates troubleshooting if Privoxy + has blocked innocent images, like navigation icons. + + + + + target-url to + send a redirect to target-url. You can redirect + to any image anywhere, even in your local filesystem via file:/// URL. + (But note that not all browsers support redirecting to a local file system). + + + A good application of redirects is to use special Privoxy-built-in + URLs, which send the built-in images, as target-url. + This has the same visual effect as specifying blank or pattern in + the first place, but enables your browser to cache the replacement image, instead of requesting + it over and over again. + + + + + - - - Privoxy main page: - -
+ + Notes: + - http://config.privoxy.org/ + The URLs for the built-in images are http://config.privoxy.org/send-banner?type=type, where type is + either blank or pattern. -
- - There is a shortcut: http://p.p/ (But it - doesn't provide a fall-back to a real page, in case the request is not - sent through Privoxy) - -
- - - - Show information about the current configuration, including viewing and - editing of actions files: - -
- http://config.privoxy.org/show-status + There is a third (advanced) type, called auto. It is NOT to be + used in set-image-blocker, but meant for use from filters. + Auto will select the type of image that would have applied to the referring page, had it been an image. -
-
+ + - - - Show the source code version numbers: - -
+ + Example usage: + - http://config.privoxy.org/show-version + Built-in pattern: -
-
- - - - Show the browser's request headers: - -
- http://config.privoxy.org/show-request + +set-image-blocker{pattern} -
-
- - - - Show which actions apply to a URL and why: - -
- http://config.privoxy.org/show-url-info + Redirect to the BSD daemon: -
-
- - - - Toggle Privoxy on or off. This feature can be turned off/on in the main - config file. When toggled off, Privoxy - continues to run, but only as a pass-through proxy, with no actions taking - place: - -
- http://config.privoxy.org/toggle + +set-image-blocker{http://www.freebsd.org/gifs/dae_up3.gif} -
- - Short cuts. Turn off, then on: - -
- http://config.privoxy.org/toggle?set=disable + Redirect to the built-in pattern for better caching: -
-
- http://config.privoxy.org/toggle?set=enable + +set-image-blocker{http://config.privoxy.org/send-banner?type=pattern} -
-
+ + + +
+ + + + +Summary + + Note that many of these actions have the potential to cause a page to + misbehave, possibly even not to display at all. There are many ways + a site designer may choose to design his site, and what HTTP header + content, and other criteria, he may depend on. There is no way to have hard + and fast rules for all sites. See the Appendix for a brief example on troubleshooting + actions. + + +
+ + + +Aliases + + Custom actions, known to Privoxy + as aliases, can be defined by combining other actions. + These can in turn be invoked just like the built-in actions. + Currently, an alias name can contain any character except space, tab, + =, + { and }, but we strongly + recommend that you only use a to z, + 0 to 9, +, and -. + Alias names are not case sensitive, and are not required to start with a + + or - sign, since they are merely textually + expanded. + + + Aliases can be used throughout the actions file, but they must be + defined in a special section at the top of the file! + And there can only be one such section per actions file. Each actions file may + have its own alias section, and the aliases defined in it are only visible + within that file. + + + There are two main reasons to use aliases: One is to save typing for frequently + used combinations of actions, the other one is a gain in flexibility: If you + decide once how you want to handle shops by defining an alias called + shop, you can later change your policy on shops in + one place, and your changes will take effect everywhere + in the actions file where the shop alias is used. Calling aliases + by their purpose also makes your actions files more readable. + + + Currently, there is one big drawback to using aliases, though: + Privoxy's built-in web-based action file + editor honors aliases when reading the actions files, but it expands + them before writing. So the effects of your aliases are of course preserved, + but the aliases themselves are lost when you edit sections that use aliases + with it. + + + + Now let's define some aliases... + + + + + # Useful custom aliases we can use later. + # + # Note the (required!) section header line and that this section + # must be at the top of the actions file! + # + {{alias}} + + # These aliases just save typing later: + # (Note that some already use other aliases!) + # + +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies + -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + +block-as-image = +block{Blocked image.} +handle-as-image + allow-all-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - + # These aliases define combinations of actions + # that are useful for certain types of sites: + # + fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer -prevent-compression + + shop = -crunch-all-cookies -filter{all-popups} + + # Short names for other aliases, for really lazy people ;-) + # + c0 = +crunch-all-cookies + c1 = -crunch-all-cookies + + + + ...and put them to use. These sections would appear in the lower part of an + actions file and define exceptions to the default actions (as specified further + up for the / pattern): - These may be bookmarked for quick reference. See next. + + # These sites are either very complex or very keen on + # user data and require minimal interference to work: + # + {fragile} + .office.microsoft.com + .windowsupdate.microsoft.com + # Gmail is really mail.google.com, not gmail.com + mail.google.com + + # Shopping sites: + # Allow cookies (for setting and retrieving your customer data) + # + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + mybank.example.com + # These shops require pop-ups: + # + {-filter{all-popups} -filter{unsolicited-popups}} + .dabs.com + .overclockers.co.uk - -Bookmarklets - Below are some bookmarklets to allow you to easily access a - mini version of some of Privoxy's - special pages. They are designed for MS Internet Explorer, but should work - equally well in Netscape, Mozilla, and other browsers which support - JavaScript. They are designed to run directly from your bookmarks - not by - clicking the links below (although that should work for testing). + Aliases like shop and fragile are typically used for + problem sites that require more than one action to be disabled + in order to function properly. + + + + +Actions Files Tutorial - To save them, right-click the link and choose Add to Favorites - (IE) or Add Bookmark (Netscape). You will get a warning that - the bookmark may not be safe - just click OK. Then you can run the - Bookmarklet directly from your favorites/bookmarks. For even faster access, - you can put them on the Links bar (IE) or the Personal - Toolbar (Netscape), and run them with a single click. + The above chapters have shown which actions files + there are and how they are organized, how actions are specified and applied + to URLs, how patterns work, and how to + define and use aliases. Now, let's look at an + example match-all.action, default.action + and user.action file and see how all these pieces come together: + +match-all.action - - - - - Privoxy - Enable - - - - - - Privoxy - Disable - - - - - - Privoxy - Toggle Privoxy (Toggles between enabled and disabled) - - + Remember all actions are disabled when matching starts, + so we have to explicitly enable the ones we want. + - - - Privoxy- View Status - - - - - - Privoxy - Why? - - - + + While the match-all.action file only contains a + single section, it is probably the most important one. It has only one + pattern, /, but this pattern + matches all URLs. Therefore, the set of + actions used in this default section will + be applied to all requests as a start. It can be partly or + wholly overridden by other actions files like default.action + and user.action, but it will still be largely responsible + for your overall browsing experience. - Credit: The site which gave us the general idea for these bookmarklets is - www.bookmarklets.com. They - have more information about bookmarklets. + Again, at the start of matching, all actions are disabled, so there is + no need to disable any actions here. (Remember: a + + preceding the action name enables the action, a - disables!). + Also note how this long line has been made more readable by splitting it into + multiple lines with line continuation. + + +{ \ + +change-x-forwarded-for{block} \ + +hide-from-header{block} \ + +set-image-blocker{pattern} \ +} +/ # Match all URLs + + + + The default behavior is now set. + - + +default.action + + If you aren't a developer, there's no need for you to edit the + default.action file. It is maintained by + the &my-app; developers and if you disagree with some of the + sections, you should overrule them in your user.action. + - - -Chain of Events - Let's take a quick look at how some of Privoxy's - core features are triggered, and the ensuing sequence of events when a web - page is requested by your browser: + Understanding the default.action file can + help you with your user.action, though. - - - - First, your web browser requests a web page. The browser knows to send - the request to Privoxy, which will in turn, - relay the request to the remote web server after passing the following - tests: - - - - - Privoxy traps any request for its own internal CGI - pages (e.g http://p.p/) and sends the CGI page back to the browser. - - - - - Next, Privoxy checks to see if the URL - matches any +block patterns. If - so, the URL is then blocked, and the remote web server will not be contacted. - +handle-as-image - and - +handle-as-empty-document - are then checked, and if there is no match, an - HTML BLOCKED page is sent back to the browser. Otherwise, if - it does match, an image is returned for the former, and an empty text - document for the latter. The type of image would depend on the setting of - +set-image-blocker - (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). - - - - - Untrusted URLs are blocked. If URLs are being added to the - trust file, then that is done. - - - - - If the URL pattern matches the +fast-redirects action, - it is then processed. Unwanted parts of the requested URL are stripped. - - - - - Now the rest of the client browser's request headers are processed. If any - of these match any of the relevant actions (e.g. +hide-user-agent, - etc.), headers are suppressed or forged as determined by these actions and - their parameters. - - - - - Now the web server starts sending its response back (i.e. typically a web - page). - - - - - First, the server headers are read and processed to determine, among other - things, the MIME type (document type) and encoding. The headers are then - filtered as determined by the - +crunch-incoming-cookies, - +session-cookies-only, - and +downgrade-http-version - actions. - - - - - If any +filter action - or +deanimate-gifs - action applies (and the document type fits the action), the rest of the page is - read into memory (up to a configurable limit). Then the filter rules (from - default.filter and any other filter files) are - processed against the buffered content. Filters are applied in the order - they are specified in one of the filter files. Animated GIFs, if present, - are reduced to either the first or last frame, depending on the action - setting.The entire page, which is now filtered, is then sent by - Privoxy back to your browser. - - - If neither a +filter action - or +deanimate-gifs - matches, then Privoxy passes the raw data through - to the client browser as it becomes available. - - - - - As the browser receives the now (possibly filtered) page content, it - reads and then requests any URLs that may be embedded within the page - source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. - frames), sounds, etc. For each of these objects, the browser issues a - separate request (this is easily viewable in Privoxy's - logs). And each such request is in turn processed just as above. Note that a - complex web page will have many, many such embedded URLs. If these - secondary requests are to a different server, then quite possibly a very - differing set of actions is triggered. - - + The first section in this file is a special section for internal use + that prevents older &my-app; versions from reading the file: + - + + +########################################################################## +# Settings -- Don't change! For internal Privoxy use ONLY. +########################################################################## +{{settings}} +for-privoxy-version=3.0.11 + - NOTE: This is somewhat of a simplistic overview of what happens with each URL - request. For the sake of brevity and simplicity, we have focused on - Privoxy's core features only. + After that comes the (optional) alias section. We'll use the example + section from the above chapter on aliases, + that also explains why and how aliases are used: - - + + +########################################################################## +# Aliases +########################################################################## +{{alias}} - - -Troubleshooting: Anatomy of an Action + # These aliases just save typing later: + # (Note that some already use other aliases!) + # + +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies + -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + +block-as-image = +block{Blocked image.} +handle-as-image + mercy-for-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - - The way Privoxy applies - actions and filters - to any given URL can be complex, and not always so - easy to understand what is happening. And sometimes we need to be able to - see just what Privoxy is - doing. Especially, if something Privoxy is doing - is causing us a problem inadvertently. It can be a little daunting to look at - the actions and filters files themselves, since they tend to be filled with - regular expressions whose consequences are not - always so obvious. + # These aliases define combinations of actions + # that are useful for certain types of sites: + # + fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer + shop = -crunch-all-cookies -filter{all-popups} - One quick test to see if Privoxy is causing a problem - or not, is to disable it temporarily. This should be the first troubleshooting - step. See the Bookmarklets section on a quick - and easy way to do this (be sure to flush caches afterward!). Looking at the - logs is a good idea too. (Note that both the toggle feature and logging are - enabled via config file settings, and may need to be - turned on.) + The first of our specialized sections is concerned with fragile + sites, i.e. sites that require minimum interference, because they are either + very complex or very keen on tracking you (and have mechanisms in place that + make them unusable for people who avoid being tracked). We will simply use + our pre-defined fragile alias instead of stating the list + of actions explicitly: + - Another easy troubleshooting step to try is if you have done any - customization of your installation, revert back to the installed - defaults and see if that helps. There are times the developers get complaints - about one thing or another, and the problem is more related to a customized - configuration issue. + +########################################################################## +# Exceptions for sites that'll break under the default action set: +########################################################################## + +# "Fragile" Use a minimum set of actions for these sites (see alias above): +# +{ fragile } +.office.microsoft.com # surprise, surprise! +.windowsupdate.microsoft.com +mail.google.com - Privoxy also provides the - http://config.privoxy.org/show-url-info - page that can show us very specifically how actions - are being applied to any given URL. This is a big help for troubleshooting. + Shopping sites are not as fragile, but they typically + require cookies to log in, and pop-up windows for shopping + carts or item details. Again, we'll use a pre-defined alias: - First, enter one URL (or partial URL) at the prompt, and then - Privoxy will tell us - how the current configuration will handle it. This will not - help with filtering effects (i.e. the +filter action) from - one of the filter files since this is handled very - differently and not so easy to trap! It also will not tell you about any other - URLs that may be embedded within the URL you are testing. For instance, images - such as ads are expressed as URLs within the raw page source of HTML pages. So - you will only get info for the actual URL that is pasted into the prompt area - -- not any sub-URLs. If you want to know about embedded URLs like ads, you - will have to dig those out of the HTML source. Use your browser's View - Page Source option for this. Or right click on the ad, and grab the - URL. + +# Shopping sites: +# +{ shop } +.quietpc.com +.worldpay.com # for quietpc.com +.jungle.com +.scan.co.uk - Let's try an example, google.com, - and look at it one section at a time in a sample configuration (your real - configuration may vary): + The fast-redirects + action, which may have been enabled in match-all.action, + breaks some sites. So disable it for popular sites where we know it misbehaves: - Matches for http://www.google.com: +{ -fast-redirects } +login.yahoo.com +edit.*.yahoo.com +.google.com +.altavista.com/.*(like|url|link):http +.altavista.com/trans.*urltext=http +.nytimes.com + - In file: default.action [ View ] [ Edit ] + + It is important that Privoxy knows which + URLs belong to images, so that if they are to + be blocked, a substitute image can be sent, rather than an HTML page. + Contacting the remote site to find out is not an option, since it + would destroy the loading time advantage of banner blocking, and it + would feed the advertisers information about you. We can mark any + URL as an image with the handle-as-image action, + and marking all URLs that end in a known image file extension is a + good start: + - {+change-x-forwarded-for{block} - +deanimate-gifs {last} - +fast-redirects {check-decoded-url} - +filter {refresh-tags} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - +hide-from-header {block} - +hide-referrer {forge} - +session-cookies-only - +set-image-blocker {pattern} -/ + + +########################################################################## +# Images: +########################################################################## - { -session-cookies-only } - .google.com +# Define which file types will be treated as images, in case they get +# blocked further down this file: +# +{ +handle-as-image } +/.*\.(gif|jpe?g|png|bmp|ico)$ + - { -fast-redirects } - .google.com + + And then there are known banner sources. They often use scripts to + generate the banners, so it won't be visible from the URL that the + request is for an image. Hence we block them and + mark them as images in one go, with the help of our + +block-as-image alias defined above. (We could of + course just as well use +block + +handle-as-image here.) + Remember that the type of the replacement image is chosen by the + set-image-blocker + action. Since all URLs have matched the default section with its + +set-image-blocker{pattern} + action before, it still applies and needn't be repeated: + -In file: user.action [ View ] [ Edit ] -(no matches in this file) - + + +# Known ad generators: +# +{ +block-as-image } +ar.atwola.com +.ad.doubleclick.net +.ad.*.doubleclick.net +.a.yimg.com/(?:(?!/i/).)*$ +.a[0-9].yimg.com/(?:(?!/i/).)*$ +bs*.gsanet.com +.qkimg.net - This is telling us how we have defined our - actions, and - which ones match for our test case, google.com. - Displayed is all the actions that are available to us. Remember, - the + sign denotes on. - - denotes off. So some are on here, but many - are off. Each example we try may provide a slightly different - end result, depending on our configuration directives. + One of the most important jobs of Privoxy + is to block banners. Many of these can be blocked + by the filter{banners-by-size} + action, which we enabled above, and which deletes the references to banner + images from the pages while they are loaded, so the browser doesn't request + them anymore, and hence they don't need to be blocked here. But this naturally + doesn't catch all banners, and some people choose not to use filters, so we + need a comprehensive list of patterns for banner URLs here, and apply the + block action to them. - The first listing - is for our default.action file. The large, multi-line - listing, is how the actions are set to match for all URLs, i.e. our default - settings. If you look at your actions file, this would be the - section just below the aliases section near the top. This - will apply to all URLs as signified by the single forward slash at the end - of the listing -- / . + First comes many generic patterns, which do most of the work, by + matching typical domain and path name components of banners. Then comes + a list of individual patterns for specific sites, which is omitted here + to keep the example short: - But we have defined additional actions that would be exceptions to these general - rules, and then we list specific URLs (or patterns) that these exceptions - would apply to. Last match wins. Just below this then are two explicit - matches for .google.com. The first is negating our previous - cookie setting, which was for +session-cookies-only - (i.e. not persistent). So we will allow persistent cookies for google, at - least that is how it is in this example. The second turns - off any +fast-redirects - action, allowing this to take place unmolested. Note that there is a leading - dot here -- .google.com. This will match any hosts and - sub-domains, in the google.com domain also, such as - www.google.com or mail.google.com. But it would not - match www.google.de! So, apparently, we have these two actions - defined as exceptions to the general rules at the top somewhere in the lower - part of our default.action file, and - google.com is referenced somewhere in these latter sections. + +########################################################################## +# Block these fine banners: +########################################################################## +{ +block{Banner ads.} } + +# Generic patterns: +# +ad*. +.*ads. +banner?. +count*. +/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) +/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ + +# Site-specific patterns (abbreviated): +# +.hitbox.com + + + + It's quite remarkable how many advertisers actually call their banner + servers ads.company.com, or call the directory + in which the banners are stored simply banners. So the above + generic patterns are surprisingly effective. - - Then, for our user.action file, we again have no hits. - So there is nothing google-specific that we might have added to our own, local - configuration. If there was, those actions would over-rule any actions from - previously processed files, such as default.action. - user.action typically has the last word. This is the - best place to put hard and fast exceptions, + But being very generic, they necessarily also catch URLs that we don't want + to block. The pattern .*ads. e.g. catches + nasty-ads.nasty-corp.com as intended, + but also downloads.sourcefroge.net or + adsl.some-provider.net. So here come some + well-known exceptions to the +block + section above. - - And finally we pull it all together in the bottom section and summarize how - Privoxy is applying all its actions - to google.com: - + Note that these are exceptions to exceptions from the default! Consider the URL + downloads.sourcefroge.net: Initially, all actions are deactivated, + so it wouldn't get blocked. Then comes the defaults section, which matches the + URL, but just deactivates the block + action once again. Then it matches .*ads., an exception to the + general non-blocking policy, and suddenly + +block applies. And now, it'll match + .*loads., where -block + applies, so (unless it matches again further down) it ends up + with no block action applying. +########################################################################## +# Save some innocent victims of the above generic block patterns: +########################################################################## - Final results: +# By domain: +# +{ -block } +adv[io]*. # (for advogato.org and advice.*) +adsl. # (has nothing to do with ads) +adobe. # (has nothing to do with ads either) +ad[ud]*. # (adult.* and add.*) +.edu # (universities don't host banners (yet!)) +.*loads. # (downloads, uploads etc) - -add-header - -block - +change-x-forwarded-for{block} - -client-header-filter{hide-tor-exit-notation} - -content-type-overwrite - -crunch-client-header - -crunch-if-none-match - -crunch-incoming-cookies - -crunch-outgoing-cookies - -crunch-server-header - +deanimate-gifs {last} - -downgrade-http-version - -fast-redirects - -filter {js-events} - -filter {content-cookies} - -filter {all-popups} - -filter {banners-by-link} - -filter {tiny-textforms} - -filter {frameset-borders} - -filter {demoronizer} - -filter {shockwave-flash} - -filter {quicktime-kioskmode} - -filter {fun} - -filter {crude-parental} - -filter {site-specifics} - -filter {js-annoyances} - -filter {html-annoyances} - +filter {refresh-tags} - -filter {unsolicited-popups} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - -filter {google} - -filter {yahoo} - -filter {msn} - -filter {blogspot} - -filter {no-ping} - -force-text-mode - -handle-as-empty-document - -handle-as-image - -hide-accept-language - -hide-content-disposition - +hide-from-header {block} - -hide-if-modified-since - +hide-referrer {forge} - -hide-user-agent - -limit-connect - -overwrite-last-modified - -prevent-compression - -redirect - -server-header-filter{xml-to-html} - -server-header-filter{html-to-xml} - -session-cookies-only - +set-image-blocker {pattern} +# By path: +# +/.*loads/ + +# Site-specific: +# +www.globalintersec.com/adv # (adv = advanced) +www.ugu.com/sui/ugu/adv - Notice the only difference here to the previous listing, is to - fast-redirects and session-cookies-only, - which are activated specifically for this site in our configuration, - and thus show in the Final Results. + Filtering source code can have nasty side effects, + so make an exception for our friends at sourceforge.net, + and all paths with cvs in them. Note that + -filter + disables all filters in one fell swoop! - Now another example, ad.doubleclick.net: + +# Don't filter code! +# +{ -filter } +/(.*/)?cvs +bugzilla. +developer. +wiki. +.sourceforge.net - + The actual default.action is of course much more + comprehensive, but we hope this example made clear how it works. + - { +block{Domains starts with "ad"} } - ad*. + - { +block{Domain contains "ad"} } - .ad. +user.action - { +block{Doubleclick banner server} +handle-as-image } - .[a-vx-z]*.doubleclick.net - + + So far we are painting with a broad brush by setting general policies, + which would be a reasonable starting point for many people. Now, + you might want to be more specific and have customized rules that + are more suitable to your personal habits and preferences. These would + be for narrowly defined situations like your ISP or your bank, and should + be placed in user.action, which is parsed after all other + actions files and hence has the last word, over-riding any previously + defined actions. user.action is also a + safe place for your personal settings, since + default.action is actively maintained by the + Privoxy developers and you'll probably want + to install updated versions from time to time. - We'll just show the interesting part here - the explicit matches. It is - matched three different times. Two +block{} sections, - and a +block{} +handle-as-image, - which is the expanded form of one of our aliases that had been defined as: - +block-as-image. (Aliases are defined in - the first section of the actions file and typically used to combine more - than one action.) + So let's look at a few examples of things that one might typically do in + user.action: + + + - Any one of these would have done the trick and blocked this as an unwanted - image. This is unnecessarily redundant since the last case effectively - would also cover the first. No point in taking chances with these guys - though ;-) Note that if you want an ad or obnoxious - URL to be invisible, it should be defined as ad.doubleclick.net - is done here -- as both a +block{} - and an - +handle-as-image. - The custom alias +block-as-image just - simplifies the process and make it more readable. + +# My user.action file. <fred@example.com> - One last example. Let's try http://www.example.net/adsl/HOWTO/. - This one is giving us problems. We are getting a blank page. Hmmm ... + As aliases are local to the actions + file that they are defined in, you can't use the ones from + default.action, unless you repeat them here: +# Aliases are local to the file they are defined in. +# (Re-)define aliases for this file: +# +{{alias}} +# +# These aliases just save typing later, and the alias names should +# be self explanatory. +# ++crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies +-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + allow-all-cookies = -crunch-all-cookies -session-cookies-only + allow-popups = -filter{all-popups} ++block-as-image = +block{Blocked as image.} +handle-as-image +-block-as-image = -block - Matches for http://www.example.net/adsl/HOWTO/: - - In file: default.action [ View ] [ Edit ] - - {-add-header - -block - +change-x-forwarded-for{block} - -client-header-filter{hide-tor-exit-notation} - -content-type-overwrite - -crunch-client-header - -crunch-if-none-match - -crunch-incoming-cookies - -crunch-outgoing-cookies - -crunch-server-header - +deanimate-gifs - -downgrade-http-version - +fast-redirects {check-decoded-url} - -filter {js-events} - -filter {content-cookies} - -filter {all-popups} - -filter {banners-by-link} - -filter {tiny-textforms} - -filter {frameset-borders} - -filter {demoronizer} - -filter {shockwave-flash} - -filter {quicktime-kioskmode} - -filter {fun} - -filter {crude-parental} - -filter {site-specifics} - -filter {js-annoyances} - -filter {html-annoyances} - +filter {refresh-tags} - -filter {unsolicited-popups} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - -filter {google} - -filter {yahoo} - -filter {msn} - -filter {blogspot} - -filter {no-ping} - -force-text-mode - -handle-as-empty-document - -handle-as-image - -hide-accept-language - -hide-content-disposition - +hide-from-header{block} - +hide-referer{forge} - -hide-user-agent - -overwrite-last-modified - +prevent-compression - -redirect - -server-header-filter{xml-to-html} - -server-header-filter{html-to-xml} - +session-cookies-only - +set-image-blocker{blank} } - / +# These aliases define combinations of actions that are useful for +# certain types of sites: +# +fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer +shop = -crunch-all-cookies allow-popups + +# Allow ads for selected useful free sites: +# +allow-ads = -block -filter{banners-by-size} -filter{banners-by-link} + +# Alias for specific file types that are text, but might have conflicting +# MIME types. We want the browser to force these to be text documents. +handle-as-text = -filter +-content-type-overwrite{text/plain} +-force-text-mode -hide-content-disposition - { +block{Path contains "ads".} +handle-as-image } - /ads - - Ooops, the /adsl/ is matching /ads in our - configuration! But we did not want this at all! Now we see why we get the - blank page. It is actually triggering two different actions here, and - the effects are aggregated so that the URL is blocked, and &my-app; is told - to treat the block as if it were an image. But this is, of course, all wrong. - We could now add a new action below this (or better in our own - user.action file) that explicitly - un blocks ( - {-block}) paths with - adsl in them (remember, last match in the configuration - wins). There are various ways to handle such exceptions. Example: + Say you have accounts on some sites that you visit regularly, and + you don't want to have to log in manually each time. So you'd like + to allow persistent cookies for these sites. The + allow-all-cookies alias defined above does exactly + that, i.e. it disables crunching of cookies in any direction, and the + processing of cookies to make them only temporary. +{ allow-all-cookies } + sourceforge.net + .yahoo.com + .msdn.microsoft.com + .redhat.com + - { -block } - /adsl - + + Your bank is allergic to some filter, but you don't know which, so you disable them all: - Now the page displays ;-) - Remember to flush your browser's caches when making these kinds of changes to - your configuration to insure that you get a freshly delivered page! Or, try - using Shift+Reload. + +{ -filter } + .your-home-banking-site.com - But now what about a situation where we get no explicit matches like - we did with: + Some file types you may not want to filter for various reasons: +# Technical documentation is likely to contain strings that might +# erroneously get altered by the JavaScript-oriented filters: +# +.tldp.org +/(.*/)?selfhtml/ - { +block{Path starts with "ads".} +handle-as-image } - /ads - +# And this stupid host sends streaming video with a wrong MIME type, +# so that Privoxy thinks it is getting HTML and starts filtering: +# +stupid-server.example.com/ - That actually was very helpful and pointed us quickly to where the problem - was. If you don't get this kind of match, then it means one of the default - rules in the first section of default.action is causing - the problem. This would require some guesswork, and maybe a little trial and - error to isolate the offending rule. One likely cause would be one of the - +filter actions. - These tend to be harder to troubleshoot. - Try adding the URL for the site to one of aliases that turn off - +filter: + Example of a simple block action. Say you've + seen an ad on your favourite page on example.com that you want to get rid of. + You have right-clicked the image, selected copy image location + and pasted the URL below while removing the leading http://, into a + { +block{} } section. Note that { +handle-as-image + } need not be specified, since all URLs ending in + .gif will be tagged as images by the general rules as set + in default.action anyway: - - { shop } - .quietpc.com - .worldpay.com # for quietpc.com - .jungle.com - .scan.co.uk - .forbes.com - +{ +block{Nasty ads.} } + www.example.com/nasty-ads/sponsor\.gif + another.example.net/more/junk/here/ - { shop } is an alias that expands to - { -filter -session-cookies-only }. - Or you could do your own exception to negate filtering: - + The URLs of dynamically generated banners, especially from large banner + farms, often don't use the well-known image file name extensions, which + makes it impossible for Privoxy to guess + the file type just by looking at the URL. + You can use the +block-as-image alias defined above for + these cases. + Note that objects which match this rule but then turn out NOT to be an + image are typically rendered as a broken image icon by the + browser. Use cautiously. - - { -filter } - # Disable ALL filter actions for sites in this section - .forbes.com - developer.ibm.com - localhost - +{ +block-as-image } + .doubleclick.net + .fastclick.net + /Realmedia/ads/ + ar.atwola.com/ - This would turn off all filtering for these sites. This is best - put in user.action, for local site - exceptions. Note that when a simple domain pattern is used by itself (without - the subsequent path portion), all sub-pages within that domain are included - automatically in the scope of the action. + Now you noticed that the default configuration breaks Forbes Magazine, + but you were too lazy to find out which action is the culprit, and you + were again too lazy to give feedback, so + you just used the fragile alias on the site, and + -- whoa! -- it worked. The fragile + aliases disables those actions that are most likely to break a site. Also, + good for testing purposes to see if it is Privoxy + that is causing the problem or not. We later find other regular sites + that misbehave, and add those to our personalized list of troublemakers: - Images that are inexplicably being blocked, may well be hitting the -+filter{banners-by-size} - rule, which assumes - that images of certain sizes are ad banners (works well - most of the time since these tend to be standardized). + +{ fragile } + .forbes.com + webmail.example.com + .mybank.com - { fragile } is an alias that disables most - actions that are the most likely to cause trouble. This can be used as a - last resort for problem sites. + You like the fun text replacements in default.filter, + but it is disabled in the distributed actions file. + So you'd like to turn it on in your private, + update-safe config, once and for all: - - - { fragile } - # Handle with care: easy to break - mail.google. - mybank.example.com + + +{ +filter{fun} } + / # For ALL sites! - - Remember to flush caches! Note that the - mail.google reference lacks the TLD portion (e.g. - .com). This will effectively match any TLD with - google in it, such as mail.google.de., - just as an example. + Note that the above is not really a good idea: There are exceptions + to the filters in default.action for things that + really shouldn't be filtered, like code on CVS->Web interfaces. Since + user.action has the last word, these exceptions + won't be valid for the fun filtering specified here. + - If this still does not work, you will have to go through the remaining - actions one by one to find which one(s) is causing the problem. + You might also worry about how your favourite free websites are + funded, and find that they rely on displaying banner advertisements + to survive. So you might want to specifically allow banners for those + sites that you feel provide value to you: - - - - - - Revision 2.90 2008/09/26 16:53:09 fabiankeil - Update "What's new" section. + - Revision 2.89 2008/09/21 15:38:56 fabiankeil - Fix Portage tree sync instructions in Gentoo section. - Anonymously reported at ijbswa-developers@. + - Revision 2.88 2008/09/21 14:42:52 fabiankeil - Add documentation for change-x-forwarded-for{}, - remove documentation for hide-forwarded-for-headers. + - Revision 2.87 2008/08/30 15:37:35 fabiankeil - Update entities. + +Filter Files - Revision 2.86 2008/08/16 10:12:23 fabiankeil - Merge two sentences and move the URL to the end of the item. + + On-the-fly text substitutions need + to be defined in a filter file. Once defined, they + can then be invoked as an action. + - Revision 2.85 2008/08/16 10:04:59 fabiankeil - Some more syntax fixes. This version actually builds. + + &my-app; supports three different pcrs-based filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server. + - Revision 2.84 2008/08/16 09:42:45 fabiankeil - Turns out building docs works better if the syntax is valid. + + &my-app; also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns. + - Revision 2.83 2008/08/16 09:32:02 fabiankeil - Mention changes since 3.0.9 beta. + + Finally &my-app; supports the + external-filter action + to enable external filters + written in proper programming languages. + - Revision 2.82 2008/08/16 09:00:52 fabiankeil - Fix example URL pattern (once more with feeling). - Revision 2.81 2008/08/16 08:51:28 fabiankeil - Update version-related entities. + + Multiple filter files can be defined through the filterfile config directive. The filters + as supplied by the developers are located in + default.filter. It is recommended that any locally + defined or modified filters go in a separately defined file such as + user.filter. + - Revision 2.80 2008/07/18 16:54:30 fabiankeil - Remove erroneous whitespace in documentation link. - Reported by John Chronister in #2021611. + + Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, + exit consoles, crippled windows without navigation tools, the + infamous <BLINK> tag etc, to suppress images with certain + width and height attributes (standard banner sizes or web-bugs), + or just to have fun. + - Revision 2.79 2008/06/27 18:00:53 markm68k - remove outdated startup information for mac os x + + Enabled content filters are applied to any content whose + Content Type header is recognised as a sign + of text-based content, with the exception of text/plain. + Use the force-text-mode action + to also filter other content. + - Revision 2.78 2008/06/21 17:03:03 fabiankeil - Fix typo. + + Substitutions are made at the source level, so if you want to roll + your own filters, you should first be familiar with HTML syntax, + and, of course, regular expressions. + - Revision 2.77 2008/06/14 13:45:22 fabiankeil - Re-add a colon I unintentionally removed a few revisions ago. + + Just like the actions files, the + filter file is organized in sections, which are called filters + here. Each filter consists of a heading line, that starts with one of the + keywords FILTER:, + CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: + followed by the filter's name, and a short (one line) + description of what it does. Below that line + come the jobs, i.e. lines that define the actual + text substitutions. By convention, the name of a filter + should describe what the filter eliminates. The + comment is used in the web-based + user interface. + - Revision 2.76 2008/06/14 13:21:28 fabiankeil - Prepare for the upcoming 3.0.9 beta release. + + Once a filter called name has been defined + in the filter file, it can be invoked by using an action of the form + +filter{name} + in any actions file. + - Revision 2.75 2008/06/13 16:06:48 fabiankeil - Update the "What's New in this Release" section with - the ChangeLog entries changelog2doc.pl could handle. + + Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. + A content filter header line for a filter called foo could look + like this: + - Revision 2.74 2008/05/26 15:55:46 fabiankeil - - Update "default profiles" table. - - Add some more pcrs redirect examples and note that - enabling debug 128 helps to get redirects working. + + FILTER: foo Replace all "foo" with "bar" + - Revision 2.73 2008/05/23 14:43:18 fabiankeil - Remove previously out-commented block that caused syntax problems. + + Below that line, and up to the next header line, come the jobs that + define what text replacements the filter executes. They are specified + in a syntax that imitates Perl's + s/// operator. If you are familiar with Perl, you + will find this to be quite intuitive, and may want to look at the + PCRS documentation for the subtle differences to Perl behaviour. + - Revision 2.72 2008/05/12 10:26:14 fabiankeil - Synchronize content filter descriptions with the ones in default.filter. + + Most notably, the non-standard option letter U is supported, + which turns the default to ungreedy matching (add ? to + quantifiers to turn them greedy again). + - Revision 2.71 2008/04/10 17:37:16 fabiankeil - Actually we use "modern" POSIX 1003.2 regular - expressions in path patterns, not PCRE. + + The non-standard option letter D (dynamic) allows + to use the variables $host, $origin (the IP address the request came from), + $path and $url. They will be replaced with the value they refer to before + the filter is executed. + - Revision 2.70 2008/04/10 15:59:12 fabiankeil - Add another section to the client-header-tagger example that shows - how to actually change the action settings once the tag is created. + + Note that '$' is a bad choice for a delimiter in a dynamic filter as you + might end up with unintended variables if you use a variable name + directly after the delimiter. Variables will be resolved without + escaping anything, therefore you also have to be careful not to chose + delimiters that appear in the replacement text. For example '<' should + be save, while '?' will sooner or later cause conflicts with $url. + - Revision 2.69 2008/03/29 12:14:25 fabiankeil - Remove send-wafer and send-vanilla-wafer actions. + + The non-standard option letter T (trivial) prevents + parsing for backreferences in the substitute. Use it if you want to include + text like '$&' in your substitute without quoting. + - Revision 2.68 2008/03/28 15:13:43 fabiankeil - Remove inspect-jpegs action. + + If you are new to + Regular + Expressions, you might want to take a look at + the Appendix on regular expressions, and + see the Perl + manual for + the + s/// operator's syntax and Perl-style regular + expressions in general. + The below examples might also help to get you started. + - Revision 2.67 2008/03/27 18:31:21 fabiankeil - Remove kill-popups action. - Revision 2.66 2008/03/06 16:33:47 fabiankeil - If limit-connect isn't used, don't limit CONNECT requests to port 443. + - Revision 2.65 2008/03/04 18:30:40 fabiankeil - Remove the treat-forbidden-connects-like-blocks action. We now - use the "blocked" page for forbidden CONNECT requests by default. +Filter File Tutorial + + Now, let's complete our foo content filter. We have already defined + the heading, but the jobs are still missing. Since all it does is to replace + foo with bar, there is only one (trivial) job + needed: + - Revision 2.64 2008/03/01 14:10:28 fabiankeil - Use new block syntax. Still needs some polishing. + + s/foo/bar/ + - Revision 2.63 2008/02/22 05:50:37 markm68k - fix merge problem + + But wait! Didn't the comment say that all occurrences + of foo should be replaced? Our current job will only take + care of the first foo on each page. For global substitution, + we'll need to add the g option: + - Revision 2.62 2008/02/11 11:52:23 hal9 - Fix entity ... s/&/& + + s/foo/bar/g + - Revision 2.61 2008/02/11 03:41:47 markm68k - more updates for mac os x + + Our complete filter now looks like this: + + + FILTER: foo Replace all "foo" with "bar" +s/foo/bar/g + - Revision 2.60 2008/02/11 03:40:25 markm68k - more updates for mac os x + + Let's look at some real filters for more interesting examples. Here you see + a filter that protects against some common annoyances that arise from JavaScript + abuse. Let's look at its jobs one after the other: + - Revision 2.59 2008/02/11 00:52:34 markm68k - reflect new changes for mac os x - Revision 2.58 2008/02/03 21:37:40 hal9 - Apply patch from Mark: s/OSX/OS X/ + + +FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse - Revision 2.57 2008/02/03 19:10:14 fabiankeil - Mention forward-socks5. +# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm +# +s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg + - Revision 2.56 2008/01/31 19:11:35 fabiankeil - Let the +client-header-filter{hide-tor-exit-notation} example apply - to all requests as "tainted" Referers aren't limited to exit TLDs. + + Following the header line and a comment, you see the job. Note that it uses + | as the delimiter instead of /, because + the pattern contains a forward slash, which would otherwise have to be escaped + by a backslash (\). + - Revision 2.55 2008/01/19 21:26:37 hal9 - Add IE7 to configuration section per Gerry. + + Now, let's examine the pattern: it starts with the text <script.* + enclosed in parentheses. Since the dot matches any character, and * + means: Match an arbitrary number of the element left of myself, this + matches <script, followed by any text, i.e. + it matches the whole page, from the start of the first <script> tag. + - Revision 2.54 2008/01/19 17:52:39 hal9 - Re-commit to fix various minor issues for new release. + + That's more than we want, but the pattern continues: document\.referrer + matches only the exact string document.referrer. The dot needed to + be escaped, i.e. preceded by a backslash, to take away its + special meaning as a joker, and make it just a regular dot. So far, the meaning is: + Match from the start of the first <script> tag in a the page, up to, and including, + the text document.referrer, if both are present + in the page (and appear in that order). + - Revision 2.53 2008/01/19 15:03:05 hal9 - Doc sources tagged for 3.0.8 release. + + But there's still more pattern to go. The next element, again enclosed in parentheses, + is .*</script>. You already know what .* + means, so the whole pattern translates to: Match from the start of the first <script> + tag in a page to the end of the last <script> tag, provided that the text + document.referrer appears somewhere in between. + - Revision 2.52 2008/01/17 01:49:51 hal9 - Change copyright notice for docs s/2007/2008/. All these will be rebuilt soon - enough. + + This is still not the whole story, since we have ignored the options and the parentheses: + The portions of the page matched by sub-patterns that are enclosed in parentheses, will be + remembered and be available through the variables $1, $2, ... in + the substitute. The U option switches to ungreedy matching, which means + that the first .* in the pattern will only eat up all + text in between <script and the first occurrence + of document.referrer, and that the second .* will + only span the text up to the first </script> + tag. Furthermore, the s option says that the match may span + multiple lines in the page, and the g option again means that the + substitution is global. + - Revision 2.51 2007/12/23 16:48:24 fabiankeil - Use more precise example descriptions for the mysterious domain patterns. + + So, to summarize, the pattern means: Match all scripts that contain the text + document.referrer. Remember the parts of the script from + (and including) the start tag up to (and excluding) the string + document.referrer as $1, and the part following + that string, up to and including the closing tag, as $2. + - Revision 2.50 2007/12/08 12:44:36 fabiankeil - - Remove already commented out pre-3.0.7 changes. - - Update the "new log defaults" paragraph. + + Now the pattern is deciphered, but wasn't this about substituting things? So + lets look at the substitute: $1"Not Your Business!"$2 is + easy to read: The text remembered as $1, followed by + "Not Your Business!" (including + the quotation marks!), followed by the text remembered as $2. + This produces an exact copy of the original string, with the middle part + (the document.referrer) replaced by "Not Your + Business!". + - Revision 2.49 2007/12/06 18:21:55 fabiankeil - Update hide-forwarded-for-headers description. + + The whole job now reads: Replace document.referrer by + "Not Your Business!" wherever it appears inside a + <script> tag. Note that this job won't break JavaScript syntax, + since both the original and the replacement are syntactically valid + string objects. The script just won't have access to the referrer + information anymore. + - Revision 2.48 2007/11/24 19:07:17 fabiankeil - - Mention request rewriting. - - Enable the conditional-forge paragraph. - - Minor rewordings. + + We'll show you two other jobs from the JavaScript taming department, but + this time only point out the constructs of special interest: + - Revision 2.47 2007/11/18 14:59:47 fabiankeil - A few "Note to Upgraders" updates. + + +# The status bar is for displaying link targets, not pointless blahblah +# +s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig + - Revision 2.46 2007/11/17 17:24:44 fabiankeil - - Use new action defaults. - - Minor fixes and rewordings. + + \s stands for whitespace characters (space, tab, newline, + carriage return, form feed), so that \s* means: zero + or more whitespace. The ? in .*? + makes this matching of arbitrary text ungreedy. (Note that the U + option is not set). The ['"] construct means: a single + or a double quote. Finally, \1 is + a back-reference to the first parenthesis just like $1 above, + with the difference that in the pattern, a backslash indicates + a back-reference, whereas in the substitute, it's the dollar. + - Revision 2.45 2007/11/16 11:48:46 hal9 - Fix one typo, and add a couple of small refinements. + + So what does this job do? It replaces assignments of single- or double-quoted + strings to the window.status object with a dummy assignment + (using a variable name that is hopefully odd enough not to conflict with + real variables in scripts). Thus, it catches many cases where e.g. pointless + descriptions are displayed in the status bar instead of the link target when + you move your mouse over links. + - Revision 2.44 2007/11/15 03:30:20 hal9 - Results of spell check. + + +# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html +# +s/(<body [^>]*)onunload(.*>)/$1never$2/iU + - Revision 2.43 2007/11/14 18:45:39 fabiankeil - - Mention some more contributors in the "New in this Release" list. - - Minor rewordings. + + Including the + OnUnload + event binding in the HTML DOM was a CRIME. + When I close a browser window, I want it to close and die. Basta. + This job replaces the onunload attribute in + <body> tags with the dummy word never. + Note that the i option makes the pattern matching + case-insensitive. Also note that ungreedy matching alone doesn't always guarantee + a minimal match: In the first parenthesis, we had to use [^>]* + instead of .* to prevent the match from exceeding the + <body> tag if it doesn't contain OnUnload, but the page's + content does. + - Revision 2.42 2007/11/12 03:32:40 hal9 - Updates for "What's New" and "Notes to Upgraders". Various other changes in - preparation for new release. User Manual is almost ready. + + The last example is from the fun department: + - Revision 2.41 2007/11/11 16:32:11 hal9 - This is primarily syncing What's New and Note to Upgraders sections with the many - new features and changes (gleaned from memory but mostly from ChangeLog). + + +FILTER: fun Fun text replacements - Revision 2.40 2007/11/10 17:10:59 fabiankeil - In the first third of the file, mention several times that - the action editor is disabled by default in 3.0.7 beta and later. +# Spice the daily news: +# +s/microsoft(?!\.com)/MicroSuck/ig + - Revision 2.39 2007/11/05 02:34:49 hal9 - Various changes in preparation for the upcoming release. Much yet to be done. + + Note the (?!\.com) part (a so-called negative lookahead) + in the job's pattern, which means: Don't match, if the string + .com appears directly following microsoft + in the page. This prevents links to microsoft.com from being trashed, while + still replacing the word everywhere else. + - Revision 2.38 2007/09/22 16:01:42 fabiankeil - Update embedded show-url-info output. + + +# Buzzword Bingo (example for extended regex syntax) +# +s* industry[ -]leading \ +| cutting[ -]edge \ +| customer[ -]focused \ +| market[ -]driven \ +| award[ -]winning # Comments are OK, too! \ +| high[ -]performance \ +| solutions[ -]based \ +| unmatched \ +| unparalleled \ +| unrivalled \ +*<font color="red"><b>BINGO!</b></font> \ +*igx + - Revision 2.37 2007/08/27 16:09:55 fabiankeil - Fix pre-chroot-nslookup description which I failed to - copy and paste properly. Reported by Stephen Gildea. + + The x option in this job turns on extended syntax, and allows for + e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting. + - Revision 2.36 2007/08/26 16:47:14 fabiankeil - Add Stephen Gildea's pre-chroot-nslookup patch [#1276666], - extensive comments moved to user manual. + + You get the idea? + + - Revision 2.35 2007/08/26 14:59:49 fabiankeil - Minor rewordings and fixes. + - Revision 2.34 2007/08/05 15:19:50 fabiankeil - - Don't claim HTTP/1.1 compliance. - - Use $ in some of the path pattern examples. - - Use a hide-user-agent example argument without - leading and trailing space. - - Make it clear that the cookie actions work with - HTTP cookies only. - - Rephrase the inspect-jpegs text to underline - that it's only meant to protect against a single - exploit. +The Pre-defined Filters - Revision 2.33 2007/07/27 10:57:35 hal9 - Add references for user-agent strings for hide-user-agenet + - Revision 2.30 2007/04/25 15:10:36 fabiankeil - - Describe installation for FreeBSD. - - Start to document taggers and tag patterns. - - Don't confuse devils and daemons. + +The distribution default.filter file contains a selection of +pre-defined filters for your convenience: + - Revision 2.29 2007/04/05 11:47:51 fabiankeil - Some updates regarding header filtering, - handling of compressed content and redirect's - support for pcrs commands. + + + js-annoyances + + + The purpose of this filter is to get rid of particularly annoying JavaScript abuse. + To that end, it + + + + replaces JavaScript references to the browser's referrer information + with the string "Not Your Business!". This compliments the hide-referrer action on the content level. + + + + + removes the bindings to the DOM's + unload + event which we feel has no right to exist and is responsible for most exit consoles, i.e. + nasty windows that pop up when you close another one. + + + + + removes code that causes new windows to be opened with undesired properties, such as being + full-screen, non-resizeable, without location, status or menu bar etc. + + + + + + Use with caution. This is an aggressive filter, and can break sites that + rely heavily on JavaScript. + + + - Revision 2.28 2006/12/10 23:42:48 hal9 - Fix various typos reported by Adam P. Thanks. + + js-events + + + This is a very radical measure. It removes virtually all JavaScript event bindings, which + means that scripts can not react to user actions such as mouse movements or clicks, window + resizing etc, anymore. Use with caution! + + + We strongly discourage using this filter as a default since it breaks + many legitimate scripts. It is meant for use only on extra-nasty sites (should you really + need to go there). + + + - Revision 2.27 2006/11/14 01:57:47 hal9 - Dump all docs prior to 3.0.6 release. Various minor changes to faq and user - manual. + + html-annoyances + + + This filter will undo many common instances of HTML based abuse. + + + The BLINK and MARQUEE tags + are neutralized (yeah baby!), and browser windows will be created as + resizeable (as of course they should be!), and will have location, + scroll and menu bars -- even if specified otherwise. + + + - Revision 2.26 2006/10/24 11:16:44 hal9 - Add new filters. + + content-cookies + + + Most cookies are set in the HTTP dialog, where they can be intercepted + by the + crunch-incoming-cookies + and crunch-outgoing-cookies + actions. But web sites increasingly make use of HTML meta tags and JavaScript + to sneak cookies to the browser on the content level. + + + This filter disables most HTML and JavaScript code that reads or sets + cookies. It cannot detect all clever uses of these types of code, so it + should not be relied on as an absolute fix. Use it wherever you would also + use the cookie crunch actions. + + + - Revision 2.25 2006/10/18 10:50:33 hal9 - Add note that since filters are off in Cautious, compression is ON. Turn off - compression to make filters work on all sites. + + refresh-tags + + + Disable any refresh tags if the interval is greater than nine seconds (so + that redirections done via refresh tags are not destroyed). This is useful + for dial-on-demand setups, or for those who find this HTML feature + annoying. + + + - Revision 2.24 2006/10/03 11:13:54 hal9 - More references to the new filters. Include html this time around. + + unsolicited-popups + + + This filter attempts to prevent only unsolicited pop-up + windows from opening, yet still allow pop-up windows that the user + has explicitly chosen to open. It was added in version 3.0.1, + as an improvement over earlier such filters. + + + Technical note: The filter works by redefining the window.open JavaScript + function to a dummy function, PrivoxyWindowOpen(), + during the loading and rendering phase of each HTML page access, and + restoring the function afterward. + + + This is recommended only for browsers that cannot perform this function + reliably themselves. And be aware that some sites require such windows + in order to function normally. Use with caution. + + + - Revision 2.23 2006/10/02 22:43:53 hal9 - Contains new filter definitions from Fabian, and few other miscellaneous - touch-ups. + + all-popups + + + Attempt to prevent all pop-up windows from opening. + Note this should be used with even more discretion than the above, since + it is more likely to break some sites that require pop-ups for normal + usage. Use with caution. + + + - Revision 2.22 2006/09/22 01:27:55 hal9 - Final commit of probably various minor changes here and there. Unless - something changes this should be ready for pending release. + + img-reorder + + + This is a helper filter that has no value if used alone. It makes the + banners-by-size and banners-by-link + (see below) filters more effective and should be enabled together with them. + + + - Revision 2.21 2006/09/20 03:21:36 david__schmidt - Just the tiniest tweak. Wafer thin! + + banners-by-size + + + This filter removes image tags purely based on what size they are. Fortunately + for us, many ads and banner images tend to conform to certain standardized + sizes, which makes this filter quite effective for ad stripping purposes. + + + Occasionally this filter will cause false positives on images that are not ads, + but just happen to be of one of the standard banner sizes. + + + Recommended only for those who require extreme ad blocking. The default + block rules should catch 95+% of all ads without this filter enabled. + + + - Revision 2.20 2006/09/10 14:53:54 hal9 - Results of spell check. User manual has some updates to standard.actions file - info. + + banners-by-link + + + This is an experimental filter that attempts to kill any banners if + their URLs seem to point to known or suspected click trackers. It is currently + not of much value and is not recommended for use by default. + + + - Revision 2.19 2006/09/08 12:19:02 fabiankeil - Adjust hide-if-modified-since example values - to reflect the recent changes. + + webbugs + + + Webbugs are small, invisible images (technically 1X1 GIF images), that + are used to track users across websites, and collect information on them. + As an HTML page is loaded by the browser, an embedded image tag causes the + browser to contact a third-party site, disclosing the tracking information + through the requested URL and/or cookies for that third-party domain, without + the user ever becoming aware of the interaction with the third-party site. + HTML-ized spam also uses a similar technique to verify email addresses. + + + This filter removes the HTML code that loads such webbugs. + + + - Revision 2.18 2006/09/08 02:38:57 hal9 - Various changes: - -Fix a number of broken links. - -Migrate the new Windows service command line options, and reference as - needed. - -Rebuild so that can be used with the new "user-manual" config capabilities. - -Etc. + + tiny-textforms + + + A rather special-purpose filter that can be used to enlarge textareas (those + multi-line text boxes in web forms) and turn off hard word wrap in them. + It was written for the sourceforge.net tracker system where such boxes are + a nuisance, but it can be handy on other sites, too. + + + It is not recommended to use this filter as a default. + + + - Revision 2.17 2006/09/05 13:25:12 david__schmidt - Add Windows service invocation stuff (duplicated) in FAQ and in user manual under Windows startup. One probably ought to reference the other. + + jumping-windows + + + Many consider windows that move, or resize themselves to be abusive. This filter + neutralizes the related JavaScript code. Note that some sites might not display + or behave as intended when using this filter. Use with caution. + + + - Revision 2.16 2006/09/02 12:49:37 hal9 - Various small updates for new actions, filterfiles, etc. + + frameset-borders + + + Some web designers seem to assume that everyone in the world will view their + web sites using the same browser brand and version, screen resolution etc, + because only that assumption could explain why they'd use static frame sizes, + yet prevent their frames from being resized by the user, should they be too + small to show their whole content. + + + This filter removes the related HTML code. It should only be applied to sites + which need it. + + + - Revision 2.15 2006/08/30 11:15:22 hal9 - More work on the new actions, especially filter-*-headers, and What's New - section. User Manual is close to final form for 3.0.4 release. Some tinkering - and proof reading left to do. + + demoronizer + + + Many Microsoft products that generate HTML use non-standard extensions (read: + violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those + HTML documents to display with errors on standard-compliant platforms. + + + This filter translates the MS-only characters into Latin-1 equivalents. + It is not necessary when using MS products, and will cause corruption of + all documents that use 8-bit character sets other than Latin-1. It's mostly + worthwhile for Europeans on non-MS platforms, if weird garbage characters + sometimes appear on some pages, or user agents that don't correct for this on + the fly. + + + + - Revision 2.14 2006/08/29 10:59:36 hal9 - Add a "Whats New in this release" Section. Further work on multiple filter - files, and assorted other minor changes. + + shockwave-flash + + + A filter for shockwave haters. As the name suggests, this filter strips code + out of web pages that is used to embed shockwave flash objects. + + + + + - Revision 2.13 2006/08/22 11:04:59 hal9 - Silence warnings and errors. This should build now. New filters were only - stubbed in. More to be done. + + quicktime-kioskmode + + + Change HTML code that embeds Quicktime objects so that kioskmode, which + prevents saving, is disabled. + + + - Revision 2.12 2006/08/14 08:40:39 fabiankeil - Documented new actions that were part of - the "minor Privoxy improvements". + + fun + + + Text replacements for subversive browsing fun. Make fun of your favorite + Monopolist or play buzzword bingo. + + + - Revision 2.11 2006/07/18 14:48:51 david__schmidt - Reorganizing the repository: swapping out what was HEAD (the old 3.1 branch) - with what was really the latest development (the v_3_0_branch branch) + + crude-parental + + + A demonstration-only filter that shows how Privoxy + can be used to delete web content on a keyword basis. + + + - Revision 1.123.2.43 2005/05/23 09:59:10 hal9 - Fix typo 'loose' + + ie-exploits + + + An experimental collection of text replacements to disable malicious HTML and JavaScript + code that exploits known security holes in Internet Explorer. + + + Presently, it only protects against Nimda and a cross-site scripting bug, and + would need active maintenance to provide more substantial protection. + + + - Revision 1.123.2.42 2004/12/04 14:39:57 hal9 - Fix two minor typos per bug SF report. + + site-specifics + + + Some web sites have very specific problems, the cure for which doesn't apply + anywhere else, or could even cause damage on other sites. + + + This is a collection of such site-specific cures which should only be applied + to the sites they were intended for, which is what the supplied + default.action file does. Users shouldn't need to change + anything regarding this filter. + + + - Revision 1.123.2.41 2004/03/23 12:58:42 oes - Fixed an inaccuracy + + google + + + A CSS based block for Google text ads. Also removes a width limitation + and the toolbar advertisement. + + + - Revision 1.123.2.40 2004/02/27 12:48:49 hal9 - Add comment re: redirecting to local file system for set-image-blocker may - is dependent on browser. + + yahoo + + + Another CSS based block, this time for Yahoo text ads. And removes + a width limitation as well. + + + - Revision 1.123.2.39 2004/01/30 22:31:40 oes - Added a hint re bookmarklets to Quickstart section + + msn + + + Another CSS based block, this time for MSN text ads. And removes + tracking URLs, as well as a width limitation. + + + - Revision 1.123.2.38 2004/01/30 16:47:51 oes - Some minor clarifications + + blogspot + + + Cleans up some Blogspot blogs. Read the fine print before using this one! + + + This filter also intentionally removes some navigation stuff and sets the + page width to 100%. As a result, some rounded corners would + appear to early or not at all and as fixing this would require a browser + that understands background-size (CSS3), they are removed instead. + + + - Revision 1.123.2.37 2004/01/29 22:36:11 hal9 - Updates for no longer filtering text/plain, and demoronizer default settings, - and copyright notice dates. + + xml-to-html + + + Server-header filter to change the Content-Type from xml to html. + + + - Revision 1.123.2.36 2003/12/10 02:26:26 hal9 - Changed the demoronizer filter description. + + html-to-xml + + + Server-header filter to change the Content-Type from html to xml. + + + - Revision 1.123.2.35 2003/11/06 13:36:37 oes - Updated link to nightly CVS tarball + + no-ping + + + Removes the non-standard ping attribute from + anchor and area HTML tags. + + + - Revision 1.123.2.34 2003/06/26 23:50:16 hal9 - Add a small bit on filtering and problems re: source code being corrupted. + + hide-tor-exit-notation + + + Client-header filter to remove the Tor exit node notation + found in Host and Referer headers. + + + If &my-app; and Tor are chained and &my-app; + is configured to use socks4a, one can use http://www.example.org.foobar.exit/ + to access the host www.example.org through the + Tor exit node foobar. + + + As the HTTP client isn't aware of this notation, it treats the + whole string www.example.org.foobar.exit as host and uses it + for the Host and Referer headers. From the + server's point of view the resulting headers are invalid and can cause problems. + + + An invalid Referer header can trigger hot-linking + protections, an invalid Host header will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). + + + This client-header filter removes the foo.exit part in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your Tor exit node based on the IP address + the request is coming from. + + + - Revision 1.123.2.33 2003/05/08 18:17:33 roro - Use apt-get instead of dpkg to install Debian package, which is more - solid, uses the correct and most recent Debian version automatically. + + - Revision 1.123.2.32 2003/04/11 03:13:57 hal9 - Add small note about only one filterfile (as opposed to multiple actions - files). + - Revision 1.123.2.31 2003/03/26 02:03:43 oes - Updated hard-coded copyright dates + +External filter syntax + + External filters are scripts or programs that can modify the content in + case common filters + aren't powerful enough. + + + External filters can be written in any language the platform &my-app; runs + on supports. + + + They are controlled with the + external-filter action + and have to be defined in the filterfile + first. + + + The header looks like any other filter, but instead of pcrs jobs, external + filters contain a single job which can be a program or a shell script (which + may call other scripts or programs). + + + External filters read the content from STDIN and write the rewritten + content to STDOUT. The environment variables PRIVOXY_URL, PRIVOXY_PATH, + PRIVOXY_HOST, PRIVOXY_ORIGIN can be used to get some details about the + client request. + + + &my-app; will temporary store the content to filter in the + temporary-directory. + + + +EXTERNAL-FILTER: cat Pointless example filter that doesn't actually modify the content +/bin/cat - Revision 1.123.2.30 2003/03/24 12:58:56 hal9 - Add new section on Predefined Filters. +# Incorrect reimplementation of the filter above in POSIX shell. +# +# Note that it's a single job that spans multiple lines, the line +# breaks are not passed to the shell, thus the semicolons are required. +# +# If the script isn't trivial, it is recommended to put it into an external file. +# +# In general, writing external filters entirely in POSIX shell is not +# considered a good idea. +EXTERNAL-FILTER: cat2 Pointless example filter that despite its name may actually modify the content +while read line; \ +do \ + echo "$line"; \ +done + +EXTERNAL-FILTER: rotate-image Rotate an image by 180 degree. Test filter with limited value. +/usr/local/bin/convert - -rotate 180 - + +EXTERNAL-FILTER: citation-needed Adds a "[citation needed]" tag to an image. The coordinates may need adjustment. +/usr/local/bin/convert - -pointsize 16 -fill white -annotate +17+418 "[citation needed]" - + + - Revision 1.123.2.29 2003/03/20 02:45:29 hal9 - More problems with \-\-chroot causing markup problems :( + + + Currently external filters are executed with &my-app;'s privileges! + Only use external filters you understand and trust. + + + + External filters are experimental and the syntax may change in the future. + + - Revision 1.123.2.28 2003/03/19 00:35:24 hal9 - Manual edit of revision log because 'chroot' (even inside a comment) was - causing Docbook to hang here (due to double hyphen and the processor thinking - it was a comment). + - Revision 1.123.2.27 2003/03/18 19:37:14 oes - s/Advanced|Radical/Adventuresome/g to avoid complaints re fun filter + - Revision 1.123.2.26 2003/03/17 16:50:53 oes - Added documentation for new chroot option - Revision 1.123.2.25 2003/03/15 18:36:55 oes - Adapted to the new filters - Revision 1.123.2.24 2002/11/17 06:41:06 hal9 - Move default profiles table from FAQ to U-M, and other minor related changes. - Add faq on cookies. + - Revision 1.123.2.23 2002/10/21 02:32:01 hal9 - Updates to the user.action examples section. A few new ones. + +Privoxy's Template Files + + All Privoxy built-in pages, i.e. error pages such as the + 404 - No Such Domain + error page, the BLOCKED + page + and all pages of its web-based + user interface, are generated from templates. + (Privoxy must be running for the above links to work as + intended.) + - Revision 1.123.2.22 2002/10/12 00:51:53 hal9 - Add demoronizer to filter section. + + These templates are stored in a subdirectory of the configuration + directory called templates. On Unixish platforms, + this is typically + /etc/privoxy/templates/. + - Revision 1.123.2.21 2002/10/10 04:09:35 hal9 - s/Advanced/Radical/ and added very brief note. + + The templates are basically normal HTML files, but with place-holders (called symbols + or exports), which Privoxy fills at run time. It + is possible to edit the templates with a normal text editor, should you want + to customize them. (Not recommended for the casual + user). Should you create your own custom templates, you should use + the config setting templdir + to specify an alternate location, so your templates do not get overwritten + during upgrades. + + + Note that just like in configuration files, lines starting + with # are ignored when the templates are filled in. + - Revision 1.123.2.20 2002/10/10 03:49:21 hal9 - Add notes to session-cookies-only and Quickstart about pre-existing - cookies. Also, note content-cookies work differently. + + The place-holders are of the form @name@, and you will + find a list of available symbols, which vary from template to template, + in the comments at the start of each file. Note that these comments are not + always accurate, and that it's probably best to look at the existing HTML + code to find out which symbols are supported and what they are filled in with. + - Revision 1.123.2.19 2002/09/26 01:25:36 hal9 - More explanation on Privoxy patterns, more on content-cookies and SSL. + + A special application of this substitution mechanism is to make whole + blocks of HTML code disappear when a specific symbol is set. We use this + for many purposes, one of them being to include the beta warning in all + our user interface (CGI) pages when Privoxy + is in an alpha or beta development stage: + - Revision 1.123.2.18 2002/08/22 23:47:58 hal9 - Add 'Documentation' to Privoxy Menu shot in Configuration section to match - CGIs. + + +<!-- @if-unstable-start --> - Revision 1.123.2.17 2002/08/18 01:13:05 hal9 - Spell checked (only one typo this time!). + ... beta warning HTML code goes here ... - Revision 1.123.2.16 2002/08/09 19:20:54 david__schmidt - Update to Mac OS X startup script name +<!-- if-unstable-end@ --> + - Revision 1.123.2.15 2002/08/07 17:32:11 oes - Converted some internal links from ulink to link for PDF creation; no content changed + + If the "unstable" symbol is set, everything in between and including + @if-unstable-start and if-unstable-end@ + will disappear, leaving nothing but an empty comment: + - Revision 1.123.2.14 2002/08/06 09:16:13 oes - Nits re: actions file download + + <!-- --> + - Revision 1.123.2.13 2002/08/02 18:23:19 g_sauthoff - Just 2 small corrections to the Gentoo sections + + There's also an if-then-else construct and an #include + mechanism, but you'll sure find out if you are inclined to edit the + templates ;-) + - Revision 1.123.2.12 2002/08/02 18:17:21 g_sauthoff - Added 2 Gentoo sections + + All templates refer to a style located at + http://config.privoxy.org/send-stylesheet. + This is, of course, locally served by Privoxy + and the source for it can be found and edited in the + cgi-style.css template. + - Revision 1.123.2.11 2002/07/26 15:20:31 oes - - Added version info to title - - Added info on new filters - - Revised parts of the filter file tutorial - - Added info on where to get updated actions files + - Revision 1.123.2.10 2002/07/25 21:42:29 hal9 - Add brief notes on not proxying non-HTTP protocols. + - Revision 1.123.2.9 2002/07/11 03:40:28 david__schmidt - Updated Mac OS X sections due to installation location change - Revision 1.123.2.8 2002/06/09 16:36:32 hal9 - Clarifications on filtering and MIME. Hardcode 'latest release' in index.html. + - Revision 1.123.2.7 2002/06/09 00:29:34 hal9 - Touch ups on filtering, in actions section and Anatomy. +Contacting the Developers, Bug Reporting and Feature +Requests - Revision 1.123.2.6 2002/06/06 23:11:03 hal9 - Fix broken link. Linkchecked all docs. + + &contacting; + - Revision 1.123.2.5 2002/05/29 02:01:02 hal9 - This is break out of the entire config section from u-m, so it can - eventually be used to generate the comments, etc in the main config file - so that these are in sync with each other. + - Revision 1.123.2.4 2002/05/27 03:28:45 hal9 - Ooops missed something from David. + - Revision 1.123.2.3 2002/05/27 03:23:17 hal9 - Fix FIXMEs for OS2 and Mac OS X startup. Fix Redhat typos (should be Red Hat). - That's a wrap, I think. - Revision 1.123.2.2 2002/05/26 19:02:09 hal9 - Move Amiga stuff around to take of FIXME in start up section. + +Privoxy Copyright, License and History - Revision 1.123.2.1 2002/05/26 17:04:25 hal9 - -Spellcheck, very minor edits, and sync across branches + + ©right; + - Revision 1.123 2002/05/24 23:19:23 hal9 - Include new image (Proxy setup). More fun with guibutton. - Minor corrections/clarifications here and there. + + Privoxy is free software; you can + redistribute it and/or modify it under the terms of the + GNU General Public License, version 2, + as published by the Free Software Foundation and included in + the next section. + - Revision 1.122 2002/05/24 13:24:08 oes - Added Bookmarklet for one-click pre-filled access to show-url-info + +License + + + - Revision 1.121 2002/05/23 23:20:17 oes - - Changed more (all?) references to actions to the - style. - - Small fixes in the actions chapter - - Small clarifications in the quickstart to ad blocking - - Removed from s since the new doc CSS - renders them red (bad in TOC). +</sect2> +<!-- ~ End section ~ --> - Revision 1.120 2002/05/23 19:16:43 roro - Correct Debian specials (installation and startup). - Revision 1.119 2002/05/22 17:17:05 oes - Added Security hint +<!-- ~~~~~ New section ~~~~~ --> - Revision 1.118 2002/05/21 04:54:55 hal9 - -New Section: Quickstart to Ad Blocking - -Reformat Actions Anatomy to match new CGI layout +<sect2 id="history"><title>History + + &history; + + - Revision 1.117 2002/05/17 13:56:16 oes - - Reworked & extended Templates chapter - - Small changes to Regex appendix - - #included authors.sgml into (C) and hist chapter +Authors + + &p-authors; + + - Revision 1.116 2002/05/17 03:23:46 hal9 - Fixing merge conflict in Quickstart section. + - Revision 1.115 2002/05/16 16:25:00 oes - Extended the Filter File chapter & minor fixes + - Revision 1.114 2002/05/16 09:42:50 oes - More ulink->link, added some hints to Quickstart section - Revision 1.113 2002/05/15 21:07:25 oes - Extended and further commented the example actions files + +See Also + + &seealso; + + - Revision 1.112 2002/05/15 03:57:14 hal9 - Spell check. A few minor edits here and there for better syntax and - clarification. - Revision 1.111 2002/05/14 23:01:36 oes - Fixing the fixes - Revision 1.110 2002/05/14 19:10:45 oes - Restored alphabetical order of actions + +Appendix - Revision 1.109 2002/05/14 17:23:11 oes - Renamed the prevent-*-cookies actions, extended aliases section and moved it before the example AFs - Revision 1.108 2002/05/14 15:29:12 oes - Completed proofreading the actions chapter + + +Regular Expressions + + Privoxy uses Perl-style regular + expressions in its actions + files and filter file, + through the PCRE and + + PCRS libraries. + - Revision 1.107 2002/05/12 03:20:41 hal9 - Small clarifications for 127.0.0.1 vs localhost for listen-address since this - apparently an important distinction for some OS's. + + If you are reading this, you probably don't understand what regular + expressions are, or what they can do. So this will be a very brief + introduction only. A full explanation would require a book ;-) + - Revision 1.106 2002/05/10 01:48:20 hal9 - This is mostly proposed copyright/licensing additions and changes. Docs - are still GPL, but licensing and copyright are more visible. Also, copyright - changed in doc header comments (eliminate references to JB except FAQ). + + Regular expressions provide a language to describe patterns that can be + run against strings of characters (letter, numbers, etc), to see if they + match the string or not. The patterns are themselves (sometimes complex) + strings of literal characters, combined with wild-cards, and other special + characters, called meta-characters. The meta-characters have + special meanings and are used to build complex patterns to be matched against. + Perl Compatible Regular Expressions are an especially convenient + dialect of the regular expression language. + - Revision 1.105 2002/05/05 20:26:02 hal9 - Sorting out license vs copyright in these docs. + + To make a simple analogy, we do something similar when we use wild-card + characters when listing files with the dir command in DOS. + *.* matches all filenames. The special + character here is the asterisk which matches any and all characters. We can be + more specific and use ? to match just individual + characters. So dir file?.text would match + file1.txt, file2.txt, etc. We are pattern + matching, using a similar technique to regular expressions! + - Revision 1.104 2002/05/04 08:44:45 swa - bumped version + + Regular expressions do essentially the same thing, but are much, much more + powerful. There are many more special characters and ways of + building complex patterns however. Let's look at a few of the common ones, + and then some examples: + - Revision 1.103 2002/05/04 00:40:53 hal9 - -Remove the TOC first page kludge. It's fixed proper now in ldp.dsl.in. - -Some minor additions to Quickstart. + + + . - Matches any single character, e.g. a, + A, 4, :, or @. + + - Revision 1.102 2002/05/03 17:46:00 oes - Further proofread & reactivated short build instructions + + + ? - The preceding character or expression is matched ZERO or ONE + times. Either/or. + + - Revision 1.101 2002/05/03 03:58:30 hal9 - Move the user-manual config directive to top of section. Add note about - Privoxy needing read permissions for configs, and write for logs. + + + + - The preceding character or expression is matched ONE or MORE + times. + + - Revision 1.100 2002/04/29 03:05:55 hal9 - Add clarification on differences of new actions files. + + + * - The preceding character or expression is matched ZERO or MORE + times. + + - Revision 1.99 2002/04/28 16:59:05 swa - more structure in starting section + + + \ - The escape character denotes that + the following character should be taken literally. This is used where one of the + special characters (e.g. .) needs to be taken literally and + not as a special meta-character. Example: example\.com, makes + sure the period is recognized only as a period (and not expanded to its + meta-character meaning of any single character). + + - Revision 1.98 2002/04/28 05:43:59 hal9 - This is the break up of configuration.html into multiple files. This - will probably break links elsewhere :( + + + [ ] - Characters enclosed in brackets will be matched if + any of the enclosed characters are encountered. For instance, [0-9] + matches any numeric digit (zero through nine). As an example, we can combine + this with + to match any digit one of more times: [0-9]+. + + - Revision 1.97 2002/04/27 21:04:42 hal9 - -Rewrite of Actions File example. - -Add section for user-manual directive in config. + + + ( ) - parentheses are used to group a sub-expression, + or multiple sub-expressions. + + - Revision 1.96 2002/04/27 05:32:00 hal9 - -Add short section to Filter Files to tie in with +filter action. - -Start rewrite of examples in Actions Examples (not finished). + + + | - The bar character works like an + or conditional statement. A match is successful if the + sub-expression on either side of | matches. As an example: + /(this|that) example/ uses grouping and the bar character + and would match either this example or that + example, and nothing else. + + - Revision 1.95 2002/04/26 17:23:29 swa - bookmarks cleaned, changed structure of user manual, screen and programlisting cleanups, and numerous other changes that I forgot + + These are just some of the ones you are likely to use when matching URLs with + Privoxy, and is a long way from a definitive + list. This is enough to get us started with a few simple examples which may + be more illuminating: + - Revision 1.94 2002/04/26 05:24:36 hal9 - -Add most of Andreas suggestions to Chain of Events section. - -A few other minor corrections and touch up. + + /.*/banners/.* - A simple example + that uses the common combination of . and * to + denote any character, zero or more times. In other words, any string at all. + So we start with a literal forward slash, then our regular expression pattern + (.*) another literal forward slash, the string + banners, another forward slash, and lastly another + .*. We are building + a directory path here. This will match any file with the path that has a + directory named banners in it. The .* matches + any characters, and this could conceivably be more forward slashes, so it + might expand into a much longer looking path. For example, this could match: + /eye/hate/spammers/banners/annoy_me_please.gif, or just + /banners/annoying.html, or almost an infinite number of other + possible combinations, just so it has banners in the path + somewhere. + - Revision 1.92 2002/04/25 18:55:13 hal9 - More catchups on new actions files, and new actions names. - Other assorted cleanups, and minor modifications. + + And now something a little more complex: + - Revision 1.91 2002/04/24 02:39:31 hal9 - Add 'Chain of Events' section. + + /.*/adv((er)?ts?|ertis(ing|ements?))?/ - + We have several literal forward slashes again (/), so we are + building another expression that is a file path statement. We have another + .*, so we are matching against any conceivable sub-path, just so + it matches our expression. The only true literal that must + match our pattern is adv, together with + the forward slashes. What comes after the adv string is the + interesting part. + - Revision 1.90 2002/04/23 21:41:25 hal9 - Linuxconf is deprecated on RH, substitute chkconfig. + + Remember the ? means the preceding expression (either a + literal character or anything grouped with (...) in this case) + can exist or not, since this means either zero or one match. So + ((er)?ts?|ertis(ing|ements?)) is optional, as are the + individual sub-expressions: (er), + (ing|ements?), and the s. The | + means or. We have two of those. For instance, + (ing|ements?), can expand to match either ing + OR ements?. What is being done here, is an + attempt at matching as many variations of advertisement, and + similar, as possible. So this would expand to match just adv, + or advert, or adverts, or + advertising, or advertisement, or + advertisements. You get the idea. But it would not match + advertizements (with a z). We could fix that by + changing our regular expression to: + /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match + either spelling. + - Revision 1.89 2002/04/23 21:05:28 oes - Added hint for startup on Red Hat + + /.*/advert[0-9]+\.(gif|jpe?g) - Again + another path statement with forward slashes. Anything in the square brackets + [ ] can be matched. This is using 0-9 as a + shorthand expression to mean any digit one through nine. It is the same as + saying 0123456789. So any digit matches. The + + means one or more of the preceding expression must be included. The preceding + expression here is what is in the square brackets -- in this case, any digit + one through nine. Then, at the end, we have a grouping: (gif|jpe?g). + This includes a |, so this needs to match the expression on + either side of that bar character also. A simple gif on one side, and the other + side will in turn match either jpeg or jpg, + since the ? means the letter e is optional and + can be matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string advert, then one or more digits, and a . + (which is now a literal, and not a special character, since it is escaped + with \), and lastly either gif, or + jpeg, or jpg. Some possible matches would + include: //advert1.jpg, + /nasty/ads/advert1234.gif, + /banners/from/hell/advert99.jpg. It would not match + advert1.gif (no leading slash), or + /adverts232.jpg (the expression does not include an + s), or /advert1.jsp (jsp is not + in the expression anywhere). + - Revision 1.88 2002/04/23 05:37:54 hal9 - Add AmigaOS install stuff. + + We are barely scratching the surface of regular expressions here so that you + can understand the default Privoxy + configuration files, and maybe use this knowledge to customize your own + installation. There is much, much more that can be done with regular + expressions. Now that you know enough to get started, you can learn more on + your own :/ + - Revision 1.87 2002/04/23 02:53:15 david__schmidt - Updated Mac OS X installation section - Added a few English tweaks here an there + + More reading on Perl Compatible Regular expressions: + http://perldoc.perl.org/perlre.html + - Revision 1.86 2002/04/21 01:46:32 hal9 - Re-write actions section. + + For information on regular expression based substitutions and their applications + in filters, please see the filter file tutorial + in this manual. + + - Revision 1.85 2002/04/18 21:23:23 hal9 - Fix ugly typo (mine). + - Revision 1.84 2002/04/18 21:17:13 hal9 - Spell Redhat correctly (ie Red Hat). A few minor grammar corrections. - Revision 1.83 2002/04/18 18:21:12 oes - Added RPM install detail + + +Privoxy's Internal Pages - Revision 1.82 2002/04/18 12:04:50 oes - Cosmetics + + Since Privoxy proxies each requested + web page, it is easy for Privoxy to + trap certain special URLs. In this way, we can talk directly to + Privoxy, and see how it is + configured, see how our rules are being applied, change these + rules and other configuration options, and even turn + Privoxy's filtering off, all with + a web browser. - Revision 1.81 2002/04/18 11:50:24 oes - Extended Install section - needs fixing by packagers + - Revision 1.80 2002/04/18 10:45:19 oes - Moved text to buildsource.sgml, renamed some filters, details + + The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, + Privoxy must be running to access these. If + not, you will get a friendly error message. Internet access is not + necessary either. + - Revision 1.79 2002/04/18 03:18:06 hal9 - Spellcheck, and minor touchups. + + - Revision 1.78 2002/04/17 18:04:16 oes - Proofreading part 2 + + + Privoxy main page: + +
+ + http://config.privoxy.org/ + +
+ + There is a shortcut: http://p.p/ (But it + doesn't provide a fall-back to a real page, in case the request is not + sent through Privoxy) + +
- Revision 1.77 2002/04/17 13:51:23 oes - Proofreading, part one + + + Show information about the current configuration, including viewing and + editing of actions files: + +
+ + http://config.privoxy.org/show-status + +
+
- Revision 1.76 2002/04/16 04:25:51 hal9 - -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section. - -Note about proxy may need requests to re-read config files. + + + Show the source code version numbers: + +
+ + http://config.privoxy.org/show-version + +
+
- Revision 1.75 2002/04/12 02:08:48 david__schmidt - Remove OS/2 building info... it is already in the developer-manual + + + Show the browser's request headers: + +
+ + http://config.privoxy.org/show-request + +
+
- Revision 1.74 2002/04/11 00:54:38 hal9 - Add small section on submitting actions. + + + Show which actions apply to a URL and why: + +
+ + http://config.privoxy.org/show-url-info + +
+
- Revision 1.73 2002/04/10 18:45:15 swa - generated + + + Toggle Privoxy on or off. This feature can be turned off/on in the main + config file. When toggled off, Privoxy + continues to run, but only as a pass-through proxy, with no actions taking + place: + +
+ + http://config.privoxy.org/toggle + +
+ + Short cuts. Turn off, then on: + +
+ + http://config.privoxy.org/toggle?set=disable + +
+
+ + http://config.privoxy.org/toggle?set=enable + +
+
- Revision 1.72 2002/04/10 04:06:19 hal9 - Added actions feedback to Bookmarklets section +
+
- Revision 1.71 2002/04/08 22:59:26 hal9 - Version update. Spell chkconfig correctly :) +
- Revision 1.70 2002/04/08 20:53:56 swa - ? - Revision 1.69 2002/04/06 05:07:29 hal9 - -Add privoxy-man-page.sgml, for man page. - -Add authors.sgml for AUTHORS (and p-authors.sgml) - -Reworked various aspects of various docs. - -Added additional comments to sub-docs. + + +Chain of Events + + Let's take a quick look at how some of Privoxy's + core features are triggered, and the ensuing sequence of events when a web + page is requested by your browser: + - Revision 1.68 2002/04/04 18:46:47 swa - consistent look. reuse of copyright, history et. al. + + + + + First, your web browser requests a web page. The browser knows to send + the request to Privoxy, which will in turn, + relay the request to the remote web server after passing the following + tests: + + + + + Privoxy traps any request for its own internal CGI + pages (e.g http://p.p/) and sends the CGI page back to the browser. + + + + + Next, Privoxy checks to see if the URL + matches any +block patterns. If + so, the URL is then blocked, and the remote web server will not be contacted. + +handle-as-image + and + +handle-as-empty-document + are then checked, and if there is no match, an + HTML BLOCKED page is sent back to the browser. Otherwise, if + it does match, an image is returned for the former, and an empty text + document for the latter. The type of image would depend on the setting of + +set-image-blocker + (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). + + + + + Untrusted URLs are blocked. If URLs are being added to the + trust file, then that is done. + + + + + If the URL pattern matches the +fast-redirects action, + it is then processed. Unwanted parts of the requested URL are stripped. + + + + + Now the rest of the client browser's request headers are processed. If any + of these match any of the relevant actions (e.g. +hide-user-agent, + etc.), headers are suppressed or forged as determined by these actions and + their parameters. + + + + + Now the web server starts sending its response back (i.e. typically a web + page). + + + + + First, the server headers are read and processed to determine, among other + things, the MIME type (document type) and encoding. The headers are then + filtered as determined by the + +crunch-incoming-cookies, + +session-cookies-only, + and +downgrade-http-version + actions. + + + + + If any +filter action + or +deanimate-gifs + action applies (and the document type fits the action), the rest of the page is + read into memory (up to a configurable limit). Then the filter rules (from + default.filter and any other filter files) are + processed against the buffered content. Filters are applied in the order + they are specified in one of the filter files. Animated GIFs, if present, + are reduced to either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by + Privoxy back to your browser. + + + If neither a +filter action + or +deanimate-gifs + matches, then Privoxy passes the raw data through + to the client browser as it becomes available. + + + + + As the browser receives the now (possibly filtered) page content, it + reads and then requests any URLs that may be embedded within the page + source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. + frames), sounds, etc. For each of these objects, the browser issues a + separate request (this is easily viewable in Privoxy's + logs). And each such request is in turn processed just as above. Note that a + complex web page will have many, many such embedded URLs. If these + secondary requests are to a different server, then quite possibly a very + differing set of actions is triggered. + + - Revision 1.67 2002/04/04 17:27:57 swa - more single file to be included at multiple points. make maintaining easier + + + + NOTE: This is somewhat of a simplistic overview of what happens with each URL + request. For the sake of brevity and simplicity, we have focused on + Privoxy's core features only. + - Revision 1.66 2002/04/04 06:48:37 hal9 - Structural changes to allow for conditional inclusion/exclusion of content - based on entity toggles, e.g. 'entity % p-not-stable "INCLUDE"'. And - definition of internal entities, e.g. 'entity p-version "2.9.13"' that will - eventually be set by Makefile. - More boilerplate text for use across multiple docs. + - Revision 1.65 2002/04/03 19:52:07 swa - enhance squid section due to user suggestion - Revision 1.64 2002/04/03 03:53:43 hal9 - A few minor bug fixes, and touch ups. Ready for review. + + +Troubleshooting: Anatomy of an Action - Revision 1.63 2002/04/01 16:24:49 hal9 - Define entities to include boilerplate text. See doc/source/*. + + The way Privoxy applies + actions and filters + to any given URL can be complex, and not always so + easy to understand what is happening. And sometimes we need to be able to + see just what Privoxy is + doing. Especially, if something Privoxy is doing + is causing us a problem inadvertently. It can be a little daunting to look at + the actions and filters files themselves, since they tend to be filled with + regular expressions whose consequences are not + always so obvious. + - Revision 1.62 2002/03/30 04:15:53 hal9 - - Fix privoxy.org/config links. - - Paste in Bookmarklets from Toggle page. - - Move Quickstart nearer top, and minor rework. + + One quick test to see if Privoxy is causing a problem + or not, is to disable it temporarily. This should be the first troubleshooting + step (be sure to flush caches afterward!). Looking at the + logs is a good idea too. (Note that both the toggle feature and logging are + enabled via config file settings, and may need to be + turned on.) + + + Another easy troubleshooting step to try is if you have done any + customization of your installation, revert back to the installed + defaults and see if that helps. There are times the developers get complaints + about one thing or another, and the problem is more related to a customized + configuration issue. + - Revision 1.61 2002/03/29 01:31:08 hal9 - Minor update. + + Privoxy also provides the + http://config.privoxy.org/show-url-info + page that can show us very specifically how actions + are being applied to any given URL. This is a big help for troubleshooting. + - Revision 1.60 2002/03/27 01:57:34 hal9 - Added more to Anatomy section. + + First, enter one URL (or partial URL) at the prompt, and then + Privoxy will tell us + how the current configuration will handle it. This will not + help with filtering effects (i.e. the +filter action) from + one of the filter files since this is handled very + differently and not so easy to trap! It also will not tell you about any other + URLs that may be embedded within the URL you are testing. For instance, images + such as ads are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the prompt area + -- not any sub-URLs. If you want to know about embedded URLs like ads, you + will have to dig those out of the HTML source. Use your browser's View + Page Source option for this. Or right click on the ad, and grab the + URL. + - Revision 1.59 2002/03/27 00:54:33 hal9 - Touch up intro for new name. + + Let's try an example, google.com, + and look at it one section at a time in a sample configuration (your real + configuration may vary): + - Revision 1.58 2002/03/26 22:29:55 swa - we have a new homepage! + + + Matches for http://www.google.com: - Revision 1.57 2002/03/24 20:33:30 hal9 - A few minor catch ups with name change. + In file: default.action [ View ] [ Edit ] - Revision 1.56 2002/03/24 16:17:06 swa - configure needs to be generated. + {+change-x-forwarded-for{block} + +deanimate-gifs {last} + +fast-redirects {check-decoded-url} + +filter {refresh-tags} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + +hide-from-header {block} + +hide-referrer {forge} + +session-cookies-only + +set-image-blocker {pattern} +/ - Revision 1.55 2002/03/24 16:08:08 swa - we are too lazy to make a block-built - privoxy logo. hence removed the option. + { -session-cookies-only } + .google.com - Revision 1.54 2002/03/24 15:46:20 swa - name change related issue. + { -fast-redirects } + .google.com - Revision 1.53 2002/03/24 11:51:00 swa - name change. changed filenames. +In file: user.action [ View ] [ Edit ] +(no matches in this file) + + - Revision 1.52 2002/03/24 11:01:06 swa - name change + + This is telling us how we have defined our + actions, and + which ones match for our test case, google.com. + Displayed is all the actions that are available to us. Remember, + the + sign denotes on. - + denotes off. So some are on here, but many + are off. Each example we try may provide a slightly different + end result, depending on our configuration directives. + + + The first listing + is for our default.action file. The large, multi-line + listing, is how the actions are set to match for all URLs, i.e. our default + settings. If you look at your actions file, this would be the + section just below the aliases section near the top. This + will apply to all URLs as signified by the single forward slash at the end + of the listing -- / . + - Revision 1.51 2002/03/23 15:13:11 swa - renamed every reference to the old name with foobar. - fixed "application foobar application" tag, fixed - "the foobar" with "foobar". left junkbustser in cvs - comments and remarks to history untouched. + + But we have defined additional actions that would be exceptions to these general + rules, and then we list specific URLs (or patterns) that these exceptions + would apply to. Last match wins. Just below this then are two explicit + matches for .google.com. The first is negating our previous + cookie setting, which was for +session-cookies-only + (i.e. not persistent). So we will allow persistent cookies for google, at + least that is how it is in this example. The second turns + off any +fast-redirects + action, allowing this to take place unmolested. Note that there is a leading + dot here -- .google.com. This will match any hosts and + sub-domains, in the google.com domain also, such as + www.google.com or mail.google.com. But it would not + match www.google.de! So, apparently, we have these two actions + defined as exceptions to the general rules at the top somewhere in the lower + part of our default.action file, and + google.com is referenced somewhere in these latter sections. + - Revision 1.50 2002/03/23 05:06:21 hal9 - Touch up. + + Then, for our user.action file, we again have no hits. + So there is nothing google-specific that we might have added to our own, local + configuration. If there was, those actions would over-rule any actions from + previously processed files, such as default.action. + user.action typically has the last word. This is the + best place to put hard and fast exceptions, + - Revision 1.49 2002/03/21 17:01:05 hal9 - New section in Appendix. + + And finally we pull it all together in the bottom section and summarize how + Privoxy is applying all its actions + to google.com: - Revision 1.48 2002/03/12 06:33:01 hal9 - Catching up to Andreas and re_filterfile changes. + - Revision 1.47 2002/03/11 13:13:27 swa - correct feedback channels + + - Revision 1.46 2002/03/10 00:51:08 hal9 - Added section on JB internal pages in Appendix. + Final results: - Revision 1.45 2002/03/09 17:43:53 swa - more distros + -add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs {last} + -downgrade-http-version + -fast-redirects + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header {block} + -hide-if-modified-since + +hide-referrer {forge} + -hide-user-agent + -limit-connect + -overwrite-last-modified + -prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + -session-cookies-only + +set-image-blocker {pattern} + - Revision 1.44 2002/03/09 17:08:48 hal9 - New section on Jon's actions file editor, and move some stuff around. + + Notice the only difference here to the previous listing, is to + fast-redirects and session-cookies-only, + which are activated specifically for this site in our configuration, + and thus show in the Final Results. + - Revision 1.43 2002/03/08 00:47:32 hal9 - Added imageblock{pattern}. + + Now another example, ad.doubleclick.net: + - Revision 1.42 2002/03/07 18:16:55 swa - looks better + + - Revision 1.41 2002/03/07 16:46:43 hal9 - Fix a few markup problems for jade. + { +block{Domains starts with "ad"} } + ad*. - Revision 1.40 2002/03/07 16:28:39 swa - provide correct feedback channels + { +block{Domain contains "ad"} } + .ad. - Revision 1.39 2002/03/06 16:19:28 hal9 - Note on perceived filtering slowdown per FR. + { +block{Doubleclick banner server} +handle-as-image } + .[a-vx-z]*.doubleclick.net + + - Revision 1.38 2002/03/05 23:55:14 hal9 - Stupid I did it again. Double hyphen in comment breaks jade. + + We'll just show the interesting part here - the explicit matches. It is + matched three different times. Two +block{} sections, + and a +block{} +handle-as-image, + which is the expanded form of one of our aliases that had been defined as: + +block-as-image. (Aliases are defined in + the first section of the actions file and typically used to combine more + than one action.) + - Revision 1.37 2002/03/05 23:53:49 hal9 - jade barfs on '- -' embedded in comments. - -user option broke it. + + Any one of these would have done the trick and blocked this as an unwanted + image. This is unnecessarily redundant since the last case effectively + would also cover the first. No point in taking chances with these guys + though ;-) Note that if you want an ad or obnoxious + URL to be invisible, it should be defined as ad.doubleclick.net + is done here -- as both a +block{} + and an + +handle-as-image. + The custom alias +block-as-image just + simplifies the process and make it more readable. + - Revision 1.36 2002/03/05 22:53:28 hal9 - Add new - - user option. + + One last example. Let's try http://www.example.net/adsl/HOWTO/. + This one is giving us problems. We are getting a blank page. Hmmm ... + - Revision 1.35 2002/03/05 00:17:27 hal9 - Added section on command line options. + + - Revision 1.34 2002/03/04 19:32:07 oes - Changed default port to 8118 + Matches for http://www.example.net/adsl/HOWTO/: - Revision 1.33 2002/03/03 19:46:13 hal9 - Emphasis on where/how to report bugs, etc + In file: default.action [ View ] [ Edit ] - Revision 1.32 2002/03/03 09:26:06 joergs - AmigaOS changes, config is now loaded from PROGDIR: instead of - AmiTCP:db/junkbuster/ if no configuration file is specified on the - command line. + {-add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs + -downgrade-http-version + +fast-redirects {check-decoded-url} + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header{block} + +hide-referer{forge} + -hide-user-agent + -overwrite-last-modified + +prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + +session-cookies-only + +set-image-blocker{blank} } + / - Revision 1.31 2002/03/02 22:45:52 david__schmidt - Just tweaking + { +block{Path contains "ads".} +handle-as-image } + /ads + + - Revision 1.30 2002/03/02 22:00:14 hal9 - Updated 'New Features' list. Ran through spell-checker. + + Ooops, the /adsl/ is matching /ads in our + configuration! But we did not want this at all! Now we see why we get the + blank page. It is actually triggering two different actions here, and + the effects are aggregated so that the URL is blocked, and &my-app; is told + to treat the block as if it were an image. But this is, of course, all wrong. + We could now add a new action below this (or better in our own + user.action file) that explicitly + un blocks ( + {-block}) paths with + adsl in them (remember, last match in the configuration + wins). There are various ways to handle such exceptions. Example: + - Revision 1.29 2002/03/02 20:34:07 david__schmidt - Update OS/2 build section + + - Revision 1.28 2002/02/24 14:34:24 jongfoster - Formatting changes. Now changing the doctype to DocBook XML 4.1 - will work - no other changes are needed. + { -block } + /adsl + + - Revision 1.27 2002/01/11 14:14:32 hal9 - Added a very short section on Templates + + Now the page displays ;-) + Remember to flush your browser's caches when making these kinds of changes to + your configuration to insure that you get a freshly delivered page! Or, try + using Shift+Reload. + - Revision 1.26 2002/01/09 20:02:50 hal9 - Fix bug re: auto-detect config file changes. + + But now what about a situation where we get no explicit matches like + we did with: + - Revision 1.25 2002/01/09 18:20:30 hal9 - Touch ups for *.action files. + + - Revision 1.24 2001/12/02 01:13:42 hal9 - Fix typo. + { +block{Path starts with "ads".} +handle-as-image } + /ads + + - Revision 1.23 2001/12/02 00:20:41 hal9 - Updates for recent changes. + + That actually was very helpful and pointed us quickly to where the problem + was. If you don't get this kind of match, then it means one of the default + rules in the first section of default.action is causing + the problem. This would require some guesswork, and maybe a little trial and + error to isolate the offending rule. One likely cause would be one of the + +filter actions. + These tend to be harder to troubleshoot. + Try adding the URL for the site to one of aliases that turn off + +filter: + - Revision 1.22 2001/11/05 23:57:51 hal9 - Minor update for startup now daemon mode. + + - Revision 1.21 2001/10/31 21:11:03 hal9 - Correct 2 minor errors + { shop } + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + .forbes.com + + - Revision 1.18 2001/10/24 18:45:26 hal9 - *** empty log message *** + + { shop } is an alias that expands to + { -filter -session-cookies-only }. + Or you could do your own exception to negate filtering: - Revision 1.17 2001/10/24 17:10:55 hal9 - Catching up with Jon's recent work, and a few other things. + - Revision 1.16 2001/10/21 17:19:21 swa - wrong url in documentation + + - Revision 1.15 2001/10/14 23:46:24 hal9 - Various minor changes. Fleshed out SEE ALSO section. + { -filter } + # Disable ALL filter actions for sites in this section + .forbes.com + developer.ibm.com + localhost + + - Revision 1.13 2001/10/10 17:28:33 hal9 - Very minor changes. + + This would turn off all filtering for these sites. This is best + put in user.action, for local site + exceptions. Note that when a simple domain pattern is used by itself (without + the subsequent path portion), all sub-pages within that domain are included + automatically in the scope of the action. + - Revision 1.12 2001/09/28 02:57:04 hal9 - Ditto :/ + + Images that are inexplicably being blocked, may well be hitting the ++filter{banners-by-size} + rule, which assumes + that images of certain sizes are ad banners (works well + most of the time since these tend to be standardized). + - Revision 1.11 2001/09/28 02:25:20 hal9 - Ditto. + + { fragile } is an alias that disables most + actions that are the most likely to cause trouble. This can be used as a + last resort for problem sites. + + + - Revision 1.9 2001/09/27 23:50:29 hal9 - A few changes. A short section on regular expression in appendix. + { fragile } + # Handle with care: easy to break + mail.google. + mybank.example.com + - Revision 1.8 2001/09/25 00:34:59 hal9 - Some additions, and re-arranging. - Revision 1.7 2001/09/24 14:31:36 hal9 - Diddling. + + Remember to flush caches! Note that the + mail.google reference lacks the TLD portion (e.g. + .com). This will effectively match any TLD with + google in it, such as mail.google.de., + just as an example. + + + If this still does not work, you will have to go through the remaining + actions one by one to find which one(s) is causing the problem. + - Revision 1.6 2001/09/24 14:10:32 hal9 - Including David's OS/2 installation instructions. + - Revision 1.2 2001/09/13 15:27:40 swa - cosmetics +
- Revision 1.1 2001/09/12 15:36:41 swa - source files for junkbuster documentation +