X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=ef21bd299b4a97f7363d096416ed4b4f83bcf755;hp=408a61281a326995a3692f6833dd2e2d47040a2f;hb=f6d1a7ca82613239a15439cc9b3613750d5f55c5;hpb=0428133610c525457cb16f7ac6a54203a2743d6c diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index 408a6128..ef21bd29 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -11,8 +11,8 @@ - - + + @@ -34,9 +34,9 @@ This file belongs into ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/ - $Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $ + $Id: user-manual.sgml,v 2.159 2013/01/09 15:03:06 fabiankeil Exp $ - Copyright (C) 2001-2011 Privoxy Developers http://www.privoxy.org/ + Copyright (C) 2001-2013 Privoxy Developers http://www.privoxy.org/ See LICENSE. ======================================================================== @@ -55,12 +55,12 @@ - Copyright &my-copy; 2001-2011 by + Copyright &my-copy; 2001-2013 by Privoxy Developers -$Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $ +$Id: user-manual.sgml,v 2.159 2013/01/09 15:03:06 fabiankeil Exp $ Mac OS X - Unzip the downloaded file (you can either double-click on the zip file - icon from the Finder, or from the desktop if you downloaded it there). - Then, double-click on the package installer icon and follow the - installation process. + Installation instructions for the OS X platform depend upon whether + you downloaded a ready-built installation package (.pkg or .mpkg) or have + downloaded the source code. + + + +Installation from ready-built package + + The downloaded file will either be a .pkg (for OS X 10.5 upwards) or a bzipped + .mpkg file (for OS X 10.4). The former can be double-clicked as is and the + installation will start; double-clicking the latter will unzip the .mpkg file + which can then be double-clicked to commence the installation. + + + The privoxy service will automatically start after a successful installation + (and thereafter every time your computer starts up) however you will need to + configure your web browser(s) to use it. To do so, configure them to use a + proxy for HTTP and HTTPS at the address 127.0.0.1:8118. + + + To prevent the privoxy service from automatically starting when your computer + starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist + (on OS X 10.5 and higher) or the folder named + /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger'). + + + To manually start or stop the privoxy service, use the scripts startPrivoxy.sh + and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an + administrator account, using sudo. + + + To uninstall, run /Applications/Privoxy/uninstall.command as sudo from an + administrator account. + + + +Installation from source + + To build and install the Privoxy source code on OS X you will need to obtain + the macsetup module from the Privoxy Sourceforge CVS repository (refer to + Sourceforge help for details of how to set up a CVS client to have read-only + access to the repository). This module contains scripts that leverage the usual + open-source tools (available as part of Apple's free of charge Xcode + distribution or via the usual open-source software package managers for OS X + (MacPorts, Homebrew, Fink etc.) to build and then install the privoxy binary + and associated files. The macsetup module's README file contains complete + instructions for its use. + + + The privoxy service will automatically start after a successful installation + (and thereafter every time your computer starts up) however you will need to + configure your web browser(s) to use it. To do so, configure them to use a + proxy for HTTP and HTTPS at the address 127.0.0.1:8118. - The privoxy service will automatically start after a successful - installation (in addition to every time your computer starts up). To - prevent the privoxy service from automatically starting when your - computer starts up, remove or rename the folder named - /Library/StartupItems/Privoxy. + To prevent the privoxy service from automatically starting when your computer + starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist + (on OS X 10.5 and higher) or the folder named + /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger'). To manually start or stop the privoxy service, use the Privoxy Utility - for Mac OS X. This application controls the privoxy service (e.g. - starting and stopping the service as well as uninstalling the software). + for Mac OS X (also part of the macsetup module). This application can start + and stop the privoxy service and display its log and configuration files. + + + To uninstall, run the macsetup module's uninstall.sh as sudo from an + administrator account. @@ -402,13 +454,6 @@ How to install the binary packages depends on your operating system: Keeping your Installation Up-to-Date - - As user feedback comes in and development continues, we will make updated versions - of both the main actions file (as a separate - package) and the software itself (including the actions file) available for - download. - If you wish to receive an email notification whenever we release updates of @@ -437,642 +482,1158 @@ How to install the binary packages depends on your operating system: What's New in this Release - Privoxy 3.0.17 is a stable release. - The changes since 3.0.16 stable are: + Privoxy 3.0.19 is a stable release. + The changes since 3.0.18 stable are: - - - Fixed last-chunk-detection for responses where the content was small - enough to be read with the body, causing Privoxy to wait for the - end of the content until the server closed the connection or the - request timed out. Reported by "Karsten" in #3028326. - - - - - Responses with status code 204 weren't properly detected as body-less - like RFC2616 mandates. Like the previous bug, this caused Privoxy to - wait for the end of the content until the server closed the connection - or the request timed out. Fixes #3022042 and #3025553, reported by a - user with no visible name. Most likely also fixes a bunch of other - AJAX-related problem reports that got closed in the past due to - insufficient information and lack of feedback. - - - - - Fixed an ACL bug that made it impossible to build a blacklist. - Usually the ACL directives are used in a whitelist, which worked - as expected, but blacklisting is still useful for public proxies - where one only needs to deny known abusers access. - - - - - Added LOG_LEVEL_RECEIVED to log the not-yet-parsed data read from the - network. This should make debugging various parsing issues a lot easier. - - - - - The IPv6 code is enabled by default on Windows versions that support it. - Patch submitted by oCameLo in #2942729. - - - - - In mingw32 versions, the user.filter file is reachable through the - GUI, just like default.filter is. Feature request 3040263. - - - - - Added the configure option --enable-large-file-support to set a few - defines that are required by platforms like GNU/Linux to support files - larger then 2GB. Mainly interesting for users without proper logfile - management. - - - - - Logging with "debug 16" no longer stops at the first nul byte which is - pretty useless. Non-printable characters are replaced with their hex value - so the result can't span multiple lines making parsing them harder then - necessary. - - - + - Privoxy logs when reading an action, filter or trust file. + Bug fixes: + + + + Prevent a segmentation fault when de-chunking buffered content. + It could be triggered by malicious web servers if Privoxy was + configured to filter the content and running on a platform + where SIZE_T_MAX isn't larger than UINT_MAX, which probably + includes most 32-bit systems. On those platforms, all Privoxy + versions before 3.0.19 appear to be affected. + To be on the safe side, this bug should be presumed to allow + code execution as proving that it doesn't seems unrealistic. + + + + + Do not expect a response from the SOCKS4/4A server until it + got something to respond to. This regression was introduced + in 3.0.18 and prevented the SOCKS4/4A negotiation from working. + Reported by qqqqqw in #3459781. + + + - Fixed incorrect regression test markup which caused a test in - 3.0.16 to fail while Privoxy itself was working correctly. - While Privoxy accepts hide-referer, too, the action name is actually - hide-referrer which is also the name used one the final results page, - where the test expected the alias. + General improvements: + + + + Fix an off-by-one in an error message about connect failures. + + + + + Use a GNUMakefile variable for the webserver root directory and + update the path. Sourceforge changed it which broke various + web-related targets. + + + + + Update the CODE_STATUS description. + + + + + + + + The following changes were made between 3.0.17 and 3.0.18: + + + + - CGI interface improvements: + Bug fixes: - In finish_http_response(), continue to add the 'Connection: close' - header if the client connection will not be kept alive. - Anonymously pointed out in #2987454. + If a generated redirect URL contains characters RFC 3986 doesn't + permit, they are (re)encoded. Not doing this makes Privoxy versions + from 3.0.5 to 3.0.17 susceptible to HTTP response splitting (CWE-113) + attacks if the +fast-redirects{check-decoded-url} action is used. - Apostrophes in block messages no longer cause parse errors - when the blocked page is viewed with JavaScript enabled. - Reported by dg1727 in #3062296. + Fix a logic bug that could cause Privoxy to reuse a server + socket after it got tainted by a server-header-tagger-induced + block that was triggered before the whole server response had + been read. If keep-alive was enabled and the request following + the blocked one was to the same host and using the same forwarding + settings, Privoxy would send it on the tainted server socket. + While the server would simply treat it as a pipelined request, + Privoxy would later on fail to properly parse the server's + response as it would try to parse the unread data from the + first response as server headers for the second one. + Regression introduced in 3.0.17. - Fix a bunch of anchors that used underscores instead of dashes. + When implying keep-alive in client_connection(), remember that + the client didn't. Fixes a regression introduced in 3.0.13 that + would cause Privoxy to wait for additional client requests after + receiving a HTTP/1.1 request with "Connection: close" set + and connection sharing enabled. + With clients which terminates the client connection after detecting + that the whole body has been received it doesn't really matter, + but with clients that don't the connection would be kept open until + it timed out. - Allow to keep the client connection alive after crunching the previous request. - Already opened server connections can be kept alive, too. + Fix a subtle race condition between prepare_csp_for_next_request() + and sweep(). A thread preparing itself for the next client request + could briefly appear to be inactive. + If all other threads were already using more recent files, + the thread could get its files swept away under its feet. + So far this has only been reproduced while stress testing in + valgrind while touching action files in a loop. It's unlikely + to have caused any actual problems in the real world. - In cgi_show_url_info(), don't forget to prefix URLs that only contain - http:// or https:// in the path. Fixes #2975765 reported by Adam Piggott. + Disable filters if SDCH compression is used unless filtering is forced. + If SDCH was combined with a supported compression algorithm, Privoxy + previously could try to decompress it and ditch the Content-Encoding + header even though the SDCH compression wasn't dealt with. + Reported by zebul666 in #3225863. - Show the 404 CGI page if cgi_send_user_manual() is called while - local user manual delivery is disabled. + Make a copy of the --user value and only mess with that when splitting + user and group. On some operating systems modifying the value directly + is reflected in the output of ps and friends and can be misleading. + Reported by zepard in #3292710. - + + + If forwarded-connect-retries is set, only retry if Privoxy is actually + forwarding the request. Previously direct connections would be retried + as well. + + + + + Fixed a small memory leak when retrying connections with IPv6 + support enabled. + + + + + Remove an incorrect assertion in compile_dynamic_pcrs_job_list() + It could be triggered by a pcrs job with an invalid pcre + pattern (for example one that contains a lone quantifier). + + + + + If the --user argument user[.group] contains a dot, always bail out + if no group has been specified. Previously the intended, but undocumented + (and apparently untested), behaviour was to try interpreting the whole + argument as user name, but the detection was flawed and checked for '0' + instead of '\0', thus merely preventing group names beginning with a zero. + + + + + In html_code_map[], use a numeric character reference instead of ' + which wasn't standardized before XHTML 1.0. + + + + + Fix an invalid free when compiled with FEATURE_GRACEFUL_TERMINATION + and shut down through http://config.privoxy.org/die + + + + + In get_actions(), fix the "temporary" backwards compatibility hack + to accept block actions without reason. + It also covered other actions that should be rejected as invalid. + Reported by Billy Crook. + + + - Action file improvements: + General improvements: - Enable user.filter by default. Suggested by David White in #3001830. + Privoxy can (re)compress buffered content before delivering + it to the client. Disabled by default as most users wouldn't + benefit from it. - Block .sitestat.com/. Reported by johnd16 in #3002725. + The +fast-redirects{check-decoded-url} action checks URL + segments separately. If there are other parameters behind + the redirect URL, this makes it unnecessary to cut them off + by additionally using a +redirect{} pcrs command. + Initial patch submitted by Jamie Zawinski in #3429848. - Block .atemda.com/. Reported by johnd16 in #3002723. + When loading action sections, verify that the referenced filters + exist. Currently missing filters only result in an error message, + but eventually the severity will be upgraded to fatal. - Block js.adlink.net/. Reported by johnd16 in #3002720. + Allow to bind to multiple separate addresses. + Patch set submitted by Petr Pisar in #3354485. - Block .analytics.yahoo.com/. Reported by johnd16 in #3002713. + Set socket_error to errno if connecting fails in rfc2553_connect_to(). + Previously rejected direct connections could be incorrectly reported + as DNS issues if Privoxy was compiled with IPv6 support. - Block sb.scorecardresearch.com, too. Reported by dg1727 in #2992652. + Adjust url_code_map[] so spaces are replaced with %20 instead of '+' + While '+' can be used by client's submitting form data, this is not + actually what Privoxy is using the lookups for. This is more of a + cosmetic issue and doesn't fix any known problems. - Fix problems noticed on Yahoo mail and news pages. + When compiled without FEATURE_FAST_REDIRECTS, do not silently + ignore +fast-redirect{} directives - Remove the too broad yahoo section, only keeping the - fast-redirects exception as discussed on ijbswa-devel@. + Added a workaround for GNU libc's strptime() reporting negative + year values when the parsed year is only specified with two digits. + On affected systems cookies with such a date would not be turned + into session cookies by the +session-cookies-only action. + Reported by Vaeinoe in #3403560 - Don't block adesklets.sourceforge.net. Reported in #2974204. + Fixed bind failures with certain GNU libc versions if no non-loopback + IP address has been configured on the system. This is mainly an issue + if the system is using DHCP and Privoxy is started before the network + is completely configured. + Reported by Raphael Marichez in #3349356. + Additional insight from Petr Pisar. - Block chartbeat ping tracking. Reported in #2975895. + Privoxy log messages now use the ISO 8601 date format %Y-%m-%d. + It's only slightly longer than the old format, but contains + the full date including the year and allows sorting by date + (when grepping in multiple log files) without hassle. - Tag CSS and image requests with cautious and medium settings, too. + In get_last_url(), do not bother trying to decode URLs that do + not contain at least one '%' sign. It reduces the log noise and + a number of unnecessary memory allocations. - Don't handle view.atdmt.com as image. It's used for click-throughs - so users should be able to "go there anyway". - Reported by Adam Piggott in #2975927. + In case of SOCKS5 failures, dump the socks response in the log message. - Also let the refresh-tags filter remove invalid refresh tags where - the 'url=' part is missing. Anonymously reported in #2986382. - While at it, update the description to mention the fact that only - refresh tags with refresh times above 9 seconds are covered. + Simplify the signal setup in main(). - javascript needs to be blocked with +handle-as-empty-document to - work around Firefox bug 492459. So move .js blockers from - +block{Might be a web-bug.} -handle-as-empty-document to - +block{Might be a web-bug.} +handle-as-empty-document. + Streamline socks5_connect() slightly. - ijbswa-Feature Requests-3006719 - Block 160x578 Banners. + In socks5_connect(), require a complete socks response from the server. + Previously Privoxy didn't care how much data the server response + contained as long as the first two bytes contained the expected + values. While at it, shrink the buffer size so Privoxy can't read + more than a whole socks response. - Block another omniture tracking domain. + In chat(), do not bother to generate a client request in case of + direct CONNECT requests. It will not be used anyway. - Added a range-requests tagger. + Reduce server_last_modified()'s stack size. - Added two sections to get Flickr's Ajax interface working with - default pre-settings. If you change the configuration to block - cookies by default, you'll need additional exceptions. - Reported by Mathias Homann in #3101419 and by Patrick on ijbswa-users@. + Shorten get_http_time() by using strftime(). - - - - - - Documentation improvements: - - Explicitly mention how to match all URLs. + Constify the known_http_methods pointers in unknown_method(). - Consistently recommend socks5 in the Tor FAQ entry and mention - its advantage compared to socks4a. Reported by David in #2960129. + Constify the time_formats pointers in parse_header_time(). - Slightly improve the explanation of why filtering may appear - slower than it is. + Constify the formerly_valid_actions pointers in action_used_to_be_valid(). - Grammar fixes for the ACL section. + Introduce a GNUMakefile MAN_PAGE variable that defaults to privoxy.1. + The Debian package uses section 8 for the man page and this + should simplify the patch. - Fixed a link to the 'intercepting' entry and add another one. + Deduplicate the INADDR_NONE definition for Solaris by moving it to jbsockets.h - Rename the 'Other' section to 'Mailing Lists' and reword it - to make it clear that nobody is forced to use the trackers + In block_url(), ditch the obsolete workaround for ancient Netscape versions + that supposedly couldn't properly deal with status code 403. - Note that 'anonymously' posting on the trackers may not always - be possible. + Remove a useless NULL pointer check in load_trustfile(). - Suggest to enable debug 32768 when suspecting parsing problems. + Remove two useless NULL pointer checks in load_one_re_filterfile(). + + + + + Change url_code_map[] from an array of pointers to an array of arrays + It removes an unnecessary layer of indirection and on 64bit system reduces + the size of the binary a bit. + + + + + Fix various typos. Fixes taken from Debian's 29_typos.dpatch by Roland Rosenfeld. + + + + + Add a dok-tidy GNUMakefile target to clean up the messy HTML + generated by the other dok targets. + + + + + GNUisms in the GNUMakefile have been removed. + + + + + Change the HTTP version in static responses to 1.1 + + + + + Synced config.sub and config.guess with upstream + 2011-11-11/386c7218162c145f5f9e1ff7f558a3fbb66c37c5. + + + + + Add a dedicated function to parse the values of toggles. Reduces duplicated + code in load_config() and provides better error handling. Invalid or missing + toggle values are now a fatal error instead of being silently ignored. + + + + + Terminate HTML lines in static error messages with \n instead of \r\n. + + + + + Simplify cgi_error_unknown() a bit. + + + + + In LogPutString(), don't bother looking at pszText when not + actually logging anything. - - - - - - Privoxy-Log-Parser improvements: - - Gather statistics for ressources, methods, and HTTP versions - used by the client. + Change ssplit()'s fourth parameter from int to size_t. + Fixes a clang complaint. - Also gather statistics for blocked and redirected requests. + Add a warning that the statistics currently can't be trusted. + Mention Privoxy-Log-Parser's --statistics option as + an alternative for the time being. - Provide the percentage of keep-alive offers the client accepted. + In rfc2553_connect_to(), start setting cgi->error_message on error. - Add a --url-statistics-threshold option. + Change the expected status code returned for http://p.p/die depending + on whether or not FEATURE_GRACEFUL_TERMINATION is available. - Add a --host-statistics-threshold option to also gather - statistics about how many request where made per host. + In cgi_die(), mark the client connection for closing. + If the client will fetch the style sheet through another connection + it gets the main thread out of the accept() state and should thus + trigger the actual shutdown. - Fix a bug in handle_loglevel_header() where a 'scan: ' got lost. + Add a proper CGI message for cgi_die(). - Add a --shorten-thread-ids option to replace the thread id with - a decimal number. + Don't enforce a logical line length limit in read_config_line(). - Accept and ignore: Looks like we got the last chunk together - with the server headers. We better stop reading. + Slightly refactor server_last_modified() to remove useless gmtime*() calls. - Accept and ignore: Continue hack in da house. + In get_content_type(), also recognize '.jpeg' as JPEG extension. - Accept and higlight: Rejecting connection from 10.0.0.2. - Maximum number of connections reached. + Add '.png' to the list of recognized file extensions in get_content_type(). - Accept and highlight: Loading actions file: /usr/local/etc/privoxy/default.action + In block_url(), consistently use the block reason "Request blocked by Privoxy" + In two places the reason was "Request for blocked URL" which hides the + fact that the request got blocked by Privoxy and isn't necessarily + correct as the block may be due to tags. - Accept and highlight: Loading filter file: /usr/local/etc/privoxy/default.filter + In listen_loop(), reload the configuration files after accepting + a new connection instead of before. + Previously the first connection that arrived after a configuration + change would still be handled with the old configuration. - Accept and highlight: Killed all-caps Host header line: HOST: bestproxydb.com + In chat()'s receive-data loop, skip a client socket check if + the socket will be written to right away anyway. This can + increase the transfer speed for unfiltered content on fast + network connections. - Accept and highlight: Reducing expected bytes to 0. Marking - the server socket tainted after throwing 4 bytes away. + The socket timeout is used for SOCKS negotiations as well which + previously couldn't timeout. - Accept: Merged multiple header lines to: 'X-FORWARDED-PROTO: http X-HOST: 127.0.0.1' + Don't keep the client connection alive if any configuration file + changed since the time the connection came in. This is closer to + Privoxy's behaviour before keep-alive support for client connection + has been added and also less confusing in general. + + + Treat all Content-Type header values containing the pattern + 'script' as a sign of text. Reported by pribog in #3134970. + + - Code cleanups: + Action file improvements: - Remove the next member from the client_state struct. Only the main - thread needs access to all client states so give it its own struct. + Moved the site-specific block pattern section below the one for the + generic patterns so for requests that are matched in both, the block + reason for the domain is shown which is usually more useful than showing + the one for the generic pattern. - Garbage-collect request_contains_null_bytes(). + Remove -prevent-compression from the fragile alias. It's no longer + used anywhere by default and isn't known to break stuff anyway. - Ditch redundant code in unload_configfile(). + Add a (disabled) section to block various Facebook tracking URLs. + Reported by Dan Stahlke in #3421764. - Ditch LogGetURLUnderCursor() which doesn't seem to be used anywhere. + Add a (disabled) section to rewrite and redirect click-tracking + URLs used on news.google.com. + Reported by Dan Stahlke in #3421755. - In write_socket(), remove the write-only variable write_len in - an ifdef __OS2__ block. Spotted by cppcheck. + Unblock linuxcounter.net/. + Reported by Dan Stahlke in #3422612. - In connect_to(), don't declare the variable 'flags' on OS/2 where - it isn't used. Spotted by cppcheck. + Block 'www91.intel.com/' which is used by Omniture. + Reported by Adam Piggott in #3167370. - Limit the scope of various variables. Spotted by cppcheck. + Disable the handle-as-empty-doc-returns-ok option and mark it as deprecated. + Reminded by tceverling in #2790091. - In add_to_iob(), turn an interestingly looking for loop into a - boring while loop. + Add ".ivwbox.de/" to the "Cross-site user tracking" section. + Reported by Nettozahler in #3172525. - Code cleanup in preparation for external filters. + Unblock and fast-redirect ".awin1.com/.*=http://". + Reported by Adam Piggott in #3170921. - In listen_loop(), mention the socket on which we accepted the - connection, not just the source IP address. + Block "b.collective-media.net/". - In write_socket(), also log the socket we're writing to. + Widen the Debian popcon exception to "qa.debian.org/popcon". + Seen in Debian's 05_default_action.dpatch by Roland Rosenfeld. - In log_error(), assert that escaped characters get logged - completely or not at all. + Block ".gemius.pl/" which only seems to be used for user tracking. + Reported by johnd16 in #3002731. Additional input from Lee and movax. - In log_error(), assert that ival and sval have reasonable values. - There's no reason not to abort() if they don't. + Disable banners-by-size filters for '.thinkgeek.com/'. + The filter only seems to catch pictures of the inventory. - Remove an incorrect cgi_error_unknown() call in a - cannot-happen-situation in send_crunch_response(). + Block requests for 'go.idmnet.bbelements.com/please/showit/'. + Reported by kacperdominik in #3372959. - Clean up white-space in http_response definition and - move the crunch_reason to the beginning. + Unblock adainitiative.org/. - Turn http_response.reason into an enum and rename it - to http_response.crunch_reason. + Add a fast-redirects exception for '.googleusercontent.com/.*=cache'. - Silence a 'gcc (Debian 4.3.2-1.1) 4.3.2' warning on i686 GNU/Linux. + Add a fast-redirects exception for webcache.googleusercontent.com/. - Fix white-space in a log message in remove_chunked_transfer_coding(). - While at it, add a note that the message doesn't seem to - be entirely correct and should be improved later on. + Unblock http://adassier.wordpress.com/ and http://adassier.files.wordpress.com/. - + - GNUmakefile improvements: + Filter file improvements: - Use $(SSH) instead of ssh, so one only needs to specify a username once. + Let the yahoo filter hide '.ads'. + + + + + Let the msn filter hide overlay ads for Facebook 'likes' in search + results and elements with the id 's_notf_div'. They only seem to be + used to advertise site 'enhancements'. - Removed references to the action feedback thingy that hasn't been - working for years. + Let the js-events filter additionally disarm setInterval(). + Suggested by dg1727 in #3423775. + + + + + + + + Documentation improvements: + + + + Clarify the effect of compiling Privoxy with zlib support. + Suggested by dg1727 in #3423782. - Consistently use shell.sourceforge.net instead of shell.sf.net so - one doesn't need to check server fingerprints twice. + Point out that the SourceForge messaging system works like a black + hole and should thus not be used to contact individual developers. - Removed GNUisms in the webserver and webactions targets so they - work with standard tar. + Mention some of the problems one can experience when not explicitly + configuring an IP addresses as listen address. + + + Explicitly mention that hostnames can be used instead of IP addresses + for the listen-address, that only the first address returned will be + used and what happens if the address is invalid. + Requested by Calestyo in #3302213. + + - - - - - - - -Note to Upgraders - - - A quick list of things to be aware of before upgrading from earlier - versions of Privoxy: - - - - - - - - The recommended way to upgrade &my-app; is to backup your old - configuration files, install the new ones, verify that &my-app; - is working correctly and finally merge back your changes using - diff and maybe patch. - - - There are a number of new features in each &my-app; release and - most of them have to be explicitly enabled in the configuration - files. Old configuration files obviously don't do that and due - to syntax changes using old configuration files with a new - &my-app; isn't always possible anyway. - - - - - Note that some installers remove earlier versions completely, - including configuration files, therefore you should really save - any important configuration files! - - - - - On the other hand, other installers don't overwrite existing configuration - files, thinking you will want to do that yourself. - - - - - standard.action has been merged into - the default.action file. - - - - - In the default configuration only fatal errors are logged now. - You can change that in the debug section - of the configuration file. You may also want to enable more verbose - logging until you verified that the new &my-app; version is working - as expected. - - - - - - Three other config file settings are now off by default: - enable-remote-toggle, - enable-remote-http-toggle, - and enable-edit-actions. - If you use or want these, you will need to explicitly enable them, and - be aware of the security issues involved. - - - - - + + +Note to Upgraders + + + A quick list of things to be aware of before upgrading from earlier + versions of Privoxy: + + + + + + + + The recommended way to upgrade &my-app; is to backup your old + configuration files, install the new ones, verify that &my-app; + is working correctly and finally merge back your changes using + diff and maybe patch. + + + There are a number of new features in each &my-app; release and + most of them have to be explicitly enabled in the configuration + files. Old configuration files obviously don't do that and due + to syntax changes using old configuration files with a new + &my-app; isn't always possible anyway. + + + + + Note that some installers remove earlier versions completely, + including configuration files, therefore you should really save + any important configuration files! + + + + + On the other hand, other installers don't overwrite existing configuration + files, thinking you will want to do that yourself. + + + + + standard.action has been merged into + the default.action file. + + + + + In the default configuration only fatal errors are logged now. + You can change that in the debug section + of the configuration file. You may also want to enable more verbose + logging until you verified that the new &my-app; version is working + as expected. + + + + + + Three other config file settings are now off by default: + enable-remote-toggle, + enable-remote-http-toggle, + and enable-edit-actions. + If you use or want these, you will need to explicitly enable them, and + be aware of the security issues involved. + + + + + + +limit-cookie-lifetime + + + + Typical use: + + Limit the lifetime of HTTP cookies to a couple of minutes or hours. + + + + + Effect: + + + Overwrites the expires field in Set-Cookie server headers if it's above the specified limit. + + + + + + Type: + + + Parameterized. + + + + + Parameter: + + + The lifetime limit in minutes, or 0. + + + + + + Notes: + + + This action reduces the lifetime of HTTP cookies coming from the + server to the specified number of minutes, starting from the time + the cookie passes Privoxy. + + + Cookies with a lifetime below the limit are not modified. + The lifetime of session cookies is set to the specified limit. + + + The effect of this action depends on the server. + + + In case of servers which refresh their cookies with each response + (or at least frequently), the lifetime limit set by this action + is updated as well. + Thus, a session associated with the cookie continues to work with + this action enabled, as long as a new request is made before the + last limit set is reached. + + + However, some servers send their cookies once, with a lifetime of several + years (the year 2037 is a popular choice), and do not refresh them + until a certain event in the future, for example the user logging out. + In this case this action may limit the absolute lifetime of the session, + even if requests are made frequently. + + + If the parameter is 0, this action behaves like + session-cookies-only. + + + + + + Example usages: + + + +limit-cookie-lifetime{60} + + + + + + + prevent-compression @@ -5719,6 +6415,10 @@ new action either provided as parameter, or derived by applying a single pcrs command to the original URL. + + The syntax for pcrs commands is documented in the + filter file section. + This action will be ignored if you use it together with block. @@ -6149,3740 +6849,2769 @@ example.org/instance-that-is-delivered-as-xml-but-is-not - - - -Summary - - Note that many of these actions have the potential to cause a page to - misbehave, possibly even not to display at all. There are many ways - a site designer may choose to design his site, and what HTTP header - content, and other criteria, he may depend on. There is no way to have hard - and fast rules for all sites. See the Appendix for a brief example on troubleshooting - actions. - - - - - - -Aliases - - Custom actions, known to Privoxy - as aliases, can be defined by combining other actions. - These can in turn be invoked just like the built-in actions. - Currently, an alias name can contain any character except space, tab, - =, - { and }, but we strongly - recommend that you only use a to z, - 0 to 9, +, and -. - Alias names are not case sensitive, and are not required to start with a - + or - sign, since they are merely textually - expanded. - - - Aliases can be used throughout the actions file, but they must be - defined in a special section at the top of the file! - And there can only be one such section per actions file. Each actions file may - have its own alias section, and the aliases defined in it are only visible - within that file. - - - There are two main reasons to use aliases: One is to save typing for frequently - used combinations of actions, the other one is a gain in flexibility: If you - decide once how you want to handle shops by defining an alias called - shop, you can later change your policy on shops in - one place, and your changes will take effect everywhere - in the actions file where the shop alias is used. Calling aliases - by their purpose also makes your actions files more readable. - - - Currently, there is one big drawback to using aliases, though: - Privoxy's built-in web-based action file - editor honors aliases when reading the actions files, but it expands - them before writing. So the effects of your aliases are of course preserved, - but the aliases themselves are lost when you edit sections that use aliases - with it. - - - - Now let's define some aliases... - - - - - # Useful custom aliases we can use later. - # - # Note the (required!) section header line and that this section - # must be at the top of the actions file! - # - {{alias}} - - # These aliases just save typing later: - # (Note that some already use other aliases!) - # - +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies - -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - +block-as-image = +block{Blocked image.} +handle-as-image - allow-all-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - - # These aliases define combinations of actions - # that are useful for certain types of sites: - # - fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer -prevent-compression - - shop = -crunch-all-cookies -filter{all-popups} - - # Short names for other aliases, for really lazy people ;-) - # - c0 = +crunch-all-cookies - c1 = -crunch-all-cookies - - - - ...and put them to use. These sections would appear in the lower part of an - actions file and define exceptions to the default actions (as specified further - up for the / pattern): - - - - - # These sites are either very complex or very keen on - # user data and require minimal interference to work: - # - {fragile} - .office.microsoft.com - .windowsupdate.microsoft.com - # Gmail is really mail.google.com, not gmail.com - mail.google.com - - # Shopping sites: - # Allow cookies (for setting and retrieving your customer data) - # - {shop} - .quietpc.com - .worldpay.com # for quietpc.com - mybank.example.com - - # These shops require pop-ups: - # - {-filter{all-popups} -filter{unsolicited-popups}} - .dabs.com - .overclockers.co.uk - - - - Aliases like shop and fragile are typically used for - problem sites that require more than one action to be disabled - in order to function properly. - - - - - -Actions Files Tutorial - - The above chapters have shown which actions files - there are and how they are organized, how actions are specified and applied - to URLs, how patterns work, and how to - define and use aliases. Now, let's look at an - example match-all.action, default.action - and user.action file and see how all these pieces come together: - - - -match-all.action - - Remember all actions are disabled when matching starts, - so we have to explicitly enable the ones we want. - - - - While the match-all.action file only contains a - single section, it is probably the most important one. It has only one - pattern, /, but this pattern - matches all URLs. Therefore, the set of - actions used in this default section will - be applied to all requests as a start. It can be partly or - wholly overridden by other actions files like default.action - and user.action, but it will still be largely responsible - for your overall browsing experience. - - - - Again, at the start of matching, all actions are disabled, so there is - no need to disable any actions here. (Remember: a + - preceding the action name enables the action, a - disables!). - Also note how this long line has been made more readable by splitting it into - multiple lines with line continuation. - - - - -{ \ - +change-x-forwarded-for{block} \ - +hide-from-header{block} \ - +set-image-blocker{pattern} \ -} -/ # Match all URLs - - - - - The default behavior is now set. - - - - -default.action - - - If you aren't a developer, there's no need for you to edit the - default.action file. It is maintained by - the &my-app; developers and if you disagree with some of the - sections, you should overrule them in your user.action. - - - - Understanding the default.action file can - help you with your user.action, though. - - - - The first section in this file is a special section for internal use - that prevents older &my-app; versions from reading the file: - - - - -########################################################################## -# Settings -- Don't change! For internal Privoxy use ONLY. -########################################################################## -{{settings}} -for-privoxy-version=3.0.11 - - - - After that comes the (optional) alias section. We'll use the example - section from the above chapter on aliases, - that also explains why and how aliases are used: - - - - -########################################################################## -# Aliases -########################################################################## -{{alias}} - - # These aliases just save typing later: - # (Note that some already use other aliases!) - # - +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies - -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - +block-as-image = +block{Blocked image.} +handle-as-image - mercy-for-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - - # These aliases define combinations of actions - # that are useful for certain types of sites: - # - fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer - shop = -crunch-all-cookies -filter{all-popups} - - - - The first of our specialized sections is concerned with fragile - sites, i.e. sites that require minimum interference, because they are either - very complex or very keen on tracking you (and have mechanisms in place that - make them unusable for people who avoid being tracked). We will simply use - our pre-defined fragile alias instead of stating the list - of actions explicitly: - - - - -########################################################################## -# Exceptions for sites that'll break under the default action set: -########################################################################## - -# "Fragile" Use a minimum set of actions for these sites (see alias above): -# -{ fragile } -.office.microsoft.com # surprise, surprise! -.windowsupdate.microsoft.com -mail.google.com - - - - Shopping sites are not as fragile, but they typically - require cookies to log in, and pop-up windows for shopping - carts or item details. Again, we'll use a pre-defined alias: - - - - -# Shopping sites: -# -{ shop } -.quietpc.com -.worldpay.com # for quietpc.com -.jungle.com -.scan.co.uk - - - - The fast-redirects - action, which may have been enabled in match-all.action, - breaks some sites. So disable it for popular sites where we know it misbehaves: - - - - -{ -fast-redirects } -login.yahoo.com -edit.*.yahoo.com -.google.com -.altavista.com/.*(like|url|link):http -.altavista.com/trans.*urltext=http -.nytimes.com - - - - It is important that Privoxy knows which - URLs belong to images, so that if they are to - be blocked, a substitute image can be sent, rather than an HTML page. - Contacting the remote site to find out is not an option, since it - would destroy the loading time advantage of banner blocking, and it - would feed the advertisers information about you. We can mark any - URL as an image with the handle-as-image action, - and marking all URLs that end in a known image file extension is a - good start: - - - - -########################################################################## -# Images: -########################################################################## - -# Define which file types will be treated as images, in case they get -# blocked further down this file: -# -{ +handle-as-image } -/.*\.(gif|jpe?g|png|bmp|ico)$ - - - - And then there are known banner sources. They often use scripts to - generate the banners, so it won't be visible from the URL that the - request is for an image. Hence we block them and - mark them as images in one go, with the help of our - +block-as-image alias defined above. (We could of - course just as well use +block - +handle-as-image here.) - Remember that the type of the replacement image is chosen by the - set-image-blocker - action. Since all URLs have matched the default section with its - +set-image-blocker{pattern} - action before, it still applies and needn't be repeated: - - - - -# Known ad generators: -# -{ +block-as-image } -ar.atwola.com -.ad.doubleclick.net -.ad.*.doubleclick.net -.a.yimg.com/(?:(?!/i/).)*$ -.a[0-9].yimg.com/(?:(?!/i/).)*$ -bs*.gsanet.com -.qkimg.net - - - - One of the most important jobs of Privoxy - is to block banners. Many of these can be blocked - by the filter{banners-by-size} - action, which we enabled above, and which deletes the references to banner - images from the pages while they are loaded, so the browser doesn't request - them anymore, and hence they don't need to be blocked here. But this naturally - doesn't catch all banners, and some people choose not to use filters, so we - need a comprehensive list of patterns for banner URLs here, and apply the - block action to them. - - - First comes many generic patterns, which do most of the work, by - matching typical domain and path name components of banners. Then comes - a list of individual patterns for specific sites, which is omitted here - to keep the example short: - - - - -########################################################################## -# Block these fine banners: -########################################################################## -{ +block{Banner ads.} } - -# Generic patterns: -# -ad*. -.*ads. -banner?. -count*. -/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) -/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ - -# Site-specific patterns (abbreviated): -# -.hitbox.com - - - - It's quite remarkable how many advertisers actually call their banner - servers ads.company.com, or call the directory - in which the banners are stored simply banners. So the above - generic patterns are surprisingly effective. - - - But being very generic, they necessarily also catch URLs that we don't want - to block. The pattern .*ads. e.g. catches - nasty-ads.nasty-corp.com as intended, - but also downloads.sourcefroge.net or - adsl.some-provider.net. So here come some - well-known exceptions to the +block - section above. - - - Note that these are exceptions to exceptions from the default! Consider the URL - downloads.sourcefroge.net: Initially, all actions are deactivated, - so it wouldn't get blocked. Then comes the defaults section, which matches the - URL, but just deactivates the block - action once again. Then it matches .*ads., an exception to the - general non-blocking policy, and suddenly - +block applies. And now, it'll match - .*loads., where -block - applies, so (unless it matches again further down) it ends up - with no block action applying. - - - - -########################################################################## -# Save some innocent victims of the above generic block patterns: -########################################################################## - -# By domain: -# -{ -block } -adv[io]*. # (for advogato.org and advice.*) -adsl. # (has nothing to do with ads) -adobe. # (has nothing to do with ads either) -ad[ud]*. # (adult.* and add.*) -.edu # (universities don't host banners (yet!)) -.*loads. # (downloads, uploads etc) - -# By path: -# -/.*loads/ - -# Site-specific: -# -www.globalintersec.com/adv # (adv = advanced) -www.ugu.com/sui/ugu/adv - - - - Filtering source code can have nasty side effects, - so make an exception for our friends at sourceforge.net, - and all paths with cvs in them. Note that - -filter - disables all filters in one fell swoop! - - - - -# Don't filter code! -# -{ -filter } -/(.*/)?cvs -bugzilla. -developer. -wiki. -.sourceforge.net - - - - The actual default.action is of course much more - comprehensive, but we hope this example made clear how it works. - - - - -user.action - - - So far we are painting with a broad brush by setting general policies, - which would be a reasonable starting point for many people. Now, - you might want to be more specific and have customized rules that - are more suitable to your personal habits and preferences. These would - be for narrowly defined situations like your ISP or your bank, and should - be placed in user.action, which is parsed after all other - actions files and hence has the last word, over-riding any previously - defined actions. user.action is also a - safe place for your personal settings, since - default.action is actively maintained by the - Privoxy developers and you'll probably want - to install updated versions from time to time. - - - - So let's look at a few examples of things that one might typically do in - user.action: - - - - - - - -# My user.action file. <fred@example.com> - - - - As aliases are local to the actions - file that they are defined in, you can't use the ones from - default.action, unless you repeat them here: - - - - -# Aliases are local to the file they are defined in. -# (Re-)define aliases for this file: -# -{{alias}} -# -# These aliases just save typing later, and the alias names should -# be self explanatory. -# -+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies --crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - allow-all-cookies = -crunch-all-cookies -session-cookies-only - allow-popups = -filter{all-popups} -+block-as-image = +block{Blocked as image.} +handle-as-image --block-as-image = -block - -# These aliases define combinations of actions that are useful for -# certain types of sites: -# -fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer -shop = -crunch-all-cookies allow-popups - -# Allow ads for selected useful free sites: -# -allow-ads = -block -filter{banners-by-size} -filter{banners-by-link} - -# Alias for specific file types that are text, but might have conflicting -# MIME types. We want the browser to force these to be text documents. -handle-as-text = -filter +-content-type-overwrite{text/plain} +-force-text-mode -hide-content-disposition - - - - - Say you have accounts on some sites that you visit regularly, and - you don't want to have to log in manually each time. So you'd like - to allow persistent cookies for these sites. The - allow-all-cookies alias defined above does exactly - that, i.e. it disables crunching of cookies in any direction, and the - processing of cookies to make them only temporary. - - - - -{ allow-all-cookies } - sourceforge.net - .yahoo.com - .msdn.microsoft.com - .redhat.com - - - - Your bank is allergic to some filter, but you don't know which, so you disable them all: - - - - -{ -filter } - .your-home-banking-site.com - - - - Some file types you may not want to filter for various reasons: - - - - -# Technical documentation is likely to contain strings that might -# erroneously get altered by the JavaScript-oriented filters: -# -.tldp.org -/(.*/)?selfhtml/ - -# And this stupid host sends streaming video with a wrong MIME type, -# so that Privoxy thinks it is getting HTML and starts filtering: -# -stupid-server.example.com/ - - - - Example of a simple block action. Say you've - seen an ad on your favourite page on example.com that you want to get rid of. - You have right-clicked the image, selected copy image location - and pasted the URL below while removing the leading http://, into a - { +block{} } section. Note that { +handle-as-image - } need not be specified, since all URLs ending in - .gif will be tagged as images by the general rules as set - in default.action anyway: - - - - -{ +block{Nasty ads.} } - www.example.com/nasty-ads/sponsor\.gif - another.example.net/more/junk/here/ - - - - The URLs of dynamically generated banners, especially from large banner - farms, often don't use the well-known image file name extensions, which - makes it impossible for Privoxy to guess - the file type just by looking at the URL. - You can use the +block-as-image alias defined above for - these cases. - Note that objects which match this rule but then turn out NOT to be an - image are typically rendered as a broken image icon by the - browser. Use cautiously. - - - - -{ +block-as-image } - .doubleclick.net - .fastclick.net - /Realmedia/ads/ - ar.atwola.com/ - - - - Now you noticed that the default configuration breaks Forbes Magazine, - but you were too lazy to find out which action is the culprit, and you - were again too lazy to give feedback, so - you just used the fragile alias on the site, and - -- whoa! -- it worked. The fragile - aliases disables those actions that are most likely to break a site. Also, - good for testing purposes to see if it is Privoxy - that is causing the problem or not. We later find other regular sites - that misbehave, and add those to our personalized list of troublemakers: - - - - -{ fragile } - .forbes.com - webmail.example.com - .mybank.com - - - - You like the fun text replacements in default.filter, - but it is disabled in the distributed actions file. - So you'd like to turn it on in your private, - update-safe config, once and for all: - - - - -{ +filter{fun} } - / # For ALL sites! - - - - Note that the above is not really a good idea: There are exceptions - to the filters in default.action for things that - really shouldn't be filtered, like code on CVS->Web interfaces. Since - user.action has the last word, these exceptions - won't be valid for the fun filtering specified here. - - - - You might also worry about how your favourite free websites are - funded, and find that they rely on displaying banner advertisements - to survive. So you might want to specifically allow banners for those - sites that you feel provide value to you: - - - - -{ allow-ads } - .sourceforge.net - .slashdot.org - .osdn.net - - - - Note that allow-ads has been aliased to - -block, - -filter{banners-by-size}, and - -filter{banners-by-link} above. - - - - Invoke another alias here to force an over-ride of the MIME type - application/x-sh which typically would open a download type - dialog. In my case, I want to look at the shell script, and then I can save - it should I choose to. - - - - -{ handle-as-text } - /.*\.sh$ - - - - user.action is generally the best place to define - exceptions and additions to the default policies of - default.action. Some actions are safe to have their - default policies set here though. So let's set a default policy to have a - blank image as opposed to the checkerboard pattern for - ALL sites. / of course matches all URL - paths and patterns: - - - - -{ +set-image-blocker{blank} } -/ # ALL sites - - - - - - - - - - - - - - -Filter Files - - - On-the-fly text substitutions need - to be defined in a filter file. Once defined, they - can then be invoked as an action. - - - - &my-app; supports three different filter actions: - filter to - rewrite the content that is send to the client, - client-header-filter - to rewrite headers that are send by the client, and - server-header-filter - to rewrite headers that are send by the server. - - - - &my-app; also supports two tagger actions: - client-header-tagger - and - server-header-tagger. - Taggers and filters use the same syntax in the filter files, the difference - is that taggers don't modify the text they are filtering, but use a rewritten - version of the filtered text as tag. The tags can then be used to change the - applying actions through sections with tag-patterns. - - - - - Multiple filter files can be defined through the filterfile config directive. The filters - as supplied by the developers are located in - default.filter. It is recommended that any locally - defined or modified filters go in a separately defined file such as - user.filter. - - - - Common tasks for content filters are to eliminate common annoyances in - HTML and JavaScript, such as pop-up windows, - exit consoles, crippled windows without navigation tools, the - infamous <BLINK> tag etc, to suppress images with certain - width and height attributes (standard banner sizes or web-bugs), - or just to have fun. - - - - Enabled content filters are applied to any content whose - Content Type header is recognised as a sign - of text-based content, with the exception of text/plain. - Use the force-text-mode action - to also filter other content. - - - - Substitutions are made at the source level, so if you want to roll - your own filters, you should first be familiar with HTML syntax, - and, of course, regular expressions. - - - - Just like the actions files, the - filter file is organized in sections, which are called filters - here. Each filter consists of a heading line, that starts with one of the - keywords FILTER:, - CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: - followed by the filter's name, and a short (one line) - description of what it does. Below that line - come the jobs, i.e. lines that define the actual - text substitutions. By convention, the name of a filter - should describe what the filter eliminates. The - comment is used in the web-based - user interface. - - - - Once a filter called name has been defined - in the filter file, it can be invoked by using an action of the form - +filter{name} - in any actions file. - - - - Filter definitions start with a header line that contains the filter - type, the filter name and the filter description. - A content filter header line for a filter called foo could look - like this: - - - - FILTER: foo Replace all "foo" with "bar" - - - - Below that line, and up to the next header line, come the jobs that - define what text replacements the filter executes. They are specified - in a syntax that imitates Perl's - s/// operator. If you are familiar with Perl, you - will find this to be quite intuitive, and may want to look at the - PCRS documentation for the subtle differences to Perl behaviour. Most - notably, the non-standard option letter U is supported, - which turns the default to ungreedy matching. - - - - If you are new to - Regular - Expressions, you might want to take a look at - the Appendix on regular expressions, and - see the Perl - manual for - the - s/// operator's syntax and Perl-style regular - expressions in general. - The below examples might also help to get you started. - - - - - -Filter File Tutorial - - Now, let's complete our foo content filter. We have already defined - the heading, but the jobs are still missing. Since all it does is to replace - foo with bar, there is only one (trivial) job - needed: - - - - s/foo/bar/ - - - - But wait! Didn't the comment say that all occurrences - of foo should be replaced? Our current job will only take - care of the first foo on each page. For global substitution, - we'll need to add the g option: - - - - s/foo/bar/g - - - - Our complete filter now looks like this: - - - FILTER: foo Replace all "foo" with "bar" -s/foo/bar/g - - - - Let's look at some real filters for more interesting examples. Here you see - a filter that protects against some common annoyances that arise from JavaScript - abuse. Let's look at its jobs one after the other: - - - - - -FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse - -# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm -# -s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg - - - - Following the header line and a comment, you see the job. Note that it uses - | as the delimiter instead of /, because - the pattern contains a forward slash, which would otherwise have to be escaped - by a backslash (\). - - - - Now, let's examine the pattern: it starts with the text <script.* - enclosed in parentheses. Since the dot matches any character, and * - means: Match an arbitrary number of the element left of myself, this - matches <script, followed by any text, i.e. - it matches the whole page, from the start of the first <script> tag. - - - - That's more than we want, but the pattern continues: document\.referrer - matches only the exact string document.referrer. The dot needed to - be escaped, i.e. preceded by a backslash, to take away its - special meaning as a joker, and make it just a regular dot. So far, the meaning is: - Match from the start of the first <script> tag in a the page, up to, and including, - the text document.referrer, if both are present - in the page (and appear in that order). - - - - But there's still more pattern to go. The next element, again enclosed in parentheses, - is .*</script>. You already know what .* - means, so the whole pattern translates to: Match from the start of the first <script> - tag in a page to the end of the last <script> tag, provided that the text - document.referrer appears somewhere in between. - - - - This is still not the whole story, since we have ignored the options and the parentheses: - The portions of the page matched by sub-patterns that are enclosed in parentheses, will be - remembered and be available through the variables $1, $2, ... in - the substitute. The U option switches to ungreedy matching, which means - that the first .* in the pattern will only eat up all - text in between <script and the first occurrence - of document.referrer, and that the second .* will - only span the text up to the first </script> - tag. Furthermore, the s option says that the match may span - multiple lines in the page, and the g option again means that the - substitution is global. - - - - So, to summarize, the pattern means: Match all scripts that contain the text - document.referrer. Remember the parts of the script from - (and including) the start tag up to (and excluding) the string - document.referrer as $1, and the part following - that string, up to and including the closing tag, as $2. - - - - Now the pattern is deciphered, but wasn't this about substituting things? So - lets look at the substitute: $1"Not Your Business!"$2 is - easy to read: The text remembered as $1, followed by - "Not Your Business!" (including - the quotation marks!), followed by the text remembered as $2. - This produces an exact copy of the original string, with the middle part - (the document.referrer) replaced by "Not Your - Business!". - - - - The whole job now reads: Replace document.referrer by - "Not Your Business!" wherever it appears inside a - <script> tag. Note that this job won't break JavaScript syntax, - since both the original and the replacement are syntactically valid - string objects. The script just won't have access to the referrer - information anymore. - - - - We'll show you two other jobs from the JavaScript taming department, but - this time only point out the constructs of special interest: - - - - -# The status bar is for displaying link targets, not pointless blahblah -# -s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig - - - - \s stands for whitespace characters (space, tab, newline, - carriage return, form feed), so that \s* means: zero - or more whitespace. The ? in .*? - makes this matching of arbitrary text ungreedy. (Note that the U - option is not set). The ['"] construct means: a single - or a double quote. Finally, \1 is - a back-reference to the first parenthesis just like $1 above, - with the difference that in the pattern, a backslash indicates - a back-reference, whereas in the substitute, it's the dollar. - - - - So what does this job do? It replaces assignments of single- or double-quoted - strings to the window.status object with a dummy assignment - (using a variable name that is hopefully odd enough not to conflict with - real variables in scripts). Thus, it catches many cases where e.g. pointless - descriptions are displayed in the status bar instead of the link target when - you move your mouse over links. - - - - -# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html -# -s/(<body [^>]*)onunload(.*>)/$1never$2/iU - - - - Including the - OnUnload - event binding in the HTML DOM was a CRIME. - When I close a browser window, I want it to close and die. Basta. - This job replaces the onunload attribute in - <body> tags with the dummy word never. - Note that the i option makes the pattern matching - case-insensitive. Also note that ungreedy matching alone doesn't always guarantee - a minimal match: In the first parenthesis, we had to use [^>]* - instead of .* to prevent the match from exceeding the - <body> tag if it doesn't contain OnUnload, but the page's - content does. - - - - The last example is from the fun department: - - - - -FILTER: fun Fun text replacements - -# Spice the daily news: -# -s/microsoft(?!\.com)/MicroSuck/ig - - - - Note the (?!\.com) part (a so-called negative lookahead) - in the job's pattern, which means: Don't match, if the string - .com appears directly following microsoft - in the page. This prevents links to microsoft.com from being trashed, while - still replacing the word everywhere else. - - - - -# Buzzword Bingo (example for extended regex syntax) -# -s* industry[ -]leading \ -| cutting[ -]edge \ -| customer[ -]focused \ -| market[ -]driven \ -| award[ -]winning # Comments are OK, too! \ -| high[ -]performance \ -| solutions[ -]based \ -| unmatched \ -| unparalleled \ -| unrivalled \ -*<font color="red"><b>BINGO!</b></font> \ -*igx - - - - The x option in this job turns on extended syntax, and allows for - e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting. - - - - You get the idea? - - - - - -The Pre-defined Filters - - - - -The distribution default.filter file contains a selection of -pre-defined filters for your convenience: - - - - - js-annoyances - - - The purpose of this filter is to get rid of particularly annoying JavaScript abuse. - To that end, it - - - - replaces JavaScript references to the browser's referrer information - with the string "Not Your Business!". This compliments the hide-referrer action on the content level. - - - - - removes the bindings to the DOM's - unload - event which we feel has no right to exist and is responsible for most exit consoles, i.e. - nasty windows that pop up when you close another one. - - - - - removes code that causes new windows to be opened with undesired properties, such as being - full-screen, non-resizeable, without location, status or menu bar etc. - - - - - - Use with caution. This is an aggressive filter, and can break sites that - rely heavily on JavaScript. - - - - - - js-events - - - This is a very radical measure. It removes virtually all JavaScript event bindings, which - means that scripts can not react to user actions such as mouse movements or clicks, window - resizing etc, anymore. Use with caution! - - - We strongly discourage using this filter as a default since it breaks - many legitimate scripts. It is meant for use only on extra-nasty sites (should you really - need to go there). - - - - - - html-annoyances - - - This filter will undo many common instances of HTML based abuse. - - - The BLINK and MARQUEE tags - are neutralized (yeah baby!), and browser windows will be created as - resizeable (as of course they should be!), and will have location, - scroll and menu bars -- even if specified otherwise. - - - - - - content-cookies - - - Most cookies are set in the HTTP dialog, where they can be intercepted - by the - crunch-incoming-cookies - and crunch-outgoing-cookies - actions. But web sites increasingly make use of HTML meta tags and JavaScript - to sneak cookies to the browser on the content level. - - - This filter disables most HTML and JavaScript code that reads or sets - cookies. It cannot detect all clever uses of these types of code, so it - should not be relied on as an absolute fix. Use it wherever you would also - use the cookie crunch actions. - - - - - - refresh tags - - - Disable any refresh tags if the interval is greater than nine seconds (so - that redirections done via refresh tags are not destroyed). This is useful - for dial-on-demand setups, or for those who find this HTML feature - annoying. - - - - - - unsolicited-popups - - - This filter attempts to prevent only unsolicited pop-up - windows from opening, yet still allow pop-up windows that the user - has explicitly chosen to open. It was added in version 3.0.1, - as an improvement over earlier such filters. - - - Technical note: The filter works by redefining the window.open JavaScript - function to a dummy function, PrivoxyWindowOpen(), - during the loading and rendering phase of each HTML page access, and - restoring the function afterward. - - - This is recommended only for browsers that cannot perform this function - reliably themselves. And be aware that some sites require such windows - in order to function normally. Use with caution. - - - - - - all-popups - - - Attempt to prevent all pop-up windows from opening. - Note this should be used with even more discretion than the above, since - it is more likely to break some sites that require pop-ups for normal - usage. Use with caution. - - - - - - img-reorder - - - This is a helper filter that has no value if used alone. It makes the - banners-by-size and banners-by-link - (see below) filters more effective and should be enabled together with them. - - - - - - banners-by-size - - - This filter removes image tags purely based on what size they are. Fortunately - for us, many ads and banner images tend to conform to certain standardized - sizes, which makes this filter quite effective for ad stripping purposes. - - - Occasionally this filter will cause false positives on images that are not ads, - but just happen to be of one of the standard banner sizes. - - - Recommended only for those who require extreme ad blocking. The default - block rules should catch 95+% of all ads without this filter enabled. - - - - - - banners-by-link - - - This is an experimental filter that attempts to kill any banners if - their URLs seem to point to known or suspected click trackers. It is currently - not of much value and is not recommended for use by default. - - - - - - webbugs - - - Webbugs are small, invisible images (technically 1X1 GIF images), that - are used to track users across websites, and collect information on them. - As an HTML page is loaded by the browser, an embedded image tag causes the - browser to contact a third-party site, disclosing the tracking information - through the requested URL and/or cookies for that third-party domain, without - the user ever becoming aware of the interaction with the third-party site. - HTML-ized spam also uses a similar technique to verify email addresses. - - - This filter removes the HTML code that loads such webbugs. - - - - - - tiny-textforms - - - A rather special-purpose filter that can be used to enlarge textareas (those - multi-line text boxes in web forms) and turn off hard word wrap in them. - It was written for the sourceforge.net tracker system where such boxes are - a nuisance, but it can be handy on other sites, too. - - - It is not recommended to use this filter as a default. - - - - - - jumping-windows - - - Many consider windows that move, or resize themselves to be abusive. This filter - neutralizes the related JavaScript code. Note that some sites might not display - or behave as intended when using this filter. Use with caution. - - - - - - frameset-borders - - - Some web designers seem to assume that everyone in the world will view their - web sites using the same browser brand and version, screen resolution etc, - because only that assumption could explain why they'd use static frame sizes, - yet prevent their frames from being resized by the user, should they be too - small to show their whole content. - - - This filter removes the related HTML code. It should only be applied to sites - which need it. - - - - - - demoronizer - - - Many Microsoft products that generate HTML use non-standard extensions (read: - violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those - HTML documents to display with errors on standard-compliant platforms. - - - This filter translates the MS-only characters into Latin-1 equivalents. - It is not necessary when using MS products, and will cause corruption of - all documents that use 8-bit character sets other than Latin-1. It's mostly - worthwhile for Europeans on non-MS platforms, if weird garbage characters - sometimes appear on some pages, or user agents that don't correct for this on - the fly. - - - - - - - shockwave-flash - - - A filter for shockwave haters. As the name suggests, this filter strips code - out of web pages that is used to embed shockwave flash objects. - - - - - - - - quicktime-kioskmode - - - Change HTML code that embeds Quicktime objects so that kioskmode, which - prevents saving, is disabled. - - - - - - fun - - - Text replacements for subversive browsing fun. Make fun of your favorite - Monopolist or play buzzword bingo. - - - - - - crude-parental - - - A demonstration-only filter that shows how Privoxy - can be used to delete web content on a keyword basis. - - - - - - ie-exploits - - - An experimental collection of text replacements to disable malicious HTML and JavaScript - code that exploits known security holes in Internet Explorer. - - - Presently, it only protects against Nimda and a cross-site scripting bug, and - would need active maintenance to provide more substantial protection. - - - - - - site-specifics - - - Some web sites have very specific problems, the cure for which doesn't apply - anywhere else, or could even cause damage on other sites. - - - This is a collection of such site-specific cures which should only be applied - to the sites they were intended for, which is what the supplied - default.action file does. Users shouldn't need to change - anything regarding this filter. - - - - - - google - - - A CSS based block for Google text ads. Also removes a width limitation - and the toolbar advertisement. - - - - - - yahoo - - - Another CSS based block, this time for Yahoo text ads. And removes - a width limitation as well. - - - - - - msn - - - Another CSS based block, this time for MSN text ads. And removes - tracking URLs, as well as a width limitation. - - - - - - blogspot - - - Cleans up some Blogspot blogs. Read the fine print before using this one! - - - This filter also intentionally removes some navigation stuff and sets the - page width to 100%. As a result, some rounded corners would - appear to early or not at all and as fixing this would require a browser - that understands background-size (CSS3), they are removed instead. - - - - - - xml-to-html - - - Server-header filter to change the Content-Type from xml to html. - - - - - - html-to-xml - - - Server-header filter to change the Content-Type from html to xml. - - - - - - no-ping - - - Removes the non-standard ping attribute from - anchor and area HTML tags. - - - - - - hide-tor-exit-notation - - - Client-header filter to remove the Tor exit node notation - found in Host and Referer headers. - - - If &my-app; and Tor are chained and &my-app; - is configured to use socks4a, one can use http://www.example.org.foobar.exit/ - to access the host www.example.org through the - Tor exit node foobar. - - - As the HTTP client isn't aware of this notation, it treats the - whole string www.example.org.foobar.exit as host and uses it - for the Host and Referer headers. From the - server's point of view the resulting headers are invalid and can cause problems. - - - An invalid Referer header can trigger hot-linking - protections, an invalid Host header will make it impossible for - the server to find the right vhost (several domains hosted on the same IP address). - - - This client-header filter removes the foo.exit part in those headers - to prevent the mentioned problems. Note that it only modifies - the HTTP headers, it doesn't make it impossible for the server - to detect your Tor exit node based on the IP address - the request is coming from. - - - - - - - - - - - - - - - - - -Privoxy's Template Files - - All Privoxy built-in pages, i.e. error pages such as the - 404 - No Such Domain - error page, the BLOCKED - page - and all pages of its web-based - user interface, are generated from templates. - (Privoxy must be running for the above links to work as - intended.) - - - - These templates are stored in a subdirectory of the configuration - directory called templates. On Unixish platforms, - this is typically - /etc/privoxy/templates/. - - - - The templates are basically normal HTML files, but with place-holders (called symbols - or exports), which Privoxy fills at run time. It - is possible to edit the templates with a normal text editor, should you want - to customize them. (Not recommended for the casual - user). Should you create your own custom templates, you should use - the config setting templdir - to specify an alternate location, so your templates do not get overwritten - during upgrades. - - - Note that just like in configuration files, lines starting - with # are ignored when the templates are filled in. - - - - The place-holders are of the form @name@, and you will - find a list of available symbols, which vary from template to template, - in the comments at the start of each file. Note that these comments are not - always accurate, and that it's probably best to look at the existing HTML - code to find out which symbols are supported and what they are filled in with. - - - - A special application of this substitution mechanism is to make whole - blocks of HTML code disappear when a specific symbol is set. We use this - for many purposes, one of them being to include the beta warning in all - our user interface (CGI) pages when Privoxy - is in an alpha or beta development stage: - - - - -<!-- @if-unstable-start --> - - ... beta warning HTML code goes here ... - -<!-- if-unstable-end@ --> - - - - If the "unstable" symbol is set, everything in between and including - @if-unstable-start and if-unstable-end@ - will disappear, leaving nothing but an empty comment: - - - - <!-- --> - - - - There's also an if-then-else construct and an #include - mechanism, but you'll sure find out if you are inclined to edit the - templates ;-) - - - - All templates refer to a style located at - http://config.privoxy.org/send-stylesheet. - This is, of course, locally served by Privoxy - and the source for it can be found and edited in the - cgi-style.css template. - - - - - - - - - - -Contacting the Developers, Bug Reporting and Feature -Requests - - - &contacting; - - - - - - - - -Privoxy Copyright, License and History - - - ©right; - - - -License - - &license; - - - - - - - -History - - &history; - - - -Authors - - &p-authors; - - - - - - - - - -See Also - - &seealso; - - - - - - -Appendix - - - - -Regular Expressions - - Privoxy uses Perl-style regular - expressions in its actions - files and filter file, - through the PCRE and - - PCRS libraries. - - - - If you are reading this, you probably don't understand what regular - expressions are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) - - - - Regular expressions provide a language to describe patterns that can be - run against strings of characters (letter, numbers, etc), to see if they - match the string or not. The patterns are themselves (sometimes complex) - strings of literal characters, combined with wild-cards, and other special - characters, called meta-characters. The meta-characters have - special meanings and are used to build complex patterns to be matched against. - Perl Compatible Regular Expressions are an especially convenient - dialect of the regular expression language. - - - - To make a simple analogy, we do something similar when we use wild-card - characters when listing files with the dir command in DOS. - *.* matches all filenames. The special - character here is the asterisk which matches any and all characters. We can be - more specific and use ? to match just individual - characters. So dir file?.text would match - file1.txt, file2.txt, etc. We are pattern - matching, using a similar technique to regular expressions! - - - - Regular expressions do essentially the same thing, but are much, much more - powerful. There are many more special characters and ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples: - - - - - . - Matches any single character, e.g. a, - A, 4, :, or @. - - - - - - ? - The preceding character or expression is matched ZERO or ONE - times. Either/or. - - - - - - + - The preceding character or expression is matched ONE or MORE - times. - - - - - - * - The preceding character or expression is matched ZERO or MORE - times. - - - - - - \ - The escape character denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g. .) needs to be taken literally and - not as a special meta-character. Example: example\.com, makes - sure the period is recognized only as a period (and not expanded to its - meta-character meaning of any single character). - - - - - - [ ] - Characters enclosed in brackets will be matched if - any of the enclosed characters are encountered. For instance, [0-9] - matches any numeric digit (zero through nine). As an example, we can combine - this with + to match any digit one of more times: [0-9]+. - - - - - - ( ) - parentheses are used to group a sub-expression, - or multiple sub-expressions. - - - - - - | - The bar character works like an - or conditional statement. A match is successful if the - sub-expression on either side of | matches. As an example: - /(this|that) example/ uses grouping and the bar character - and would match either this example or that - example, and nothing else. - - - - - These are just some of the ones you are likely to use when matching URLs with - Privoxy, and is a long way from a definitive - list. This is enough to get us started with a few simple examples which may - be more illuminating: - - - - /.*/banners/.* - A simple example - that uses the common combination of . and * to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (.*) another literal forward slash, the string - banners, another forward slash, and lastly another - .*. We are building - a directory path here. This will match any file with the path that has a - directory named banners in it. The .* matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: - /eye/hate/spammers/banners/annoy_me_please.gif, or just - /banners/annoying.html, or almost an infinite number of other - possible combinations, just so it has banners in the path - somewhere. - - - - And now something a little more complex: - - - - /.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again (/), so we are - building another expression that is a file path statement. We have another - .*, so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must - match our pattern is adv, together with - the forward slashes. What comes after the adv string is the - interesting part. - - - - Remember the ? means the preceding expression (either a - literal character or anything grouped with (...) in this case) - can exist or not, since this means either zero or one match. So - ((er)?ts?|ertis(ing|ements?)) is optional, as are the - individual sub-expressions: (er), - (ing|ements?), and the s. The | - means or. We have two of those. For instance, - (ing|ements?), can expand to match either ing - OR ements?. What is being done here, is an - attempt at matching as many variations of advertisement, and - similar, as possible. So this would expand to match just adv, - or advert, or adverts, or - advertising, or advertisement, or - advertisements. You get the idea. But it would not match - advertizements (with a z). We could fix that by - changing our regular expression to: - /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match - either spelling. - - + + + +Summary - /.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square brackets - [ ] can be matched. This is using 0-9 as a - shorthand expression to mean any digit one through nine. It is the same as - saying 0123456789. So any digit matches. The + - means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping: (gif|jpe?g). - This includes a |, so this needs to match the expression on - either side of that bar character also. A simple gif on one side, and the other - side will in turn match either jpeg or jpg, - since the ? means the letter e is optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string advert, then one or more digits, and a . - (which is now a literal, and not a special character, since it is escaped - with \), and lastly either gif, or - jpeg, or jpg. Some possible matches would - include: //advert1.jpg, - /nasty/ads/advert1234.gif, - /banners/from/hell/advert99.jpg. It would not match - advert1.gif (no leading slash), or - /adverts232.jpg (the expression does not include an - s), or /advert1.jsp (jsp is not - in the expression anywhere). + Note that many of these actions have the potential to cause a page to + misbehave, possibly even not to display at all. There are many ways + a site designer may choose to design his site, and what HTTP header + content, and other criteria, he may depend on. There is no way to have hard + and fast rules for all sites. See the Appendix for a brief example on troubleshooting + actions. + + + + +Aliases - We are barely scratching the surface of regular expressions here so that you - can understand the default Privoxy - configuration files, and maybe use this knowledge to customize your own - installation. There is much, much more that can be done with regular - expressions. Now that you know enough to get started, you can learn more on - your own :/ + Custom actions, known to Privoxy + as aliases, can be defined by combining other actions. + These can in turn be invoked just like the built-in actions. + Currently, an alias name can contain any character except space, tab, + =, + { and }, but we strongly + recommend that you only use a to z, + 0 to 9, +, and -. + Alias names are not case sensitive, and are not required to start with a + + or - sign, since they are merely textually + expanded. - - More reading on Perl Compatible Regular expressions: - http://perldoc.perl.org/perlre.html + Aliases can be used throughout the actions file, but they must be + defined in a special section at the top of the file! + And there can only be one such section per actions file. Each actions file may + have its own alias section, and the aliases defined in it are only visible + within that file. + + + There are two main reasons to use aliases: One is to save typing for frequently + used combinations of actions, the other one is a gain in flexibility: If you + decide once how you want to handle shops by defining an alias called + shop, you can later change your policy on shops in + one place, and your changes will take effect everywhere + in the actions file where the shop alias is used. Calling aliases + by their purpose also makes your actions files more readable. + + + Currently, there is one big drawback to using aliases, though: + Privoxy's built-in web-based action file + editor honors aliases when reading the actions files, but it expands + them before writing. So the effects of your aliases are of course preserved, + but the aliases themselves are lost when you edit sections that use aliases + with it. - For information on regular expression based substitutions and their applications - in filters, please see the filter file tutorial - in this manual. + Now let's define some aliases... - - + + + # Useful custom aliases we can use later. + # + # Note the (required!) section header line and that this section + # must be at the top of the actions file! + # + {{alias}} + # These aliases just save typing later: + # (Note that some already use other aliases!) + # + +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies + -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + +block-as-image = +block{Blocked image.} +handle-as-image + allow-all-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} - - -Privoxy's Internal Pages + # These aliases define combinations of actions + # that are useful for certain types of sites: + # + fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer -prevent-compression - - Since Privoxy proxies each requested - web page, it is easy for Privoxy to - trap certain special URLs. In this way, we can talk directly to - Privoxy, and see how it is - configured, see how our rules are being applied, change these - rules and other configuration options, and even turn - Privoxy's filtering off, all with - a web browser. + shop = -crunch-all-cookies -filter{all-popups} + # Short names for other aliases, for really lazy people ;-) + # + c0 = +crunch-all-cookies + c1 = -crunch-all-cookies - The URLs listed below are the special ones that allow direct access - to Privoxy. Of course, - Privoxy must be running to access these. If - not, you will get a friendly error message. Internet access is not - necessary either. + ...and put them to use. These sections would appear in the lower part of an + actions file and define exceptions to the default actions (as specified further + up for the / pattern): - - - - - Privoxy main page: - -
- - http://config.privoxy.org/ - -
- - There is a shortcut: http://p.p/ (But it - doesn't provide a fall-back to a real page, in case the request is not - sent through Privoxy) - -
- - - - Show information about the current configuration, including viewing and - editing of actions files: - -
- - http://config.privoxy.org/show-status - -
-
- - - - Show the source code version numbers: - -
- - http://config.privoxy.org/show-version - -
-
- - - - Show the browser's request headers: - -
- - http://config.privoxy.org/show-request - -
-
+ + # These sites are either very complex or very keen on + # user data and require minimal interference to work: + # + {fragile} + .office.microsoft.com + .windowsupdate.microsoft.com + # Gmail is really mail.google.com, not gmail.com + mail.google.com - - - Show which actions apply to a URL and why: - -
- - http://config.privoxy.org/show-url-info - -
-
+ # Shopping sites: + # Allow cookies (for setting and retrieving your customer data) + # + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + mybank.example.com - - - Toggle Privoxy on or off. This feature can be turned off/on in the main - config file. When toggled off, Privoxy - continues to run, but only as a pass-through proxy, with no actions taking - place: - -
- - http://config.privoxy.org/toggle - -
- - Short cuts. Turn off, then on: - -
- - http://config.privoxy.org/toggle?set=disable - -
-
- - http://config.privoxy.org/toggle?set=enable - -
-
+ # These shops require pop-ups: + # + {-filter{all-popups} -filter{unsolicited-popups}} + .dabs.com + .overclockers.co.uk
+
-
+ + Aliases like shop and fragile are typically used for + problem sites that require more than one action to be disabled + in order to function properly. + +
+ + + +Actions Files Tutorial + + The above chapters have shown which actions files + there are and how they are organized, how actions are specified and applied + to URLs, how patterns work, and how to + define and use aliases. Now, let's look at an + example match-all.action, default.action + and user.action file and see how all these pieces come together: + +match-all.action - These may be bookmarked for quick reference. See next. - + Remember all actions are disabled when matching starts, + so we have to explicitly enable the ones we want. - -Bookmarklets - Below are some bookmarklets to allow you to easily access a - mini version of some of Privoxy's - special pages. They are designed for MS Internet Explorer, but should work - equally well in Netscape, Mozilla, and other browsers which support - JavaScript. They are designed to run directly from your bookmarks - not by - clicking the links below (although that should work for testing). + While the match-all.action file only contains a + single section, it is probably the most important one. It has only one + pattern, /, but this pattern + matches all URLs. Therefore, the set of + actions used in this default section will + be applied to all requests as a start. It can be partly or + wholly overridden by other actions files like default.action + and user.action, but it will still be largely responsible + for your overall browsing experience. + - To save them, right-click the link and choose Add to Favorites - (IE) or Add Bookmark (Netscape). You will get a warning that - the bookmark may not be safe - just click OK. Then you can run the - Bookmarklet directly from your favorites/bookmarks. For even faster access, - you can put them on the Links bar (IE) or the Personal - Toolbar (Netscape), and run them with a single click. + Again, at the start of matching, all actions are disabled, so there is + no need to disable any actions here. (Remember: a + + preceding the action name enables the action, a - disables!). + Also note how this long line has been made more readable by splitting it into + multiple lines with line continuation. - - - - - Privoxy - Enable - - + +{ \ + +change-x-forwarded-for{block} \ + +hide-from-header{block} \ + +set-image-blocker{pattern} \ +} +/ # Match all URLs + + - - - Privoxy - Disable - - + + The default behavior is now set. + + - - - Privoxy - Toggle Privoxy (Toggles between enabled and disabled) - - + +default.action - - - Privoxy- View Status - - - - - - Privoxy - Why? - - - + + If you aren't a developer, there's no need for you to edit the + default.action file. It is maintained by + the &my-app; developers and if you disagree with some of the + sections, you should overrule them in your user.action. - Credit: The site which gave us the general idea for these bookmarklets is - www.bookmarklets.com. They - have more information about bookmarklets. + Understanding the default.action file can + help you with your user.action, though. + + The first section in this file is a special section for internal use + that prevents older &my-app; versions from reading the file: + - - - - + + +########################################################################## +# Settings -- Don't change! For internal Privoxy use ONLY. +########################################################################## +{{settings}} +for-privoxy-version=3.0.11 + - - -Chain of Events - Let's take a quick look at how some of Privoxy's - core features are triggered, and the ensuing sequence of events when a web - page is requested by your browser: + After that comes the (optional) alias section. We'll use the example + section from the above chapter on aliases, + that also explains why and how aliases are used: - - - - First, your web browser requests a web page. The browser knows to send - the request to Privoxy, which will in turn, - relay the request to the remote web server after passing the following - tests: - - - - - Privoxy traps any request for its own internal CGI - pages (e.g http://p.p/) and sends the CGI page back to the browser. - - - - - Next, Privoxy checks to see if the URL - matches any +block patterns. If - so, the URL is then blocked, and the remote web server will not be contacted. - +handle-as-image - and - +handle-as-empty-document - are then checked, and if there is no match, an - HTML BLOCKED page is sent back to the browser. Otherwise, if - it does match, an image is returned for the former, and an empty text - document for the latter. The type of image would depend on the setting of - +set-image-blocker - (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). - - - - - Untrusted URLs are blocked. If URLs are being added to the - trust file, then that is done. - - - - - If the URL pattern matches the +fast-redirects action, - it is then processed. Unwanted parts of the requested URL are stripped. - - - - - Now the rest of the client browser's request headers are processed. If any - of these match any of the relevant actions (e.g. +hide-user-agent, - etc.), headers are suppressed or forged as determined by these actions and - their parameters. - - - - - Now the web server starts sending its response back (i.e. typically a web - page). - - - - - First, the server headers are read and processed to determine, among other - things, the MIME type (document type) and encoding. The headers are then - filtered as determined by the - +crunch-incoming-cookies, - +session-cookies-only, - and +downgrade-http-version - actions. - - - - - If any +filter action - or +deanimate-gifs - action applies (and the document type fits the action), the rest of the page is - read into memory (up to a configurable limit). Then the filter rules (from - default.filter and any other filter files) are - processed against the buffered content. Filters are applied in the order - they are specified in one of the filter files. Animated GIFs, if present, - are reduced to either the first or last frame, depending on the action - setting.The entire page, which is now filtered, is then sent by - Privoxy back to your browser. - - - If neither a +filter action - or +deanimate-gifs - matches, then Privoxy passes the raw data through - to the client browser as it becomes available. - - - - - As the browser receives the now (possibly filtered) page content, it - reads and then requests any URLs that may be embedded within the page - source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. - frames), sounds, etc. For each of these objects, the browser issues a - separate request (this is easily viewable in Privoxy's - logs). And each such request is in turn processed just as above. Note that a - complex web page will have many, many such embedded URLs. If these - secondary requests are to a different server, then quite possibly a very - differing set of actions is triggered. - - + +########################################################################## +# Aliases +########################################################################## +{{alias}} - + # These aliases just save typing later: + # (Note that some already use other aliases!) + # + +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies + -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + +block-as-image = +block{Blocked image.} +handle-as-image + mercy-for-cookies = -crunch-all-cookies -session-cookies-only -filter{content-cookies} + + # These aliases define combinations of actions + # that are useful for certain types of sites: + # + fragile = -block -filter -crunch-all-cookies -fast-redirects -hide-referrer + shop = -crunch-all-cookies -filter{all-popups} + - NOTE: This is somewhat of a simplistic overview of what happens with each URL - request. For the sake of brevity and simplicity, we have focused on - Privoxy's core features only. + The first of our specialized sections is concerned with fragile + sites, i.e. sites that require minimum interference, because they are either + very complex or very keen on tracking you (and have mechanisms in place that + make them unusable for people who avoid being tracked). We will simply use + our pre-defined fragile alias instead of stating the list + of actions explicitly: - - - - - -Troubleshooting: Anatomy of an Action - - The way Privoxy applies - actions and filters - to any given URL can be complex, and not always so - easy to understand what is happening. And sometimes we need to be able to - see just what Privoxy is - doing. Especially, if something Privoxy is doing - is causing us a problem inadvertently. It can be a little daunting to look at - the actions and filters files themselves, since they tend to be filled with - regular expressions whose consequences are not - always so obvious. + +########################################################################## +# Exceptions for sites that'll break under the default action set: +########################################################################## + +# "Fragile" Use a minimum set of actions for these sites (see alias above): +# +{ fragile } +.office.microsoft.com # surprise, surprise! +.windowsupdate.microsoft.com +mail.google.com - One quick test to see if Privoxy is causing a problem - or not, is to disable it temporarily. This should be the first troubleshooting - step. See the Bookmarklets section on a quick - and easy way to do this (be sure to flush caches afterward!). Looking at the - logs is a good idea too. (Note that both the toggle feature and logging are - enabled via config file settings, and may need to be - turned on.) + Shopping sites are not as fragile, but they typically + require cookies to log in, and pop-up windows for shopping + carts or item details. Again, we'll use a pre-defined alias: + - Another easy troubleshooting step to try is if you have done any - customization of your installation, revert back to the installed - defaults and see if that helps. There are times the developers get complaints - about one thing or another, and the problem is more related to a customized - configuration issue. + +# Shopping sites: +# +{ shop } +.quietpc.com +.worldpay.com # for quietpc.com +.jungle.com +.scan.co.uk - Privoxy also provides the - http://config.privoxy.org/show-url-info - page that can show us very specifically how actions - are being applied to any given URL. This is a big help for troubleshooting. + The fast-redirects + action, which may have been enabled in match-all.action, + breaks some sites. So disable it for popular sites where we know it misbehaves: - First, enter one URL (or partial URL) at the prompt, and then - Privoxy will tell us - how the current configuration will handle it. This will not - help with filtering effects (i.e. the +filter action) from - one of the filter files since this is handled very - differently and not so easy to trap! It also will not tell you about any other - URLs that may be embedded within the URL you are testing. For instance, images - such as ads are expressed as URLs within the raw page source of HTML pages. So - you will only get info for the actual URL that is pasted into the prompt area - -- not any sub-URLs. If you want to know about embedded URLs like ads, you - will have to dig those out of the HTML source. Use your browser's View - Page Source option for this. Or right click on the ad, and grab the - URL. + +{ -fast-redirects } +login.yahoo.com +edit.*.yahoo.com +.google.com +.altavista.com/.*(like|url|link):http +.altavista.com/trans.*urltext=http +.nytimes.com - Let's try an example, google.com, - and look at it one section at a time in a sample configuration (your real - configuration may vary): + It is important that Privoxy knows which + URLs belong to images, so that if they are to + be blocked, a substitute image can be sent, rather than an HTML page. + Contacting the remote site to find out is not an option, since it + would destroy the loading time advantage of banner blocking, and it + would feed the advertisers information about you. We can mark any + URL as an image with the handle-as-image action, + and marking all URLs that end in a known image file extension is a + good start: - Matches for http://www.google.com: - - In file: default.action [ View ] [ Edit ] - - {+change-x-forwarded-for{block} - +deanimate-gifs {last} - +fast-redirects {check-decoded-url} - +filter {refresh-tags} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - +hide-from-header {block} - +hide-referrer {forge} - +session-cookies-only - +set-image-blocker {pattern} -/ +########################################################################## +# Images: +########################################################################## - { -session-cookies-only } - .google.com +# Define which file types will be treated as images, in case they get +# blocked further down this file: +# +{ +handle-as-image } +/.*\.(gif|jpe?g|png|bmp|ico)$ + - { -fast-redirects } - .google.com + + And then there are known banner sources. They often use scripts to + generate the banners, so it won't be visible from the URL that the + request is for an image. Hence we block them and + mark them as images in one go, with the help of our + +block-as-image alias defined above. (We could of + course just as well use +block + +handle-as-image here.) + Remember that the type of the replacement image is chosen by the + set-image-blocker + action. Since all URLs have matched the default section with its + +set-image-blocker{pattern} + action before, it still applies and needn't be repeated: + -In file: user.action [ View ] [ Edit ] -(no matches in this file) - + + +# Known ad generators: +# +{ +block-as-image } +ar.atwola.com +.ad.doubleclick.net +.ad.*.doubleclick.net +.a.yimg.com/(?:(?!/i/).)*$ +.a[0-9].yimg.com/(?:(?!/i/).)*$ +bs*.gsanet.com +.qkimg.net - This is telling us how we have defined our - actions, and - which ones match for our test case, google.com. - Displayed is all the actions that are available to us. Remember, - the + sign denotes on. - - denotes off. So some are on here, but many - are off. Each example we try may provide a slightly different - end result, depending on our configuration directives. + One of the most important jobs of Privoxy + is to block banners. Many of these can be blocked + by the filter{banners-by-size} + action, which we enabled above, and which deletes the references to banner + images from the pages while they are loaded, so the browser doesn't request + them anymore, and hence they don't need to be blocked here. But this naturally + doesn't catch all banners, and some people choose not to use filters, so we + need a comprehensive list of patterns for banner URLs here, and apply the + block action to them. - The first listing - is for our default.action file. The large, multi-line - listing, is how the actions are set to match for all URLs, i.e. our default - settings. If you look at your actions file, this would be the - section just below the aliases section near the top. This - will apply to all URLs as signified by the single forward slash at the end - of the listing -- / . + First comes many generic patterns, which do most of the work, by + matching typical domain and path name components of banners. Then comes + a list of individual patterns for specific sites, which is omitted here + to keep the example short: - But we have defined additional actions that would be exceptions to these general - rules, and then we list specific URLs (or patterns) that these exceptions - would apply to. Last match wins. Just below this then are two explicit - matches for .google.com. The first is negating our previous - cookie setting, which was for +session-cookies-only - (i.e. not persistent). So we will allow persistent cookies for google, at - least that is how it is in this example. The second turns - off any +fast-redirects - action, allowing this to take place unmolested. Note that there is a leading - dot here -- .google.com. This will match any hosts and - sub-domains, in the google.com domain also, such as - www.google.com or mail.google.com. But it would not - match www.google.de! So, apparently, we have these two actions - defined as exceptions to the general rules at the top somewhere in the lower - part of our default.action file, and - google.com is referenced somewhere in these latter sections. + +########################################################################## +# Block these fine banners: +########################################################################## +{ +block{Banner ads.} } + +# Generic patterns: +# +ad*. +.*ads. +banner?. +count*. +/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) +/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ + +# Site-specific patterns (abbreviated): +# +.hitbox.com - Then, for our user.action file, we again have no hits. - So there is nothing google-specific that we might have added to our own, local - configuration. If there was, those actions would over-rule any actions from - previously processed files, such as default.action. - user.action typically has the last word. This is the - best place to put hard and fast exceptions, + It's quite remarkable how many advertisers actually call their banner + servers ads.company.com, or call the directory + in which the banners are stored simply banners. So the above + generic patterns are surprisingly effective. - - And finally we pull it all together in the bottom section and summarize how - Privoxy is applying all its actions - to google.com: - + But being very generic, they necessarily also catch URLs that we don't want + to block. The pattern .*ads. e.g. catches + nasty-ads.nasty-corp.com as intended, + but also downloads.sourcefroge.net or + adsl.some-provider.net. So here come some + well-known exceptions to the +block + section above. + + + Note that these are exceptions to exceptions from the default! Consider the URL + downloads.sourcefroge.net: Initially, all actions are deactivated, + so it wouldn't get blocked. Then comes the defaults section, which matches the + URL, but just deactivates the block + action once again. Then it matches .*ads., an exception to the + general non-blocking policy, and suddenly + +block applies. And now, it'll match + .*loads., where -block + applies, so (unless it matches again further down) it ends up + with no block action applying. +########################################################################## +# Save some innocent victims of the above generic block patterns: +########################################################################## - Final results: +# By domain: +# +{ -block } +adv[io]*. # (for advogato.org and advice.*) +adsl. # (has nothing to do with ads) +adobe. # (has nothing to do with ads either) +ad[ud]*. # (adult.* and add.*) +.edu # (universities don't host banners (yet!)) +.*loads. # (downloads, uploads etc) - -add-header - -block - +change-x-forwarded-for{block} - -client-header-filter{hide-tor-exit-notation} - -content-type-overwrite - -crunch-client-header - -crunch-if-none-match - -crunch-incoming-cookies - -crunch-outgoing-cookies - -crunch-server-header - +deanimate-gifs {last} - -downgrade-http-version - -fast-redirects - -filter {js-events} - -filter {content-cookies} - -filter {all-popups} - -filter {banners-by-link} - -filter {tiny-textforms} - -filter {frameset-borders} - -filter {demoronizer} - -filter {shockwave-flash} - -filter {quicktime-kioskmode} - -filter {fun} - -filter {crude-parental} - -filter {site-specifics} - -filter {js-annoyances} - -filter {html-annoyances} - +filter {refresh-tags} - -filter {unsolicited-popups} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - -filter {google} - -filter {yahoo} - -filter {msn} - -filter {blogspot} - -filter {no-ping} - -force-text-mode - -handle-as-empty-document - -handle-as-image - -hide-accept-language - -hide-content-disposition - +hide-from-header {block} - -hide-if-modified-since - +hide-referrer {forge} - -hide-user-agent - -limit-connect - -overwrite-last-modified - -prevent-compression - -redirect - -server-header-filter{xml-to-html} - -server-header-filter{html-to-xml} - -session-cookies-only - +set-image-blocker {pattern} +# By path: +# +/.*loads/ + +# Site-specific: +# +www.globalintersec.com/adv # (adv = advanced) +www.ugu.com/sui/ugu/adv - Notice the only difference here to the previous listing, is to - fast-redirects and session-cookies-only, - which are activated specifically for this site in our configuration, - and thus show in the Final Results. + Filtering source code can have nasty side effects, + so make an exception for our friends at sourceforge.net, + and all paths with cvs in them. Note that + -filter + disables all filters in one fell swoop! - Now another example, ad.doubleclick.net: + +# Don't filter code! +# +{ -filter } +/(.*/)?cvs +bugzilla. +developer. +wiki. +.sourceforge.net - + The actual default.action is of course much more + comprehensive, but we hope this example made clear how it works. + - { +block{Domains starts with "ad"} } - ad*. + - { +block{Domain contains "ad"} } - .ad. +user.action - { +block{Doubleclick banner server} +handle-as-image } - .[a-vx-z]*.doubleclick.net - + + So far we are painting with a broad brush by setting general policies, + which would be a reasonable starting point for many people. Now, + you might want to be more specific and have customized rules that + are more suitable to your personal habits and preferences. These would + be for narrowly defined situations like your ISP or your bank, and should + be placed in user.action, which is parsed after all other + actions files and hence has the last word, over-riding any previously + defined actions. user.action is also a + safe place for your personal settings, since + default.action is actively maintained by the + Privoxy developers and you'll probably want + to install updated versions from time to time. - We'll just show the interesting part here - the explicit matches. It is - matched three different times. Two +block{} sections, - and a +block{} +handle-as-image, - which is the expanded form of one of our aliases that had been defined as: - +block-as-image. (Aliases are defined in - the first section of the actions file and typically used to combine more - than one action.) + So let's look at a few examples of things that one might typically do in + user.action: + + + - Any one of these would have done the trick and blocked this as an unwanted - image. This is unnecessarily redundant since the last case effectively - would also cover the first. No point in taking chances with these guys - though ;-) Note that if you want an ad or obnoxious - URL to be invisible, it should be defined as ad.doubleclick.net - is done here -- as both a +block{} - and an - +handle-as-image. - The custom alias +block-as-image just - simplifies the process and make it more readable. + +# My user.action file. <fred@example.com> - One last example. Let's try http://www.example.net/adsl/HOWTO/. - This one is giving us problems. We are getting a blank page. Hmmm ... + As aliases are local to the actions + file that they are defined in, you can't use the ones from + default.action, unless you repeat them here: +# Aliases are local to the file they are defined in. +# (Re-)define aliases for this file: +# +{{alias}} +# +# These aliases just save typing later, and the alias names should +# be self explanatory. +# ++crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies +-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies + allow-all-cookies = -crunch-all-cookies -session-cookies-only + allow-popups = -filter{all-popups} ++block-as-image = +block{Blocked as image.} +handle-as-image +-block-as-image = -block - Matches for http://www.example.net/adsl/HOWTO/: +# These aliases define combinations of actions that are useful for +# certain types of sites: +# +fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer +shop = -crunch-all-cookies allow-popups - In file: default.action [ View ] [ Edit ] +# Allow ads for selected useful free sites: +# +allow-ads = -block -filter{banners-by-size} -filter{banners-by-link} - {-add-header - -block - +change-x-forwarded-for{block} - -client-header-filter{hide-tor-exit-notation} - -content-type-overwrite - -crunch-client-header - -crunch-if-none-match - -crunch-incoming-cookies - -crunch-outgoing-cookies - -crunch-server-header - +deanimate-gifs - -downgrade-http-version - +fast-redirects {check-decoded-url} - -filter {js-events} - -filter {content-cookies} - -filter {all-popups} - -filter {banners-by-link} - -filter {tiny-textforms} - -filter {frameset-borders} - -filter {demoronizer} - -filter {shockwave-flash} - -filter {quicktime-kioskmode} - -filter {fun} - -filter {crude-parental} - -filter {site-specifics} - -filter {js-annoyances} - -filter {html-annoyances} - +filter {refresh-tags} - -filter {unsolicited-popups} - +filter {img-reorder} - +filter {banners-by-size} - +filter {webbugs} - +filter {jumping-windows} - +filter {ie-exploits} - -filter {google} - -filter {yahoo} - -filter {msn} - -filter {blogspot} - -filter {no-ping} - -force-text-mode - -handle-as-empty-document - -handle-as-image - -hide-accept-language - -hide-content-disposition - +hide-from-header{block} - +hide-referer{forge} - -hide-user-agent - -overwrite-last-modified - +prevent-compression - -redirect - -server-header-filter{xml-to-html} - -server-header-filter{html-to-xml} - +session-cookies-only - +set-image-blocker{blank} } - / +# Alias for specific file types that are text, but might have conflicting +# MIME types. We want the browser to force these to be text documents. +handle-as-text = -filter +-content-type-overwrite{text/plain} +-force-text-mode -hide-content-disposition - { +block{Path contains "ads".} +handle-as-image } - /ads - - Ooops, the /adsl/ is matching /ads in our - configuration! But we did not want this at all! Now we see why we get the - blank page. It is actually triggering two different actions here, and - the effects are aggregated so that the URL is blocked, and &my-app; is told - to treat the block as if it were an image. But this is, of course, all wrong. - We could now add a new action below this (or better in our own - user.action file) that explicitly - un blocks ( - {-block}) paths with - adsl in them (remember, last match in the configuration - wins). There are various ways to handle such exceptions. Example: + Say you have accounts on some sites that you visit regularly, and + you don't want to have to log in manually each time. So you'd like + to allow persistent cookies for these sites. The + allow-all-cookies alias defined above does exactly + that, i.e. it disables crunching of cookies in any direction, and the + processing of cookies to make them only temporary. +{ allow-all-cookies } + sourceforge.net + .yahoo.com + .msdn.microsoft.com + .redhat.com + - { -block } - /adsl - + + Your bank is allergic to some filter, but you don't know which, so you disable them all: - Now the page displays ;-) - Remember to flush your browser's caches when making these kinds of changes to - your configuration to insure that you get a freshly delivered page! Or, try - using Shift+Reload. + +{ -filter } + .your-home-banking-site.com - But now what about a situation where we get no explicit matches like - we did with: + Some file types you may not want to filter for various reasons: +# Technical documentation is likely to contain strings that might +# erroneously get altered by the JavaScript-oriented filters: +# +.tldp.org +/(.*/)?selfhtml/ - { +block{Path starts with "ads".} +handle-as-image } - /ads - +# And this stupid host sends streaming video with a wrong MIME type, +# so that Privoxy thinks it is getting HTML and starts filtering: +# +stupid-server.example.com/ - That actually was very helpful and pointed us quickly to where the problem - was. If you don't get this kind of match, then it means one of the default - rules in the first section of default.action is causing - the problem. This would require some guesswork, and maybe a little trial and - error to isolate the offending rule. One likely cause would be one of the - +filter actions. - These tend to be harder to troubleshoot. - Try adding the URL for the site to one of aliases that turn off - +filter: + Example of a simple block action. Say you've + seen an ad on your favourite page on example.com that you want to get rid of. + You have right-clicked the image, selected copy image location + and pasted the URL below while removing the leading http://, into a + { +block{} } section. Note that { +handle-as-image + } need not be specified, since all URLs ending in + .gif will be tagged as images by the general rules as set + in default.action anyway: - - { shop } - .quietpc.com - .worldpay.com # for quietpc.com - .jungle.com - .scan.co.uk - .forbes.com - +{ +block{Nasty ads.} } + www.example.com/nasty-ads/sponsor\.gif + another.example.net/more/junk/here/ - { shop } is an alias that expands to - { -filter -session-cookies-only }. - Or you could do your own exception to negate filtering: - + The URLs of dynamically generated banners, especially from large banner + farms, often don't use the well-known image file name extensions, which + makes it impossible for Privoxy to guess + the file type just by looking at the URL. + You can use the +block-as-image alias defined above for + these cases. + Note that objects which match this rule but then turn out NOT to be an + image are typically rendered as a broken image icon by the + browser. Use cautiously. - - { -filter } - # Disable ALL filter actions for sites in this section - .forbes.com - developer.ibm.com - localhost - +{ +block-as-image } + .doubleclick.net + .fastclick.net + /Realmedia/ads/ + ar.atwola.com/ - This would turn off all filtering for these sites. This is best - put in user.action, for local site - exceptions. Note that when a simple domain pattern is used by itself (without - the subsequent path portion), all sub-pages within that domain are included - automatically in the scope of the action. + Now you noticed that the default configuration breaks Forbes Magazine, + but you were too lazy to find out which action is the culprit, and you + were again too lazy to give feedback, so + you just used the fragile alias on the site, and + -- whoa! -- it worked. The fragile + aliases disables those actions that are most likely to break a site. Also, + good for testing purposes to see if it is Privoxy + that is causing the problem or not. We later find other regular sites + that misbehave, and add those to our personalized list of troublemakers: - Images that are inexplicably being blocked, may well be hitting the -+filter{banners-by-size} - rule, which assumes - that images of certain sizes are ad banners (works well - most of the time since these tend to be standardized). + +{ fragile } + .forbes.com + webmail.example.com + .mybank.com - { fragile } is an alias that disables most - actions that are the most likely to cause trouble. This can be used as a - last resort for problem sites. + You like the fun text replacements in default.filter, + but it is disabled in the distributed actions file. + So you'd like to turn it on in your private, + update-safe config, once and for all: - - - { fragile } - # Handle with care: easy to break - mail.google. - mybank.example.com + + +{ +filter{fun} } + / # For ALL sites! - - Remember to flush caches! Note that the - mail.google reference lacks the TLD portion (e.g. - .com). This will effectively match any TLD with - google in it, such as mail.google.de., - just as an example. + Note that the above is not really a good idea: There are exceptions + to the filters in default.action for things that + really shouldn't be filtered, like code on CVS->Web interfaces. Since + user.action has the last word, these exceptions + won't be valid for the fun filtering specified here. + - If this still does not work, you will have to go through the remaining - actions one by one to find which one(s) is causing the problem. + You might also worry about how your favourite free websites are + funded, and find that they rely on displaying banner advertisements + to survive. So you might want to specifically allow banners for those + sites that you feel provide value to you: - - - - - - Revision 2.90 2008/09/26 16:53:09 fabiankeil - Update "What's new" section. + - Revision 2.89 2008/09/21 15:38:56 fabiankeil - Fix Portage tree sync instructions in Gentoo section. - Anonymously reported at ijbswa-developers@. + - Revision 2.88 2008/09/21 14:42:52 fabiankeil - Add documentation for change-x-forwarded-for{}, - remove documentation for hide-forwarded-for-headers. + - Revision 2.87 2008/08/30 15:37:35 fabiankeil - Update entities. + +Filter Files - Revision 2.86 2008/08/16 10:12:23 fabiankeil - Merge two sentences and move the URL to the end of the item. + + On-the-fly text substitutions need + to be defined in a filter file. Once defined, they + can then be invoked as an action. + - Revision 2.85 2008/08/16 10:04:59 fabiankeil - Some more syntax fixes. This version actually builds. + + &my-app; supports three different filter actions: + filter to + rewrite the content that is send to the client, + client-header-filter + to rewrite headers that are send by the client, and + server-header-filter + to rewrite headers that are send by the server. + - Revision 2.84 2008/08/16 09:42:45 fabiankeil - Turns out building docs works better if the syntax is valid. + + &my-app; also supports two tagger actions: + client-header-tagger + and + server-header-tagger. + Taggers and filters use the same syntax in the filter files, the difference + is that taggers don't modify the text they are filtering, but use a rewritten + version of the filtered text as tag. The tags can then be used to change the + applying actions through sections with tag-patterns. + - Revision 2.83 2008/08/16 09:32:02 fabiankeil - Mention changes since 3.0.9 beta. - Revision 2.82 2008/08/16 09:00:52 fabiankeil - Fix example URL pattern (once more with feeling). + + Multiple filter files can be defined through the filterfile config directive. The filters + as supplied by the developers are located in + default.filter. It is recommended that any locally + defined or modified filters go in a separately defined file such as + user.filter. + - Revision 2.81 2008/08/16 08:51:28 fabiankeil - Update version-related entities. + + Common tasks for content filters are to eliminate common annoyances in + HTML and JavaScript, such as pop-up windows, + exit consoles, crippled windows without navigation tools, the + infamous <BLINK> tag etc, to suppress images with certain + width and height attributes (standard banner sizes or web-bugs), + or just to have fun. + - Revision 2.80 2008/07/18 16:54:30 fabiankeil - Remove erroneous whitespace in documentation link. - Reported by John Chronister in #2021611. + + Enabled content filters are applied to any content whose + Content Type header is recognised as a sign + of text-based content, with the exception of text/plain. + Use the force-text-mode action + to also filter other content. + - Revision 2.79 2008/06/27 18:00:53 markm68k - remove outdated startup information for mac os x + + Substitutions are made at the source level, so if you want to roll + your own filters, you should first be familiar with HTML syntax, + and, of course, regular expressions. + - Revision 2.78 2008/06/21 17:03:03 fabiankeil - Fix typo. + + Just like the actions files, the + filter file is organized in sections, which are called filters + here. Each filter consists of a heading line, that starts with one of the + keywords FILTER:, + CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER: + followed by the filter's name, and a short (one line) + description of what it does. Below that line + come the jobs, i.e. lines that define the actual + text substitutions. By convention, the name of a filter + should describe what the filter eliminates. The + comment is used in the web-based + user interface. + - Revision 2.77 2008/06/14 13:45:22 fabiankeil - Re-add a colon I unintentionally removed a few revisions ago. + + Once a filter called name has been defined + in the filter file, it can be invoked by using an action of the form + +filter{name} + in any actions file. + - Revision 2.76 2008/06/14 13:21:28 fabiankeil - Prepare for the upcoming 3.0.9 beta release. + + Filter definitions start with a header line that contains the filter + type, the filter name and the filter description. + A content filter header line for a filter called foo could look + like this: + - Revision 2.75 2008/06/13 16:06:48 fabiankeil - Update the "What's New in this Release" section with - the ChangeLog entries changelog2doc.pl could handle. + + FILTER: foo Replace all "foo" with "bar" + - Revision 2.74 2008/05/26 15:55:46 fabiankeil - - Update "default profiles" table. - - Add some more pcrs redirect examples and note that - enabling debug 128 helps to get redirects working. + + Below that line, and up to the next header line, come the jobs that + define what text replacements the filter executes. They are specified + in a syntax that imitates Perl's + s/// operator. If you are familiar with Perl, you + will find this to be quite intuitive, and may want to look at the + PCRS documentation for the subtle differences to Perl behaviour. Most + notably, the non-standard option letter U is supported, + which turns the default to ungreedy matching. + - Revision 2.73 2008/05/23 14:43:18 fabiankeil - Remove previously out-commented block that caused syntax problems. + + If you are new to + Regular + Expressions, you might want to take a look at + the Appendix on regular expressions, and + see the Perl + manual for + the + s/// operator's syntax and Perl-style regular + expressions in general. + The below examples might also help to get you started. + - Revision 2.72 2008/05/12 10:26:14 fabiankeil - Synchronize content filter descriptions with the ones in default.filter. - Revision 2.71 2008/04/10 17:37:16 fabiankeil - Actually we use "modern" POSIX 1003.2 regular - expressions in path patterns, not PCRE. + - Revision 2.70 2008/04/10 15:59:12 fabiankeil - Add another section to the client-header-tagger example that shows - how to actually change the action settings once the tag is created. +Filter File Tutorial + + Now, let's complete our foo content filter. We have already defined + the heading, but the jobs are still missing. Since all it does is to replace + foo with bar, there is only one (trivial) job + needed: + - Revision 2.69 2008/03/29 12:14:25 fabiankeil - Remove send-wafer and send-vanilla-wafer actions. + + s/foo/bar/ + - Revision 2.68 2008/03/28 15:13:43 fabiankeil - Remove inspect-jpegs action. + + But wait! Didn't the comment say that all occurrences + of foo should be replaced? Our current job will only take + care of the first foo on each page. For global substitution, + we'll need to add the g option: + - Revision 2.67 2008/03/27 18:31:21 fabiankeil - Remove kill-popups action. + + s/foo/bar/g + - Revision 2.66 2008/03/06 16:33:47 fabiankeil - If limit-connect isn't used, don't limit CONNECT requests to port 443. + + Our complete filter now looks like this: + + + FILTER: foo Replace all "foo" with "bar" +s/foo/bar/g + - Revision 2.65 2008/03/04 18:30:40 fabiankeil - Remove the treat-forbidden-connects-like-blocks action. We now - use the "blocked" page for forbidden CONNECT requests by default. + + Let's look at some real filters for more interesting examples. Here you see + a filter that protects against some common annoyances that arise from JavaScript + abuse. Let's look at its jobs one after the other: + - Revision 2.64 2008/03/01 14:10:28 fabiankeil - Use new block syntax. Still needs some polishing. - Revision 2.63 2008/02/22 05:50:37 markm68k - fix merge problem + + +FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse - Revision 2.62 2008/02/11 11:52:23 hal9 - Fix entity ... s/&/& +# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm +# +s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg + - Revision 2.61 2008/02/11 03:41:47 markm68k - more updates for mac os x + + Following the header line and a comment, you see the job. Note that it uses + | as the delimiter instead of /, because + the pattern contains a forward slash, which would otherwise have to be escaped + by a backslash (\). + - Revision 2.60 2008/02/11 03:40:25 markm68k - more updates for mac os x + + Now, let's examine the pattern: it starts with the text <script.* + enclosed in parentheses. Since the dot matches any character, and * + means: Match an arbitrary number of the element left of myself, this + matches <script, followed by any text, i.e. + it matches the whole page, from the start of the first <script> tag. + - Revision 2.59 2008/02/11 00:52:34 markm68k - reflect new changes for mac os x + + That's more than we want, but the pattern continues: document\.referrer + matches only the exact string document.referrer. The dot needed to + be escaped, i.e. preceded by a backslash, to take away its + special meaning as a joker, and make it just a regular dot. So far, the meaning is: + Match from the start of the first <script> tag in a the page, up to, and including, + the text document.referrer, if both are present + in the page (and appear in that order). + - Revision 2.58 2008/02/03 21:37:40 hal9 - Apply patch from Mark: s/OSX/OS X/ + + But there's still more pattern to go. The next element, again enclosed in parentheses, + is .*</script>. You already know what .* + means, so the whole pattern translates to: Match from the start of the first <script> + tag in a page to the end of the last <script> tag, provided that the text + document.referrer appears somewhere in between. + - Revision 2.57 2008/02/03 19:10:14 fabiankeil - Mention forward-socks5. + + This is still not the whole story, since we have ignored the options and the parentheses: + The portions of the page matched by sub-patterns that are enclosed in parentheses, will be + remembered and be available through the variables $1, $2, ... in + the substitute. The U option switches to ungreedy matching, which means + that the first .* in the pattern will only eat up all + text in between <script and the first occurrence + of document.referrer, and that the second .* will + only span the text up to the first </script> + tag. Furthermore, the s option says that the match may span + multiple lines in the page, and the g option again means that the + substitution is global. + - Revision 2.56 2008/01/31 19:11:35 fabiankeil - Let the +client-header-filter{hide-tor-exit-notation} example apply - to all requests as "tainted" Referers aren't limited to exit TLDs. + + So, to summarize, the pattern means: Match all scripts that contain the text + document.referrer. Remember the parts of the script from + (and including) the start tag up to (and excluding) the string + document.referrer as $1, and the part following + that string, up to and including the closing tag, as $2. + - Revision 2.55 2008/01/19 21:26:37 hal9 - Add IE7 to configuration section per Gerry. + + Now the pattern is deciphered, but wasn't this about substituting things? So + lets look at the substitute: $1"Not Your Business!"$2 is + easy to read: The text remembered as $1, followed by + "Not Your Business!" (including + the quotation marks!), followed by the text remembered as $2. + This produces an exact copy of the original string, with the middle part + (the document.referrer) replaced by "Not Your + Business!". + - Revision 2.54 2008/01/19 17:52:39 hal9 - Re-commit to fix various minor issues for new release. + + The whole job now reads: Replace document.referrer by + "Not Your Business!" wherever it appears inside a + <script> tag. Note that this job won't break JavaScript syntax, + since both the original and the replacement are syntactically valid + string objects. The script just won't have access to the referrer + information anymore. + - Revision 2.53 2008/01/19 15:03:05 hal9 - Doc sources tagged for 3.0.8 release. + + We'll show you two other jobs from the JavaScript taming department, but + this time only point out the constructs of special interest: + - Revision 2.52 2008/01/17 01:49:51 hal9 - Change copyright notice for docs s/2007/2008/. All these will be rebuilt soon - enough. + + +# The status bar is for displaying link targets, not pointless blahblah +# +s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig + - Revision 2.51 2007/12/23 16:48:24 fabiankeil - Use more precise example descriptions for the mysterious domain patterns. + + \s stands for whitespace characters (space, tab, newline, + carriage return, form feed), so that \s* means: zero + or more whitespace. The ? in .*? + makes this matching of arbitrary text ungreedy. (Note that the U + option is not set). The ['"] construct means: a single + or a double quote. Finally, \1 is + a back-reference to the first parenthesis just like $1 above, + with the difference that in the pattern, a backslash indicates + a back-reference, whereas in the substitute, it's the dollar. + - Revision 2.50 2007/12/08 12:44:36 fabiankeil - - Remove already commented out pre-3.0.7 changes. - - Update the "new log defaults" paragraph. + + So what does this job do? It replaces assignments of single- or double-quoted + strings to the window.status object with a dummy assignment + (using a variable name that is hopefully odd enough not to conflict with + real variables in scripts). Thus, it catches many cases where e.g. pointless + descriptions are displayed in the status bar instead of the link target when + you move your mouse over links. + - Revision 2.49 2007/12/06 18:21:55 fabiankeil - Update hide-forwarded-for-headers description. + + +# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html +# +s/(<body [^>]*)onunload(.*>)/$1never$2/iU + - Revision 2.48 2007/11/24 19:07:17 fabiankeil - - Mention request rewriting. - - Enable the conditional-forge paragraph. - - Minor rewordings. + + Including the + OnUnload + event binding in the HTML DOM was a CRIME. + When I close a browser window, I want it to close and die. Basta. + This job replaces the onunload attribute in + <body> tags with the dummy word never. + Note that the i option makes the pattern matching + case-insensitive. Also note that ungreedy matching alone doesn't always guarantee + a minimal match: In the first parenthesis, we had to use [^>]* + instead of .* to prevent the match from exceeding the + <body> tag if it doesn't contain OnUnload, but the page's + content does. + - Revision 2.47 2007/11/18 14:59:47 fabiankeil - A few "Note to Upgraders" updates. + + The last example is from the fun department: + - Revision 2.46 2007/11/17 17:24:44 fabiankeil - - Use new action defaults. - - Minor fixes and rewordings. + + +FILTER: fun Fun text replacements - Revision 2.45 2007/11/16 11:48:46 hal9 - Fix one typo, and add a couple of small refinements. +# Spice the daily news: +# +s/microsoft(?!\.com)/MicroSuck/ig + - Revision 2.44 2007/11/15 03:30:20 hal9 - Results of spell check. + + Note the (?!\.com) part (a so-called negative lookahead) + in the job's pattern, which means: Don't match, if the string + .com appears directly following microsoft + in the page. This prevents links to microsoft.com from being trashed, while + still replacing the word everywhere else. + - Revision 2.43 2007/11/14 18:45:39 fabiankeil - - Mention some more contributors in the "New in this Release" list. - - Minor rewordings. + + +# Buzzword Bingo (example for extended regex syntax) +# +s* industry[ -]leading \ +| cutting[ -]edge \ +| customer[ -]focused \ +| market[ -]driven \ +| award[ -]winning # Comments are OK, too! \ +| high[ -]performance \ +| solutions[ -]based \ +| unmatched \ +| unparalleled \ +| unrivalled \ +*<font color="red"><b>BINGO!</b></font> \ +*igx + - Revision 2.42 2007/11/12 03:32:40 hal9 - Updates for "What's New" and "Notes to Upgraders". Various other changes in - preparation for new release. User Manual is almost ready. + + The x option in this job turns on extended syntax, and allows for + e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting. + - Revision 2.41 2007/11/11 16:32:11 hal9 - This is primarily syncing What's New and Note to Upgraders sections with the many - new features and changes (gleaned from memory but mostly from ChangeLog). + + You get the idea? + + - Revision 2.40 2007/11/10 17:10:59 fabiankeil - In the first third of the file, mention several times that - the action editor is disabled by default in 3.0.7 beta and later. + - Revision 2.39 2007/11/05 02:34:49 hal9 - Various changes in preparation for the upcoming release. Much yet to be done. +The Pre-defined Filters - Revision 2.38 2007/09/22 16:01:42 fabiankeil - Update embedded show-url-info output. + - Revision 2.35 2007/08/26 14:59:49 fabiankeil - Minor rewordings and fixes. + +The distribution default.filter file contains a selection of +pre-defined filters for your convenience: + - Revision 2.34 2007/08/05 15:19:50 fabiankeil - - Don't claim HTTP/1.1 compliance. - - Use $ in some of the path pattern examples. - - Use a hide-user-agent example argument without - leading and trailing space. - - Make it clear that the cookie actions work with - HTTP cookies only. - - Rephrase the inspect-jpegs text to underline - that it's only meant to protect against a single - exploit. + + + js-annoyances + + + The purpose of this filter is to get rid of particularly annoying JavaScript abuse. + To that end, it + + + + replaces JavaScript references to the browser's referrer information + with the string "Not Your Business!". This compliments the hide-referrer action on the content level. + + + + + removes the bindings to the DOM's + unload + event which we feel has no right to exist and is responsible for most exit consoles, i.e. + nasty windows that pop up when you close another one. + + + + + removes code that causes new windows to be opened with undesired properties, such as being + full-screen, non-resizeable, without location, status or menu bar etc. + + + + + + Use with caution. This is an aggressive filter, and can break sites that + rely heavily on JavaScript. + + + - Revision 2.33 2007/07/27 10:57:35 hal9 - Add references for user-agent strings for hide-user-agenet + + js-events + + + This is a very radical measure. It removes virtually all JavaScript event bindings, which + means that scripts can not react to user actions such as mouse movements or clicks, window + resizing etc, anymore. Use with caution! + + + We strongly discourage using this filter as a default since it breaks + many legitimate scripts. It is meant for use only on extra-nasty sites (should you really + need to go there). + + + - Revision 2.32 2007/06/07 12:36:22 fabiankeil - Apply Roland's 29_usermanual.dpatch to fix a bunch - of syntax errors I collected over the last months. + + html-annoyances + + + This filter will undo many common instances of HTML based abuse. + + + The BLINK and MARQUEE tags + are neutralized (yeah baby!), and browser windows will be created as + resizeable (as of course they should be!), and will have location, + scroll and menu bars -- even if specified otherwise. + + + - Revision 2.31 2007/06/02 14:01:37 fabiankeil - Start to document forward-override{}. + + content-cookies + + + Most cookies are set in the HTTP dialog, where they can be intercepted + by the + crunch-incoming-cookies + and crunch-outgoing-cookies + actions. But web sites increasingly make use of HTML meta tags and JavaScript + to sneak cookies to the browser on the content level. + + + This filter disables most HTML and JavaScript code that reads or sets + cookies. It cannot detect all clever uses of these types of code, so it + should not be relied on as an absolute fix. Use it wherever you would also + use the cookie crunch actions. + + + - Revision 2.30 2007/04/25 15:10:36 fabiankeil - - Describe installation for FreeBSD. - - Start to document taggers and tag patterns. - - Don't confuse devils and daemons. + + refresh-tags + + + Disable any refresh tags if the interval is greater than nine seconds (so + that redirections done via refresh tags are not destroyed). This is useful + for dial-on-demand setups, or for those who find this HTML feature + annoying. + + + - Revision 2.29 2007/04/05 11:47:51 fabiankeil - Some updates regarding header filtering, - handling of compressed content and redirect's - support for pcrs commands. + + unsolicited-popups + + + This filter attempts to prevent only unsolicited pop-up + windows from opening, yet still allow pop-up windows that the user + has explicitly chosen to open. It was added in version 3.0.1, + as an improvement over earlier such filters. + + + Technical note: The filter works by redefining the window.open JavaScript + function to a dummy function, PrivoxyWindowOpen(), + during the loading and rendering phase of each HTML page access, and + restoring the function afterward. + + + This is recommended only for browsers that cannot perform this function + reliably themselves. And be aware that some sites require such windows + in order to function normally. Use with caution. + + + - Revision 2.28 2006/12/10 23:42:48 hal9 - Fix various typos reported by Adam P. Thanks. + + all-popups + + + Attempt to prevent all pop-up windows from opening. + Note this should be used with even more discretion than the above, since + it is more likely to break some sites that require pop-ups for normal + usage. Use with caution. + + + - Revision 2.27 2006/11/14 01:57:47 hal9 - Dump all docs prior to 3.0.6 release. Various minor changes to faq and user - manual. + + img-reorder + + + This is a helper filter that has no value if used alone. It makes the + banners-by-size and banners-by-link + (see below) filters more effective and should be enabled together with them. + + + - Revision 2.26 2006/10/24 11:16:44 hal9 - Add new filters. + + banners-by-size + + + This filter removes image tags purely based on what size they are. Fortunately + for us, many ads and banner images tend to conform to certain standardized + sizes, which makes this filter quite effective for ad stripping purposes. + + + Occasionally this filter will cause false positives on images that are not ads, + but just happen to be of one of the standard banner sizes. + + + Recommended only for those who require extreme ad blocking. The default + block rules should catch 95+% of all ads without this filter enabled. + + + - Revision 2.25 2006/10/18 10:50:33 hal9 - Add note that since filters are off in Cautious, compression is ON. Turn off - compression to make filters work on all sites. + + banners-by-link + + + This is an experimental filter that attempts to kill any banners if + their URLs seem to point to known or suspected click trackers. It is currently + not of much value and is not recommended for use by default. + + + - Revision 2.24 2006/10/03 11:13:54 hal9 - More references to the new filters. Include html this time around. + + webbugs + + + Webbugs are small, invisible images (technically 1X1 GIF images), that + are used to track users across websites, and collect information on them. + As an HTML page is loaded by the browser, an embedded image tag causes the + browser to contact a third-party site, disclosing the tracking information + through the requested URL and/or cookies for that third-party domain, without + the user ever becoming aware of the interaction with the third-party site. + HTML-ized spam also uses a similar technique to verify email addresses. + + + This filter removes the HTML code that loads such webbugs. + + + - Revision 2.23 2006/10/02 22:43:53 hal9 - Contains new filter definitions from Fabian, and few other miscellaneous - touch-ups. + + tiny-textforms + + + A rather special-purpose filter that can be used to enlarge textareas (those + multi-line text boxes in web forms) and turn off hard word wrap in them. + It was written for the sourceforge.net tracker system where such boxes are + a nuisance, but it can be handy on other sites, too. + + + It is not recommended to use this filter as a default. + + + - Revision 2.22 2006/09/22 01:27:55 hal9 - Final commit of probably various minor changes here and there. Unless - something changes this should be ready for pending release. + + jumping-windows + + + Many consider windows that move, or resize themselves to be abusive. This filter + neutralizes the related JavaScript code. Note that some sites might not display + or behave as intended when using this filter. Use with caution. + + + - Revision 2.21 2006/09/20 03:21:36 david__schmidt - Just the tiniest tweak. Wafer thin! + + frameset-borders + + + Some web designers seem to assume that everyone in the world will view their + web sites using the same browser brand and version, screen resolution etc, + because only that assumption could explain why they'd use static frame sizes, + yet prevent their frames from being resized by the user, should they be too + small to show their whole content. + + + This filter removes the related HTML code. It should only be applied to sites + which need it. + + + - Revision 2.20 2006/09/10 14:53:54 hal9 - Results of spell check. User manual has some updates to standard.actions file - info. + + demoronizer + + + Many Microsoft products that generate HTML use non-standard extensions (read: + violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those + HTML documents to display with errors on standard-compliant platforms. + + + This filter translates the MS-only characters into Latin-1 equivalents. + It is not necessary when using MS products, and will cause corruption of + all documents that use 8-bit character sets other than Latin-1. It's mostly + worthwhile for Europeans on non-MS platforms, if weird garbage characters + sometimes appear on some pages, or user agents that don't correct for this on + the fly. + + + + - Revision 2.19 2006/09/08 12:19:02 fabiankeil - Adjust hide-if-modified-since example values - to reflect the recent changes. + + shockwave-flash + + + A filter for shockwave haters. As the name suggests, this filter strips code + out of web pages that is used to embed shockwave flash objects. + + + + + - Revision 2.18 2006/09/08 02:38:57 hal9 - Various changes: - -Fix a number of broken links. - -Migrate the new Windows service command line options, and reference as - needed. - -Rebuild so that can be used with the new "user-manual" config capabilities. - -Etc. + + quicktime-kioskmode + + + Change HTML code that embeds Quicktime objects so that kioskmode, which + prevents saving, is disabled. + + + - Revision 2.17 2006/09/05 13:25:12 david__schmidt - Add Windows service invocation stuff (duplicated) in FAQ and in user manual under Windows startup. One probably ought to reference the other. + + fun + + + Text replacements for subversive browsing fun. Make fun of your favorite + Monopolist or play buzzword bingo. + + + - Revision 2.16 2006/09/02 12:49:37 hal9 - Various small updates for new actions, filterfiles, etc. + + crude-parental + + + A demonstration-only filter that shows how Privoxy + can be used to delete web content on a keyword basis. + + + - Revision 2.15 2006/08/30 11:15:22 hal9 - More work on the new actions, especially filter-*-headers, and What's New - section. User Manual is close to final form for 3.0.4 release. Some tinkering - and proof reading left to do. + + ie-exploits + + + An experimental collection of text replacements to disable malicious HTML and JavaScript + code that exploits known security holes in Internet Explorer. + + + Presently, it only protects against Nimda and a cross-site scripting bug, and + would need active maintenance to provide more substantial protection. + + + - Revision 2.14 2006/08/29 10:59:36 hal9 - Add a "Whats New in this release" Section. Further work on multiple filter - files, and assorted other minor changes. + + site-specifics + + + Some web sites have very specific problems, the cure for which doesn't apply + anywhere else, or could even cause damage on other sites. + + + This is a collection of such site-specific cures which should only be applied + to the sites they were intended for, which is what the supplied + default.action file does. Users shouldn't need to change + anything regarding this filter. + + + - Revision 2.13 2006/08/22 11:04:59 hal9 - Silence warnings and errors. This should build now. New filters were only - stubbed in. More to be done. + + google + + + A CSS based block for Google text ads. Also removes a width limitation + and the toolbar advertisement. + + + - Revision 2.12 2006/08/14 08:40:39 fabiankeil - Documented new actions that were part of - the "minor Privoxy improvements". + + yahoo + + + Another CSS based block, this time for Yahoo text ads. And removes + a width limitation as well. + + + - Revision 2.11 2006/07/18 14:48:51 david__schmidt - Reorganizing the repository: swapping out what was HEAD (the old 3.1 branch) - with what was really the latest development (the v_3_0_branch branch) + + msn + + + Another CSS based block, this time for MSN text ads. And removes + tracking URLs, as well as a width limitation. + + + - Revision 1.123.2.43 2005/05/23 09:59:10 hal9 - Fix typo 'loose' + + blogspot + + + Cleans up some Blogspot blogs. Read the fine print before using this one! + + + This filter also intentionally removes some navigation stuff and sets the + page width to 100%. As a result, some rounded corners would + appear to early or not at all and as fixing this would require a browser + that understands background-size (CSS3), they are removed instead. + + + - Revision 1.123.2.42 2004/12/04 14:39:57 hal9 - Fix two minor typos per bug SF report. + + xml-to-html + + + Server-header filter to change the Content-Type from xml to html. + + + - Revision 1.123.2.41 2004/03/23 12:58:42 oes - Fixed an inaccuracy + + html-to-xml + + + Server-header filter to change the Content-Type from html to xml. + + + - Revision 1.123.2.40 2004/02/27 12:48:49 hal9 - Add comment re: redirecting to local file system for set-image-blocker may - is dependent on browser. + + no-ping + + + Removes the non-standard ping attribute from + anchor and area HTML tags. + + + - Revision 1.123.2.39 2004/01/30 22:31:40 oes - Added a hint re bookmarklets to Quickstart section + + hide-tor-exit-notation + + + Client-header filter to remove the Tor exit node notation + found in Host and Referer headers. + + + If &my-app; and Tor are chained and &my-app; + is configured to use socks4a, one can use http://www.example.org.foobar.exit/ + to access the host www.example.org through the + Tor exit node foobar. + + + As the HTTP client isn't aware of this notation, it treats the + whole string www.example.org.foobar.exit as host and uses it + for the Host and Referer headers. From the + server's point of view the resulting headers are invalid and can cause problems. + + + An invalid Referer header can trigger hot-linking + protections, an invalid Host header will make it impossible for + the server to find the right vhost (several domains hosted on the same IP address). + + + This client-header filter removes the foo.exit part in those headers + to prevent the mentioned problems. Note that it only modifies + the HTTP headers, it doesn't make it impossible for the server + to detect your Tor exit node based on the IP address + the request is coming from. + + + - Revision 1.123.2.38 2004/01/30 16:47:51 oes - Some minor clarifications + + - Revision 1.123.2.37 2004/01/29 22:36:11 hal9 - Updates for no longer filtering text/plain, and demoronizer default settings, - and copyright notice dates. + + - Revision 1.123.2.36 2003/12/10 02:26:26 hal9 - Changed the demoronizer filter description. + - Revision 1.123.2.35 2003/11/06 13:36:37 oes - Updated link to nightly CVS tarball - Revision 1.123.2.34 2003/06/26 23:50:16 hal9 - Add a small bit on filtering and problems re: source code being corrupted. - Revision 1.123.2.33 2003/05/08 18:17:33 roro - Use apt-get instead of dpkg to install Debian package, which is more - solid, uses the correct and most recent Debian version automatically. + - Revision 1.123.2.32 2003/04/11 03:13:57 hal9 - Add small note about only one filterfile (as opposed to multiple actions - files). + +Privoxy's Template Files + + All Privoxy built-in pages, i.e. error pages such as the + 404 - No Such Domain + error page, the BLOCKED + page + and all pages of its web-based + user interface, are generated from templates. + (Privoxy must be running for the above links to work as + intended.) + - Revision 1.123.2.31 2003/03/26 02:03:43 oes - Updated hard-coded copyright dates + + These templates are stored in a subdirectory of the configuration + directory called templates. On Unixish platforms, + this is typically + /etc/privoxy/templates/. + - Revision 1.123.2.30 2003/03/24 12:58:56 hal9 - Add new section on Predefined Filters. + + The templates are basically normal HTML files, but with place-holders (called symbols + or exports), which Privoxy fills at run time. It + is possible to edit the templates with a normal text editor, should you want + to customize them. (Not recommended for the casual + user). Should you create your own custom templates, you should use + the config setting templdir + to specify an alternate location, so your templates do not get overwritten + during upgrades. + + + Note that just like in configuration files, lines starting + with # are ignored when the templates are filled in. + - Revision 1.123.2.29 2003/03/20 02:45:29 hal9 - More problems with \-\-chroot causing markup problems :( + + The place-holders are of the form @name@, and you will + find a list of available symbols, which vary from template to template, + in the comments at the start of each file. Note that these comments are not + always accurate, and that it's probably best to look at the existing HTML + code to find out which symbols are supported and what they are filled in with. + - Revision 1.123.2.28 2003/03/19 00:35:24 hal9 - Manual edit of revision log because 'chroot' (even inside a comment) was - causing Docbook to hang here (due to double hyphen and the processor thinking - it was a comment). + + A special application of this substitution mechanism is to make whole + blocks of HTML code disappear when a specific symbol is set. We use this + for many purposes, one of them being to include the beta warning in all + our user interface (CGI) pages when Privoxy + is in an alpha or beta development stage: + - Revision 1.123.2.27 2003/03/18 19:37:14 oes - s/Advanced|Radical/Adventuresome/g to avoid complaints re fun filter + + +<!-- @if-unstable-start --> - Revision 1.123.2.26 2003/03/17 16:50:53 oes - Added documentation for new chroot option + ... beta warning HTML code goes here ... - Revision 1.123.2.25 2003/03/15 18:36:55 oes - Adapted to the new filters +<!-- if-unstable-end@ --> + - Revision 1.123.2.24 2002/11/17 06:41:06 hal9 - Move default profiles table from FAQ to U-M, and other minor related changes. - Add faq on cookies. + + If the "unstable" symbol is set, everything in between and including + @if-unstable-start and if-unstable-end@ + will disappear, leaving nothing but an empty comment: + - Revision 1.123.2.23 2002/10/21 02:32:01 hal9 - Updates to the user.action examples section. A few new ones. + + <!-- --> + - Revision 1.123.2.22 2002/10/12 00:51:53 hal9 - Add demoronizer to filter section. + + There's also an if-then-else construct and an #include + mechanism, but you'll sure find out if you are inclined to edit the + templates ;-) + - Revision 1.123.2.21 2002/10/10 04:09:35 hal9 - s/Advanced/Radical/ and added very brief note. + + All templates refer to a style located at + http://config.privoxy.org/send-stylesheet. + This is, of course, locally served by Privoxy + and the source for it can be found and edited in the + cgi-style.css template. + - Revision 1.123.2.20 2002/10/10 03:49:21 hal9 - Add notes to session-cookies-only and Quickstart about pre-existing - cookies. Also, note content-cookies work differently. + - Revision 1.123.2.19 2002/09/26 01:25:36 hal9 - More explanation on Privoxy patterns, more on content-cookies and SSL. + - Revision 1.123.2.18 2002/08/22 23:47:58 hal9 - Add 'Documentation' to Privoxy Menu shot in Configuration section to match - CGIs. - Revision 1.123.2.17 2002/08/18 01:13:05 hal9 - Spell checked (only one typo this time!). - Revision 1.123.2.16 2002/08/09 19:20:54 david__schmidt - Update to Mac OS X startup script name + - Revision 1.123.2.15 2002/08/07 17:32:11 oes - Converted some internal links from ulink to link for PDF creation; no content changed +Contacting the Developers, Bug Reporting and Feature +Requests - Revision 1.123.2.14 2002/08/06 09:16:13 oes - Nits re: actions file download + + &contacting; + - Revision 1.123.2.13 2002/08/02 18:23:19 g_sauthoff - Just 2 small corrections to the Gentoo sections + - Revision 1.123.2.12 2002/08/02 18:17:21 g_sauthoff - Added 2 Gentoo sections + - Revision 1.123.2.11 2002/07/26 15:20:31 oes - - Added version info to title - - Added info on new filters - - Revised parts of the filter file tutorial - - Added info on where to get updated actions files - Revision 1.123.2.10 2002/07/25 21:42:29 hal9 - Add brief notes on not proxying non-HTTP protocols. + +Privoxy Copyright, License and History - Revision 1.123.2.9 2002/07/11 03:40:28 david__schmidt + + ©right; + - Updated Mac OS X sections due to installation location change + +License + + &license; + + + - Revision 1.123.2.8 2002/06/09 16:36:32 hal9 - Clarifications on filtering and MIME. Hardcode 'latest release' in index.html. - Revision 1.123.2.7 2002/06/09 00:29:34 hal9 - Touch ups on filtering, in actions section and Anatomy. + - Revision 1.123.2.6 2002/06/06 23:11:03 hal9 - Fix broken link. Linkchecked all docs. +History + + &history; + + - Revision 1.123.2.5 2002/05/29 02:01:02 hal9 - This is break out of the entire config section from u-m, so it can - eventually be used to generate the comments, etc in the main config file - so that these are in sync with each other. +Authors + + &p-authors; + + - Revision 1.123.2.4 2002/05/27 03:28:45 hal9 - Ooops missed something from David. + - Revision 1.123.2.3 2002/05/27 03:23:17 hal9 - Fix FIXMEs for OS2 and Mac OS X startup. Fix Redhat typos (should be Red Hat). - That's a wrap, I think. + - Revision 1.123.2.2 2002/05/26 19:02:09 hal9 - Move Amiga stuff around to take of FIXME in start up section. - Revision 1.123.2.1 2002/05/26 17:04:25 hal9 - -Spellcheck, very minor edits, and sync across branches + +See Also + + &seealso; + + - Revision 1.123 2002/05/24 23:19:23 hal9 - Include new image (Proxy setup). More fun with guibutton. - Minor corrections/clarifications here and there. - Revision 1.122 2002/05/24 13:24:08 oes - Added Bookmarklet for one-click pre-filled access to show-url-info - Revision 1.121 2002/05/23 23:20:17 oes - - Changed more (all?) references to actions to the - style. - - Small fixes in the actions chapter - - Small clarifications in the quickstart to ad blocking - - Removed from s since the new doc CSS - renders them red (bad in TOC). +<!-- ~~~~~ New section ~~~~~ --> +<sect1 id="appendix"><title>Appendix - Revision 1.120 2002/05/23 19:16:43 roro - Correct Debian specials (installation and startup). - Revision 1.119 2002/05/22 17:17:05 oes - Added Security hint + + +Regular Expressions + + Privoxy uses Perl-style regular + expressions in its actions + files and filter file, + through the PCRE and + + PCRS libraries. + - Revision 1.118 2002/05/21 04:54:55 hal9 - -New Section: Quickstart to Ad Blocking - -Reformat Actions Anatomy to match new CGI layout + + If you are reading this, you probably don't understand what regular + expressions are, or what they can do. So this will be a very brief + introduction only. A full explanation would require a book ;-) + - Revision 1.117 2002/05/17 13:56:16 oes - - Reworked & extended Templates chapter - - Small changes to Regex appendix - - #included authors.sgml into (C) and hist chapter + + Regular expressions provide a language to describe patterns that can be + run against strings of characters (letter, numbers, etc), to see if they + match the string or not. The patterns are themselves (sometimes complex) + strings of literal characters, combined with wild-cards, and other special + characters, called meta-characters. The meta-characters have + special meanings and are used to build complex patterns to be matched against. + Perl Compatible Regular Expressions are an especially convenient + dialect of the regular expression language. + - Revision 1.116 2002/05/17 03:23:46 hal9 - Fixing merge conflict in Quickstart section. + + To make a simple analogy, we do something similar when we use wild-card + characters when listing files with the dir command in DOS. + *.* matches all filenames. The special + character here is the asterisk which matches any and all characters. We can be + more specific and use ? to match just individual + characters. So dir file?.text would match + file1.txt, file2.txt, etc. We are pattern + matching, using a similar technique to regular expressions! + - Revision 1.115 2002/05/16 16:25:00 oes - Extended the Filter File chapter & minor fixes + + Regular expressions do essentially the same thing, but are much, much more + powerful. There are many more special characters and ways of + building complex patterns however. Let's look at a few of the common ones, + and then some examples: + - Revision 1.114 2002/05/16 09:42:50 oes - More ulink->link, added some hints to Quickstart section + + + . - Matches any single character, e.g. a, + A, 4, :, or @. + + - Revision 1.113 2002/05/15 21:07:25 oes - Extended and further commented the example actions files + + + ? - The preceding character or expression is matched ZERO or ONE + times. Either/or. + + - Revision 1.112 2002/05/15 03:57:14 hal9 - Spell check. A few minor edits here and there for better syntax and - clarification. + + + + - The preceding character or expression is matched ONE or MORE + times. + + - Revision 1.111 2002/05/14 23:01:36 oes - Fixing the fixes + + + * - The preceding character or expression is matched ZERO or MORE + times. + + - Revision 1.110 2002/05/14 19:10:45 oes - Restored alphabetical order of actions + + + \ - The escape character denotes that + the following character should be taken literally. This is used where one of the + special characters (e.g. .) needs to be taken literally and + not as a special meta-character. Example: example\.com, makes + sure the period is recognized only as a period (and not expanded to its + meta-character meaning of any single character). + + - Revision 1.109 2002/05/14 17:23:11 oes - Renamed the prevent-*-cookies actions, extended aliases section and moved it before the example AFs + + + [ ] - Characters enclosed in brackets will be matched if + any of the enclosed characters are encountered. For instance, [0-9] + matches any numeric digit (zero through nine). As an example, we can combine + this with + to match any digit one of more times: [0-9]+. + + - Revision 1.108 2002/05/14 15:29:12 oes - Completed proofreading the actions chapter + + + ( ) - parentheses are used to group a sub-expression, + or multiple sub-expressions. + + - Revision 1.107 2002/05/12 03:20:41 hal9 - Small clarifications for 127.0.0.1 vs localhost for listen-address since this - apparently an important distinction for some OS's. + + + | - The bar character works like an + or conditional statement. A match is successful if the + sub-expression on either side of | matches. As an example: + /(this|that) example/ uses grouping and the bar character + and would match either this example or that + example, and nothing else. + + - Revision 1.106 2002/05/10 01:48:20 hal9 - This is mostly proposed copyright/licensing additions and changes. Docs - are still GPL, but licensing and copyright are more visible. Also, copyright - changed in doc header comments (eliminate references to JB except FAQ). + + These are just some of the ones you are likely to use when matching URLs with + Privoxy, and is a long way from a definitive + list. This is enough to get us started with a few simple examples which may + be more illuminating: + - Revision 1.105 2002/05/05 20:26:02 hal9 - Sorting out license vs copyright in these docs. + + /.*/banners/.* - A simple example + that uses the common combination of . and * to + denote any character, zero or more times. In other words, any string at all. + So we start with a literal forward slash, then our regular expression pattern + (.*) another literal forward slash, the string + banners, another forward slash, and lastly another + .*. We are building + a directory path here. This will match any file with the path that has a + directory named banners in it. The .* matches + any characters, and this could conceivably be more forward slashes, so it + might expand into a much longer looking path. For example, this could match: + /eye/hate/spammers/banners/annoy_me_please.gif, or just + /banners/annoying.html, or almost an infinite number of other + possible combinations, just so it has banners in the path + somewhere. + - Revision 1.104 2002/05/04 08:44:45 swa - bumped version + + And now something a little more complex: + - Revision 1.103 2002/05/04 00:40:53 hal9 - -Remove the TOC first page kludge. It's fixed proper now in ldp.dsl.in. - -Some minor additions to Quickstart. + + /.*/adv((er)?ts?|ertis(ing|ements?))?/ - + We have several literal forward slashes again (/), so we are + building another expression that is a file path statement. We have another + .*, so we are matching against any conceivable sub-path, just so + it matches our expression. The only true literal that must + match our pattern is adv, together with + the forward slashes. What comes after the adv string is the + interesting part. + - Revision 1.102 2002/05/03 17:46:00 oes - Further proofread & reactivated short build instructions + + Remember the ? means the preceding expression (either a + literal character or anything grouped with (...) in this case) + can exist or not, since this means either zero or one match. So + ((er)?ts?|ertis(ing|ements?)) is optional, as are the + individual sub-expressions: (er), + (ing|ements?), and the s. The | + means or. We have two of those. For instance, + (ing|ements?), can expand to match either ing + OR ements?. What is being done here, is an + attempt at matching as many variations of advertisement, and + similar, as possible. So this would expand to match just adv, + or advert, or adverts, or + advertising, or advertisement, or + advertisements. You get the idea. But it would not match + advertizements (with a z). We could fix that by + changing our regular expression to: + /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match + either spelling. + - Revision 1.101 2002/05/03 03:58:30 hal9 - Move the user-manual config directive to top of section. Add note about - Privoxy needing read permissions for configs, and write for logs. + + /.*/advert[0-9]+\.(gif|jpe?g) - Again + another path statement with forward slashes. Anything in the square brackets + [ ] can be matched. This is using 0-9 as a + shorthand expression to mean any digit one through nine. It is the same as + saying 0123456789. So any digit matches. The + + means one or more of the preceding expression must be included. The preceding + expression here is what is in the square brackets -- in this case, any digit + one through nine. Then, at the end, we have a grouping: (gif|jpe?g). + This includes a |, so this needs to match the expression on + either side of that bar character also. A simple gif on one side, and the other + side will in turn match either jpeg or jpg, + since the ? means the letter e is optional and + can be matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string advert, then one or more digits, and a . + (which is now a literal, and not a special character, since it is escaped + with \), and lastly either gif, or + jpeg, or jpg. Some possible matches would + include: //advert1.jpg, + /nasty/ads/advert1234.gif, + /banners/from/hell/advert99.jpg. It would not match + advert1.gif (no leading slash), or + /adverts232.jpg (the expression does not include an + s), or /advert1.jsp (jsp is not + in the expression anywhere). + - Revision 1.100 2002/04/29 03:05:55 hal9 - Add clarification on differences of new actions files. + + We are barely scratching the surface of regular expressions here so that you + can understand the default Privoxy + configuration files, and maybe use this knowledge to customize your own + installation. There is much, much more that can be done with regular + expressions. Now that you know enough to get started, you can learn more on + your own :/ + - Revision 1.99 2002/04/28 16:59:05 swa - more structure in starting section + + More reading on Perl Compatible Regular expressions: + http://perldoc.perl.org/perlre.html + - Revision 1.98 2002/04/28 05:43:59 hal9 - This is the break up of configuration.html into multiple files. This - will probably break links elsewhere :( + + For information on regular expression based substitutions and their applications + in filters, please see the filter file tutorial + in this manual. + + - Revision 1.97 2002/04/27 21:04:42 hal9 - -Rewrite of Actions File example. - -Add section for user-manual directive in config. + - Revision 1.96 2002/04/27 05:32:00 hal9 - -Add short section to Filter Files to tie in with +filter action. - -Start rewrite of examples in Actions Examples (not finished). - Revision 1.95 2002/04/26 17:23:29 swa - bookmarks cleaned, changed structure of user manual, screen and programlisting cleanups, and numerous other changes that I forgot + + +Privoxy's Internal Pages - Revision 1.94 2002/04/26 05:24:36 hal9 - -Add most of Andreas suggestions to Chain of Events section. - -A few other minor corrections and touch up. + + Since Privoxy proxies each requested + web page, it is easy for Privoxy to + trap certain special URLs. In this way, we can talk directly to + Privoxy, and see how it is + configured, see how our rules are being applied, change these + rules and other configuration options, and even turn + Privoxy's filtering off, all with + a web browser. - Revision 1.92 2002/04/25 18:55:13 hal9 - More catchups on new actions files, and new actions names. - Other assorted cleanups, and minor modifications. + - Revision 1.91 2002/04/24 02:39:31 hal9 - Add 'Chain of Events' section. + + The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, + Privoxy must be running to access these. If + not, you will get a friendly error message. Internet access is not + necessary either. + - Revision 1.90 2002/04/23 21:41:25 hal9 - Linuxconf is deprecated on RH, substitute chkconfig. + + - Revision 1.89 2002/04/23 21:05:28 oes - Added hint for startup on Red Hat + + + Privoxy main page: + +
+ + http://config.privoxy.org/ + +
+ + There is a shortcut: http://p.p/ (But it + doesn't provide a fall-back to a real page, in case the request is not + sent through Privoxy) + +
- Revision 1.88 2002/04/23 05:37:54 hal9 - Add AmigaOS install stuff. + + + Show information about the current configuration, including viewing and + editing of actions files: + +
+ + http://config.privoxy.org/show-status + +
+
- Revision 1.87 2002/04/23 02:53:15 david__schmidt - Updated Mac OS X installation section - Added a few English tweaks here an there + + + Show the source code version numbers: + +
+ + http://config.privoxy.org/show-version + +
+
- Revision 1.86 2002/04/21 01:46:32 hal9 - Re-write actions section. + + + Show the browser's request headers: + +
+ + http://config.privoxy.org/show-request + +
+
- Revision 1.85 2002/04/18 21:23:23 hal9 - Fix ugly typo (mine). + + + Show which actions apply to a URL and why: + +
+ + http://config.privoxy.org/show-url-info + +
+
- Revision 1.84 2002/04/18 21:17:13 hal9 - Spell Redhat correctly (ie Red Hat). A few minor grammar corrections. + + + Toggle Privoxy on or off. This feature can be turned off/on in the main + config file. When toggled off, Privoxy + continues to run, but only as a pass-through proxy, with no actions taking + place: + +
+ + http://config.privoxy.org/toggle + +
+ + Short cuts. Turn off, then on: + +
+ + http://config.privoxy.org/toggle?set=disable + +
+
+ + http://config.privoxy.org/toggle?set=enable + +
+
- Revision 1.83 2002/04/18 18:21:12 oes - Added RPM install detail +
+
- Revision 1.82 2002/04/18 12:04:50 oes - Cosmetics + + These may be bookmarked for quick reference. See next. - Revision 1.81 2002/04/18 11:50:24 oes - Extended Install section - needs fixing by packagers + - Revision 1.80 2002/04/18 10:45:19 oes - Moved text to buildsource.sgml, renamed some filters, details + +Bookmarklets + + Below are some bookmarklets to allow you to easily access a + mini version of some of Privoxy's + special pages. They are designed for MS Internet Explorer, but should work + equally well in Netscape, Mozilla, and other browsers which support + JavaScript. They are designed to run directly from your bookmarks - not by + clicking the links below (although that should work for testing). + + + To save them, right-click the link and choose Add to Favorites + (IE) or Add Bookmark (Netscape). You will get a warning that + the bookmark may not be safe - just click OK. Then you can run the + Bookmarklet directly from your favorites/bookmarks. For even faster access, + you can put them on the Links bar (IE) or the Personal + Toolbar (Netscape), and run them with a single click. + - Revision 1.79 2002/04/18 03:18:06 hal9 - Spellcheck, and minor touchups. + + - Revision 1.78 2002/04/17 18:04:16 oes - Proofreading part 2 + + + Privoxy - Enable + + - Revision 1.77 2002/04/17 13:51:23 oes - Proofreading, part one + + + Privoxy - Disable + + - Revision 1.76 2002/04/16 04:25:51 hal9 - -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section. - -Note about proxy may need requests to re-read config files. + + + Privoxy - Toggle Privoxy (Toggles between enabled and disabled) + + - Revision 1.75 2002/04/12 02:08:48 david__schmidt - Remove OS/2 building info... it is already in the developer-manual + + + Privoxy- View Status + + + + + + Privoxy - Why? + + + + - Revision 1.74 2002/04/11 00:54:38 hal9 - Add small section on submitting actions. + + Credit: The site which gave us the general idea for these bookmarklets is + www.bookmarklets.com. They + have more information about bookmarklets. + - Revision 1.73 2002/04/10 18:45:15 swa - generated - Revision 1.72 2002/04/10 04:06:19 hal9 - Added actions feedback to Bookmarklets section + - Revision 1.71 2002/04/08 22:59:26 hal9 - Version update. Spell chkconfig correctly :) +
- Revision 1.70 2002/04/08 20:53:56 swa - ? - Revision 1.69 2002/04/06 05:07:29 hal9 - -Add privoxy-man-page.sgml, for man page. - -Add authors.sgml for AUTHORS (and p-authors.sgml) - -Reworked various aspects of various docs. - -Added additional comments to sub-docs. + + +Chain of Events + + Let's take a quick look at how some of Privoxy's + core features are triggered, and the ensuing sequence of events when a web + page is requested by your browser: + - Revision 1.68 2002/04/04 18:46:47 swa - consistent look. reuse of copyright, history et. al. + + + + + First, your web browser requests a web page. The browser knows to send + the request to Privoxy, which will in turn, + relay the request to the remote web server after passing the following + tests: + + + + + Privoxy traps any request for its own internal CGI + pages (e.g http://p.p/) and sends the CGI page back to the browser. + + + + + Next, Privoxy checks to see if the URL + matches any +block patterns. If + so, the URL is then blocked, and the remote web server will not be contacted. + +handle-as-image + and + +handle-as-empty-document + are then checked, and if there is no match, an + HTML BLOCKED page is sent back to the browser. Otherwise, if + it does match, an image is returned for the former, and an empty text + document for the latter. The type of image would depend on the setting of + +set-image-blocker + (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). + + + + + Untrusted URLs are blocked. If URLs are being added to the + trust file, then that is done. + + + + + If the URL pattern matches the +fast-redirects action, + it is then processed. Unwanted parts of the requested URL are stripped. + + + + + Now the rest of the client browser's request headers are processed. If any + of these match any of the relevant actions (e.g. +hide-user-agent, + etc.), headers are suppressed or forged as determined by these actions and + their parameters. + + + + + Now the web server starts sending its response back (i.e. typically a web + page). + + + + + First, the server headers are read and processed to determine, among other + things, the MIME type (document type) and encoding. The headers are then + filtered as determined by the + +crunch-incoming-cookies, + +session-cookies-only, + and +downgrade-http-version + actions. + + + + + If any +filter action + or +deanimate-gifs + action applies (and the document type fits the action), the rest of the page is + read into memory (up to a configurable limit). Then the filter rules (from + default.filter and any other filter files) are + processed against the buffered content. Filters are applied in the order + they are specified in one of the filter files. Animated GIFs, if present, + are reduced to either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by + Privoxy back to your browser. + + + If neither a +filter action + or +deanimate-gifs + matches, then Privoxy passes the raw data through + to the client browser as it becomes available. + + + + + As the browser receives the now (possibly filtered) page content, it + reads and then requests any URLs that may be embedded within the page + source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. + frames), sounds, etc. For each of these objects, the browser issues a + separate request (this is easily viewable in Privoxy's + logs). And each such request is in turn processed just as above. Note that a + complex web page will have many, many such embedded URLs. If these + secondary requests are to a different server, then quite possibly a very + differing set of actions is triggered. + + - Revision 1.67 2002/04/04 17:27:57 swa - more single file to be included at multiple points. make maintaining easier + + + + NOTE: This is somewhat of a simplistic overview of what happens with each URL + request. For the sake of brevity and simplicity, we have focused on + Privoxy's core features only. + - Revision 1.66 2002/04/04 06:48:37 hal9 - Structural changes to allow for conditional inclusion/exclusion of content - based on entity toggles, e.g. 'entity % p-not-stable "INCLUDE"'. And - definition of internal entities, e.g. 'entity p-version "2.9.13"' that will - eventually be set by Makefile. - More boilerplate text for use across multiple docs. + - Revision 1.65 2002/04/03 19:52:07 swa - enhance squid section due to user suggestion - Revision 1.64 2002/04/03 03:53:43 hal9 - A few minor bug fixes, and touch ups. Ready for review. + + +Troubleshooting: Anatomy of an Action - Revision 1.63 2002/04/01 16:24:49 hal9 - Define entities to include boilerplate text. See doc/source/*. + + The way Privoxy applies + actions and filters + to any given URL can be complex, and not always so + easy to understand what is happening. And sometimes we need to be able to + see just what Privoxy is + doing. Especially, if something Privoxy is doing + is causing us a problem inadvertently. It can be a little daunting to look at + the actions and filters files themselves, since they tend to be filled with + regular expressions whose consequences are not + always so obvious. + - Revision 1.62 2002/03/30 04:15:53 hal9 - - Fix privoxy.org/config links. - - Paste in Bookmarklets from Toggle page. - - Move Quickstart nearer top, and minor rework. + + One quick test to see if Privoxy is causing a problem + or not, is to disable it temporarily. This should be the first troubleshooting + step. See the Bookmarklets section on a quick + and easy way to do this (be sure to flush caches afterward!). Looking at the + logs is a good idea too. (Note that both the toggle feature and logging are + enabled via config file settings, and may need to be + turned on.) + + + Another easy troubleshooting step to try is if you have done any + customization of your installation, revert back to the installed + defaults and see if that helps. There are times the developers get complaints + about one thing or another, and the problem is more related to a customized + configuration issue. + - Revision 1.61 2002/03/29 01:31:08 hal9 - Minor update. + + Privoxy also provides the + http://config.privoxy.org/show-url-info + page that can show us very specifically how actions + are being applied to any given URL. This is a big help for troubleshooting. + - Revision 1.60 2002/03/27 01:57:34 hal9 - Added more to Anatomy section. + + First, enter one URL (or partial URL) at the prompt, and then + Privoxy will tell us + how the current configuration will handle it. This will not + help with filtering effects (i.e. the +filter action) from + one of the filter files since this is handled very + differently and not so easy to trap! It also will not tell you about any other + URLs that may be embedded within the URL you are testing. For instance, images + such as ads are expressed as URLs within the raw page source of HTML pages. So + you will only get info for the actual URL that is pasted into the prompt area + -- not any sub-URLs. If you want to know about embedded URLs like ads, you + will have to dig those out of the HTML source. Use your browser's View + Page Source option for this. Or right click on the ad, and grab the + URL. + - Revision 1.59 2002/03/27 00:54:33 hal9 - Touch up intro for new name. + + Let's try an example, google.com, + and look at it one section at a time in a sample configuration (your real + configuration may vary): + - Revision 1.58 2002/03/26 22:29:55 swa - we have a new homepage! + + + Matches for http://www.google.com: - Revision 1.57 2002/03/24 20:33:30 hal9 - A few minor catch ups with name change. + In file: default.action [ View ] [ Edit ] - Revision 1.56 2002/03/24 16:17:06 swa - configure needs to be generated. + {+change-x-forwarded-for{block} + +deanimate-gifs {last} + +fast-redirects {check-decoded-url} + +filter {refresh-tags} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + +hide-from-header {block} + +hide-referrer {forge} + +session-cookies-only + +set-image-blocker {pattern} +/ - Revision 1.55 2002/03/24 16:08:08 swa - we are too lazy to make a block-built - privoxy logo. hence removed the option. + { -session-cookies-only } + .google.com - Revision 1.54 2002/03/24 15:46:20 swa - name change related issue. + { -fast-redirects } + .google.com - Revision 1.53 2002/03/24 11:51:00 swa - name change. changed filenames. +In file: user.action [ View ] [ Edit ] +(no matches in this file) + + - Revision 1.52 2002/03/24 11:01:06 swa - name change + + This is telling us how we have defined our + actions, and + which ones match for our test case, google.com. + Displayed is all the actions that are available to us. Remember, + the + sign denotes on. - + denotes off. So some are on here, but many + are off. Each example we try may provide a slightly different + end result, depending on our configuration directives. + + + The first listing + is for our default.action file. The large, multi-line + listing, is how the actions are set to match for all URLs, i.e. our default + settings. If you look at your actions file, this would be the + section just below the aliases section near the top. This + will apply to all URLs as signified by the single forward slash at the end + of the listing -- / . + - Revision 1.51 2002/03/23 15:13:11 swa - renamed every reference to the old name with foobar. - fixed "application foobar application" tag, fixed - "the foobar" with "foobar". left junkbustser in cvs - comments and remarks to history untouched. + + But we have defined additional actions that would be exceptions to these general + rules, and then we list specific URLs (or patterns) that these exceptions + would apply to. Last match wins. Just below this then are two explicit + matches for .google.com. The first is negating our previous + cookie setting, which was for +session-cookies-only + (i.e. not persistent). So we will allow persistent cookies for google, at + least that is how it is in this example. The second turns + off any +fast-redirects + action, allowing this to take place unmolested. Note that there is a leading + dot here -- .google.com. This will match any hosts and + sub-domains, in the google.com domain also, such as + www.google.com or mail.google.com. But it would not + match www.google.de! So, apparently, we have these two actions + defined as exceptions to the general rules at the top somewhere in the lower + part of our default.action file, and + google.com is referenced somewhere in these latter sections. + - Revision 1.50 2002/03/23 05:06:21 hal9 - Touch up. + + Then, for our user.action file, we again have no hits. + So there is nothing google-specific that we might have added to our own, local + configuration. If there was, those actions would over-rule any actions from + previously processed files, such as default.action. + user.action typically has the last word. This is the + best place to put hard and fast exceptions, + - Revision 1.49 2002/03/21 17:01:05 hal9 - New section in Appendix. + + And finally we pull it all together in the bottom section and summarize how + Privoxy is applying all its actions + to google.com: - Revision 1.48 2002/03/12 06:33:01 hal9 - Catching up to Andreas and re_filterfile changes. + - Revision 1.47 2002/03/11 13:13:27 swa - correct feedback channels + + - Revision 1.46 2002/03/10 00:51:08 hal9 - Added section on JB internal pages in Appendix. + Final results: - Revision 1.45 2002/03/09 17:43:53 swa - more distros + -add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs {last} + -downgrade-http-version + -fast-redirects + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header {block} + -hide-if-modified-since + +hide-referrer {forge} + -hide-user-agent + -limit-connect + -overwrite-last-modified + -prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + -session-cookies-only + +set-image-blocker {pattern} + - Revision 1.44 2002/03/09 17:08:48 hal9 - New section on Jon's actions file editor, and move some stuff around. + + Notice the only difference here to the previous listing, is to + fast-redirects and session-cookies-only, + which are activated specifically for this site in our configuration, + and thus show in the Final Results. + - Revision 1.43 2002/03/08 00:47:32 hal9 - Added imageblock{pattern}. + + Now another example, ad.doubleclick.net: + - Revision 1.42 2002/03/07 18:16:55 swa - looks better + + - Revision 1.41 2002/03/07 16:46:43 hal9 - Fix a few markup problems for jade. + { +block{Domains starts with "ad"} } + ad*. - Revision 1.40 2002/03/07 16:28:39 swa - provide correct feedback channels + { +block{Domain contains "ad"} } + .ad. - Revision 1.39 2002/03/06 16:19:28 hal9 - Note on perceived filtering slowdown per FR. + { +block{Doubleclick banner server} +handle-as-image } + .[a-vx-z]*.doubleclick.net + + - Revision 1.38 2002/03/05 23:55:14 hal9 - Stupid I did it again. Double hyphen in comment breaks jade. + + We'll just show the interesting part here - the explicit matches. It is + matched three different times. Two +block{} sections, + and a +block{} +handle-as-image, + which is the expanded form of one of our aliases that had been defined as: + +block-as-image. (Aliases are defined in + the first section of the actions file and typically used to combine more + than one action.) + - Revision 1.37 2002/03/05 23:53:49 hal9 - jade barfs on '- -' embedded in comments. - -user option broke it. + + Any one of these would have done the trick and blocked this as an unwanted + image. This is unnecessarily redundant since the last case effectively + would also cover the first. No point in taking chances with these guys + though ;-) Note that if you want an ad or obnoxious + URL to be invisible, it should be defined as ad.doubleclick.net + is done here -- as both a +block{} + and an + +handle-as-image. + The custom alias +block-as-image just + simplifies the process and make it more readable. + - Revision 1.36 2002/03/05 22:53:28 hal9 - Add new - - user option. + + One last example. Let's try http://www.example.net/adsl/HOWTO/. + This one is giving us problems. We are getting a blank page. Hmmm ... + - Revision 1.35 2002/03/05 00:17:27 hal9 - Added section on command line options. + + - Revision 1.34 2002/03/04 19:32:07 oes - Changed default port to 8118 + Matches for http://www.example.net/adsl/HOWTO/: - Revision 1.33 2002/03/03 19:46:13 hal9 - Emphasis on where/how to report bugs, etc + In file: default.action [ View ] [ Edit ] - Revision 1.32 2002/03/03 09:26:06 joergs - AmigaOS changes, config is now loaded from PROGDIR: instead of - AmiTCP:db/junkbuster/ if no configuration file is specified on the - command line. + {-add-header + -block + +change-x-forwarded-for{block} + -client-header-filter{hide-tor-exit-notation} + -content-type-overwrite + -crunch-client-header + -crunch-if-none-match + -crunch-incoming-cookies + -crunch-outgoing-cookies + -crunch-server-header + +deanimate-gifs + -downgrade-http-version + +fast-redirects {check-decoded-url} + -filter {js-events} + -filter {content-cookies} + -filter {all-popups} + -filter {banners-by-link} + -filter {tiny-textforms} + -filter {frameset-borders} + -filter {demoronizer} + -filter {shockwave-flash} + -filter {quicktime-kioskmode} + -filter {fun} + -filter {crude-parental} + -filter {site-specifics} + -filter {js-annoyances} + -filter {html-annoyances} + +filter {refresh-tags} + -filter {unsolicited-popups} + +filter {img-reorder} + +filter {banners-by-size} + +filter {webbugs} + +filter {jumping-windows} + +filter {ie-exploits} + -filter {google} + -filter {yahoo} + -filter {msn} + -filter {blogspot} + -filter {no-ping} + -force-text-mode + -handle-as-empty-document + -handle-as-image + -hide-accept-language + -hide-content-disposition + +hide-from-header{block} + +hide-referer{forge} + -hide-user-agent + -overwrite-last-modified + +prevent-compression + -redirect + -server-header-filter{xml-to-html} + -server-header-filter{html-to-xml} + +session-cookies-only + +set-image-blocker{blank} } + / - Revision 1.31 2002/03/02 22:45:52 david__schmidt - Just tweaking + { +block{Path contains "ads".} +handle-as-image } + /ads + + - Revision 1.30 2002/03/02 22:00:14 hal9 - Updated 'New Features' list. Ran through spell-checker. + + Ooops, the /adsl/ is matching /ads in our + configuration! But we did not want this at all! Now we see why we get the + blank page. It is actually triggering two different actions here, and + the effects are aggregated so that the URL is blocked, and &my-app; is told + to treat the block as if it were an image. But this is, of course, all wrong. + We could now add a new action below this (or better in our own + user.action file) that explicitly + un blocks ( + {-block}) paths with + adsl in them (remember, last match in the configuration + wins). There are various ways to handle such exceptions. Example: + - Revision 1.29 2002/03/02 20:34:07 david__schmidt - Update OS/2 build section + + - Revision 1.28 2002/02/24 14:34:24 jongfoster - Formatting changes. Now changing the doctype to DocBook XML 4.1 - will work - no other changes are needed. + { -block } + /adsl + + - Revision 1.27 2002/01/11 14:14:32 hal9 - Added a very short section on Templates + + Now the page displays ;-) + Remember to flush your browser's caches when making these kinds of changes to + your configuration to insure that you get a freshly delivered page! Or, try + using Shift+Reload. + - Revision 1.26 2002/01/09 20:02:50 hal9 - Fix bug re: auto-detect config file changes. + + But now what about a situation where we get no explicit matches like + we did with: + - Revision 1.25 2002/01/09 18:20:30 hal9 - Touch ups for *.action files. + + - Revision 1.24 2001/12/02 01:13:42 hal9 - Fix typo. + { +block{Path starts with "ads".} +handle-as-image } + /ads + + - Revision 1.23 2001/12/02 00:20:41 hal9 - Updates for recent changes. + + That actually was very helpful and pointed us quickly to where the problem + was. If you don't get this kind of match, then it means one of the default + rules in the first section of default.action is causing + the problem. This would require some guesswork, and maybe a little trial and + error to isolate the offending rule. One likely cause would be one of the + +filter actions. + These tend to be harder to troubleshoot. + Try adding the URL for the site to one of aliases that turn off + +filter: + - Revision 1.22 2001/11/05 23:57:51 hal9 - Minor update for startup now daemon mode. + + - Revision 1.21 2001/10/31 21:11:03 hal9 - Correct 2 minor errors + { shop } + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + .forbes.com + + - Revision 1.18 2001/10/24 18:45:26 hal9 - *** empty log message *** + + { shop } is an alias that expands to + { -filter -session-cookies-only }. + Or you could do your own exception to negate filtering: - Revision 1.17 2001/10/24 17:10:55 hal9 - Catching up with Jon's recent work, and a few other things. + - Revision 1.16 2001/10/21 17:19:21 swa - wrong url in documentation + + - Revision 1.15 2001/10/14 23:46:24 hal9 - Various minor changes. Fleshed out SEE ALSO section. + { -filter } + # Disable ALL filter actions for sites in this section + .forbes.com + developer.ibm.com + localhost + + - Revision 1.13 2001/10/10 17:28:33 hal9 - Very minor changes. + + This would turn off all filtering for these sites. This is best + put in user.action, for local site + exceptions. Note that when a simple domain pattern is used by itself (without + the subsequent path portion), all sub-pages within that domain are included + automatically in the scope of the action. + - Revision 1.12 2001/09/28 02:57:04 hal9 - Ditto :/ + + Images that are inexplicably being blocked, may well be hitting the ++filter{banners-by-size} + rule, which assumes + that images of certain sizes are ad banners (works well + most of the time since these tend to be standardized). + - Revision 1.11 2001/09/28 02:25:20 hal9 - Ditto. + + { fragile } is an alias that disables most + actions that are the most likely to cause trouble. This can be used as a + last resort for problem sites. + + + - Revision 1.9 2001/09/27 23:50:29 hal9 - A few changes. A short section on regular expression in appendix. + { fragile } + # Handle with care: easy to break + mail.google. + mybank.example.com + - Revision 1.8 2001/09/25 00:34:59 hal9 - Some additions, and re-arranging. - Revision 1.7 2001/09/24 14:31:36 hal9 - Diddling. + + Remember to flush caches! Note that the + mail.google reference lacks the TLD portion (e.g. + .com). This will effectively match any TLD with + google in it, such as mail.google.de., + just as an example. + + + If this still does not work, you will have to go through the remaining + actions one by one to find which one(s) is causing the problem. + - Revision 1.6 2001/09/24 14:10:32 hal9 - Including David's OS/2 installation instructions. + - Revision 1.2 2001/09/13 15:27:40 swa - cosmetics + - Revision 1.1 2001/09/12 15:36:41 swa - source files for junkbuster documentation +