X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=ef21bd299b4a97f7363d096416ed4b4f83bcf755;hp=408a61281a326995a3692f6833dd2e2d47040a2f;hb=f6d1a7ca82613239a15439cc9b3613750d5f55c5;hpb=0428133610c525457cb16f7ac6a54203a2743d6c
diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml
index 408a6128..ef21bd29 100644
--- a/doc/source/user-manual.sgml
+++ b/doc/source/user-manual.sgml
@@ -11,8 +11,8 @@
-
-
+
+
@@ -34,9 +34,9 @@
This file belongs into
ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/
- $Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $
+ $Id: user-manual.sgml,v 2.159 2013/01/09 15:03:06 fabiankeil Exp $
- Copyright (C) 2001-2011 Privoxy Developers http://www.privoxy.org/
+ Copyright (C) 2001-2013 Privoxy Developers http://www.privoxy.org/
See LICENSE.
========================================================================
@@ -55,12 +55,12 @@
- Copyright &my-copy; 2001-2011 by
+ Copyright &my-copy; 2001-2013 by
Privoxy Developers
-$Id: user-manual.sgml,v 2.134 2011/08/18 11:45:02 fabiankeil Exp $
+$Id: user-manual.sgml,v 2.159 2013/01/09 15:03:06 fabiankeil Exp $
Mac OS X
- Unzip the downloaded file (you can either double-click on the zip file
- icon from the Finder, or from the desktop if you downloaded it there).
- Then, double-click on the package installer icon and follow the
- installation process.
+ Installation instructions for the OS X platform depend upon whether
+ you downloaded a ready-built installation package (.pkg or .mpkg) or have
+ downloaded the source code.
+
+
+
+Installation from ready-built package
+
+ The downloaded file will either be a .pkg (for OS X 10.5 upwards) or a bzipped
+ .mpkg file (for OS X 10.4). The former can be double-clicked as is and the
+ installation will start; double-clicking the latter will unzip the .mpkg file
+ which can then be double-clicked to commence the installation.
+
+
+ The privoxy service will automatically start after a successful installation
+ (and thereafter every time your computer starts up) however you will need to
+ configure your web browser(s) to use it. To do so, configure them to use a
+ proxy for HTTP and HTTPS at the address 127.0.0.1:8118.
+
+
+ To prevent the privoxy service from automatically starting when your computer
+ starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist
+ (on OS X 10.5 and higher) or the folder named
+ /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger').
+
+
+ To manually start or stop the privoxy service, use the scripts startPrivoxy.sh
+ and stopPrivoxy.sh supplied in /Applications/Privoxy. They must be run from an
+ administrator account, using sudo.
+
+
+ To uninstall, run /Applications/Privoxy/uninstall.command as sudo from an
+ administrator account.
+
+
+
+Installation from source
+
+ To build and install the Privoxy source code on OS X you will need to obtain
+ the macsetup module from the Privoxy Sourceforge CVS repository (refer to
+ Sourceforge help for details of how to set up a CVS client to have read-only
+ access to the repository). This module contains scripts that leverage the usual
+ open-source tools (available as part of Apple's free of charge Xcode
+ distribution or via the usual open-source software package managers for OS X
+ (MacPorts, Homebrew, Fink etc.) to build and then install the privoxy binary
+ and associated files. The macsetup module's README file contains complete
+ instructions for its use.
+
+
+ The privoxy service will automatically start after a successful installation
+ (and thereafter every time your computer starts up) however you will need to
+ configure your web browser(s) to use it. To do so, configure them to use a
+ proxy for HTTP and HTTPS at the address 127.0.0.1:8118.
- The privoxy service will automatically start after a successful
- installation (in addition to every time your computer starts up). To
- prevent the privoxy service from automatically starting when your
- computer starts up, remove or rename the folder named
- /Library/StartupItems/Privoxy .
+ To prevent the privoxy service from automatically starting when your computer
+ starts up, remove or rename the file /Library/LaunchDaemons/org.ijbswa.privoxy.plist
+ (on OS X 10.5 and higher) or the folder named
+ /Library/StartupItems/Privoxy (on OS X 10.4 'Tiger').
To manually start or stop the privoxy service, use the Privoxy Utility
- for Mac OS X. This application controls the privoxy service (e.g.
- starting and stopping the service as well as uninstalling the software).
+ for Mac OS X (also part of the macsetup module). This application can start
+ and stop the privoxy service and display its log and configuration files.
+
+
+ To uninstall, run the macsetup module's uninstall.sh as sudo from an
+ administrator account.
@@ -402,13 +454,6 @@ How to install the binary packages depends on your operating system:
Keeping your Installation Up-to-Date
-
- As user feedback comes in and development continues, we will make updated versions
- of both the main actions file (as a separate
- package ) and the software itself (including the actions file) available for
- download.
-
If you wish to receive an email notification whenever we release updates of
@@ -437,642 +482,1158 @@ How to install the binary packages depends on your operating system:
What's New in this Release
- Privoxy 3.0.17 is a stable release.
- The changes since 3.0.16 stable are:
+ Privoxy 3.0.19 is a stable release.
+ The changes since 3.0.18 stable are:
-
-
- Fixed last-chunk-detection for responses where the content was small
- enough to be read with the body, causing Privoxy to wait for the
- end of the content until the server closed the connection or the
- request timed out. Reported by "Karsten" in #3028326.
-
-
-
-
- Responses with status code 204 weren't properly detected as body-less
- like RFC2616 mandates. Like the previous bug, this caused Privoxy to
- wait for the end of the content until the server closed the connection
- or the request timed out. Fixes #3022042 and #3025553, reported by a
- user with no visible name. Most likely also fixes a bunch of other
- AJAX-related problem reports that got closed in the past due to
- insufficient information and lack of feedback.
-
-
-
-
- Fixed an ACL bug that made it impossible to build a blacklist.
- Usually the ACL directives are used in a whitelist, which worked
- as expected, but blacklisting is still useful for public proxies
- where one only needs to deny known abusers access.
-
-
-
-
- Added LOG_LEVEL_RECEIVED to log the not-yet-parsed data read from the
- network. This should make debugging various parsing issues a lot easier.
-
-
-
-
- The IPv6 code is enabled by default on Windows versions that support it.
- Patch submitted by oCameLo in #2942729.
-
-
-
-
- In mingw32 versions, the user.filter file is reachable through the
- GUI, just like default.filter is. Feature request 3040263.
-
-
-
-
- Added the configure option --enable-large-file-support to set a few
- defines that are required by platforms like GNU/Linux to support files
- larger then 2GB. Mainly interesting for users without proper logfile
- management.
-
-
-
-
- Logging with "debug 16" no longer stops at the first nul byte which is
- pretty useless. Non-printable characters are replaced with their hex value
- so the result can't span multiple lines making parsing them harder then
- necessary.
-
-
-
+
- Privoxy logs when reading an action, filter or trust file.
+ Bug fixes:
+
+
+
+ Prevent a segmentation fault when de-chunking buffered content.
+ It could be triggered by malicious web servers if Privoxy was
+ configured to filter the content and running on a platform
+ where SIZE_T_MAX isn't larger than UINT_MAX, which probably
+ includes most 32-bit systems. On those platforms, all Privoxy
+ versions before 3.0.19 appear to be affected.
+ To be on the safe side, this bug should be presumed to allow
+ code execution as proving that it doesn't seems unrealistic.
+
+
+
+
+ Do not expect a response from the SOCKS4/4A server until it
+ got something to respond to. This regression was introduced
+ in 3.0.18 and prevented the SOCKS4/4A negotiation from working.
+ Reported by qqqqqw in #3459781.
+
+
+
- Fixed incorrect regression test markup which caused a test in
- 3.0.16 to fail while Privoxy itself was working correctly.
- While Privoxy accepts hide-referer, too, the action name is actually
- hide-referrer which is also the name used one the final results page,
- where the test expected the alias.
+ General improvements:
+
+
+
+ Fix an off-by-one in an error message about connect failures.
+
+
+
+
+ Use a GNUMakefile variable for the webserver root directory and
+ update the path. Sourceforge changed it which broke various
+ web-related targets.
+
+
+
+
+ Update the CODE_STATUS description.
+
+
+
+
+
+
+
+ The following changes were made between 3.0.17 and 3.0.18:
+
+
+
+
- CGI interface improvements:
+ Bug fixes:
- In finish_http_response(), continue to add the 'Connection: close'
- header if the client connection will not be kept alive.
- Anonymously pointed out in #2987454.
+ If a generated redirect URL contains characters RFC 3986 doesn't
+ permit, they are (re)encoded. Not doing this makes Privoxy versions
+ from 3.0.5 to 3.0.17 susceptible to HTTP response splitting (CWE-113)
+ attacks if the +fast-redirects{check-decoded-url} action is used.
- Apostrophes in block messages no longer cause parse errors
- when the blocked page is viewed with JavaScript enabled.
- Reported by dg1727 in #3062296.
+ Fix a logic bug that could cause Privoxy to reuse a server
+ socket after it got tainted by a server-header-tagger-induced
+ block that was triggered before the whole server response had
+ been read. If keep-alive was enabled and the request following
+ the blocked one was to the same host and using the same forwarding
+ settings, Privoxy would send it on the tainted server socket.
+ While the server would simply treat it as a pipelined request,
+ Privoxy would later on fail to properly parse the server's
+ response as it would try to parse the unread data from the
+ first response as server headers for the second one.
+ Regression introduced in 3.0.17.
- Fix a bunch of anchors that used underscores instead of dashes.
+ When implying keep-alive in client_connection(), remember that
+ the client didn't. Fixes a regression introduced in 3.0.13 that
+ would cause Privoxy to wait for additional client requests after
+ receiving a HTTP/1.1 request with "Connection: close" set
+ and connection sharing enabled.
+ With clients which terminates the client connection after detecting
+ that the whole body has been received it doesn't really matter,
+ but with clients that don't the connection would be kept open until
+ it timed out.
- Allow to keep the client connection alive after crunching the previous request.
- Already opened server connections can be kept alive, too.
+ Fix a subtle race condition between prepare_csp_for_next_request()
+ and sweep(). A thread preparing itself for the next client request
+ could briefly appear to be inactive.
+ If all other threads were already using more recent files,
+ the thread could get its files swept away under its feet.
+ So far this has only been reproduced while stress testing in
+ valgrind while touching action files in a loop. It's unlikely
+ to have caused any actual problems in the real world.
- In cgi_show_url_info(), don't forget to prefix URLs that only contain
- http:// or https:// in the path. Fixes #2975765 reported by Adam Piggott.
+ Disable filters if SDCH compression is used unless filtering is forced.
+ If SDCH was combined with a supported compression algorithm, Privoxy
+ previously could try to decompress it and ditch the Content-Encoding
+ header even though the SDCH compression wasn't dealt with.
+ Reported by zebul666 in #3225863.
- Show the 404 CGI page if cgi_send_user_manual() is called while
- local user manual delivery is disabled.
+ Make a copy of the --user value and only mess with that when splitting
+ user and group. On some operating systems modifying the value directly
+ is reflected in the output of ps and friends and can be misleading.
+ Reported by zepard in #3292710.
-
+
+
+ If forwarded-connect-retries is set, only retry if Privoxy is actually
+ forwarding the request. Previously direct connections would be retried
+ as well.
+
+
+
+
+ Fixed a small memory leak when retrying connections with IPv6
+ support enabled.
+
+
+
+
+ Remove an incorrect assertion in compile_dynamic_pcrs_job_list()
+ It could be triggered by a pcrs job with an invalid pcre
+ pattern (for example one that contains a lone quantifier).
+
+
+
+
+ If the --user argument user[.group] contains a dot, always bail out
+ if no group has been specified. Previously the intended, but undocumented
+ (and apparently untested), behaviour was to try interpreting the whole
+ argument as user name, but the detection was flawed and checked for '0'
+ instead of '\0', thus merely preventing group names beginning with a zero.
+
+
+
+
+ In html_code_map[], use a numeric character reference instead of '
+ which wasn't standardized before XHTML 1.0.
+
+
+
+
+ Fix an invalid free when compiled with FEATURE_GRACEFUL_TERMINATION
+ and shut down through http://config.privoxy.org/die
+
+
+
+
+ In get_actions(), fix the "temporary" backwards compatibility hack
+ to accept block actions without reason.
+ It also covered other actions that should be rejected as invalid.
+ Reported by Billy Crook.
+
+
+
- Action file improvements:
+ General improvements:
- Enable user.filter by default. Suggested by David White in #3001830.
+ Privoxy can (re)compress buffered content before delivering
+ it to the client. Disabled by default as most users wouldn't
+ benefit from it.
- Block .sitestat.com/. Reported by johnd16 in #3002725.
+ The +fast-redirects{check-decoded-url} action checks URL
+ segments separately. If there are other parameters behind
+ the redirect URL, this makes it unnecessary to cut them off
+ by additionally using a +redirect{} pcrs command.
+ Initial patch submitted by Jamie Zawinski in #3429848.
- Block .atemda.com/. Reported by johnd16 in #3002723.
+ When loading action sections, verify that the referenced filters
+ exist. Currently missing filters only result in an error message,
+ but eventually the severity will be upgraded to fatal.
- Block js.adlink.net/. Reported by johnd16 in #3002720.
+ Allow to bind to multiple separate addresses.
+ Patch set submitted by Petr Pisar in #3354485.
- Block .analytics.yahoo.com/. Reported by johnd16 in #3002713.
+ Set socket_error to errno if connecting fails in rfc2553_connect_to().
+ Previously rejected direct connections could be incorrectly reported
+ as DNS issues if Privoxy was compiled with IPv6 support.
- Block sb.scorecardresearch.com, too. Reported by dg1727 in #2992652.
+ Adjust url_code_map[] so spaces are replaced with %20 instead of '+'
+ While '+' can be used by client's submitting form data, this is not
+ actually what Privoxy is using the lookups for. This is more of a
+ cosmetic issue and doesn't fix any known problems.
- Fix problems noticed on Yahoo mail and news pages.
+ When compiled without FEATURE_FAST_REDIRECTS, do not silently
+ ignore +fast-redirect{} directives
- Remove the too broad yahoo section, only keeping the
- fast-redirects exception as discussed on ijbswa-devel@.
+ Added a workaround for GNU libc's strptime() reporting negative
+ year values when the parsed year is only specified with two digits.
+ On affected systems cookies with such a date would not be turned
+ into session cookies by the +session-cookies-only action.
+ Reported by Vaeinoe in #3403560
- Don't block adesklets.sourceforge.net. Reported in #2974204.
+ Fixed bind failures with certain GNU libc versions if no non-loopback
+ IP address has been configured on the system. This is mainly an issue
+ if the system is using DHCP and Privoxy is started before the network
+ is completely configured.
+ Reported by Raphael Marichez in #3349356.
+ Additional insight from Petr Pisar.
- Block chartbeat ping tracking. Reported in #2975895.
+ Privoxy log messages now use the ISO 8601 date format %Y-%m-%d.
+ It's only slightly longer than the old format, but contains
+ the full date including the year and allows sorting by date
+ (when grepping in multiple log files) without hassle.
- Tag CSS and image requests with cautious and medium settings, too.
+ In get_last_url(), do not bother trying to decode URLs that do
+ not contain at least one '%' sign. It reduces the log noise and
+ a number of unnecessary memory allocations.
- Don't handle view.atdmt.com as image. It's used for click-throughs
- so users should be able to "go there anyway".
- Reported by Adam Piggott in #2975927.
+ In case of SOCKS5 failures, dump the socks response in the log message.
- Also let the refresh-tags filter remove invalid refresh tags where
- the 'url=' part is missing. Anonymously reported in #2986382.
- While at it, update the description to mention the fact that only
- refresh tags with refresh times above 9 seconds are covered.
+ Simplify the signal setup in main().
- javascript needs to be blocked with +handle-as-empty-document to
- work around Firefox bug 492459. So move .js blockers from
- +block{Might be a web-bug.} -handle-as-empty-document to
- +block{Might be a web-bug.} +handle-as-empty-document.
+ Streamline socks5_connect() slightly.
- ijbswa-Feature Requests-3006719 - Block 160x578 Banners.
+ In socks5_connect(), require a complete socks response from the server.
+ Previously Privoxy didn't care how much data the server response
+ contained as long as the first two bytes contained the expected
+ values. While at it, shrink the buffer size so Privoxy can't read
+ more than a whole socks response.
- Block another omniture tracking domain.
+ In chat(), do not bother to generate a client request in case of
+ direct CONNECT requests. It will not be used anyway.
- Added a range-requests tagger.
+ Reduce server_last_modified()'s stack size.
- Added two sections to get Flickr's Ajax interface working with
- default pre-settings. If you change the configuration to block
- cookies by default, you'll need additional exceptions.
- Reported by Mathias Homann in #3101419 and by Patrick on ijbswa-users@.
+ Shorten get_http_time() by using strftime().
-
-
-
-
-
- Documentation improvements:
-
- Explicitly mention how to match all URLs.
+ Constify the known_http_methods pointers in unknown_method().
- Consistently recommend socks5 in the Tor FAQ entry and mention
- its advantage compared to socks4a. Reported by David in #2960129.
+ Constify the time_formats pointers in parse_header_time().
- Slightly improve the explanation of why filtering may appear
- slower than it is.
+ Constify the formerly_valid_actions pointers in action_used_to_be_valid().
- Grammar fixes for the ACL section.
+ Introduce a GNUMakefile MAN_PAGE variable that defaults to privoxy.1.
+ The Debian package uses section 8 for the man page and this
+ should simplify the patch.
- Fixed a link to the 'intercepting' entry and add another one.
+ Deduplicate the INADDR_NONE definition for Solaris by moving it to jbsockets.h
- Rename the 'Other' section to 'Mailing Lists' and reword it
- to make it clear that nobody is forced to use the trackers
+ In block_url(), ditch the obsolete workaround for ancient Netscape versions
+ that supposedly couldn't properly deal with status code 403.
- Note that 'anonymously' posting on the trackers may not always
- be possible.
+ Remove a useless NULL pointer check in load_trustfile().
- Suggest to enable debug 32768 when suspecting parsing problems.
+ Remove two useless NULL pointer checks in load_one_re_filterfile().
+
+
+
+
+ Change url_code_map[] from an array of pointers to an array of arrays
+ It removes an unnecessary layer of indirection and on 64bit system reduces
+ the size of the binary a bit.
+
+
+
+
+ Fix various typos. Fixes taken from Debian's 29_typos.dpatch by Roland Rosenfeld.
+
+
+
+
+ Add a dok-tidy GNUMakefile target to clean up the messy HTML
+ generated by the other dok targets.
+
+
+
+
+ GNUisms in the GNUMakefile have been removed.
+
+
+
+
+ Change the HTTP version in static responses to 1.1
+
+
+
+
+ Synced config.sub and config.guess with upstream
+ 2011-11-11/386c7218162c145f5f9e1ff7f558a3fbb66c37c5.
+
+
+
+
+ Add a dedicated function to parse the values of toggles. Reduces duplicated
+ code in load_config() and provides better error handling. Invalid or missing
+ toggle values are now a fatal error instead of being silently ignored.
+
+
+
+
+ Terminate HTML lines in static error messages with \n instead of \r\n.
+
+
+
+
+ Simplify cgi_error_unknown() a bit.
+
+
+
+
+ In LogPutString(), don't bother looking at pszText when not
+ actually logging anything.
-
-
-
-
-
- Privoxy-Log-Parser improvements:
-
- Gather statistics for ressources, methods, and HTTP versions
- used by the client.
+ Change ssplit()'s fourth parameter from int to size_t.
+ Fixes a clang complaint.
- Also gather statistics for blocked and redirected requests.
+ Add a warning that the statistics currently can't be trusted.
+ Mention Privoxy-Log-Parser's --statistics option as
+ an alternative for the time being.
- Provide the percentage of keep-alive offers the client accepted.
+ In rfc2553_connect_to(), start setting cgi->error_message on error.
- Add a --url-statistics-threshold option.
+ Change the expected status code returned for http://p.p/die depending
+ on whether or not FEATURE_GRACEFUL_TERMINATION is available.
- Add a --host-statistics-threshold option to also gather
- statistics about how many request where made per host.
+ In cgi_die(), mark the client connection for closing.
+ If the client will fetch the style sheet through another connection
+ it gets the main thread out of the accept() state and should thus
+ trigger the actual shutdown.
- Fix a bug in handle_loglevel_header() where a 'scan: ' got lost.
+ Add a proper CGI message for cgi_die().
- Add a --shorten-thread-ids option to replace the thread id with
- a decimal number.
+ Don't enforce a logical line length limit in read_config_line().
- Accept and ignore: Looks like we got the last chunk together
- with the server headers. We better stop reading.
+ Slightly refactor server_last_modified() to remove useless gmtime*() calls.
- Accept and ignore: Continue hack in da house.
+ In get_content_type(), also recognize '.jpeg' as JPEG extension.
- Accept and higlight: Rejecting connection from 10.0.0.2.
- Maximum number of connections reached.
+ Add '.png' to the list of recognized file extensions in get_content_type().
- Accept and highlight: Loading actions file: /usr/local/etc/privoxy/default.action
+ In block_url(), consistently use the block reason "Request blocked by Privoxy"
+ In two places the reason was "Request for blocked URL" which hides the
+ fact that the request got blocked by Privoxy and isn't necessarily
+ correct as the block may be due to tags.
- Accept and highlight: Loading filter file: /usr/local/etc/privoxy/default.filter
+ In listen_loop(), reload the configuration files after accepting
+ a new connection instead of before.
+ Previously the first connection that arrived after a configuration
+ change would still be handled with the old configuration.
- Accept and highlight: Killed all-caps Host header line: HOST: bestproxydb.com
+ In chat()'s receive-data loop, skip a client socket check if
+ the socket will be written to right away anyway. This can
+ increase the transfer speed for unfiltered content on fast
+ network connections.
- Accept and highlight: Reducing expected bytes to 0. Marking
- the server socket tainted after throwing 4 bytes away.
+ The socket timeout is used for SOCKS negotiations as well which
+ previously couldn't timeout.
- Accept: Merged multiple header lines to: 'X-FORWARDED-PROTO: http X-HOST: 127.0.0.1'
+ Don't keep the client connection alive if any configuration file
+ changed since the time the connection came in. This is closer to
+ Privoxy's behaviour before keep-alive support for client connection
+ has been added and also less confusing in general.
+
+
+ Treat all Content-Type header values containing the pattern
+ 'script' as a sign of text. Reported by pribog in #3134970.
+
+
- Code cleanups:
+ Action file improvements:
- Remove the next member from the client_state struct. Only the main
- thread needs access to all client states so give it its own struct.
+ Moved the site-specific block pattern section below the one for the
+ generic patterns so for requests that are matched in both, the block
+ reason for the domain is shown which is usually more useful than showing
+ the one for the generic pattern.
- Garbage-collect request_contains_null_bytes().
+ Remove -prevent-compression from the fragile alias. It's no longer
+ used anywhere by default and isn't known to break stuff anyway.
- Ditch redundant code in unload_configfile().
+ Add a (disabled) section to block various Facebook tracking URLs.
+ Reported by Dan Stahlke in #3421764.
- Ditch LogGetURLUnderCursor() which doesn't seem to be used anywhere.
+ Add a (disabled) section to rewrite and redirect click-tracking
+ URLs used on news.google.com.
+ Reported by Dan Stahlke in #3421755.
- In write_socket(), remove the write-only variable write_len in
- an ifdef __OS2__ block. Spotted by cppcheck.
+ Unblock linuxcounter.net/.
+ Reported by Dan Stahlke in #3422612.
- In connect_to(), don't declare the variable 'flags' on OS/2 where
- it isn't used. Spotted by cppcheck.
+ Block 'www91.intel.com/' which is used by Omniture.
+ Reported by Adam Piggott in #3167370.
- Limit the scope of various variables. Spotted by cppcheck.
+ Disable the handle-as-empty-doc-returns-ok option and mark it as deprecated.
+ Reminded by tceverling in #2790091.
- In add_to_iob(), turn an interestingly looking for loop into a
- boring while loop.
+ Add ".ivwbox.de/" to the "Cross-site user tracking" section.
+ Reported by Nettozahler in #3172525.
- Code cleanup in preparation for external filters.
+ Unblock and fast-redirect ".awin1.com/.*=http://".
+ Reported by Adam Piggott in #3170921.
- In listen_loop(), mention the socket on which we accepted the
- connection, not just the source IP address.
+ Block "b.collective-media.net/".
- In write_socket(), also log the socket we're writing to.
+ Widen the Debian popcon exception to "qa.debian.org/popcon".
+ Seen in Debian's 05_default_action.dpatch by Roland Rosenfeld.
- In log_error(), assert that escaped characters get logged
- completely or not at all.
+ Block ".gemius.pl/" which only seems to be used for user tracking.
+ Reported by johnd16 in #3002731. Additional input from Lee and movax.
- In log_error(), assert that ival and sval have reasonable values.
- There's no reason not to abort() if they don't.
+ Disable banners-by-size filters for '.thinkgeek.com/'.
+ The filter only seems to catch pictures of the inventory.
- Remove an incorrect cgi_error_unknown() call in a
- cannot-happen-situation in send_crunch_response().
+ Block requests for 'go.idmnet.bbelements.com/please/showit/'.
+ Reported by kacperdominik in #3372959.
- Clean up white-space in http_response definition and
- move the crunch_reason to the beginning.
+ Unblock adainitiative.org/.
- Turn http_response.reason into an enum and rename it
- to http_response.crunch_reason.
+ Add a fast-redirects exception for '.googleusercontent.com/.*=cache'.
- Silence a 'gcc (Debian 4.3.2-1.1) 4.3.2' warning on i686 GNU/Linux.
+ Add a fast-redirects exception for webcache.googleusercontent.com/.
- Fix white-space in a log message in remove_chunked_transfer_coding().
- While at it, add a note that the message doesn't seem to
- be entirely correct and should be improved later on.
+ Unblock http://adassier.wordpress.com/ and http://adassier.files.wordpress.com/.
-
+
- GNUmakefile improvements:
+ Filter file improvements:
- Use $(SSH) instead of ssh, so one only needs to specify a username once.
+ Let the yahoo filter hide '.ads'.
+
+
+
+
+ Let the msn filter hide overlay ads for Facebook 'likes' in search
+ results and elements with the id 's_notf_div'. They only seem to be
+ used to advertise site 'enhancements'.
- Removed references to the action feedback thingy that hasn't been
- working for years.
+ Let the js-events filter additionally disarm setInterval().
+ Suggested by dg1727 in #3423775.
+
+
+
+
+
+
+
+ Documentation improvements:
+
+
+
+ Clarify the effect of compiling Privoxy with zlib support.
+ Suggested by dg1727 in #3423782.
- Consistently use shell.sourceforge.net instead of shell.sf.net so
- one doesn't need to check server fingerprints twice.
+ Point out that the SourceForge messaging system works like a black
+ hole and should thus not be used to contact individual developers.
- Removed GNUisms in the webserver and webactions targets so they
- work with standard tar.
+ Mention some of the problems one can experience when not explicitly
+ configuring an IP addresses as listen address.
+
+
+ Explicitly mention that hostnames can be used instead of IP addresses
+ for the listen-address, that only the first address returned will be
+ used and what happens if the address is invalid.
+ Requested by Calestyo in #3302213.
+
+
-
-
-
-
-
-
-
-Note to Upgraders
-
-
- A quick list of things to be aware of before upgrading from earlier
- versions of Privoxy :
-
-
-
-
-
-
-
- The recommended way to upgrade &my-app; is to backup your old
- configuration files, install the new ones, verify that &my-app;
- is working correctly and finally merge back your changes using
- diff and maybe patch .
-
-
- There are a number of new features in each &my-app; release and
- most of them have to be explicitly enabled in the configuration
- files. Old configuration files obviously don't do that and due
- to syntax changes using old configuration files with a new
- &my-app; isn't always possible anyway.
-
-
-
-
- Note that some installers remove earlier versions completely,
- including configuration files, therefore you should really save
- any important configuration files!
-
-
-
-
- On the other hand, other installers don't overwrite existing configuration
- files, thinking you will want to do that yourself.
-
-
-
-
- standard.action has been merged into
- the default.action file.
-
-
-
-
- In the default configuration only fatal errors are logged now.
- You can change that in the debug section
- of the configuration file. You may also want to enable more verbose
- logging until you verified that the new &my-app; version is working
- as expected.
-
-
-
-
-
- Three other config file settings are now off by default:
- enable-remote-toggle,
- enable-remote-http-toggle,
- and enable-edit-actions.
- If you use or want these, you will need to explicitly enable them, and
- be aware of the security issues involved.
-
-
-
-
-
+
+
+Note to Upgraders
+
+
+ A quick list of things to be aware of before upgrading from earlier
+ versions of Privoxy :
+
+
+
+
+
+
+
+ The recommended way to upgrade &my-app; is to backup your old
+ configuration files, install the new ones, verify that &my-app;
+ is working correctly and finally merge back your changes using
+ diff and maybe patch .
+
+
+ There are a number of new features in each &my-app; release and
+ most of them have to be explicitly enabled in the configuration
+ files. Old configuration files obviously don't do that and due
+ to syntax changes using old configuration files with a new
+ &my-app; isn't always possible anyway.
+
+
+
+
+ Note that some installers remove earlier versions completely,
+ including configuration files, therefore you should really save
+ any important configuration files!
+
+
+
+
+ On the other hand, other installers don't overwrite existing configuration
+ files, thinking you will want to do that yourself.
+
+
+
+
+ standard.action has been merged into
+ the default.action file.
+
+
+
+
+ In the default configuration only fatal errors are logged now.
+ You can change that in the debug section
+ of the configuration file. You may also want to enable more verbose
+ logging until you verified that the new &my-app; version is working
+ as expected.
+
+
+
+
+
+ Three other config file settings are now off by default:
+ enable-remote-toggle,
+ enable-remote-http-toggle,
+ and enable-edit-actions.
+ If you use or want these, you will need to explicitly enable them, and
+ be aware of the security issues involved.
+
+
+
+
+
+
+limit-cookie-lifetime
+
+
+
+ Typical use:
+
+ Limit the lifetime of HTTP cookies to a couple of minutes or hours.
+
+
+
+
+ Effect:
+
+
+ Overwrites the expires field in Set-Cookie server headers if it's above the specified limit.
+
+
+
+
+
+ Type:
+
+
+ Parameterized.
+
+
+
+
+ Parameter:
+
+
+ The lifetime limit in minutes, or 0.
+
+
+
+
+
+ Notes:
+
+
+ This action reduces the lifetime of HTTP cookies coming from the
+ server to the specified number of minutes, starting from the time
+ the cookie passes Privoxy.
+
+
+ Cookies with a lifetime below the limit are not modified.
+ The lifetime of session cookies is set to the specified limit.
+
+
+ The effect of this action depends on the server.
+
+
+ In case of servers which refresh their cookies with each response
+ (or at least frequently), the lifetime limit set by this action
+ is updated as well.
+ Thus, a session associated with the cookie continues to work with
+ this action enabled, as long as a new request is made before the
+ last limit set is reached.
+
+
+ However, some servers send their cookies once, with a lifetime of several
+ years (the year 2037 is a popular choice), and do not refresh them
+ until a certain event in the future, for example the user logging out.
+ In this case this action may limit the absolute lifetime of the session,
+ even if requests are made frequently.
+
+
+ If the parameter is 0
, this action behaves like
+ session-cookies-only .
+
+
+
+
+
+ Example usages:
+
+
+ +limit-cookie-lifetime{60}
+
+
+
+
+
+
+
prevent-compression
@@ -5719,6 +6415,10 @@ new action
either provided as parameter, or derived by applying a
single pcrs command to the original URL.
+
+ The syntax for pcrs commands is documented in the
+ filter file section.
+
This action will be ignored if you use it together with
block .
@@ -6149,3740 +6849,2769 @@ example.org/instance-that-is-delivered-as-xml-but-is-not
-
-
-
-Summary
-
- Note that many of these actions have the potential to cause a page to
- misbehave, possibly even not to display at all. There are many ways
- a site designer may choose to design his site, and what HTTP header
- content, and other criteria, he may depend on. There is no way to have hard
- and fast rules for all sites. See the Appendix for a brief example on troubleshooting
- actions.
-
-
-
-
-
-
-Aliases
-
- Custom actions
, known to Privoxy
- as aliases
, can be defined by combining other actions.
- These can in turn be invoked just like the built-in actions.
- Currently, an alias name can contain any character except space, tab,
- =
,
- {
and }
, but we strongly
- recommend that you only use a
to z
,
- 0
to 9
, +
, and -
.
- Alias names are not case sensitive, and are not required to start with a
- +
or -
sign, since they are merely textually
- expanded.
-
-
- Aliases can be used throughout the actions file, but they must be
- defined in a special section at the top of the file!
- And there can only be one such section per actions file. Each actions file may
- have its own alias section, and the aliases defined in it are only visible
- within that file.
-
-
- There are two main reasons to use aliases: One is to save typing for frequently
- used combinations of actions, the other one is a gain in flexibility: If you
- decide once how you want to handle shops by defining an alias called
- shop
, you can later change your policy on shops in
- one place, and your changes will take effect everywhere
- in the actions file where the shop
alias is used. Calling aliases
- by their purpose also makes your actions files more readable.
-
-
- Currently, there is one big drawback to using aliases, though:
- Privoxy 's built-in web-based action file
- editor honors aliases when reading the actions files, but it expands
- them before writing. So the effects of your aliases are of course preserved,
- but the aliases themselves are lost when you edit sections that use aliases
- with it.
-
-
-
- Now let's define some aliases...
-
-
-
-
- # Useful custom aliases we can use later.
- #
- # Note the (required!) section header line and that this section
- # must be at the top of the actions file!
- #
- {{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
- -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
- +block-as-image = +block{Blocked image.} +handle-as-image
- allow-all-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer - prevent-compression
-
- shop = -crunch-all-cookies - filter{all-popups}
-
- # Short names for other aliases, for really lazy people ;-)
- #
- c0 = +crunch-all-cookies
- c1 = -crunch-all-cookies
-
-
-
- ...and put them to use. These sections would appear in the lower part of an
- actions file and define exceptions to the default actions (as specified further
- up for the /
pattern):
-
-
-
-
- # These sites are either very complex or very keen on
- # user data and require minimal interference to work:
- #
- {fragile}
- .office.microsoft.com
- .windowsupdate.microsoft.com
- # Gmail is really mail.google.com, not gmail.com
- mail.google.com
-
- # Shopping sites:
- # Allow cookies (for setting and retrieving your customer data)
- #
- {shop}
- .quietpc.com
- .worldpay.com # for quietpc.com
- mybank.example.com
-
- # These shops require pop-ups:
- #
- {-filter{all-popups} -filter{unsolicited-popups}}
- .dabs.com
- .overclockers.co.uk
-
-
-
- Aliases like shop
and fragile
are typically used for
- problem
sites that require more than one action to be disabled
- in order to function properly.
-
-
-
-
-
-Actions Files Tutorial
-
- The above chapters have shown which actions files
- there are and how they are organized, how actions are specified and applied
- to URLs, how patterns work, and how to
- define and use aliases. Now, let's look at an
- example match-all.action , default.action
- and user.action file and see how all these pieces come together:
-
-
-
-match-all.action
-
- Remember all actions are disabled when matching starts ,
- so we have to explicitly enable the ones we want.
-
-
-
- While the match-all.action file only contains a
- single section, it is probably the most important one. It has only one
- pattern, /
, but this pattern
- matches all URLs. Therefore, the set of
- actions used in this default
section will
- be applied to all requests as a start . It can be partly or
- wholly overridden by other actions files like default.action
- and user.action , but it will still be largely responsible
- for your overall browsing experience.
-
-
-
- Again, at the start of matching, all actions are disabled, so there is
- no need to disable any actions here. (Remember: a +
- preceding the action name enables the action, a -
disables!).
- Also note how this long line has been made more readable by splitting it into
- multiple lines with line continuation.
-
-
-
-
-{ \
- + change-x-forwarded-for{block} \
- + hide-from-header{block} \
- + set-image-blocker{pattern} \
-}
-/ # Match all URLs
-
-
-
-
- The default behavior is now set.
-
-
-
-
-default.action
-
-
- If you aren't a developer, there's no need for you to edit the
- default.action file. It is maintained by
- the &my-app; developers and if you disagree with some of the
- sections, you should overrule them in your user.action .
-
-
-
- Understanding the default.action file can
- help you with your user.action , though.
-
-
-
- The first section in this file is a special section for internal use
- that prevents older &my-app; versions from reading the file:
-
-
-
-
-##########################################################################
-# Settings -- Don't change! For internal Privoxy use ONLY.
-##########################################################################
-{{settings}}
-for-privoxy-version=3.0.11
-
-
-
- After that comes the (optional) alias section. We'll use the example
- section from the above chapter on aliases,
- that also explains why and how aliases are used:
-
-
-
-
-##########################################################################
-# Aliases
-##########################################################################
-{{alias}}
-
- # These aliases just save typing later:
- # (Note that some already use other aliases!)
- #
- +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
- -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
- +block-as-image = +block{Blocked image.} +handle-as-image
- mercy-for-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
- # These aliases define combinations of actions
- # that are useful for certain types of sites:
- #
- fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer
- shop = -crunch-all-cookies - filter{all-popups}
-
-
-
- The first of our specialized sections is concerned with fragile
- sites, i.e. sites that require minimum interference, because they are either
- very complex or very keen on tracking you (and have mechanisms in place that
- make them unusable for people who avoid being tracked). We will simply use
- our pre-defined fragile alias instead of stating the list
- of actions explicitly:
-
-
-
-
-##########################################################################
-# Exceptions for sites that'll break under the default action set:
-##########################################################################
-
-# "Fragile" Use a minimum set of actions for these sites (see alias above):
-#
-{ fragile }
-.office.microsoft.com # surprise, surprise!
-.windowsupdate.microsoft.com
-mail.google.com
-
-
-
- Shopping sites are not as fragile, but they typically
- require cookies to log in, and pop-up windows for shopping
- carts or item details. Again, we'll use a pre-defined alias:
-
-
-
-
-# Shopping sites:
-#
-{ shop }
-.quietpc.com
-.worldpay.com # for quietpc.com
-.jungle.com
-.scan.co.uk
-
-
-
- The fast-redirects
- action, which may have been enabled in match-all.action ,
- breaks some sites. So disable it for popular sites where we know it misbehaves:
-
-
-
-
-{ - fast-redirects }
-login.yahoo.com
-edit.*.yahoo.com
-.google.com
-.altavista.com/.*(like|url|link):http
-.altavista.com/trans.*urltext=http
-.nytimes.com
-
-
-
- It is important that Privoxy knows which
- URLs belong to images, so that if they are to
- be blocked, a substitute image can be sent, rather than an HTML page.
- Contacting the remote site to find out is not an option, since it
- would destroy the loading time advantage of banner blocking, and it
- would feed the advertisers information about you. We can mark any
- URL as an image with the handle-as-image action,
- and marking all URLs that end in a known image file extension is a
- good start:
-
-
-
-
-##########################################################################
-# Images:
-##########################################################################
-
-# Define which file types will be treated as images, in case they get
-# blocked further down this file:
-#
-{ + handle-as-image }
-/.*\.(gif|jpe?g|png|bmp|ico)$
-
-
-
- And then there are known banner sources. They often use scripts to
- generate the banners, so it won't be visible from the URL that the
- request is for an image. Hence we block them and
- mark them as images in one go, with the help of our
- +block-as-image alias defined above. (We could of
- course just as well use + block
- + handle-as-image here.)
- Remember that the type of the replacement image is chosen by the
- set-image-blocker
- action. Since all URLs have matched the default section with its
- + set-image-blocker{pattern}
- action before, it still applies and needn't be repeated:
-
-
-
-
-# Known ad generators:
-#
-{ +block-as-image }
-ar.atwola.com
-.ad.doubleclick.net
-.ad.*.doubleclick.net
-.a.yimg.com/(?:(?!/i/).)*$
-.a[0-9].yimg.com/(?:(?!/i/).)*$
-bs*.gsanet.com
-.qkimg.net
-
-
-
- One of the most important jobs of Privoxy
- is to block banners. Many of these can be blocked
- by the filter{banners-by-size}
- action, which we enabled above, and which deletes the references to banner
- images from the pages while they are loaded, so the browser doesn't request
- them anymore, and hence they don't need to be blocked here. But this naturally
- doesn't catch all banners, and some people choose not to use filters, so we
- need a comprehensive list of patterns for banner URLs here, and apply the
- block action to them.
-
-
- First comes many generic patterns, which do most of the work, by
- matching typical domain and path name components of banners. Then comes
- a list of individual patterns for specific sites, which is omitted here
- to keep the example short:
-
-
-
-
-##########################################################################
-# Block these fine banners:
-##########################################################################
-{ +block{Banner ads.} }
-
-# Generic patterns:
-#
-ad*.
-.*ads.
-banner?.
-count*.
-/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
-/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
-
-# Site-specific patterns (abbreviated):
-#
-.hitbox.com
-
-
-
- It's quite remarkable how many advertisers actually call their banner
- servers ads.company .com, or call the directory
- in which the banners are stored simply banners
. So the above
- generic patterns are surprisingly effective.
-
-
- But being very generic, they necessarily also catch URLs that we don't want
- to block. The pattern .*ads. e.g. catches
- nasty-ads .nasty-corp.com
as intended,
- but also downloads .sourcefroge.net
or
- ads l.some-provider.net.
So here come some
- well-known exceptions to the + block
- section above.
-
-
- Note that these are exceptions to exceptions from the default! Consider the URL
- downloads.sourcefroge.net
: Initially, all actions are deactivated,
- so it wouldn't get blocked. Then comes the defaults section, which matches the
- URL, but just deactivates the block
- action once again. Then it matches .*ads. , an exception to the
- general non-blocking policy, and suddenly
- +block applies. And now, it'll match
- .*loads. , where -block
- applies, so (unless it matches again further down) it ends up
- with no block action applying.
-
-
-
-
-##########################################################################
-# Save some innocent victims of the above generic block patterns:
-##########################################################################
-
-# By domain:
-#
-{ - block }
-adv[io]*. # (for advogato.org and advice.*)
-adsl. # (has nothing to do with ads)
-adobe. # (has nothing to do with ads either)
-ad[ud]*. # (adult.* and add.*)
-.edu # (universities don't host banners (yet!))
-.*loads. # (downloads, uploads etc)
-
-# By path:
-#
-/.*loads/
-
-# Site-specific:
-#
-www.globalintersec.com/adv # (adv = advanced)
-www.ugu.com/sui/ugu/adv
-
-
-
- Filtering source code can have nasty side effects,
- so make an exception for our friends at sourceforge.net,
- and all paths with cvs
in them. Note that
- - filter
- disables all filters in one fell swoop!
-
-
-
-
-# Don't filter code!
-#
-{ - filter }
-/(.*/)?cvs
-bugzilla.
-developer.
-wiki.
-.sourceforge.net
-
-
-
- The actual default.action is of course much more
- comprehensive, but we hope this example made clear how it works.
-
-
-
-
-user.action
-
-
- So far we are painting with a broad brush by setting general policies,
- which would be a reasonable starting point for many people. Now,
- you might want to be more specific and have customized rules that
- are more suitable to your personal habits and preferences. These would
- be for narrowly defined situations like your ISP or your bank, and should
- be placed in user.action , which is parsed after all other
- actions files and hence has the last word, over-riding any previously
- defined actions. user.action is also a
- safe place for your personal settings, since
- default.action is actively maintained by the
- Privoxy developers and you'll probably want
- to install updated versions from time to time.
-
-
-
- So let's look at a few examples of things that one might typically do in
- user.action :
-
-
-
-
-
-
-
-# My user.action file. <fred@example.com>
-
-
-
- As aliases are local to the actions
- file that they are defined in, you can't use the ones from
- default.action , unless you repeat them here:
-
-
-
-
-# Aliases are local to the file they are defined in.
-# (Re-)define aliases for this file:
-#
-{{alias}}
-#
-# These aliases just save typing later, and the alias names should
-# be self explanatory.
-#
-+crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
--crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
- allow-all-cookies = -crunch-all-cookies -session-cookies-only
- allow-popups = -filter{all-popups}
-+block-as-image = +block{Blocked as image.} +handle-as-image
--block-as-image = -block
-
-# These aliases define combinations of actions that are useful for
-# certain types of sites:
-#
-fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
-shop = -crunch-all-cookies allow-popups
-
-# Allow ads for selected useful free sites:
-#
-allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
-
-# Alias for specific file types that are text, but might have conflicting
-# MIME types. We want the browser to force these to be text documents.
-handle-as-text = - filter +- content-type-overwrite{text/plain} +- force-text-mode - hide-content-disposition
-
-
-
-
- Say you have accounts on some sites that you visit regularly, and
- you don't want to have to log in manually each time. So you'd like
- to allow persistent cookies for these sites. The
- allow-all-cookies alias defined above does exactly
- that, i.e. it disables crunching of cookies in any direction, and the
- processing of cookies to make them only temporary.
-
-
-
-
-{ allow-all-cookies }
- sourceforge.net
- .yahoo.com
- .msdn.microsoft.com
- .redhat.com
-
-
-
- Your bank is allergic to some filter, but you don't know which, so you disable them all:
-
-
-
-
-{ - filter }
- .your-home-banking-site.com
-
-
-
- Some file types you may not want to filter for various reasons:
-
-
-
-
-# Technical documentation is likely to contain strings that might
-# erroneously get altered by the JavaScript-oriented filters:
-#
-.tldp.org
-/(.*/)?selfhtml/
-
-# And this stupid host sends streaming video with a wrong MIME type,
-# so that Privoxy thinks it is getting HTML and starts filtering:
-#
-stupid-server.example.com/
-
-
-
- Example of a simple block action. Say you've
- seen an ad on your favourite page on example.com that you want to get rid of.
- You have right-clicked the image, selected copy image location
- and pasted the URL below while removing the leading http://, into a
- { +block{} } section. Note that { +handle-as-image
- } need not be specified, since all URLs ending in
- .gif will be tagged as images by the general rules as set
- in default.action anyway:
-
-
-
-
-{ + block{Nasty ads.} }
- www.example.com/nasty-ads/sponsor\.gif
- another.example.net/more/junk/here/
-
-
-
- The URLs of dynamically generated banners, especially from large banner
- farms, often don't use the well-known image file name extensions, which
- makes it impossible for Privoxy to guess
- the file type just by looking at the URL.
- You can use the +block-as-image alias defined above for
- these cases.
- Note that objects which match this rule but then turn out NOT to be an
- image are typically rendered as a broken image
icon by the
- browser. Use cautiously.
-
-
-
-
-{ +block-as-image }
- .doubleclick.net
- .fastclick.net
- /Realmedia/ads/
- ar.atwola.com/
-
-
-
- Now you noticed that the default configuration breaks Forbes Magazine,
- but you were too lazy to find out which action is the culprit, and you
- were again too lazy to give feedback, so
- you just used the fragile alias on the site, and
- -- whoa! -- it worked. The fragile
- aliases disables those actions that are most likely to break a site. Also,
- good for testing purposes to see if it is Privoxy
- that is causing the problem or not. We later find other regular sites
- that misbehave, and add those to our personalized list of troublemakers:
-
-
-
-
-{ fragile }
- .forbes.com
- webmail.example.com
- .mybank.com
-
-
-
- You like the fun
text replacements in default.filter ,
- but it is disabled in the distributed actions file.
- So you'd like to turn it on in your private,
- update-safe config, once and for all:
-
-
-
-
-{ + filter{fun} }
- / # For ALL sites!
-
-
-
- Note that the above is not really a good idea: There are exceptions
- to the filters in default.action for things that
- really shouldn't be filtered, like code on CVS->Web interfaces. Since
- user.action has the last word, these exceptions
- won't be valid for the fun
filtering specified here.
-
-
-
- You might also worry about how your favourite free websites are
- funded, and find that they rely on displaying banner advertisements
- to survive. So you might want to specifically allow banners for those
- sites that you feel provide value to you:
-
-
-
-
-{ allow-ads }
- .sourceforge.net
- .slashdot.org
- .osdn.net
-
-
-
- Note that allow-ads has been aliased to
- - block ,
- - filter{banners-by-size} , and
- - filter{banners-by-link} above.
-
-
-
- Invoke another alias here to force an over-ride of the MIME type
- application/x-sh which typically would open a download type
- dialog. In my case, I want to look at the shell script, and then I can save
- it should I choose to.
-
-
-
-
-{ handle-as-text }
- /.*\.sh$
-
-
-
- user.action is generally the best place to define
- exceptions and additions to the default policies of
- default.action . Some actions are safe to have their
- default policies set here though. So let's set a default policy to have a
- blank
image as opposed to the checkerboard pattern for
- ALL sites. /
of course matches all URL
- paths and patterns:
-
-
-
-
-{ + set-image-blocker{blank} }
-/ # ALL sites
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Filter Files
-
-
- On-the-fly text substitutions need
- to be defined in a filter file
. Once defined, they
- can then be invoked as an action
.
-
-
-
- &my-app; supports three different filter actions:
- filter to
- rewrite the content that is send to the client,
- client-header-filter
- to rewrite headers that are send by the client, and
- server-header-filter
- to rewrite headers that are send by the server.
-
-
-
- &my-app; also supports two tagger actions:
- client-header-tagger
- and
- server-header-tagger .
- Taggers and filters use the same syntax in the filter files, the difference
- is that taggers don't modify the text they are filtering, but use a rewritten
- version of the filtered text as tag. The tags can then be used to change the
- applying actions through sections with tag-patterns.
-
-
-
-
- Multiple filter files can be defined through the filterfile config directive. The filters
- as supplied by the developers are located in
- default.filter . It is recommended that any locally
- defined or modified filters go in a separately defined file such as
- user.filter .
-
-
-
- Common tasks for content filters are to eliminate common annoyances in
- HTML and JavaScript, such as pop-up windows,
- exit consoles, crippled windows without navigation tools, the
- infamous <BLINK> tag etc, to suppress images with certain
- width and height attributes (standard banner sizes or web-bugs),
- or just to have fun.
-
-
-
- Enabled content filters are applied to any content whose
- Content Type
header is recognised as a sign
- of text-based content, with the exception of text/plain .
- Use the force-text-mode action
- to also filter other content.
-
-
-
- Substitutions are made at the source level, so if you want to roll
- your own
filters, you should first be familiar with HTML syntax,
- and, of course, regular expressions.
-
-
-
- Just like the actions files, the
- filter file is organized in sections, which are called filters
- here. Each filter consists of a heading line, that starts with one of the
- keywords FILTER: ,
- CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER:
- followed by the filter's name , and a short (one line)
- description of what it does. Below that line
- come the jobs , i.e. lines that define the actual
- text substitutions. By convention, the name of a filter
- should describe what the filter eliminates . The
- comment is used in the web-based
- user interface .
-
-
-
- Once a filter called name has been defined
- in the filter file, it can be invoked by using an action of the form
- + filter{name }
- in any actions file.
-
-
-
- Filter definitions start with a header line that contains the filter
- type, the filter name and the filter description.
- A content filter header line for a filter called foo
could look
- like this:
-
-
-
- FILTER: foo Replace all "foo" with "bar"
-
-
-
- Below that line, and up to the next header line, come the jobs that
- define what text replacements the filter executes. They are specified
- in a syntax that imitates Perl 's
- s/// operator. If you are familiar with Perl, you
- will find this to be quite intuitive, and may want to look at the
- PCRS documentation for the subtle differences to Perl behaviour. Most
- notably, the non-standard option letter U is supported,
- which turns the default to ungreedy matching.
-
-
-
- If you are new to
- Regular
- Expressions
, you might want to take a look at
- the Appendix on regular expressions, and
- see the Perl
- manual for
- the
- s/// operator's syntax and Perl-style regular
- expressions in general.
- The below examples might also help to get you started.
-
-
-
-
-
-Filter File Tutorial
-
- Now, let's complete our foo
content filter. We have already defined
- the heading, but the jobs are still missing. Since all it does is to replace
- foo
with bar
, there is only one (trivial) job
- needed:
-
-
-
- s/foo/bar/
-
-
-
- But wait! Didn't the comment say that all occurrences
- of foo
should be replaced? Our current job will only take
- care of the first foo
on each page. For global substitution,
- we'll need to add the g option:
-
-
-
- s/foo/bar/g
-
-
-
- Our complete filter now looks like this:
-
-
- FILTER: foo Replace all "foo" with "bar"
-s/foo/bar/g
-
-
-
- Let's look at some real filters for more interesting examples. Here you see
- a filter that protects against some common annoyances that arise from JavaScript
- abuse. Let's look at its jobs one after the other:
-
-
-
-
-
-FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
-
-# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
-#
-s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
-
-
-
- Following the header line and a comment, you see the job. Note that it uses
- | as the delimiter instead of / , because
- the pattern contains a forward slash, which would otherwise have to be escaped
- by a backslash (\ ).
-
-
-
- Now, let's examine the pattern: it starts with the text <script.*
- enclosed in parentheses. Since the dot matches any character, and *
- means: Match an arbitrary number of the element left of myself
, this
- matches <script
, followed by any text, i.e.
- it matches the whole page, from the start of the first <script> tag.
-
-
-
- That's more than we want, but the pattern continues: document\.referrer
- matches only the exact string document.referrer
. The dot needed to
- be escaped , i.e. preceded by a backslash, to take away its
- special meaning as a joker, and make it just a regular dot. So far, the meaning is:
- Match from the start of the first <script> tag in a the page, up to, and including,
- the text document.referrer
, if both are present
- in the page (and appear in that order).
-
-
-
- But there's still more pattern to go. The next element, again enclosed in parentheses,
- is .*</script> . You already know what .*
- means, so the whole pattern translates to: Match from the start of the first <script>
- tag in a page to the end of the last <script> tag, provided that the text
- document.referrer
appears somewhere in between.
-
-
-
- This is still not the whole story, since we have ignored the options and the parentheses:
- The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
- remembered and be available through the variables $1, $2, ... in
- the substitute. The U option switches to ungreedy matching, which means
- that the first .* in the pattern will only eat up
all
- text in between <script
and the first occurrence
- of document.referrer
, and that the second .* will
- only span the text up to the first </script>
- tag. Furthermore, the s option says that the match may span
- multiple lines in the page, and the g option again means that the
- substitution is global.
-
-
-
- So, to summarize, the pattern means: Match all scripts that contain the text
- document.referrer
. Remember the parts of the script from
- (and including) the start tag up to (and excluding) the string
- document.referrer
as $1 , and the part following
- that string, up to and including the closing tag, as $2 .
-
-
-
- Now the pattern is deciphered, but wasn't this about substituting things? So
- lets look at the substitute: $1"Not Your Business!"$2 is
- easy to read: The text remembered as $1 , followed by
- "Not Your Business!" (including
- the quotation marks!), followed by the text remembered as $2 .
- This produces an exact copy of the original string, with the middle part
- (the document.referrer
) replaced by "Not Your
- Business!" .
-
-
-
- The whole job now reads: Replace document.referrer
by
- "Not Your Business!" wherever it appears inside a
- <script> tag. Note that this job won't break JavaScript syntax,
- since both the original and the replacement are syntactically valid
- string objects. The script just won't have access to the referrer
- information anymore.
-
-
-
- We'll show you two other jobs from the JavaScript taming department, but
- this time only point out the constructs of special interest:
-
-
-
-
-# The status bar is for displaying link targets, not pointless blahblah
-#
-s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
-
-
-
- \s stands for whitespace characters (space, tab, newline,
- carriage return, form feed), so that \s* means: zero
- or more whitespace
. The ? in .*?
- makes this matching of arbitrary text ungreedy. (Note that the U
- option is not set). The ['"] construct means: a single
- or a double quote
. Finally, \1 is
- a back-reference to the first parenthesis just like $1 above,
- with the difference that in the pattern , a backslash indicates
- a back-reference, whereas in the substitute , it's the dollar.
-
-
-
- So what does this job do? It replaces assignments of single- or double-quoted
- strings to the window.status
object with a dummy assignment
- (using a variable name that is hopefully odd enough not to conflict with
- real variables in scripts). Thus, it catches many cases where e.g. pointless
- descriptions are displayed in the status bar instead of the link target when
- you move your mouse over links.
-
-
-
-
-# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
-#
-s/(<body [^>]*)onunload(.*>)/$1never$2/iU
-
-
-
- Including the
- OnUnload
- event binding in the HTML DOM was a CRIME .
- When I close a browser window, I want it to close and die. Basta.
- This job replaces the onunload
attribute in
- <body>
tags with the dummy word never .
- Note that the i option makes the pattern matching
- case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
- a minimal match: In the first parenthesis, we had to use [^>]*
- instead of .* to prevent the match from exceeding the
- <body> tag if it doesn't contain OnUnload
, but the page's
- content does.
-
-
-
- The last example is from the fun department:
-
-
-
-
-FILTER: fun Fun text replacements
-
-# Spice the daily news:
-#
-s/microsoft(?!\.com)/MicroSuck/ig
-
-
-
- Note the (?!\.com) part (a so-called negative lookahead)
- in the job's pattern, which means: Don't match, if the string
- .com
appears directly following microsoft
- in the page. This prevents links to microsoft.com from being trashed, while
- still replacing the word everywhere else.
-
-
-
-
-# Buzzword Bingo (example for extended regex syntax)
-#
-s* industry[ -]leading \
-| cutting[ -]edge \
-| customer[ -]focused \
-| market[ -]driven \
-| award[ -]winning # Comments are OK, too! \
-| high[ -]performance \
-| solutions[ -]based \
-| unmatched \
-| unparalleled \
-| unrivalled \
-*<font color="red"><b>BINGO!</b></font> \
-*igx
-
-
-
- The x option in this job turns on extended syntax, and allows for
- e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
-
-
-
- You get the idea?
-
-
-
-
-
-The Pre-defined Filters
-
-
-
-
-The distribution default.filter file contains a selection of
-pre-defined filters for your convenience:
-
-
-
-
- js-annoyances
-
-
- The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
- To that end, it
-
-
-
- replaces JavaScript references to the browser's referrer information
- with the string "Not Your Business!". This compliments the hide-referrer action on the content level.
-
-
-
-
- removes the bindings to the DOM's
- unload
- event which we feel has no right to exist and is responsible for most exit consoles
, i.e.
- nasty windows that pop up when you close another one.
-
-
-
-
- removes code that causes new windows to be opened with undesired properties, such as being
- full-screen, non-resizeable, without location, status or menu bar etc.
-
-
-
-
-
- Use with caution. This is an aggressive filter, and can break sites that
- rely heavily on JavaScript.
-
-
-
-
-
- js-events
-
-
- This is a very radical measure. It removes virtually all JavaScript event bindings, which
- means that scripts can not react to user actions such as mouse movements or clicks, window
- resizing etc, anymore. Use with caution!
-
-
- We strongly discourage using this filter as a default since it breaks
- many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
- need to go there).
-
-
-
-
-
- html-annoyances
-
-
- This filter will undo many common instances of HTML based abuse.
-
-
- The BLINK and MARQUEE tags
- are neutralized (yeah baby!), and browser windows will be created as
- resizeable (as of course they should be!), and will have location,
- scroll and menu bars -- even if specified otherwise.
-
-
-
-
-
- content-cookies
-
-
- Most cookies are set in the HTTP dialog, where they can be intercepted
- by the
- crunch-incoming-cookies
- and crunch-outgoing-cookies
- actions. But web sites increasingly make use of HTML meta tags and JavaScript
- to sneak cookies to the browser on the content level.
-
-
- This filter disables most HTML and JavaScript code that reads or sets
- cookies. It cannot detect all clever uses of these types of code, so it
- should not be relied on as an absolute fix. Use it wherever you would also
- use the cookie crunch actions.
-
-
-
-
-
- refresh tags
-
-
- Disable any refresh tags if the interval is greater than nine seconds (so
- that redirections done via refresh tags are not destroyed). This is useful
- for dial-on-demand setups, or for those who find this HTML feature
- annoying.
-
-
-
-
-
- unsolicited-popups
-
-
- This filter attempts to prevent only unsolicited
pop-up
- windows from opening, yet still allow pop-up windows that the user
- has explicitly chosen to open. It was added in version 3.0.1,
- as an improvement over earlier such filters.
-
-
- Technical note: The filter works by redefining the window.open JavaScript
- function to a dummy function, PrivoxyWindowOpen() ,
- during the loading and rendering phase of each HTML page access, and
- restoring the function afterward.
-
-
- This is recommended only for browsers that cannot perform this function
- reliably themselves. And be aware that some sites require such windows
- in order to function normally. Use with caution.
-
-
-
-
-
- all-popups
-
-
- Attempt to prevent all pop-up windows from opening.
- Note this should be used with even more discretion than the above, since
- it is more likely to break some sites that require pop-ups for normal
- usage. Use with caution.
-
-
-
-
-
- img-reorder
-
-
- This is a helper filter that has no value if used alone. It makes the
- banners-by-size and banners-by-link
- (see below) filters more effective and should be enabled together with them.
-
-
-
-
-
- banners-by-size
-
-
- This filter removes image tags purely based on what size they are. Fortunately
- for us, many ads and banner images tend to conform to certain standardized
- sizes, which makes this filter quite effective for ad stripping purposes.
-
-
- Occasionally this filter will cause false positives on images that are not ads,
- but just happen to be of one of the standard banner sizes.
-
-
- Recommended only for those who require extreme ad blocking. The default
- block rules should catch 95+% of all ads without this filter enabled.
-
-
-
-
-
- banners-by-link
-
-
- This is an experimental filter that attempts to kill any banners if
- their URLs seem to point to known or suspected click trackers. It is currently
- not of much value and is not recommended for use by default.
-
-
-
-
-
- webbugs
-
-
- Webbugs are small, invisible images (technically 1X1 GIF images), that
- are used to track users across websites, and collect information on them.
- As an HTML page is loaded by the browser, an embedded image tag causes the
- browser to contact a third-party site, disclosing the tracking information
- through the requested URL and/or cookies for that third-party domain, without
- the user ever becoming aware of the interaction with the third-party site.
- HTML-ized spam also uses a similar technique to verify email addresses.
-
-
- This filter removes the HTML code that loads such webbugs
.
-
-
-
-
-
- tiny-textforms
-
-
- A rather special-purpose filter that can be used to enlarge textareas (those
- multi-line text boxes in web forms) and turn off hard word wrap in them.
- It was written for the sourceforge.net tracker system where such boxes are
- a nuisance, but it can be handy on other sites, too.
-
-
- It is not recommended to use this filter as a default.
-
-
-
-
-
- jumping-windows
-
-
- Many consider windows that move, or resize themselves to be abusive. This filter
- neutralizes the related JavaScript code. Note that some sites might not display
- or behave as intended when using this filter. Use with caution.
-
-
-
-
-
- frameset-borders
-
-
- Some web designers seem to assume that everyone in the world will view their
- web sites using the same browser brand and version, screen resolution etc,
- because only that assumption could explain why they'd use static frame sizes,
- yet prevent their frames from being resized by the user, should they be too
- small to show their whole content.
-
-
- This filter removes the related HTML code. It should only be applied to sites
- which need it.
-
-
-
-
-
- demoronizer
-
-
- Many Microsoft products that generate HTML use non-standard extensions (read:
- violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
- HTML documents to display with errors on standard-compliant platforms.
-
-
- This filter translates the MS-only characters into Latin-1 equivalents.
- It is not necessary when using MS products, and will cause corruption of
- all documents that use 8-bit character sets other than Latin-1. It's mostly
- worthwhile for Europeans on non-MS platforms, if weird garbage characters
- sometimes appear on some pages, or user agents that don't correct for this on
- the fly.
-
-
-
-
-
-
- shockwave-flash
-
-
- A filter for shockwave haters. As the name suggests, this filter strips code
- out of web pages that is used to embed shockwave flash objects.
-
-
-
-
-
-
-
- quicktime-kioskmode
-
-
- Change HTML code that embeds Quicktime objects so that kioskmode, which
- prevents saving, is disabled.
-
-
-
-
-
- fun
-
-
- Text replacements for subversive browsing fun. Make fun of your favorite
- Monopolist or play buzzword bingo.
-
-
-
-
-
- crude-parental
-
-
- A demonstration-only filter that shows how Privoxy
- can be used to delete web content on a keyword basis.
-
-
-
-
-
- ie-exploits
-
-
- An experimental collection of text replacements to disable malicious HTML and JavaScript
- code that exploits known security holes in Internet Explorer.
-
-
- Presently, it only protects against Nimda and a cross-site scripting bug, and
- would need active maintenance to provide more substantial protection.
-
-
-
-
-
- site-specifics
-
-
- Some web sites have very specific problems, the cure for which doesn't apply
- anywhere else, or could even cause damage on other sites.
-
-
- This is a collection of such site-specific cures which should only be applied
- to the sites they were intended for, which is what the supplied
- default.action file does. Users shouldn't need to change
- anything regarding this filter.
-
-
-
-
-
- google
-
-
- A CSS based block for Google text ads. Also removes a width limitation
- and the toolbar advertisement.
-
-
-
-
-
- yahoo
-
-
- Another CSS based block, this time for Yahoo text ads. And removes
- a width limitation as well.
-
-
-
-
-
- msn
-
-
- Another CSS based block, this time for MSN text ads. And removes
- tracking URLs, as well as a width limitation.
-
-
-
-
-
- blogspot
-
-
- Cleans up some Blogspot blogs. Read the fine print before using this one!
-
-
- This filter also intentionally removes some navigation stuff and sets the
- page width to 100%. As a result, some rounded corners
would
- appear to early or not at all and as fixing this would require a browser
- that understands background-size (CSS3), they are removed instead.
-
-
-
-
-
- xml-to-html
-
-
- Server-header filter to change the Content-Type from xml to html.
-
-
-
-
-
- html-to-xml
-
-
- Server-header filter to change the Content-Type from html to xml.
-
-
-
-
-
- no-ping
-
-
- Removes the non-standard ping attribute from
- anchor and area HTML tags.
-
-
-
-
-
- hide-tor-exit-notation
-
-
- Client-header filter to remove the Tor exit node notation
- found in Host and Referer headers.
-
-
- If &my-app; and Tor are chained and &my-app;
- is configured to use socks4a, one can use http://www.example.org.foobar.exit/
- to access the host www.example.org
through the
- Tor exit node foobar
.
-
-
- As the HTTP client isn't aware of this notation, it treats the
- whole string www.example.org.foobar.exit
as host and uses it
- for the Host
and Referer
headers. From the
- server's point of view the resulting headers are invalid and can cause problems.
-
-
- An invalid Referer
header can trigger hot-linking
- protections, an invalid Host
header will make it impossible for
- the server to find the right vhost (several domains hosted on the same IP address).
-
-
- This client-header filter removes the foo.exit
part in those headers
- to prevent the mentioned problems. Note that it only modifies
- the HTTP headers, it doesn't make it impossible for the server
- to detect your Tor exit node based on the IP address
- the request is coming from.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Privoxy's Template Files
-
- All Privoxy built-in pages, i.e. error pages such as the
- 404 - No Such Domain
- error page , the BLOCKED
- page
- and all pages of its web-based
- user interface , are generated from templates .
- (Privoxy must be running for the above links to work as
- intended.)
-
-
-
- These templates are stored in a subdirectory of the configuration
- directory called templates . On Unixish platforms,
- this is typically
- /etc/privoxy/templates/ .
-
-
-
- The templates are basically normal HTML files, but with place-holders (called symbols
- or exports), which Privoxy fills at run time. It
- is possible to edit the templates with a normal text editor, should you want
- to customize them. (Not recommended for the casual
- user ). Should you create your own custom templates, you should use
- the config setting templdir
- to specify an alternate location, so your templates do not get overwritten
- during upgrades.
-
-
- Note that just like in configuration files, lines starting
- with # are ignored when the templates are filled in.
-
-
-
- The place-holders are of the form @name@ , and you will
- find a list of available symbols, which vary from template to template,
- in the comments at the start of each file. Note that these comments are not
- always accurate, and that it's probably best to look at the existing HTML
- code to find out which symbols are supported and what they are filled in with.
-
-
-
- A special application of this substitution mechanism is to make whole
- blocks of HTML code disappear when a specific symbol is set. We use this
- for many purposes, one of them being to include the beta warning in all
- our user interface (CGI) pages when Privoxy
- is in an alpha or beta development stage:
-
-
-
-
-<!-- @if-unstable-start -->
-
- ... beta warning HTML code goes here ...
-
-<!-- if-unstable-end@ -->
-
-
-
- If the "unstable" symbol is set, everything in between and including
- @if-unstable-start and if-unstable-end@
- will disappear, leaving nothing but an empty comment:
-
-
-
- <!-- -->
-
-
-
- There's also an if-then-else construct and an #include
- mechanism, but you'll sure find out if you are inclined to edit the
- templates ;-)
-
-
-
- All templates refer to a style located at
- http://config.privoxy.org/send-stylesheet .
- This is, of course, locally served by Privoxy
- and the source for it can be found and edited in the
- cgi-style.css template.
-
-
-
-
-
-
-
-
-
-
-Contacting the Developers, Bug Reporting and Feature
-Requests
-
-
- &contacting;
-
-
-
-
-
-
-
-
-Privoxy Copyright, License and History
-
-
- ©right;
-
-
-
-License
-
- &license;
-
-
-
-
-
-
-
-History
-
- &history;
-
-
-
-Authors
-
- &p-authors;
-
-
-
-
-
-
-
-
-
-See Also
-
- &seealso;
-
-
-
-
-
-
-Appendix
-
-
-
-
-Regular Expressions
-
- Privoxy uses Perl-style regular
- expressions
in its actions
- files and filter file,
- through the PCRE and
-
- PCRS libraries.
-
-
-
- If you are reading this, you probably don't understand what regular
- expressions
are, or what they can do. So this will be a very brief
- introduction only. A full explanation would require a book ;-)
-
-
-
- Regular expressions provide a language to describe patterns that can be
- run against strings of characters (letter, numbers, etc), to see if they
- match the string or not. The patterns are themselves (sometimes complex)
- strings of literal characters, combined with wild-cards, and other special
- characters, called meta-characters. The meta-characters
have
- special meanings and are used to build complex patterns to be matched against.
- Perl Compatible Regular Expressions are an especially convenient
- dialect
of the regular expression language.
-
-
-
- To make a simple analogy, we do something similar when we use wild-card
- characters when listing files with the dir command in DOS.
- *.* matches all filenames. The special
- character here is the asterisk which matches any and all characters. We can be
- more specific and use ? to match just individual
- characters. So dir file?.text
would match
- file1.txt
, file2.txt
, etc. We are pattern
- matching, using a similar technique to regular expressions
!
-
-
-
- Regular expressions do essentially the same thing, but are much, much more
- powerful. There are many more special characters
and ways of
- building complex patterns however. Let's look at a few of the common ones,
- and then some examples:
-
-
-
-
- . - Matches any single character, e.g. a
,
- A
, 4
, :
, or @
.
-
-
-
-
-
- ? - The preceding character or expression is matched ZERO or ONE
- times. Either/or.
-
-
-
-
-
- + - The preceding character or expression is matched ONE or MORE
- times.
-
-
-
-
-
- * - The preceding character or expression is matched ZERO or MORE
- times.
-
-
-
-
-
- \ - The escape
character denotes that
- the following character should be taken literally. This is used where one of the
- special characters (e.g. .
) needs to be taken literally and
- not as a special meta-character. Example: example\.com
, makes
- sure the period is recognized only as a period (and not expanded to its
- meta-character meaning of any single character).
-
-
-
-
-
- [ ] - Characters enclosed in brackets will be matched if
- any of the enclosed characters are encountered. For instance, [0-9]
- matches any numeric digit (zero through nine). As an example, we can combine
- this with +
to match any digit one of more times: [0-9]+
.
-
-
-
-
-
- ( ) - parentheses are used to group a sub-expression,
- or multiple sub-expressions.
-
-
-
-
-
- | - The bar
character works like an
- or
conditional statement. A match is successful if the
- sub-expression on either side of |
matches. As an example:
- /(this|that) example/
uses grouping and the bar character
- and would match either this example
or that
- example
, and nothing else.
-
-
-
-
- These are just some of the ones you are likely to use when matching URLs with
- Privoxy , and is a long way from a definitive
- list. This is enough to get us started with a few simple examples which may
- be more illuminating:
-
-
-
- /.*/banners/.* - A simple example
- that uses the common combination of .
and *
to
- denote any character, zero or more times. In other words, any string at all.
- So we start with a literal forward slash, then our regular expression pattern
- (.*
) another literal forward slash, the string
- banners
, another forward slash, and lastly another
- .*
. We are building
- a directory path here. This will match any file with the path that has a
- directory named banners
in it. The .*
matches
- any characters, and this could conceivably be more forward slashes, so it
- might expand into a much longer looking path. For example, this could match:
- /eye/hate/spammers/banners/annoy_me_please.gif
, or just
- /banners/annoying.html
, or almost an infinite number of other
- possible combinations, just so it has banners
in the path
- somewhere.
-
-
-
- And now something a little more complex:
-
-
-
- /.*/adv((er)?ts?|ertis(ing|ements?))?/ -
- We have several literal forward slashes again (/
), so we are
- building another expression that is a file path statement. We have another
- .*
, so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that must
- match our pattern is adv , together with
- the forward slashes. What comes after the adv
string is the
- interesting part.
-
-
-
- Remember the ?
means the preceding expression (either a
- literal character or anything grouped with (...)
in this case)
- can exist or not, since this means either zero or one match. So
- ((er)?ts?|ertis(ing|ements?))
is optional, as are the
- individual sub-expressions: (er)
,
- (ing|ements?)
, and the s
. The |
- means or
. We have two of those. For instance,
- (ing|ements?)
, can expand to match either ing
- OR ements?
. What is being done here, is an
- attempt at matching as many variations of advertisement
, and
- similar, as possible. So this would expand to match just adv
,
- or advert
, or adverts
, or
- advertising
, or advertisement
, or
- advertisements
. You get the idea. But it would not match
- advertizements
(with a z
). We could fix that by
- changing our regular expression to:
- /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/
, which would then match
- either spelling.
-
-
+
+
+
+Summary
- /.*/advert[0-9]+\.(gif|jpe?g) - Again
- another path statement with forward slashes. Anything in the square brackets
- [ ]
can be matched. This is using 0-9
as a
- shorthand expression to mean any digit one through nine. It is the same as
- saying 0123456789
. So any digit matches. The +
- means one or more of the preceding expression must be included. The preceding
- expression here is what is in the square brackets -- in this case, any digit
- one through nine. Then, at the end, we have a grouping: (gif|jpe?g)
.
- This includes a |
, so this needs to match the expression on
- either side of that bar character also. A simple gif
on one side, and the other
- side will in turn match either jpeg
or jpg
,
- since the ?
means the letter e
is optional and
- can be matched once or not at all. So we are building an expression here to
- match image GIF or JPEG type image file. It must include the literal
- string advert
, then one or more digits, and a .
- (which is now a literal, and not a special character, since it is escaped
- with \
), and lastly either gif
, or
- jpeg
, or jpg
. Some possible matches would
- include: //advert1.jpg
,
- /nasty/ads/advert1234.gif
,
- /banners/from/hell/advert99.jpg
. It would not match
- advert1.gif
(no leading slash), or
- /adverts232.jpg
(the expression does not include an
- s
), or /advert1.jsp
(jsp
is not
- in the expression anywhere).
+ Note that many of these actions have the potential to cause a page to
+ misbehave, possibly even not to display at all. There are many ways
+ a site designer may choose to design his site, and what HTTP header
+ content, and other criteria, he may depend on. There is no way to have hard
+ and fast rules for all sites. See the Appendix for a brief example on troubleshooting
+ actions.
+
+
+
+
+Aliases
- We are barely scratching the surface of regular expressions here so that you
- can understand the default Privoxy
- configuration files, and maybe use this knowledge to customize your own
- installation. There is much, much more that can be done with regular
- expressions. Now that you know enough to get started, you can learn more on
- your own :/
+ Custom actions
, known to Privoxy
+ as aliases
, can be defined by combining other actions.
+ These can in turn be invoked just like the built-in actions.
+ Currently, an alias name can contain any character except space, tab,
+ =
,
+ {
and }
, but we strongly
+ recommend that you only use a
to z
,
+ 0
to 9
, +
, and -
.
+ Alias names are not case sensitive, and are not required to start with a
+ +
or -
sign, since they are merely textually
+ expanded.
-
- More reading on Perl Compatible Regular expressions:
- http://perldoc.perl.org/perlre.html
+ Aliases can be used throughout the actions file, but they must be
+ defined in a special section at the top of the file!
+ And there can only be one such section per actions file. Each actions file may
+ have its own alias section, and the aliases defined in it are only visible
+ within that file.
+
+
+ There are two main reasons to use aliases: One is to save typing for frequently
+ used combinations of actions, the other one is a gain in flexibility: If you
+ decide once how you want to handle shops by defining an alias called
+ shop
, you can later change your policy on shops in
+ one place, and your changes will take effect everywhere
+ in the actions file where the shop
alias is used. Calling aliases
+ by their purpose also makes your actions files more readable.
+
+
+ Currently, there is one big drawback to using aliases, though:
+ Privoxy 's built-in web-based action file
+ editor honors aliases when reading the actions files, but it expands
+ them before writing. So the effects of your aliases are of course preserved,
+ but the aliases themselves are lost when you edit sections that use aliases
+ with it.
- For information on regular expression based substitutions and their applications
- in filters, please see the filter file tutorial
- in this manual.
+ Now let's define some aliases...
-
-
+
+
+ # Useful custom aliases we can use later.
+ #
+ # Note the (required!) section header line and that this section
+ # must be at the top of the actions file!
+ #
+ {{alias}}
+ # These aliases just save typing later:
+ # (Note that some already use other aliases!)
+ #
+ +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
+ -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
+ +block-as-image = +block{Blocked image.} +handle-as-image
+ allow-all-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
-
-
-Privoxy's Internal Pages
+ # These aliases define combinations of actions
+ # that are useful for certain types of sites:
+ #
+ fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer - prevent-compression
-
- Since Privoxy proxies each requested
- web page, it is easy for Privoxy to
- trap certain special URLs. In this way, we can talk directly to
- Privoxy , and see how it is
- configured, see how our rules are being applied, change these
- rules and other configuration options, and even turn
- Privoxy's filtering off, all with
- a web browser.
+ shop = -crunch-all-cookies - filter{all-popups}
+ # Short names for other aliases, for really lazy people ;-)
+ #
+ c0 = +crunch-all-cookies
+ c1 = -crunch-all-cookies
- The URLs listed below are the special ones that allow direct access
- to Privoxy . Of course,
- Privoxy must be running to access these. If
- not, you will get a friendly error message. Internet access is not
- necessary either.
+ ...and put them to use. These sections would appear in the lower part of an
+ actions file and define exceptions to the default actions (as specified further
+ up for the /
pattern):
-
-
-
-
- Privoxy main page:
-
-
-
- http://config.privoxy.org/
-
-
-
- There is a shortcut: http://p.p/ (But it
- doesn't provide a fall-back to a real page, in case the request is not
- sent through Privoxy )
-
-
-
-
-
- Show information about the current configuration, including viewing and
- editing of actions files:
-
-
-
- http://config.privoxy.org/show-status
-
-
-
-
-
-
- Show the source code version numbers:
-
-
-
- http://config.privoxy.org/show-version
-
-
-
-
-
-
- Show the browser's request headers:
-
-
-
- http://config.privoxy.org/show-request
-
-
-
+
+ # These sites are either very complex or very keen on
+ # user data and require minimal interference to work:
+ #
+ {fragile}
+ .office.microsoft.com
+ .windowsupdate.microsoft.com
+ # Gmail is really mail.google.com, not gmail.com
+ mail.google.com
-
-
- Show which actions apply to a URL and why:
-
-
-
- http://config.privoxy.org/show-url-info
-
-
-
+ # Shopping sites:
+ # Allow cookies (for setting and retrieving your customer data)
+ #
+ {shop}
+ .quietpc.com
+ .worldpay.com # for quietpc.com
+ mybank.example.com
-
-
- Toggle Privoxy on or off. This feature can be turned off/on in the main
- config file. When toggled off
, Privoxy
- continues to run, but only as a pass-through proxy, with no actions taking
- place:
-
-
-
- http://config.privoxy.org/toggle
-
-
-
- Short cuts. Turn off, then on:
-
-
-
- http://config.privoxy.org/toggle?set=disable
-
-
-
-
- http://config.privoxy.org/toggle?set=enable
-
-
-
+ # These shops require pop-ups:
+ #
+ {-filter{all-popups} -filter{unsolicited-popups}}
+ .dabs.com
+ .overclockers.co.uk
+
-
+
+ Aliases like shop
and fragile
are typically used for
+ problem
sites that require more than one action to be disabled
+ in order to function properly.
+
+
+
+
+
+Actions Files Tutorial
+
+ The above chapters have shown which actions files
+ there are and how they are organized, how actions are specified and applied
+ to URLs, how patterns work, and how to
+ define and use aliases. Now, let's look at an
+ example match-all.action , default.action
+ and user.action file and see how all these pieces come together:
+
+match-all.action
- These may be bookmarked for quick reference. See next.
-
+ Remember all actions are disabled when matching starts ,
+ so we have to explicitly enable the ones we want.
-
-Bookmarklets
- Below are some bookmarklets
to allow you to easily access a
- mini
version of some of Privoxy's
- special pages. They are designed for MS Internet Explorer, but should work
- equally well in Netscape, Mozilla, and other browsers which support
- JavaScript. They are designed to run directly from your bookmarks - not by
- clicking the links below (although that should work for testing).
+ While the match-all.action file only contains a
+ single section, it is probably the most important one. It has only one
+ pattern, /
, but this pattern
+ matches all URLs. Therefore, the set of
+ actions used in this default
section will
+ be applied to all requests as a start . It can be partly or
+ wholly overridden by other actions files like default.action
+ and user.action , but it will still be largely responsible
+ for your overall browsing experience.
+
- To save them, right-click the link and choose Add to Favorites
- (IE) or Add Bookmark
(Netscape). You will get a warning that
- the bookmark may not be safe
- just click OK. Then you can run the
- Bookmarklet directly from your favorites/bookmarks. For even faster access,
- you can put them on the Links
bar (IE) or the Personal
- Toolbar
(Netscape), and run them with a single click.
+ Again, at the start of matching, all actions are disabled, so there is
+ no need to disable any actions here. (Remember: a +
+ preceding the action name enables the action, a -
disables!).
+ Also note how this long line has been made more readable by splitting it into
+ multiple lines with line continuation.
-
-
-
-
- Privoxy - Enable
-
-
+
+{ \
+ + change-x-forwarded-for{block} \
+ + hide-from-header{block} \
+ + set-image-blocker{pattern} \
+}
+/ # Match all URLs
+
+
-
-
- Privoxy - Disable
-
-
+
+ The default behavior is now set.
+
+
-
-
- Privoxy - Toggle Privoxy (Toggles between enabled and disabled)
-
-
+
+default.action
-
-
- Privoxy- View Status
-
-
-
-
-
- Privoxy - Why?
-
-
-
+
+ If you aren't a developer, there's no need for you to edit the
+ default.action file. It is maintained by
+ the &my-app; developers and if you disagree with some of the
+ sections, you should overrule them in your user.action .
- Credit: The site which gave us the general idea for these bookmarklets is
- www.bookmarklets.com . They
- have more information about bookmarklets.
+ Understanding the default.action file can
+ help you with your user.action , though.
+
+ The first section in this file is a special section for internal use
+ that prevents older &my-app; versions from reading the file:
+
-
-
-
-
+
+
+##########################################################################
+# Settings -- Don't change! For internal Privoxy use ONLY.
+##########################################################################
+{{settings}}
+for-privoxy-version=3.0.11
+
-
-
-Chain of Events
- Let's take a quick look at how some of Privoxy's
- core features are triggered, and the ensuing sequence of events when a web
- page is requested by your browser:
+ After that comes the (optional) alias section. We'll use the example
+ section from the above chapter on aliases,
+ that also explains why and how aliases are used:
-
-
-
- First, your web browser requests a web page. The browser knows to send
- the request to Privoxy , which will in turn,
- relay the request to the remote web server after passing the following
- tests:
-
-
-
-
- Privoxy traps any request for its own internal CGI
- pages (e.g http://p.p/ ) and sends the CGI page back to the browser.
-
-
-
-
- Next, Privoxy checks to see if the URL
- matches any +block
patterns. If
- so, the URL is then blocked, and the remote web server will not be contacted.
- +handle-as-image
- and
- +handle-as-empty-document
- are then checked, and if there is no match, an
- HTML BLOCKED
page is sent back to the browser. Otherwise, if
- it does match, an image is returned for the former, and an empty text
- document for the latter. The type of image would depend on the setting of
- +set-image-blocker
- (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
-
-
-
-
- Untrusted URLs are blocked. If URLs are being added to the
- trust file, then that is done.
-
-
-
-
- If the URL pattern matches the +fast-redirects
action,
- it is then processed. Unwanted parts of the requested URL are stripped.
-
-
-
-
- Now the rest of the client browser's request headers are processed. If any
- of these match any of the relevant actions (e.g. +hide-user-agent
,
- etc.), headers are suppressed or forged as determined by these actions and
- their parameters.
-
-
-
-
- Now the web server starts sending its response back (i.e. typically a web
- page).
-
-
-
-
- First, the server headers are read and processed to determine, among other
- things, the MIME type (document type) and encoding. The headers are then
- filtered as determined by the
- +crunch-incoming-cookies
,
- +session-cookies-only
,
- and +downgrade-http-version
- actions.
-
-
-
-
- If any +filter
action
- or +deanimate-gifs
- action applies (and the document type fits the action), the rest of the page is
- read into memory (up to a configurable limit). Then the filter rules (from
- default.filter and any other filter files) are
- processed against the buffered content. Filters are applied in the order
- they are specified in one of the filter files. Animated GIFs, if present,
- are reduced to either the first or last frame, depending on the action
- setting.The entire page, which is now filtered, is then sent by
- Privoxy back to your browser.
-
-
- If neither a +filter
action
- or +deanimate-gifs
- matches, then Privoxy passes the raw data through
- to the client browser as it becomes available.
-
-
-
-
- As the browser receives the now (possibly filtered) page content, it
- reads and then requests any URLs that may be embedded within the page
- source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
- frames), sounds, etc. For each of these objects, the browser issues a
- separate request (this is easily viewable in Privoxy's
- logs). And each such request is in turn processed just as above. Note that a
- complex web page will have many, many such embedded URLs. If these
- secondary requests are to a different server, then quite possibly a very
- differing set of actions is triggered.
-
-
+
+##########################################################################
+# Aliases
+##########################################################################
+{{alias}}
-
+ # These aliases just save typing later:
+ # (Note that some already use other aliases!)
+ #
+ +crunch-all-cookies = + crunch-incoming-cookies + crunch-outgoing-cookies
+ -crunch-all-cookies = - crunch-incoming-cookies - crunch-outgoing-cookies
+ +block-as-image = +block{Blocked image.} +handle-as-image
+ mercy-for-cookies = -crunch-all-cookies - session-cookies-only - filter{content-cookies}
+
+ # These aliases define combinations of actions
+ # that are useful for certain types of sites:
+ #
+ fragile = - block - filter -crunch-all-cookies - fast-redirects - hide-referrer
+ shop = -crunch-all-cookies - filter{all-popups}
+
- NOTE: This is somewhat of a simplistic overview of what happens with each URL
- request. For the sake of brevity and simplicity, we have focused on
- Privoxy's core features only.
+ The first of our specialized sections is concerned with fragile
+ sites, i.e. sites that require minimum interference, because they are either
+ very complex or very keen on tracking you (and have mechanisms in place that
+ make them unusable for people who avoid being tracked). We will simply use
+ our pre-defined fragile alias instead of stating the list
+ of actions explicitly:
-
-
-
-
-
-Troubleshooting: Anatomy of an Action
-
- The way Privoxy applies
- actions and filters
- to any given URL can be complex, and not always so
- easy to understand what is happening. And sometimes we need to be able to
- see just what Privoxy is
- doing. Especially, if something Privoxy is doing
- is causing us a problem inadvertently. It can be a little daunting to look at
- the actions and filters files themselves, since they tend to be filled with
- regular expressions whose consequences are not
- always so obvious.
+
+##########################################################################
+# Exceptions for sites that'll break under the default action set:
+##########################################################################
+
+# "Fragile" Use a minimum set of actions for these sites (see alias above):
+#
+{ fragile }
+.office.microsoft.com # surprise, surprise!
+.windowsupdate.microsoft.com
+mail.google.com
- One quick test to see if Privoxy is causing a problem
- or not, is to disable it temporarily. This should be the first troubleshooting
- step. See the Bookmarklets section on a quick
- and easy way to do this (be sure to flush caches afterward!). Looking at the
- logs is a good idea too. (Note that both the toggle feature and logging are
- enabled via config file settings, and may need to be
- turned on
.)
+ Shopping sites are not as fragile, but they typically
+ require cookies to log in, and pop-up windows for shopping
+ carts or item details. Again, we'll use a pre-defined alias:
+
- Another easy troubleshooting step to try is if you have done any
- customization of your installation, revert back to the installed
- defaults and see if that helps. There are times the developers get complaints
- about one thing or another, and the problem is more related to a customized
- configuration issue.
+
+# Shopping sites:
+#
+{ shop }
+.quietpc.com
+.worldpay.com # for quietpc.com
+.jungle.com
+.scan.co.uk
- Privoxy also provides the
- http://config.privoxy.org/show-url-info
- page that can show us very specifically how actions
- are being applied to any given URL. This is a big help for troubleshooting.
+ The fast-redirects
+ action, which may have been enabled in match-all.action ,
+ breaks some sites. So disable it for popular sites where we know it misbehaves:
- First, enter one URL (or partial URL) at the prompt, and then
- Privoxy will tell us
- how the current configuration will handle it. This will not
- help with filtering effects (i.e. the +filter
action) from
- one of the filter files since this is handled very
- differently and not so easy to trap! It also will not tell you about any other
- URLs that may be embedded within the URL you are testing. For instance, images
- such as ads are expressed as URLs within the raw page source of HTML pages. So
- you will only get info for the actual URL that is pasted into the prompt area
- -- not any sub-URLs. If you want to know about embedded URLs like ads, you
- will have to dig those out of the HTML source. Use your browser's View
- Page Source
option for this. Or right click on the ad, and grab the
- URL.
+
+{ - fast-redirects }
+login.yahoo.com
+edit.*.yahoo.com
+.google.com
+.altavista.com/.*(like|url|link):http
+.altavista.com/trans.*urltext=http
+.nytimes.com
- Let's try an example, google.com ,
- and look at it one section at a time in a sample configuration (your real
- configuration may vary):
+ It is important that Privoxy knows which
+ URLs belong to images, so that if they are to
+ be blocked, a substitute image can be sent, rather than an HTML page.
+ Contacting the remote site to find out is not an option, since it
+ would destroy the loading time advantage of banner blocking, and it
+ would feed the advertisers information about you. We can mark any
+ URL as an image with the handle-as-image action,
+ and marking all URLs that end in a known image file extension is a
+ good start:
- Matches for http://www.google.com:
-
- In file: default.action [ View ] [ Edit ]
-
- {+change-x-forwarded-for{block}
- +deanimate-gifs {last}
- +fast-redirects {check-decoded-url}
- +filter {refresh-tags}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- +hide-from-header {block}
- +hide-referrer {forge}
- +session-cookies-only
- +set-image-blocker {pattern}
-/
+##########################################################################
+# Images:
+##########################################################################
- { -session-cookies-only }
- .google.com
+# Define which file types will be treated as images, in case they get
+# blocked further down this file:
+#
+{ + handle-as-image }
+/.*\.(gif|jpe?g|png|bmp|ico)$
+
- { -fast-redirects }
- .google.com
+
+ And then there are known banner sources. They often use scripts to
+ generate the banners, so it won't be visible from the URL that the
+ request is for an image. Hence we block them and
+ mark them as images in one go, with the help of our
+ +block-as-image alias defined above. (We could of
+ course just as well use + block
+ + handle-as-image here.)
+ Remember that the type of the replacement image is chosen by the
+ set-image-blocker
+ action. Since all URLs have matched the default section with its
+ + set-image-blocker{pattern}
+ action before, it still applies and needn't be repeated:
+
-In file: user.action [ View ] [ Edit ]
-(no matches in this file)
-
+
+
+# Known ad generators:
+#
+{ +block-as-image }
+ar.atwola.com
+.ad.doubleclick.net
+.ad.*.doubleclick.net
+.a.yimg.com/(?:(?!/i/).)*$
+.a[0-9].yimg.com/(?:(?!/i/).)*$
+bs*.gsanet.com
+.qkimg.net
- This is telling us how we have defined our
- actions
, and
- which ones match for our test case, google.com
.
- Displayed is all the actions that are available to us. Remember,
- the + sign denotes on
. -
- denotes off
. So some are on
here, but many
- are off
. Each example we try may provide a slightly different
- end result, depending on our configuration directives.
+ One of the most important jobs of Privoxy
+ is to block banners. Many of these can be blocked
+ by the filter{banners-by-size}
+ action, which we enabled above, and which deletes the references to banner
+ images from the pages while they are loaded, so the browser doesn't request
+ them anymore, and hence they don't need to be blocked here. But this naturally
+ doesn't catch all banners, and some people choose not to use filters, so we
+ need a comprehensive list of patterns for banner URLs here, and apply the
+ block action to them.
- The first listing
- is for our default.action file. The large, multi-line
- listing, is how the actions are set to match for all URLs, i.e. our default
- settings. If you look at your actions
file, this would be the
- section just below the aliases
section near the top. This
- will apply to all URLs as signified by the single forward slash at the end
- of the listing -- /
.
+ First comes many generic patterns, which do most of the work, by
+ matching typical domain and path name components of banners. Then comes
+ a list of individual patterns for specific sites, which is omitted here
+ to keep the example short:
- But we have defined additional actions that would be exceptions to these general
- rules, and then we list specific URLs (or patterns) that these exceptions
- would apply to. Last match wins. Just below this then are two explicit
- matches for .google.com
. The first is negating our previous
- cookie setting, which was for +session-cookies-only
- (i.e. not persistent). So we will allow persistent cookies for google, at
- least that is how it is in this example. The second turns
- off any +fast-redirects
- action, allowing this to take place unmolested. Note that there is a leading
- dot here -- .google.com
. This will match any hosts and
- sub-domains, in the google.com domain also, such as
- www.google.com
or mail.google.com
. But it would not
- match www.google.de
! So, apparently, we have these two actions
- defined as exceptions to the general rules at the top somewhere in the lower
- part of our default.action file, and
- google.com
is referenced somewhere in these latter sections.
+
+##########################################################################
+# Block these fine banners:
+##########################################################################
+{ +block{Banner ads.} }
+
+# Generic patterns:
+#
+ad*.
+.*ads.
+banner?.
+count*.
+/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?)
+/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/
+
+# Site-specific patterns (abbreviated):
+#
+.hitbox.com
- Then, for our user.action file, we again have no hits.
- So there is nothing google-specific that we might have added to our own, local
- configuration. If there was, those actions would over-rule any actions from
- previously processed files, such as default.action .
- user.action typically has the last word. This is the
- best place to put hard and fast exceptions,
+ It's quite remarkable how many advertisers actually call their banner
+ servers ads.company .com, or call the directory
+ in which the banners are stored simply banners
. So the above
+ generic patterns are surprisingly effective.
-
- And finally we pull it all together in the bottom section and summarize how
- Privoxy is applying all its actions
- to google.com
:
-
+ But being very generic, they necessarily also catch URLs that we don't want
+ to block. The pattern .*ads. e.g. catches
+ nasty-ads .nasty-corp.com
as intended,
+ but also downloads .sourcefroge.net
or
+ ads l.some-provider.net.
So here come some
+ well-known exceptions to the + block
+ section above.
+
+
+ Note that these are exceptions to exceptions from the default! Consider the URL
+ downloads.sourcefroge.net
: Initially, all actions are deactivated,
+ so it wouldn't get blocked. Then comes the defaults section, which matches the
+ URL, but just deactivates the block
+ action once again. Then it matches .*ads. , an exception to the
+ general non-blocking policy, and suddenly
+ +block applies. And now, it'll match
+ .*loads. , where -block
+ applies, so (unless it matches again further down) it ends up
+ with no block action applying.
+##########################################################################
+# Save some innocent victims of the above generic block patterns:
+##########################################################################
- Final results:
+# By domain:
+#
+{ - block }
+adv[io]*. # (for advogato.org and advice.*)
+adsl. # (has nothing to do with ads)
+adobe. # (has nothing to do with ads either)
+ad[ud]*. # (adult.* and add.*)
+.edu # (universities don't host banners (yet!))
+.*loads. # (downloads, uploads etc)
- -add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs {last}
- -downgrade-http-version
- -fast-redirects
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header {block}
- -hide-if-modified-since
- +hide-referrer {forge}
- -hide-user-agent
- -limit-connect
- -overwrite-last-modified
- -prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- -session-cookies-only
- +set-image-blocker {pattern}
+# By path:
+#
+/.*loads/
+
+# Site-specific:
+#
+www.globalintersec.com/adv # (adv = advanced)
+www.ugu.com/sui/ugu/adv
- Notice the only difference here to the previous listing, is to
- fast-redirects
and session-cookies-only
,
- which are activated specifically for this site in our configuration,
- and thus show in the Final Results
.
+ Filtering source code can have nasty side effects,
+ so make an exception for our friends at sourceforge.net,
+ and all paths with cvs
in them. Note that
+ - filter
+ disables all filters in one fell swoop!
- Now another example, ad.doubleclick.net
:
+
+# Don't filter code!
+#
+{ - filter }
+/(.*/)?cvs
+bugzilla.
+developer.
+wiki.
+.sourceforge.net
-
+ The actual default.action is of course much more
+ comprehensive, but we hope this example made clear how it works.
+
- { +block{Domains starts with "ad"} }
- ad*.
+
- { +block{Domain contains "ad"} }
- .ad.
+user.action
- { +block{Doubleclick banner server} +handle-as-image }
- .[a-vx-z]*.doubleclick.net
-
+
+ So far we are painting with a broad brush by setting general policies,
+ which would be a reasonable starting point for many people. Now,
+ you might want to be more specific and have customized rules that
+ are more suitable to your personal habits and preferences. These would
+ be for narrowly defined situations like your ISP or your bank, and should
+ be placed in user.action , which is parsed after all other
+ actions files and hence has the last word, over-riding any previously
+ defined actions. user.action is also a
+ safe place for your personal settings, since
+ default.action is actively maintained by the
+ Privoxy developers and you'll probably want
+ to install updated versions from time to time.
- We'll just show the interesting part here - the explicit matches. It is
- matched three different times. Two +block{}
sections,
- and a +block{} +handle-as-image
,
- which is the expanded form of one of our aliases that had been defined as:
- +block-as-image
. (Aliases
are defined in
- the first section of the actions file and typically used to combine more
- than one action.)
+ So let's look at a few examples of things that one might typically do in
+ user.action :
+
+
+
- Any one of these would have done the trick and blocked this as an unwanted
- image. This is unnecessarily redundant since the last case effectively
- would also cover the first. No point in taking chances with these guys
- though ;-) Note that if you want an ad or obnoxious
- URL to be invisible, it should be defined as ad.doubleclick.net
- is done here -- as both a +block{}
- and an
- +handle-as-image
.
- The custom alias +block-as-image
just
- simplifies the process and make it more readable.
+
+# My user.action file. <fred@example.com>
- One last example. Let's try http://www.example.net/adsl/HOWTO/
.
- This one is giving us problems. We are getting a blank page. Hmmm ...
+ As aliases are local to the actions
+ file that they are defined in, you can't use the ones from
+ default.action , unless you repeat them here:
+# Aliases are local to the file they are defined in.
+# (Re-)define aliases for this file:
+#
+{{alias}}
+#
+# These aliases just save typing later, and the alias names should
+# be self explanatory.
+#
++crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies
+-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies
+ allow-all-cookies = -crunch-all-cookies -session-cookies-only
+ allow-popups = -filter{all-popups}
++block-as-image = +block{Blocked as image.} +handle-as-image
+-block-as-image = -block
- Matches for http://www.example.net/adsl/HOWTO/:
+# These aliases define combinations of actions that are useful for
+# certain types of sites:
+#
+fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referrer
+shop = -crunch-all-cookies allow-popups
- In file: default.action [ View ] [ Edit ]
+# Allow ads for selected useful free sites:
+#
+allow-ads = -block -filter{banners-by-size} -filter{banners-by-link}
- {-add-header
- -block
- +change-x-forwarded-for{block}
- -client-header-filter{hide-tor-exit-notation}
- -content-type-overwrite
- -crunch-client-header
- -crunch-if-none-match
- -crunch-incoming-cookies
- -crunch-outgoing-cookies
- -crunch-server-header
- +deanimate-gifs
- -downgrade-http-version
- +fast-redirects {check-decoded-url}
- -filter {js-events}
- -filter {content-cookies}
- -filter {all-popups}
- -filter {banners-by-link}
- -filter {tiny-textforms}
- -filter {frameset-borders}
- -filter {demoronizer}
- -filter {shockwave-flash}
- -filter {quicktime-kioskmode}
- -filter {fun}
- -filter {crude-parental}
- -filter {site-specifics}
- -filter {js-annoyances}
- -filter {html-annoyances}
- +filter {refresh-tags}
- -filter {unsolicited-popups}
- +filter {img-reorder}
- +filter {banners-by-size}
- +filter {webbugs}
- +filter {jumping-windows}
- +filter {ie-exploits}
- -filter {google}
- -filter {yahoo}
- -filter {msn}
- -filter {blogspot}
- -filter {no-ping}
- -force-text-mode
- -handle-as-empty-document
- -handle-as-image
- -hide-accept-language
- -hide-content-disposition
- +hide-from-header{block}
- +hide-referer{forge}
- -hide-user-agent
- -overwrite-last-modified
- +prevent-compression
- -redirect
- -server-header-filter{xml-to-html}
- -server-header-filter{html-to-xml}
- +session-cookies-only
- +set-image-blocker{blank} }
- /
+# Alias for specific file types that are text, but might have conflicting
+# MIME types. We want the browser to force these to be text documents.
+handle-as-text = - filter +- content-type-overwrite{text/plain} +- force-text-mode - hide-content-disposition
- { +block{Path contains "ads".} +handle-as-image }
- /ads
-
- Ooops, the /adsl/
is matching /ads
in our
- configuration! But we did not want this at all! Now we see why we get the
- blank page. It is actually triggering two different actions here, and
- the effects are aggregated so that the URL is blocked, and &my-app; is told
- to treat the block as if it were an image. But this is, of course, all wrong.
- We could now add a new action below this (or better in our own
- user.action file) that explicitly
- un blocks (
- {-block}
) paths with
- adsl
in them (remember, last match in the configuration
- wins). There are various ways to handle such exceptions. Example:
+ Say you have accounts on some sites that you visit regularly, and
+ you don't want to have to log in manually each time. So you'd like
+ to allow persistent cookies for these sites. The
+ allow-all-cookies alias defined above does exactly
+ that, i.e. it disables crunching of cookies in any direction, and the
+ processing of cookies to make them only temporary.
+{ allow-all-cookies }
+ sourceforge.net
+ .yahoo.com
+ .msdn.microsoft.com
+ .redhat.com
+
- { -block }
- /adsl
-
+
+ Your bank is allergic to some filter, but you don't know which, so you disable them all:
- Now the page displays ;-)
- Remember to flush your browser's caches when making these kinds of changes to
- your configuration to insure that you get a freshly delivered page! Or, try
- using Shift+Reload .
+
+{ - filter }
+ .your-home-banking-site.com
- But now what about a situation where we get no explicit matches like
- we did with:
+ Some file types you may not want to filter for various reasons:
+# Technical documentation is likely to contain strings that might
+# erroneously get altered by the JavaScript-oriented filters:
+#
+.tldp.org
+/(.*/)?selfhtml/
- { +block{Path starts with "ads".} +handle-as-image }
- /ads
-
+# And this stupid host sends streaming video with a wrong MIME type,
+# so that Privoxy thinks it is getting HTML and starts filtering:
+#
+stupid-server.example.com/
- That actually was very helpful and pointed us quickly to where the problem
- was. If you don't get this kind of match, then it means one of the default
- rules in the first section of default.action is causing
- the problem. This would require some guesswork, and maybe a little trial and
- error to isolate the offending rule. One likely cause would be one of the
- +filter
actions.
- These tend to be harder to troubleshoot.
- Try adding the URL for the site to one of aliases that turn off
- +filter
:
+ Example of a simple block action. Say you've
+ seen an ad on your favourite page on example.com that you want to get rid of.
+ You have right-clicked the image, selected copy image location
+ and pasted the URL below while removing the leading http://, into a
+ { +block{} } section. Note that { +handle-as-image
+ } need not be specified, since all URLs ending in
+ .gif will be tagged as images by the general rules as set
+ in default.action anyway:
-
- { shop }
- .quietpc.com
- .worldpay.com # for quietpc.com
- .jungle.com
- .scan.co.uk
- .forbes.com
-
+{ + block{Nasty ads.} }
+ www.example.com/nasty-ads/sponsor\.gif
+ another.example.net/more/junk/here/
- { shop }
is an alias
that expands to
- { -filter -session-cookies-only }
.
- Or you could do your own exception to negate filtering:
-
+ The URLs of dynamically generated banners, especially from large banner
+ farms, often don't use the well-known image file name extensions, which
+ makes it impossible for Privoxy to guess
+ the file type just by looking at the URL.
+ You can use the +block-as-image alias defined above for
+ these cases.
+ Note that objects which match this rule but then turn out NOT to be an
+ image are typically rendered as a broken image
icon by the
+ browser. Use cautiously.
-
- { -filter }
- # Disable ALL filter actions for sites in this section
- .forbes.com
- developer.ibm.com
- localhost
-
+{ +block-as-image }
+ .doubleclick.net
+ .fastclick.net
+ /Realmedia/ads/
+ ar.atwola.com/
- This would turn off all filtering for these sites. This is best
- put in user.action , for local site
- exceptions. Note that when a simple domain pattern is used by itself (without
- the subsequent path portion), all sub-pages within that domain are included
- automatically in the scope of the action.
+ Now you noticed that the default configuration breaks Forbes Magazine,
+ but you were too lazy to find out which action is the culprit, and you
+ were again too lazy to give feedback, so
+ you just used the fragile alias on the site, and
+ -- whoa! -- it worked. The fragile
+ aliases disables those actions that are most likely to break a site. Also,
+ good for testing purposes to see if it is Privoxy
+ that is causing the problem or not. We later find other regular sites
+ that misbehave, and add those to our personalized list of troublemakers:
- Images that are inexplicably being blocked, may well be hitting the
-+filter{banners-by-size}
- rule, which assumes
- that images of certain sizes are ad banners (works well
- most of the time since these tend to be standardized).
+
+{ fragile }
+ .forbes.com
+ webmail.example.com
+ .mybank.com
- { fragile }
is an alias that disables most
- actions that are the most likely to cause trouble. This can be used as a
- last resort for problem sites.
+ You like the fun
text replacements in default.filter ,
+ but it is disabled in the distributed actions file.
+ So you'd like to turn it on in your private,
+ update-safe config, once and for all:
-
-
- { fragile }
- # Handle with care: easy to break
- mail.google.
- mybank.example.com
+
+
+{ + filter{fun} }
+ / # For ALL sites!
-
- Remember to flush caches! Note that the
- mail.google reference lacks the TLD portion (e.g.
- .com
). This will effectively match any TLD with
- google in it, such as mail.google.de. ,
- just as an example.
+ Note that the above is not really a good idea: There are exceptions
+ to the filters in default.action for things that
+ really shouldn't be filtered, like code on CVS->Web interfaces. Since
+ user.action has the last word, these exceptions
+ won't be valid for the fun
filtering specified here.
+
- If this still does not work, you will have to go through the remaining
- actions one by one to find which one(s) is causing the problem.
+ You might also worry about how your favourite free websites are
+ funded, and find that they rely on displaying banner advertisements
+ to survive. So you might want to specifically allow banners for those
+ sites that you feel provide value to you:
-
-
-
-
-
- Revision 2.90 2008/09/26 16:53:09 fabiankeil
- Update "What's new" section.
+
- Revision 2.89 2008/09/21 15:38:56 fabiankeil
- Fix Portage tree sync instructions in Gentoo section.
- Anonymously reported at ijbswa-developers@.
+
- Revision 2.88 2008/09/21 14:42:52 fabiankeil
- Add documentation for change-x-forwarded-for{},
- remove documentation for hide-forwarded-for-headers.
+
- Revision 2.87 2008/08/30 15:37:35 fabiankeil
- Update entities.
+
+Filter Files
- Revision 2.86 2008/08/16 10:12:23 fabiankeil
- Merge two sentences and move the URL to the end of the item.
+
+ On-the-fly text substitutions need
+ to be defined in a filter file
. Once defined, they
+ can then be invoked as an action
.
+
- Revision 2.85 2008/08/16 10:04:59 fabiankeil
- Some more syntax fixes. This version actually builds.
+
+ &my-app; supports three different filter actions:
+ filter to
+ rewrite the content that is send to the client,
+ client-header-filter
+ to rewrite headers that are send by the client, and
+ server-header-filter
+ to rewrite headers that are send by the server.
+
- Revision 2.84 2008/08/16 09:42:45 fabiankeil
- Turns out building docs works better if the syntax is valid.
+
+ &my-app; also supports two tagger actions:
+ client-header-tagger
+ and
+ server-header-tagger .
+ Taggers and filters use the same syntax in the filter files, the difference
+ is that taggers don't modify the text they are filtering, but use a rewritten
+ version of the filtered text as tag. The tags can then be used to change the
+ applying actions through sections with tag-patterns.
+
- Revision 2.83 2008/08/16 09:32:02 fabiankeil
- Mention changes since 3.0.9 beta.
- Revision 2.82 2008/08/16 09:00:52 fabiankeil
- Fix example URL pattern (once more with feeling).
+
+ Multiple filter files can be defined through the filterfile config directive. The filters
+ as supplied by the developers are located in
+ default.filter . It is recommended that any locally
+ defined or modified filters go in a separately defined file such as
+ user.filter .
+
- Revision 2.81 2008/08/16 08:51:28 fabiankeil
- Update version-related entities.
+
+ Common tasks for content filters are to eliminate common annoyances in
+ HTML and JavaScript, such as pop-up windows,
+ exit consoles, crippled windows without navigation tools, the
+ infamous <BLINK> tag etc, to suppress images with certain
+ width and height attributes (standard banner sizes or web-bugs),
+ or just to have fun.
+
- Revision 2.80 2008/07/18 16:54:30 fabiankeil
- Remove erroneous whitespace in documentation link.
- Reported by John Chronister in #2021611.
+
+ Enabled content filters are applied to any content whose
+ Content Type
header is recognised as a sign
+ of text-based content, with the exception of text/plain .
+ Use the force-text-mode action
+ to also filter other content.
+
- Revision 2.79 2008/06/27 18:00:53 markm68k
- remove outdated startup information for mac os x
+
+ Substitutions are made at the source level, so if you want to roll
+ your own
filters, you should first be familiar with HTML syntax,
+ and, of course, regular expressions.
+
- Revision 2.78 2008/06/21 17:03:03 fabiankeil
- Fix typo.
+
+ Just like the actions files, the
+ filter file is organized in sections, which are called filters
+ here. Each filter consists of a heading line, that starts with one of the
+ keywords FILTER: ,
+ CLIENT-HEADER-FILTER: or SERVER-HEADER-FILTER:
+ followed by the filter's name , and a short (one line)
+ description of what it does. Below that line
+ come the jobs , i.e. lines that define the actual
+ text substitutions. By convention, the name of a filter
+ should describe what the filter eliminates . The
+ comment is used in the web-based
+ user interface .
+
- Revision 2.77 2008/06/14 13:45:22 fabiankeil
- Re-add a colon I unintentionally removed a few revisions ago.
+
+ Once a filter called name has been defined
+ in the filter file, it can be invoked by using an action of the form
+ + filter{name }
+ in any actions file.
+
- Revision 2.76 2008/06/14 13:21:28 fabiankeil
- Prepare for the upcoming 3.0.9 beta release.
+
+ Filter definitions start with a header line that contains the filter
+ type, the filter name and the filter description.
+ A content filter header line for a filter called foo
could look
+ like this:
+
- Revision 2.75 2008/06/13 16:06:48 fabiankeil
- Update the "What's New in this Release" section with
- the ChangeLog entries changelog2doc.pl could handle.
+
+ FILTER: foo Replace all "foo" with "bar"
+
- Revision 2.74 2008/05/26 15:55:46 fabiankeil
- - Update "default profiles" table.
- - Add some more pcrs redirect examples and note that
- enabling debug 128 helps to get redirects working.
+
+ Below that line, and up to the next header line, come the jobs that
+ define what text replacements the filter executes. They are specified
+ in a syntax that imitates Perl 's
+ s/// operator. If you are familiar with Perl, you
+ will find this to be quite intuitive, and may want to look at the
+ PCRS documentation for the subtle differences to Perl behaviour. Most
+ notably, the non-standard option letter U is supported,
+ which turns the default to ungreedy matching.
+
- Revision 2.73 2008/05/23 14:43:18 fabiankeil
- Remove previously out-commented block that caused syntax problems.
+
+ If you are new to
+ Regular
+ Expressions
, you might want to take a look at
+ the Appendix on regular expressions, and
+ see the Perl
+ manual for
+ the
+ s/// operator's syntax and Perl-style regular
+ expressions in general.
+ The below examples might also help to get you started.
+
- Revision 2.72 2008/05/12 10:26:14 fabiankeil
- Synchronize content filter descriptions with the ones in default.filter.
- Revision 2.71 2008/04/10 17:37:16 fabiankeil
- Actually we use "modern" POSIX 1003.2 regular
- expressions in path patterns, not PCRE.
+
- Revision 2.70 2008/04/10 15:59:12 fabiankeil
- Add another section to the client-header-tagger example that shows
- how to actually change the action settings once the tag is created.
+Filter File Tutorial
+
+ Now, let's complete our foo
content filter. We have already defined
+ the heading, but the jobs are still missing. Since all it does is to replace
+ foo
with bar
, there is only one (trivial) job
+ needed:
+
- Revision 2.69 2008/03/29 12:14:25 fabiankeil
- Remove send-wafer and send-vanilla-wafer actions.
+
+ s/foo/bar/
+
- Revision 2.68 2008/03/28 15:13:43 fabiankeil
- Remove inspect-jpegs action.
+
+ But wait! Didn't the comment say that all occurrences
+ of foo
should be replaced? Our current job will only take
+ care of the first foo
on each page. For global substitution,
+ we'll need to add the g option:
+
- Revision 2.67 2008/03/27 18:31:21 fabiankeil
- Remove kill-popups action.
+
+ s/foo/bar/g
+
- Revision 2.66 2008/03/06 16:33:47 fabiankeil
- If limit-connect isn't used, don't limit CONNECT requests to port 443.
+
+ Our complete filter now looks like this:
+
+
+ FILTER: foo Replace all "foo" with "bar"
+s/foo/bar/g
+
- Revision 2.65 2008/03/04 18:30:40 fabiankeil
- Remove the treat-forbidden-connects-like-blocks action. We now
- use the "blocked" page for forbidden CONNECT requests by default.
+
+ Let's look at some real filters for more interesting examples. Here you see
+ a filter that protects against some common annoyances that arise from JavaScript
+ abuse. Let's look at its jobs one after the other:
+
- Revision 2.64 2008/03/01 14:10:28 fabiankeil
- Use new block syntax. Still needs some polishing.
- Revision 2.63 2008/02/22 05:50:37 markm68k
- fix merge problem
+
+
+FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse
- Revision 2.62 2008/02/11 11:52:23 hal9
- Fix entity ... s/&/&
+# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm
+#
+s|(<script.*)document\.referrer(.*</script>)|$1"Not Your Business!"$2|Usg
+
- Revision 2.61 2008/02/11 03:41:47 markm68k
- more updates for mac os x
+
+ Following the header line and a comment, you see the job. Note that it uses
+ | as the delimiter instead of / , because
+ the pattern contains a forward slash, which would otherwise have to be escaped
+ by a backslash (\ ).
+
- Revision 2.60 2008/02/11 03:40:25 markm68k
- more updates for mac os x
+
+ Now, let's examine the pattern: it starts with the text <script.*
+ enclosed in parentheses. Since the dot matches any character, and *
+ means: Match an arbitrary number of the element left of myself
, this
+ matches <script
, followed by any text, i.e.
+ it matches the whole page, from the start of the first <script> tag.
+
- Revision 2.59 2008/02/11 00:52:34 markm68k
- reflect new changes for mac os x
+
+ That's more than we want, but the pattern continues: document\.referrer
+ matches only the exact string document.referrer
. The dot needed to
+ be escaped , i.e. preceded by a backslash, to take away its
+ special meaning as a joker, and make it just a regular dot. So far, the meaning is:
+ Match from the start of the first <script> tag in a the page, up to, and including,
+ the text document.referrer
, if both are present
+ in the page (and appear in that order).
+
- Revision 2.58 2008/02/03 21:37:40 hal9
- Apply patch from Mark: s/OSX/OS X/
+
+ But there's still more pattern to go. The next element, again enclosed in parentheses,
+ is .*</script> . You already know what .*
+ means, so the whole pattern translates to: Match from the start of the first <script>
+ tag in a page to the end of the last <script> tag, provided that the text
+ document.referrer
appears somewhere in between.
+
- Revision 2.57 2008/02/03 19:10:14 fabiankeil
- Mention forward-socks5.
+
+ This is still not the whole story, since we have ignored the options and the parentheses:
+ The portions of the page matched by sub-patterns that are enclosed in parentheses, will be
+ remembered and be available through the variables $1, $2, ... in
+ the substitute. The U option switches to ungreedy matching, which means
+ that the first .* in the pattern will only eat up
all
+ text in between <script
and the first occurrence
+ of document.referrer
, and that the second .* will
+ only span the text up to the first </script>
+ tag. Furthermore, the s option says that the match may span
+ multiple lines in the page, and the g option again means that the
+ substitution is global.
+
- Revision 2.56 2008/01/31 19:11:35 fabiankeil
- Let the +client-header-filter{hide-tor-exit-notation} example apply
- to all requests as "tainted" Referers aren't limited to exit TLDs.
+
+ So, to summarize, the pattern means: Match all scripts that contain the text
+ document.referrer
. Remember the parts of the script from
+ (and including) the start tag up to (and excluding) the string
+ document.referrer
as $1 , and the part following
+ that string, up to and including the closing tag, as $2 .
+
- Revision 2.55 2008/01/19 21:26:37 hal9
- Add IE7 to configuration section per Gerry.
+
+ Now the pattern is deciphered, but wasn't this about substituting things? So
+ lets look at the substitute: $1"Not Your Business!"$2 is
+ easy to read: The text remembered as $1 , followed by
+ "Not Your Business!" (including
+ the quotation marks!), followed by the text remembered as $2 .
+ This produces an exact copy of the original string, with the middle part
+ (the document.referrer
) replaced by "Not Your
+ Business!" .
+
- Revision 2.54 2008/01/19 17:52:39 hal9
- Re-commit to fix various minor issues for new release.
+
+ The whole job now reads: Replace document.referrer
by
+ "Not Your Business!" wherever it appears inside a
+ <script> tag. Note that this job won't break JavaScript syntax,
+ since both the original and the replacement are syntactically valid
+ string objects. The script just won't have access to the referrer
+ information anymore.
+
- Revision 2.53 2008/01/19 15:03:05 hal9
- Doc sources tagged for 3.0.8 release.
+
+ We'll show you two other jobs from the JavaScript taming department, but
+ this time only point out the constructs of special interest:
+
- Revision 2.52 2008/01/17 01:49:51 hal9
- Change copyright notice for docs s/2007/2008/. All these will be rebuilt soon
- enough.
+
+
+# The status bar is for displaying link targets, not pointless blahblah
+#
+s/window\.status\s*=\s*(['"]).*?\1/dUmMy=1/ig
+
- Revision 2.51 2007/12/23 16:48:24 fabiankeil
- Use more precise example descriptions for the mysterious domain patterns.
+
+ \s stands for whitespace characters (space, tab, newline,
+ carriage return, form feed), so that \s* means: zero
+ or more whitespace
. The ? in .*?
+ makes this matching of arbitrary text ungreedy. (Note that the U
+ option is not set). The ['"] construct means: a single
+ or a double quote
. Finally, \1 is
+ a back-reference to the first parenthesis just like $1 above,
+ with the difference that in the pattern , a backslash indicates
+ a back-reference, whereas in the substitute , it's the dollar.
+
- Revision 2.50 2007/12/08 12:44:36 fabiankeil
- - Remove already commented out pre-3.0.7 changes.
- - Update the "new log defaults" paragraph.
+
+ So what does this job do? It replaces assignments of single- or double-quoted
+ strings to the window.status
object with a dummy assignment
+ (using a variable name that is hopefully odd enough not to conflict with
+ real variables in scripts). Thus, it catches many cases where e.g. pointless
+ descriptions are displayed in the status bar instead of the link target when
+ you move your mouse over links.
+
- Revision 2.49 2007/12/06 18:21:55 fabiankeil
- Update hide-forwarded-for-headers description.
+
+
+# Kill OnUnload popups. Yummy. Test: http://www.zdnet.com/zdsubs/yahoo/tree/yfs.html
+#
+s/(<body [^>]*)onunload(.*>)/$1never$2/iU
+
- Revision 2.48 2007/11/24 19:07:17 fabiankeil
- - Mention request rewriting.
- - Enable the conditional-forge paragraph.
- - Minor rewordings.
+
+ Including the
+ OnUnload
+ event binding in the HTML DOM was a CRIME .
+ When I close a browser window, I want it to close and die. Basta.
+ This job replaces the onunload
attribute in
+ <body>
tags with the dummy word never .
+ Note that the i option makes the pattern matching
+ case-insensitive. Also note that ungreedy matching alone doesn't always guarantee
+ a minimal match: In the first parenthesis, we had to use [^>]*
+ instead of .* to prevent the match from exceeding the
+ <body> tag if it doesn't contain OnUnload
, but the page's
+ content does.
+
- Revision 2.47 2007/11/18 14:59:47 fabiankeil
- A few "Note to Upgraders" updates.
+
+ The last example is from the fun department:
+
- Revision 2.46 2007/11/17 17:24:44 fabiankeil
- - Use new action defaults.
- - Minor fixes and rewordings.
+
+
+FILTER: fun Fun text replacements
- Revision 2.45 2007/11/16 11:48:46 hal9
- Fix one typo, and add a couple of small refinements.
+# Spice the daily news:
+#
+s/microsoft(?!\.com)/MicroSuck/ig
+
- Revision 2.44 2007/11/15 03:30:20 hal9
- Results of spell check.
+
+ Note the (?!\.com) part (a so-called negative lookahead)
+ in the job's pattern, which means: Don't match, if the string
+ .com
appears directly following microsoft
+ in the page. This prevents links to microsoft.com from being trashed, while
+ still replacing the word everywhere else.
+
- Revision 2.43 2007/11/14 18:45:39 fabiankeil
- - Mention some more contributors in the "New in this Release" list.
- - Minor rewordings.
+
+
+# Buzzword Bingo (example for extended regex syntax)
+#
+s* industry[ -]leading \
+| cutting[ -]edge \
+| customer[ -]focused \
+| market[ -]driven \
+| award[ -]winning # Comments are OK, too! \
+| high[ -]performance \
+| solutions[ -]based \
+| unmatched \
+| unparalleled \
+| unrivalled \
+*<font color="red"><b>BINGO!</b></font> \
+*igx
+
- Revision 2.42 2007/11/12 03:32:40 hal9
- Updates for "What's New" and "Notes to Upgraders". Various other changes in
- preparation for new release. User Manual is almost ready.
+
+ The x option in this job turns on extended syntax, and allows for
+ e.g. the liberal use of (non-interpreted!) whitespace for nicer formatting.
+
- Revision 2.41 2007/11/11 16:32:11 hal9
- This is primarily syncing What's New and Note to Upgraders sections with the many
- new features and changes (gleaned from memory but mostly from ChangeLog).
+
+ You get the idea?
+
+
- Revision 2.40 2007/11/10 17:10:59 fabiankeil
- In the first third of the file, mention several times that
- the action editor is disabled by default in 3.0.7 beta and later.
+
- Revision 2.39 2007/11/05 02:34:49 hal9
- Various changes in preparation for the upcoming release. Much yet to be done.
+The Pre-defined Filters
- Revision 2.38 2007/09/22 16:01:42 fabiankeil
- Update embedded show-url-info output.
+
- Revision 2.35 2007/08/26 14:59:49 fabiankeil
- Minor rewordings and fixes.
+
+The distribution default.filter file contains a selection of
+pre-defined filters for your convenience:
+
- Revision 2.34 2007/08/05 15:19:50 fabiankeil
- - Don't claim HTTP/1.1 compliance.
- - Use $ in some of the path pattern examples.
- - Use a hide-user-agent example argument without
- leading and trailing space.
- - Make it clear that the cookie actions work with
- HTTP cookies only.
- - Rephrase the inspect-jpegs text to underline
- that it's only meant to protect against a single
- exploit.
+
+
+ js-annoyances
+
+
+ The purpose of this filter is to get rid of particularly annoying JavaScript abuse.
+ To that end, it
+
+
+
+ replaces JavaScript references to the browser's referrer information
+ with the string "Not Your Business!". This compliments the hide-referrer action on the content level.
+
+
+
+
+ removes the bindings to the DOM's
+ unload
+ event which we feel has no right to exist and is responsible for most exit consoles
, i.e.
+ nasty windows that pop up when you close another one.
+
+
+
+
+ removes code that causes new windows to be opened with undesired properties, such as being
+ full-screen, non-resizeable, without location, status or menu bar etc.
+
+
+
+
+
+ Use with caution. This is an aggressive filter, and can break sites that
+ rely heavily on JavaScript.
+
+
+
- Revision 2.33 2007/07/27 10:57:35 hal9
- Add references for user-agent strings for hide-user-agenet
+
+ js-events
+
+
+ This is a very radical measure. It removes virtually all JavaScript event bindings, which
+ means that scripts can not react to user actions such as mouse movements or clicks, window
+ resizing etc, anymore. Use with caution!
+
+
+ We strongly discourage using this filter as a default since it breaks
+ many legitimate scripts. It is meant for use only on extra-nasty sites (should you really
+ need to go there).
+
+
+
- Revision 2.32 2007/06/07 12:36:22 fabiankeil
- Apply Roland's 29_usermanual.dpatch to fix a bunch
- of syntax errors I collected over the last months.
+
+ html-annoyances
+
+
+ This filter will undo many common instances of HTML based abuse.
+
+
+ The BLINK and MARQUEE tags
+ are neutralized (yeah baby!), and browser windows will be created as
+ resizeable (as of course they should be!), and will have location,
+ scroll and menu bars -- even if specified otherwise.
+
+
+
- Revision 2.31 2007/06/02 14:01:37 fabiankeil
- Start to document forward-override{}.
+
+ content-cookies
+
+
+ Most cookies are set in the HTTP dialog, where they can be intercepted
+ by the
+ crunch-incoming-cookies
+ and crunch-outgoing-cookies
+ actions. But web sites increasingly make use of HTML meta tags and JavaScript
+ to sneak cookies to the browser on the content level.
+
+
+ This filter disables most HTML and JavaScript code that reads or sets
+ cookies. It cannot detect all clever uses of these types of code, so it
+ should not be relied on as an absolute fix. Use it wherever you would also
+ use the cookie crunch actions.
+
+
+
- Revision 2.30 2007/04/25 15:10:36 fabiankeil
- - Describe installation for FreeBSD.
- - Start to document taggers and tag patterns.
- - Don't confuse devils and daemons.
+
+ refresh-tags
+
+
+ Disable any refresh tags if the interval is greater than nine seconds (so
+ that redirections done via refresh tags are not destroyed). This is useful
+ for dial-on-demand setups, or for those who find this HTML feature
+ annoying.
+
+
+
- Revision 2.29 2007/04/05 11:47:51 fabiankeil
- Some updates regarding header filtering,
- handling of compressed content and redirect's
- support for pcrs commands.
+
+ unsolicited-popups
+
+
+ This filter attempts to prevent only unsolicited
pop-up
+ windows from opening, yet still allow pop-up windows that the user
+ has explicitly chosen to open. It was added in version 3.0.1,
+ as an improvement over earlier such filters.
+
+
+ Technical note: The filter works by redefining the window.open JavaScript
+ function to a dummy function, PrivoxyWindowOpen() ,
+ during the loading and rendering phase of each HTML page access, and
+ restoring the function afterward.
+
+
+ This is recommended only for browsers that cannot perform this function
+ reliably themselves. And be aware that some sites require such windows
+ in order to function normally. Use with caution.
+
+
+
- Revision 2.28 2006/12/10 23:42:48 hal9
- Fix various typos reported by Adam P. Thanks.
+
+ all-popups
+
+
+ Attempt to prevent all pop-up windows from opening.
+ Note this should be used with even more discretion than the above, since
+ it is more likely to break some sites that require pop-ups for normal
+ usage. Use with caution.
+
+
+
- Revision 2.27 2006/11/14 01:57:47 hal9
- Dump all docs prior to 3.0.6 release. Various minor changes to faq and user
- manual.
+
+ img-reorder
+
+
+ This is a helper filter that has no value if used alone. It makes the
+ banners-by-size and banners-by-link
+ (see below) filters more effective and should be enabled together with them.
+
+
+
- Revision 2.26 2006/10/24 11:16:44 hal9
- Add new filters.
+
+ banners-by-size
+
+
+ This filter removes image tags purely based on what size they are. Fortunately
+ for us, many ads and banner images tend to conform to certain standardized
+ sizes, which makes this filter quite effective for ad stripping purposes.
+
+
+ Occasionally this filter will cause false positives on images that are not ads,
+ but just happen to be of one of the standard banner sizes.
+
+
+ Recommended only for those who require extreme ad blocking. The default
+ block rules should catch 95+% of all ads without this filter enabled.
+
+
+
- Revision 2.25 2006/10/18 10:50:33 hal9
- Add note that since filters are off in Cautious, compression is ON. Turn off
- compression to make filters work on all sites.
+
+ banners-by-link
+
+
+ This is an experimental filter that attempts to kill any banners if
+ their URLs seem to point to known or suspected click trackers. It is currently
+ not of much value and is not recommended for use by default.
+
+
+
- Revision 2.24 2006/10/03 11:13:54 hal9
- More references to the new filters. Include html this time around.
+
+ webbugs
+
+
+ Webbugs are small, invisible images (technically 1X1 GIF images), that
+ are used to track users across websites, and collect information on them.
+ As an HTML page is loaded by the browser, an embedded image tag causes the
+ browser to contact a third-party site, disclosing the tracking information
+ through the requested URL and/or cookies for that third-party domain, without
+ the user ever becoming aware of the interaction with the third-party site.
+ HTML-ized spam also uses a similar technique to verify email addresses.
+
+
+ This filter removes the HTML code that loads such webbugs
.
+
+
+
- Revision 2.23 2006/10/02 22:43:53 hal9
- Contains new filter definitions from Fabian, and few other miscellaneous
- touch-ups.
+
+ tiny-textforms
+
+
+ A rather special-purpose filter that can be used to enlarge textareas (those
+ multi-line text boxes in web forms) and turn off hard word wrap in them.
+ It was written for the sourceforge.net tracker system where such boxes are
+ a nuisance, but it can be handy on other sites, too.
+
+
+ It is not recommended to use this filter as a default.
+
+
+
- Revision 2.22 2006/09/22 01:27:55 hal9
- Final commit of probably various minor changes here and there. Unless
- something changes this should be ready for pending release.
+
+ jumping-windows
+
+
+ Many consider windows that move, or resize themselves to be abusive. This filter
+ neutralizes the related JavaScript code. Note that some sites might not display
+ or behave as intended when using this filter. Use with caution.
+
+
+
- Revision 2.21 2006/09/20 03:21:36 david__schmidt
- Just the tiniest tweak. Wafer thin!
+
+ frameset-borders
+
+
+ Some web designers seem to assume that everyone in the world will view their
+ web sites using the same browser brand and version, screen resolution etc,
+ because only that assumption could explain why they'd use static frame sizes,
+ yet prevent their frames from being resized by the user, should they be too
+ small to show their whole content.
+
+
+ This filter removes the related HTML code. It should only be applied to sites
+ which need it.
+
+
+
- Revision 2.20 2006/09/10 14:53:54 hal9
- Results of spell check. User manual has some updates to standard.actions file
- info.
+
+ demoronizer
+
+
+ Many Microsoft products that generate HTML use non-standard extensions (read:
+ violations) of the ISO 8859-1 aka Latin-1 character set. This can cause those
+ HTML documents to display with errors on standard-compliant platforms.
+
+
+ This filter translates the MS-only characters into Latin-1 equivalents.
+ It is not necessary when using MS products, and will cause corruption of
+ all documents that use 8-bit character sets other than Latin-1. It's mostly
+ worthwhile for Europeans on non-MS platforms, if weird garbage characters
+ sometimes appear on some pages, or user agents that don't correct for this on
+ the fly.
+
+
+
+
- Revision 2.19 2006/09/08 12:19:02 fabiankeil
- Adjust hide-if-modified-since example values
- to reflect the recent changes.
+
+ shockwave-flash
+
+
+ A filter for shockwave haters. As the name suggests, this filter strips code
+ out of web pages that is used to embed shockwave flash objects.
+
+
+
+
+
- Revision 2.18 2006/09/08 02:38:57 hal9
- Various changes:
- -Fix a number of broken links.
- -Migrate the new Windows service command line options, and reference as
- needed.
- -Rebuild so that can be used with the new "user-manual" config capabilities.
- -Etc.
+
+ quicktime-kioskmode
+
+
+ Change HTML code that embeds Quicktime objects so that kioskmode, which
+ prevents saving, is disabled.
+
+
+
- Revision 2.17 2006/09/05 13:25:12 david__schmidt
- Add Windows service invocation stuff (duplicated) in FAQ and in user manual under Windows startup. One probably ought to reference the other.
+
+ fun
+
+
+ Text replacements for subversive browsing fun. Make fun of your favorite
+ Monopolist or play buzzword bingo.
+
+
+
- Revision 2.16 2006/09/02 12:49:37 hal9
- Various small updates for new actions, filterfiles, etc.
+
+ crude-parental
+
+
+ A demonstration-only filter that shows how Privoxy
+ can be used to delete web content on a keyword basis.
+
+
+
- Revision 2.15 2006/08/30 11:15:22 hal9
- More work on the new actions, especially filter-*-headers, and What's New
- section. User Manual is close to final form for 3.0.4 release. Some tinkering
- and proof reading left to do.
+
+ ie-exploits
+
+
+ An experimental collection of text replacements to disable malicious HTML and JavaScript
+ code that exploits known security holes in Internet Explorer.
+
+
+ Presently, it only protects against Nimda and a cross-site scripting bug, and
+ would need active maintenance to provide more substantial protection.
+
+
+
- Revision 2.14 2006/08/29 10:59:36 hal9
- Add a "Whats New in this release" Section. Further work on multiple filter
- files, and assorted other minor changes.
+
+ site-specifics
+
+
+ Some web sites have very specific problems, the cure for which doesn't apply
+ anywhere else, or could even cause damage on other sites.
+
+
+ This is a collection of such site-specific cures which should only be applied
+ to the sites they were intended for, which is what the supplied
+ default.action file does. Users shouldn't need to change
+ anything regarding this filter.
+
+
+
- Revision 2.13 2006/08/22 11:04:59 hal9
- Silence warnings and errors. This should build now. New filters were only
- stubbed in. More to be done.
+
+ google
+
+
+ A CSS based block for Google text ads. Also removes a width limitation
+ and the toolbar advertisement.
+
+
+
- Revision 2.12 2006/08/14 08:40:39 fabiankeil
- Documented new actions that were part of
- the "minor Privoxy improvements".
+
+ yahoo
+
+
+ Another CSS based block, this time for Yahoo text ads. And removes
+ a width limitation as well.
+
+
+
- Revision 2.11 2006/07/18 14:48:51 david__schmidt
- Reorganizing the repository: swapping out what was HEAD (the old 3.1 branch)
- with what was really the latest development (the v_3_0_branch branch)
+
+ msn
+
+
+ Another CSS based block, this time for MSN text ads. And removes
+ tracking URLs, as well as a width limitation.
+
+
+
- Revision 1.123.2.43 2005/05/23 09:59:10 hal9
- Fix typo 'loose'
+
+ blogspot
+
+
+ Cleans up some Blogspot blogs. Read the fine print before using this one!
+
+
+ This filter also intentionally removes some navigation stuff and sets the
+ page width to 100%. As a result, some rounded corners
would
+ appear to early or not at all and as fixing this would require a browser
+ that understands background-size (CSS3), they are removed instead.
+
+
+
- Revision 1.123.2.42 2004/12/04 14:39:57 hal9
- Fix two minor typos per bug SF report.
+
+ xml-to-html
+
+
+ Server-header filter to change the Content-Type from xml to html.
+
+
+
- Revision 1.123.2.41 2004/03/23 12:58:42 oes
- Fixed an inaccuracy
+
+ html-to-xml
+
+
+ Server-header filter to change the Content-Type from html to xml.
+
+
+
- Revision 1.123.2.40 2004/02/27 12:48:49 hal9
- Add comment re: redirecting to local file system for set-image-blocker may
- is dependent on browser.
+
+ no-ping
+
+
+ Removes the non-standard ping attribute from
+ anchor and area HTML tags.
+
+
+
- Revision 1.123.2.39 2004/01/30 22:31:40 oes
- Added a hint re bookmarklets to Quickstart section
+
+ hide-tor-exit-notation
+
+
+ Client-header filter to remove the Tor exit node notation
+ found in Host and Referer headers.
+
+
+ If &my-app; and Tor are chained and &my-app;
+ is configured to use socks4a, one can use http://www.example.org.foobar.exit/
+ to access the host www.example.org
through the
+ Tor exit node foobar
.
+
+
+ As the HTTP client isn't aware of this notation, it treats the
+ whole string www.example.org.foobar.exit
as host and uses it
+ for the Host
and Referer
headers. From the
+ server's point of view the resulting headers are invalid and can cause problems.
+
+
+ An invalid Referer
header can trigger hot-linking
+ protections, an invalid Host
header will make it impossible for
+ the server to find the right vhost (several domains hosted on the same IP address).
+
+
+ This client-header filter removes the foo.exit
part in those headers
+ to prevent the mentioned problems. Note that it only modifies
+ the HTTP headers, it doesn't make it impossible for the server
+ to detect your Tor exit node based on the IP address
+ the request is coming from.
+
+
+
- Revision 1.123.2.38 2004/01/30 16:47:51 oes
- Some minor clarifications
+
+
- Revision 1.123.2.37 2004/01/29 22:36:11 hal9
- Updates for no longer filtering text/plain, and demoronizer default settings,
- and copyright notice dates.
+
+
- Revision 1.123.2.36 2003/12/10 02:26:26 hal9
- Changed the demoronizer filter description.
+
- Revision 1.123.2.35 2003/11/06 13:36:37 oes
- Updated link to nightly CVS tarball
- Revision 1.123.2.34 2003/06/26 23:50:16 hal9
- Add a small bit on filtering and problems re: source code being corrupted.
- Revision 1.123.2.33 2003/05/08 18:17:33 roro
- Use apt-get instead of dpkg to install Debian package, which is more
- solid, uses the correct and most recent Debian version automatically.
+
- Revision 1.123.2.32 2003/04/11 03:13:57 hal9
- Add small note about only one filterfile (as opposed to multiple actions
- files).
+
+Privoxy's Template Files
+
+ All Privoxy built-in pages, i.e. error pages such as the
+ 404 - No Such Domain
+ error page , the BLOCKED
+ page
+ and all pages of its web-based
+ user interface , are generated from templates .
+ (Privoxy must be running for the above links to work as
+ intended.)
+
- Revision 1.123.2.31 2003/03/26 02:03:43 oes
- Updated hard-coded copyright dates
+
+ These templates are stored in a subdirectory of the configuration
+ directory called templates . On Unixish platforms,
+ this is typically
+ /etc/privoxy/templates/ .
+
- Revision 1.123.2.30 2003/03/24 12:58:56 hal9
- Add new section on Predefined Filters.
+
+ The templates are basically normal HTML files, but with place-holders (called symbols
+ or exports), which Privoxy fills at run time. It
+ is possible to edit the templates with a normal text editor, should you want
+ to customize them. (Not recommended for the casual
+ user ). Should you create your own custom templates, you should use
+ the config setting templdir
+ to specify an alternate location, so your templates do not get overwritten
+ during upgrades.
+
+
+ Note that just like in configuration files, lines starting
+ with # are ignored when the templates are filled in.
+
- Revision 1.123.2.29 2003/03/20 02:45:29 hal9
- More problems with \-\-chroot causing markup problems :(
+
+ The place-holders are of the form @name@ , and you will
+ find a list of available symbols, which vary from template to template,
+ in the comments at the start of each file. Note that these comments are not
+ always accurate, and that it's probably best to look at the existing HTML
+ code to find out which symbols are supported and what they are filled in with.
+
- Revision 1.123.2.28 2003/03/19 00:35:24 hal9
- Manual edit of revision log because 'chroot' (even inside a comment) was
- causing Docbook to hang here (due to double hyphen and the processor thinking
- it was a comment).
+
+ A special application of this substitution mechanism is to make whole
+ blocks of HTML code disappear when a specific symbol is set. We use this
+ for many purposes, one of them being to include the beta warning in all
+ our user interface (CGI) pages when Privoxy
+ is in an alpha or beta development stage:
+
- Revision 1.123.2.27 2003/03/18 19:37:14 oes
- s/Advanced|Radical/Adventuresome/g to avoid complaints re fun filter
+
+
+<!-- @if-unstable-start -->
- Revision 1.123.2.26 2003/03/17 16:50:53 oes
- Added documentation for new chroot option
+ ... beta warning HTML code goes here ...
- Revision 1.123.2.25 2003/03/15 18:36:55 oes
- Adapted to the new filters
+<!-- if-unstable-end@ -->
+
- Revision 1.123.2.24 2002/11/17 06:41:06 hal9
- Move default profiles table from FAQ to U-M, and other minor related changes.
- Add faq on cookies.
+
+ If the "unstable" symbol is set, everything in between and including
+ @if-unstable-start and if-unstable-end@
+ will disappear, leaving nothing but an empty comment:
+
- Revision 1.123.2.23 2002/10/21 02:32:01 hal9
- Updates to the user.action examples section. A few new ones.
+
+ <!-- -->
+
- Revision 1.123.2.22 2002/10/12 00:51:53 hal9
- Add demoronizer to filter section.
+
+ There's also an if-then-else construct and an #include
+ mechanism, but you'll sure find out if you are inclined to edit the
+ templates ;-)
+
- Revision 1.123.2.21 2002/10/10 04:09:35 hal9
- s/Advanced/Radical/ and added very brief note.
+
+ All templates refer to a style located at
+ http://config.privoxy.org/send-stylesheet .
+ This is, of course, locally served by Privoxy
+ and the source for it can be found and edited in the
+ cgi-style.css template.
+
- Revision 1.123.2.20 2002/10/10 03:49:21 hal9
- Add notes to session-cookies-only and Quickstart about pre-existing
- cookies. Also, note content-cookies work differently.
+
- Revision 1.123.2.19 2002/09/26 01:25:36 hal9
- More explanation on Privoxy patterns, more on content-cookies and SSL.
+
- Revision 1.123.2.18 2002/08/22 23:47:58 hal9
- Add 'Documentation' to Privoxy Menu shot in Configuration section to match
- CGIs.
- Revision 1.123.2.17 2002/08/18 01:13:05 hal9
- Spell checked (only one typo this time!).
- Revision 1.123.2.16 2002/08/09 19:20:54 david__schmidt
- Update to Mac OS X startup script name
+
- Revision 1.123.2.15 2002/08/07 17:32:11 oes
- Converted some internal links from ulink to link for PDF creation; no content changed
+Contacting the Developers, Bug Reporting and Feature
+Requests
- Revision 1.123.2.14 2002/08/06 09:16:13 oes
- Nits re: actions file download
+
+ &contacting;
+
- Revision 1.123.2.13 2002/08/02 18:23:19 g_sauthoff
- Just 2 small corrections to the Gentoo sections
+
- Revision 1.123.2.12 2002/08/02 18:17:21 g_sauthoff
- Added 2 Gentoo sections
+
- Revision 1.123.2.11 2002/07/26 15:20:31 oes
- - Added version info to title
- - Added info on new filters
- - Revised parts of the filter file tutorial
- - Added info on where to get updated actions files
- Revision 1.123.2.10 2002/07/25 21:42:29 hal9
- Add brief notes on not proxying non-HTTP protocols.
+
+Privoxy Copyright, License and History
- Revision 1.123.2.9 2002/07/11 03:40:28 david__schmidt
+
+ ©right;
+
- Updated Mac OS X sections due to installation location change
+
+License
+
+ &license;
+
+
+
- Revision 1.123.2.8 2002/06/09 16:36:32 hal9
- Clarifications on filtering and MIME. Hardcode 'latest release' in index.html.
- Revision 1.123.2.7 2002/06/09 00:29:34 hal9
- Touch ups on filtering, in actions section and Anatomy.
+
- Revision 1.123.2.6 2002/06/06 23:11:03 hal9
- Fix broken link. Linkchecked all docs.
+History
+
+ &history;
+
+
- Revision 1.123.2.5 2002/05/29 02:01:02 hal9
- This is break out of the entire config section from u-m, so it can
- eventually be used to generate the comments, etc in the main config file
- so that these are in sync with each other.
+Authors
+
+ &p-authors;
+
+
- Revision 1.123.2.4 2002/05/27 03:28:45 hal9
- Ooops missed something from David.
+
- Revision 1.123.2.3 2002/05/27 03:23:17 hal9
- Fix FIXMEs for OS2 and Mac OS X startup. Fix Redhat typos (should be Red Hat).
- That's a wrap, I think.
+
- Revision 1.123.2.2 2002/05/26 19:02:09 hal9
- Move Amiga stuff around to take of FIXME in start up section.
- Revision 1.123.2.1 2002/05/26 17:04:25 hal9
- -Spellcheck, very minor edits, and sync across branches
+
+See Also
+
+ &seealso;
+
+
- Revision 1.123 2002/05/24 23:19:23 hal9
- Include new image (Proxy setup). More fun with guibutton.
- Minor corrections/clarifications here and there.
- Revision 1.122 2002/05/24 13:24:08 oes
- Added Bookmarklet for one-click pre-filled access to show-url-info
- Revision 1.121 2002/05/23 23:20:17 oes
- - Changed more (all?) references to actions to the
- style.
- - Small fixes in the actions chapter
- - Small clarifications in the quickstart to ad blocking
- - Removed from s since the new doc CSS
- renders them red (bad in TOC).
+
+Appendix
- Revision 1.120 2002/05/23 19:16:43 roro
- Correct Debian specials (installation and startup).
- Revision 1.119 2002/05/22 17:17:05 oes
- Added Security hint
+
+
+Regular Expressions
+
+ Privoxy uses Perl-style regular
+ expressions
in its actions
+ files and filter file,
+ through the PCRE and
+
+ PCRS libraries.
+
- Revision 1.118 2002/05/21 04:54:55 hal9
- -New Section: Quickstart to Ad Blocking
- -Reformat Actions Anatomy to match new CGI layout
+
+ If you are reading this, you probably don't understand what regular
+ expressions
are, or what they can do. So this will be a very brief
+ introduction only. A full explanation would require a book ;-)
+
- Revision 1.117 2002/05/17 13:56:16 oes
- - Reworked & extended Templates chapter
- - Small changes to Regex appendix
- - #included authors.sgml into (C) and hist chapter
+
+ Regular expressions provide a language to describe patterns that can be
+ run against strings of characters (letter, numbers, etc), to see if they
+ match the string or not. The patterns are themselves (sometimes complex)
+ strings of literal characters, combined with wild-cards, and other special
+ characters, called meta-characters. The meta-characters
have
+ special meanings and are used to build complex patterns to be matched against.
+ Perl Compatible Regular Expressions are an especially convenient
+ dialect
of the regular expression language.
+
- Revision 1.116 2002/05/17 03:23:46 hal9
- Fixing merge conflict in Quickstart section.
+
+ To make a simple analogy, we do something similar when we use wild-card
+ characters when listing files with the dir command in DOS.
+ *.* matches all filenames. The special
+ character here is the asterisk which matches any and all characters. We can be
+ more specific and use ? to match just individual
+ characters. So dir file?.text
would match
+ file1.txt
, file2.txt
, etc. We are pattern
+ matching, using a similar technique to regular expressions
!
+
- Revision 1.115 2002/05/16 16:25:00 oes
- Extended the Filter File chapter & minor fixes
+
+ Regular expressions do essentially the same thing, but are much, much more
+ powerful. There are many more special characters
and ways of
+ building complex patterns however. Let's look at a few of the common ones,
+ and then some examples:
+
- Revision 1.114 2002/05/16 09:42:50 oes
- More ulink->link, added some hints to Quickstart section
+
+
+ . - Matches any single character, e.g. a
,
+ A
, 4
, :
, or @
.
+
+
- Revision 1.113 2002/05/15 21:07:25 oes
- Extended and further commented the example actions files
+
+
+ ? - The preceding character or expression is matched ZERO or ONE
+ times. Either/or.
+
+
- Revision 1.112 2002/05/15 03:57:14 hal9
- Spell check. A few minor edits here and there for better syntax and
- clarification.
+
+
+ + - The preceding character or expression is matched ONE or MORE
+ times.
+
+
- Revision 1.111 2002/05/14 23:01:36 oes
- Fixing the fixes
+
+
+ * - The preceding character or expression is matched ZERO or MORE
+ times.
+
+
- Revision 1.110 2002/05/14 19:10:45 oes
- Restored alphabetical order of actions
+
+
+ \ - The escape
character denotes that
+ the following character should be taken literally. This is used where one of the
+ special characters (e.g. .
) needs to be taken literally and
+ not as a special meta-character. Example: example\.com
, makes
+ sure the period is recognized only as a period (and not expanded to its
+ meta-character meaning of any single character).
+
+
- Revision 1.109 2002/05/14 17:23:11 oes
- Renamed the prevent-*-cookies actions, extended aliases section and moved it before the example AFs
+
+
+ [ ] - Characters enclosed in brackets will be matched if
+ any of the enclosed characters are encountered. For instance, [0-9]
+ matches any numeric digit (zero through nine). As an example, we can combine
+ this with +
to match any digit one of more times: [0-9]+
.
+
+
- Revision 1.108 2002/05/14 15:29:12 oes
- Completed proofreading the actions chapter
+
+
+ ( ) - parentheses are used to group a sub-expression,
+ or multiple sub-expressions.
+
+
- Revision 1.107 2002/05/12 03:20:41 hal9
- Small clarifications for 127.0.0.1 vs localhost for listen-address since this
- apparently an important distinction for some OS's.
+
+
+ | - The bar
character works like an
+ or
conditional statement. A match is successful if the
+ sub-expression on either side of |
matches. As an example:
+ /(this|that) example/
uses grouping and the bar character
+ and would match either this example
or that
+ example
, and nothing else.
+
+
- Revision 1.106 2002/05/10 01:48:20 hal9
- This is mostly proposed copyright/licensing additions and changes. Docs
- are still GPL, but licensing and copyright are more visible. Also, copyright
- changed in doc header comments (eliminate references to JB except FAQ).
+
+ These are just some of the ones you are likely to use when matching URLs with
+ Privoxy , and is a long way from a definitive
+ list. This is enough to get us started with a few simple examples which may
+ be more illuminating:
+
- Revision 1.105 2002/05/05 20:26:02 hal9
- Sorting out license vs copyright in these docs.
+
+ /.*/banners/.* - A simple example
+ that uses the common combination of .
and *
to
+ denote any character, zero or more times. In other words, any string at all.
+ So we start with a literal forward slash, then our regular expression pattern
+ (.*
) another literal forward slash, the string
+ banners
, another forward slash, and lastly another
+ .*
. We are building
+ a directory path here. This will match any file with the path that has a
+ directory named banners
in it. The .*
matches
+ any characters, and this could conceivably be more forward slashes, so it
+ might expand into a much longer looking path. For example, this could match:
+ /eye/hate/spammers/banners/annoy_me_please.gif
, or just
+ /banners/annoying.html
, or almost an infinite number of other
+ possible combinations, just so it has banners
in the path
+ somewhere.
+
- Revision 1.104 2002/05/04 08:44:45 swa
- bumped version
+
+ And now something a little more complex:
+
- Revision 1.103 2002/05/04 00:40:53 hal9
- -Remove the TOC first page kludge. It's fixed proper now in ldp.dsl.in.
- -Some minor additions to Quickstart.
+
+ /.*/adv((er)?ts?|ertis(ing|ements?))?/ -
+ We have several literal forward slashes again (/
), so we are
+ building another expression that is a file path statement. We have another
+ .*
, so we are matching against any conceivable sub-path, just so
+ it matches our expression. The only true literal that must
+ match our pattern is adv , together with
+ the forward slashes. What comes after the adv
string is the
+ interesting part.
+
- Revision 1.102 2002/05/03 17:46:00 oes
- Further proofread & reactivated short build instructions
+
+ Remember the ?
means the preceding expression (either a
+ literal character or anything grouped with (...)
in this case)
+ can exist or not, since this means either zero or one match. So
+ ((er)?ts?|ertis(ing|ements?))
is optional, as are the
+ individual sub-expressions: (er)
,
+ (ing|ements?)
, and the s
. The |
+ means or
. We have two of those. For instance,
+ (ing|ements?)
, can expand to match either ing
+ OR ements?
. What is being done here, is an
+ attempt at matching as many variations of advertisement
, and
+ similar, as possible. So this would expand to match just adv
,
+ or advert
, or adverts
, or
+ advertising
, or advertisement
, or
+ advertisements
. You get the idea. But it would not match
+ advertizements
(with a z
). We could fix that by
+ changing our regular expression to:
+ /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/
, which would then match
+ either spelling.
+
- Revision 1.101 2002/05/03 03:58:30 hal9
- Move the user-manual config directive to top of section. Add note about
- Privoxy needing read permissions for configs, and write for logs.
+
+ /.*/advert[0-9]+\.(gif|jpe?g) - Again
+ another path statement with forward slashes. Anything in the square brackets
+ [ ]
can be matched. This is using 0-9
as a
+ shorthand expression to mean any digit one through nine. It is the same as
+ saying 0123456789
. So any digit matches. The +
+ means one or more of the preceding expression must be included. The preceding
+ expression here is what is in the square brackets -- in this case, any digit
+ one through nine. Then, at the end, we have a grouping: (gif|jpe?g)
.
+ This includes a |
, so this needs to match the expression on
+ either side of that bar character also. A simple gif
on one side, and the other
+ side will in turn match either jpeg
or jpg
,
+ since the ?
means the letter e
is optional and
+ can be matched once or not at all. So we are building an expression here to
+ match image GIF or JPEG type image file. It must include the literal
+ string advert
, then one or more digits, and a .
+ (which is now a literal, and not a special character, since it is escaped
+ with \
), and lastly either gif
, or
+ jpeg
, or jpg
. Some possible matches would
+ include: //advert1.jpg
,
+ /nasty/ads/advert1234.gif
,
+ /banners/from/hell/advert99.jpg
. It would not match
+ advert1.gif
(no leading slash), or
+ /adverts232.jpg
(the expression does not include an
+ s
), or /advert1.jsp
(jsp
is not
+ in the expression anywhere).
+
- Revision 1.100 2002/04/29 03:05:55 hal9
- Add clarification on differences of new actions files.
+
+ We are barely scratching the surface of regular expressions here so that you
+ can understand the default Privoxy
+ configuration files, and maybe use this knowledge to customize your own
+ installation. There is much, much more that can be done with regular
+ expressions. Now that you know enough to get started, you can learn more on
+ your own :/
+
- Revision 1.99 2002/04/28 16:59:05 swa
- more structure in starting section
+
+ More reading on Perl Compatible Regular expressions:
+ http://perldoc.perl.org/perlre.html
+
- Revision 1.98 2002/04/28 05:43:59 hal9
- This is the break up of configuration.html into multiple files. This
- will probably break links elsewhere :(
+
+ For information on regular expression based substitutions and their applications
+ in filters, please see the filter file tutorial
+ in this manual.
+
+
- Revision 1.97 2002/04/27 21:04:42 hal9
- -Rewrite of Actions File example.
- -Add section for user-manual directive in config.
+
- Revision 1.96 2002/04/27 05:32:00 hal9
- -Add short section to Filter Files to tie in with +filter action.
- -Start rewrite of examples in Actions Examples (not finished).
- Revision 1.95 2002/04/26 17:23:29 swa
- bookmarks cleaned, changed structure of user manual, screen and programlisting cleanups, and numerous other changes that I forgot
+
+
+Privoxy's Internal Pages
- Revision 1.94 2002/04/26 05:24:36 hal9
- -Add most of Andreas suggestions to Chain of Events section.
- -A few other minor corrections and touch up.
+
+ Since Privoxy proxies each requested
+ web page, it is easy for Privoxy to
+ trap certain special URLs. In this way, we can talk directly to
+ Privoxy , and see how it is
+ configured, see how our rules are being applied, change these
+ rules and other configuration options, and even turn
+ Privoxy's filtering off, all with
+ a web browser.
- Revision 1.92 2002/04/25 18:55:13 hal9
- More catchups on new actions files, and new actions names.
- Other assorted cleanups, and minor modifications.
+
- Revision 1.91 2002/04/24 02:39:31 hal9
- Add 'Chain of Events' section.
+
+ The URLs listed below are the special ones that allow direct access
+ to Privoxy . Of course,
+ Privoxy must be running to access these. If
+ not, you will get a friendly error message. Internet access is not
+ necessary either.
+
- Revision 1.90 2002/04/23 21:41:25 hal9
- Linuxconf is deprecated on RH, substitute chkconfig.
+
+
- Revision 1.89 2002/04/23 21:05:28 oes
- Added hint for startup on Red Hat
+
+
+ Privoxy main page:
+
+
+
+ http://config.privoxy.org/
+
+
+
+ There is a shortcut: http://p.p/ (But it
+ doesn't provide a fall-back to a real page, in case the request is not
+ sent through Privoxy )
+
+
- Revision 1.88 2002/04/23 05:37:54 hal9
- Add AmigaOS install stuff.
+
+
+ Show information about the current configuration, including viewing and
+ editing of actions files:
+
+
+
+ http://config.privoxy.org/show-status
+
+
+
- Revision 1.87 2002/04/23 02:53:15 david__schmidt
- Updated Mac OS X installation section
- Added a few English tweaks here an there
+
+
+ Show the source code version numbers:
+
+
+
+ http://config.privoxy.org/show-version
+
+
+
- Revision 1.86 2002/04/21 01:46:32 hal9
- Re-write actions section.
+
+
+ Show the browser's request headers:
+
+
+
+ http://config.privoxy.org/show-request
+
+
+
- Revision 1.85 2002/04/18 21:23:23 hal9
- Fix ugly typo (mine).
+
+
+ Show which actions apply to a URL and why:
+
+
+
+ http://config.privoxy.org/show-url-info
+
+
+
- Revision 1.84 2002/04/18 21:17:13 hal9
- Spell Redhat correctly (ie Red Hat). A few minor grammar corrections.
+
+
+ Toggle Privoxy on or off. This feature can be turned off/on in the main
+ config file. When toggled off
, Privoxy
+ continues to run, but only as a pass-through proxy, with no actions taking
+ place:
+
+
+
+ http://config.privoxy.org/toggle
+
+
+
+ Short cuts. Turn off, then on:
+
+
+
+ http://config.privoxy.org/toggle?set=disable
+
+
+
+
+ http://config.privoxy.org/toggle?set=enable
+
+
+
- Revision 1.83 2002/04/18 18:21:12 oes
- Added RPM install detail
+
+
- Revision 1.82 2002/04/18 12:04:50 oes
- Cosmetics
+
+ These may be bookmarked for quick reference. See next.
- Revision 1.81 2002/04/18 11:50:24 oes
- Extended Install section - needs fixing by packagers
+
- Revision 1.80 2002/04/18 10:45:19 oes
- Moved text to buildsource.sgml, renamed some filters, details
+
+Bookmarklets
+
+ Below are some bookmarklets
to allow you to easily access a
+ mini
version of some of Privoxy's
+ special pages. They are designed for MS Internet Explorer, but should work
+ equally well in Netscape, Mozilla, and other browsers which support
+ JavaScript. They are designed to run directly from your bookmarks - not by
+ clicking the links below (although that should work for testing).
+
+
+ To save them, right-click the link and choose Add to Favorites
+ (IE) or Add Bookmark
(Netscape). You will get a warning that
+ the bookmark may not be safe
- just click OK. Then you can run the
+ Bookmarklet directly from your favorites/bookmarks. For even faster access,
+ you can put them on the Links
bar (IE) or the Personal
+ Toolbar
(Netscape), and run them with a single click.
+
- Revision 1.79 2002/04/18 03:18:06 hal9
- Spellcheck, and minor touchups.
+
+
- Revision 1.78 2002/04/17 18:04:16 oes
- Proofreading part 2
+
+
+ Privoxy - Enable
+
+
- Revision 1.77 2002/04/17 13:51:23 oes
- Proofreading, part one
+
+
+ Privoxy - Disable
+
+
- Revision 1.76 2002/04/16 04:25:51 hal9
- -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section.
- -Note about proxy may need requests to re-read config files.
+
+
+ Privoxy - Toggle Privoxy (Toggles between enabled and disabled)
+
+
- Revision 1.75 2002/04/12 02:08:48 david__schmidt
- Remove OS/2 building info... it is already in the developer-manual
+
+
+ Privoxy- View Status
+
+
+
+
+
+ Privoxy - Why?
+
+
+
+
- Revision 1.74 2002/04/11 00:54:38 hal9
- Add small section on submitting actions.
+
+ Credit: The site which gave us the general idea for these bookmarklets is
+ www.bookmarklets.com . They
+ have more information about bookmarklets.
+
- Revision 1.73 2002/04/10 18:45:15 swa
- generated
- Revision 1.72 2002/04/10 04:06:19 hal9
- Added actions feedback to Bookmarklets section
+
- Revision 1.71 2002/04/08 22:59:26 hal9
- Version update. Spell chkconfig correctly :)
+
- Revision 1.70 2002/04/08 20:53:56 swa
- ?
- Revision 1.69 2002/04/06 05:07:29 hal9
- -Add privoxy-man-page.sgml, for man page.
- -Add authors.sgml for AUTHORS (and p-authors.sgml)
- -Reworked various aspects of various docs.
- -Added additional comments to sub-docs.
+
+
+Chain of Events
+
+ Let's take a quick look at how some of Privoxy's
+ core features are triggered, and the ensuing sequence of events when a web
+ page is requested by your browser:
+
- Revision 1.68 2002/04/04 18:46:47 swa
- consistent look. reuse of copyright, history et. al.
+
+
+
+
+ First, your web browser requests a web page. The browser knows to send
+ the request to Privoxy , which will in turn,
+ relay the request to the remote web server after passing the following
+ tests:
+
+
+
+
+ Privoxy traps any request for its own internal CGI
+ pages (e.g http://p.p/ ) and sends the CGI page back to the browser.
+
+
+
+
+ Next, Privoxy checks to see if the URL
+ matches any +block
patterns. If
+ so, the URL is then blocked, and the remote web server will not be contacted.
+ +handle-as-image
+ and
+ +handle-as-empty-document
+ are then checked, and if there is no match, an
+ HTML BLOCKED
page is sent back to the browser. Otherwise, if
+ it does match, an image is returned for the former, and an empty text
+ document for the latter. The type of image would depend on the setting of
+ +set-image-blocker
+ (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
+
+
+
+
+ Untrusted URLs are blocked. If URLs are being added to the
+ trust file, then that is done.
+
+
+
+
+ If the URL pattern matches the +fast-redirects
action,
+ it is then processed. Unwanted parts of the requested URL are stripped.
+
+
+
+
+ Now the rest of the client browser's request headers are processed. If any
+ of these match any of the relevant actions (e.g. +hide-user-agent
,
+ etc.), headers are suppressed or forged as determined by these actions and
+ their parameters.
+
+
+
+
+ Now the web server starts sending its response back (i.e. typically a web
+ page).
+
+
+
+
+ First, the server headers are read and processed to determine, among other
+ things, the MIME type (document type) and encoding. The headers are then
+ filtered as determined by the
+ +crunch-incoming-cookies
,
+ +session-cookies-only
,
+ and +downgrade-http-version
+ actions.
+
+
+
+
+ If any +filter
action
+ or +deanimate-gifs
+ action applies (and the document type fits the action), the rest of the page is
+ read into memory (up to a configurable limit). Then the filter rules (from
+ default.filter and any other filter files) are
+ processed against the buffered content. Filters are applied in the order
+ they are specified in one of the filter files. Animated GIFs, if present,
+ are reduced to either the first or last frame, depending on the action
+ setting.The entire page, which is now filtered, is then sent by
+ Privoxy back to your browser.
+
+
+ If neither a +filter
action
+ or +deanimate-gifs
+ matches, then Privoxy passes the raw data through
+ to the client browser as it becomes available.
+
+
+
+
+ As the browser receives the now (possibly filtered) page content, it
+ reads and then requests any URLs that may be embedded within the page
+ source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
+ frames), sounds, etc. For each of these objects, the browser issues a
+ separate request (this is easily viewable in Privoxy's
+ logs). And each such request is in turn processed just as above. Note that a
+ complex web page will have many, many such embedded URLs. If these
+ secondary requests are to a different server, then quite possibly a very
+ differing set of actions is triggered.
+
+
- Revision 1.67 2002/04/04 17:27:57 swa
- more single file to be included at multiple points. make maintaining easier
+
+
+
+ NOTE: This is somewhat of a simplistic overview of what happens with each URL
+ request. For the sake of brevity and simplicity, we have focused on
+ Privoxy's core features only.
+
- Revision 1.66 2002/04/04 06:48:37 hal9
- Structural changes to allow for conditional inclusion/exclusion of content
- based on entity toggles, e.g. 'entity % p-not-stable "INCLUDE"'. And
- definition of internal entities, e.g. 'entity p-version "2.9.13"' that will
- eventually be set by Makefile.
- More boilerplate text for use across multiple docs.
+
- Revision 1.65 2002/04/03 19:52:07 swa
- enhance squid section due to user suggestion
- Revision 1.64 2002/04/03 03:53:43 hal9
- A few minor bug fixes, and touch ups. Ready for review.
+
+
+Troubleshooting: Anatomy of an Action
- Revision 1.63 2002/04/01 16:24:49 hal9
- Define entities to include boilerplate text. See doc/source/*.
+
+ The way Privoxy applies
+ actions and filters
+ to any given URL can be complex, and not always so
+ easy to understand what is happening. And sometimes we need to be able to
+ see just what Privoxy is
+ doing. Especially, if something Privoxy is doing
+ is causing us a problem inadvertently. It can be a little daunting to look at
+ the actions and filters files themselves, since they tend to be filled with
+ regular expressions whose consequences are not
+ always so obvious.
+
- Revision 1.62 2002/03/30 04:15:53 hal9
- - Fix privoxy.org/config links.
- - Paste in Bookmarklets from Toggle page.
- - Move Quickstart nearer top, and minor rework.
+
+ One quick test to see if Privoxy is causing a problem
+ or not, is to disable it temporarily. This should be the first troubleshooting
+ step. See the Bookmarklets section on a quick
+ and easy way to do this (be sure to flush caches afterward!). Looking at the
+ logs is a good idea too. (Note that both the toggle feature and logging are
+ enabled via config file settings, and may need to be
+ turned on
.)
+
+
+ Another easy troubleshooting step to try is if you have done any
+ customization of your installation, revert back to the installed
+ defaults and see if that helps. There are times the developers get complaints
+ about one thing or another, and the problem is more related to a customized
+ configuration issue.
+
- Revision 1.61 2002/03/29 01:31:08 hal9
- Minor update.
+
+ Privoxy also provides the
+ http://config.privoxy.org/show-url-info
+ page that can show us very specifically how actions
+ are being applied to any given URL. This is a big help for troubleshooting.
+
- Revision 1.60 2002/03/27 01:57:34 hal9
- Added more to Anatomy section.
+
+ First, enter one URL (or partial URL) at the prompt, and then
+ Privoxy will tell us
+ how the current configuration will handle it. This will not
+ help with filtering effects (i.e. the +filter
action) from
+ one of the filter files since this is handled very
+ differently and not so easy to trap! It also will not tell you about any other
+ URLs that may be embedded within the URL you are testing. For instance, images
+ such as ads are expressed as URLs within the raw page source of HTML pages. So
+ you will only get info for the actual URL that is pasted into the prompt area
+ -- not any sub-URLs. If you want to know about embedded URLs like ads, you
+ will have to dig those out of the HTML source. Use your browser's View
+ Page Source
option for this. Or right click on the ad, and grab the
+ URL.
+
- Revision 1.59 2002/03/27 00:54:33 hal9
- Touch up intro for new name.
+
+ Let's try an example, google.com ,
+ and look at it one section at a time in a sample configuration (your real
+ configuration may vary):
+
- Revision 1.58 2002/03/26 22:29:55 swa
- we have a new homepage!
+
+
+ Matches for http://www.google.com:
- Revision 1.57 2002/03/24 20:33:30 hal9
- A few minor catch ups with name change.
+ In file: default.action [ View ] [ Edit ]
- Revision 1.56 2002/03/24 16:17:06 swa
- configure needs to be generated.
+ {+change-x-forwarded-for{block}
+ +deanimate-gifs {last}
+ +fast-redirects {check-decoded-url}
+ +filter {refresh-tags}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ +hide-from-header {block}
+ +hide-referrer {forge}
+ +session-cookies-only
+ +set-image-blocker {pattern}
+/
- Revision 1.55 2002/03/24 16:08:08 swa
- we are too lazy to make a block-built
- privoxy logo. hence removed the option.
+ { -session-cookies-only }
+ .google.com
- Revision 1.54 2002/03/24 15:46:20 swa
- name change related issue.
+ { -fast-redirects }
+ .google.com
- Revision 1.53 2002/03/24 11:51:00 swa
- name change. changed filenames.
+In file: user.action [ View ] [ Edit ]
+(no matches in this file)
+
+
- Revision 1.52 2002/03/24 11:01:06 swa
- name change
+
+ This is telling us how we have defined our
+ actions
, and
+ which ones match for our test case, google.com
.
+ Displayed is all the actions that are available to us. Remember,
+ the + sign denotes on
. -
+ denotes off
. So some are on
here, but many
+ are off
. Each example we try may provide a slightly different
+ end result, depending on our configuration directives.
+
+
+ The first listing
+ is for our default.action file. The large, multi-line
+ listing, is how the actions are set to match for all URLs, i.e. our default
+ settings. If you look at your actions
file, this would be the
+ section just below the aliases
section near the top. This
+ will apply to all URLs as signified by the single forward slash at the end
+ of the listing -- /
.
+
- Revision 1.51 2002/03/23 15:13:11 swa
- renamed every reference to the old name with foobar.
- fixed "application foobar application" tag, fixed
- "the foobar" with "foobar". left junkbustser in cvs
- comments and remarks to history untouched.
+
+ But we have defined additional actions that would be exceptions to these general
+ rules, and then we list specific URLs (or patterns) that these exceptions
+ would apply to. Last match wins. Just below this then are two explicit
+ matches for .google.com
. The first is negating our previous
+ cookie setting, which was for +session-cookies-only
+ (i.e. not persistent). So we will allow persistent cookies for google, at
+ least that is how it is in this example. The second turns
+ off any +fast-redirects
+ action, allowing this to take place unmolested. Note that there is a leading
+ dot here -- .google.com
. This will match any hosts and
+ sub-domains, in the google.com domain also, such as
+ www.google.com
or mail.google.com
. But it would not
+ match www.google.de
! So, apparently, we have these two actions
+ defined as exceptions to the general rules at the top somewhere in the lower
+ part of our default.action file, and
+ google.com
is referenced somewhere in these latter sections.
+
- Revision 1.50 2002/03/23 05:06:21 hal9
- Touch up.
+
+ Then, for our user.action file, we again have no hits.
+ So there is nothing google-specific that we might have added to our own, local
+ configuration. If there was, those actions would over-rule any actions from
+ previously processed files, such as default.action .
+ user.action typically has the last word. This is the
+ best place to put hard and fast exceptions,
+
- Revision 1.49 2002/03/21 17:01:05 hal9
- New section in Appendix.
+
+ And finally we pull it all together in the bottom section and summarize how
+ Privoxy is applying all its actions
+ to google.com
:
- Revision 1.48 2002/03/12 06:33:01 hal9
- Catching up to Andreas and re_filterfile changes.
+
- Revision 1.47 2002/03/11 13:13:27 swa
- correct feedback channels
+
+
- Revision 1.46 2002/03/10 00:51:08 hal9
- Added section on JB internal pages in Appendix.
+ Final results:
- Revision 1.45 2002/03/09 17:43:53 swa
- more distros
+ -add-header
+ -block
+ +change-x-forwarded-for{block}
+ -client-header-filter{hide-tor-exit-notation}
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs {last}
+ -downgrade-http-version
+ -fast-redirects
+ -filter {js-events}
+ -filter {content-cookies}
+ -filter {all-popups}
+ -filter {banners-by-link}
+ -filter {tiny-textforms}
+ -filter {frameset-borders}
+ -filter {demoronizer}
+ -filter {shockwave-flash}
+ -filter {quicktime-kioskmode}
+ -filter {fun}
+ -filter {crude-parental}
+ -filter {site-specifics}
+ -filter {js-annoyances}
+ -filter {html-annoyances}
+ +filter {refresh-tags}
+ -filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter {google}
+ -filter {yahoo}
+ -filter {msn}
+ -filter {blogspot}
+ -filter {no-ping}
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-from-header {block}
+ -hide-if-modified-since
+ +hide-referrer {forge}
+ -hide-user-agent
+ -limit-connect
+ -overwrite-last-modified
+ -prevent-compression
+ -redirect
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+ -session-cookies-only
+ +set-image-blocker {pattern}
+
- Revision 1.44 2002/03/09 17:08:48 hal9
- New section on Jon's actions file editor, and move some stuff around.
+
+ Notice the only difference here to the previous listing, is to
+ fast-redirects
and session-cookies-only
,
+ which are activated specifically for this site in our configuration,
+ and thus show in the Final Results
.
+
- Revision 1.43 2002/03/08 00:47:32 hal9
- Added imageblock{pattern}.
+
+ Now another example, ad.doubleclick.net
:
+
- Revision 1.42 2002/03/07 18:16:55 swa
- looks better
+
+
- Revision 1.41 2002/03/07 16:46:43 hal9
- Fix a few markup problems for jade.
+ { +block{Domains starts with "ad"} }
+ ad*.
- Revision 1.40 2002/03/07 16:28:39 swa
- provide correct feedback channels
+ { +block{Domain contains "ad"} }
+ .ad.
- Revision 1.39 2002/03/06 16:19:28 hal9
- Note on perceived filtering slowdown per FR.
+ { +block{Doubleclick banner server} +handle-as-image }
+ .[a-vx-z]*.doubleclick.net
+
+
- Revision 1.38 2002/03/05 23:55:14 hal9
- Stupid I did it again. Double hyphen in comment breaks jade.
+
+ We'll just show the interesting part here - the explicit matches. It is
+ matched three different times. Two +block{}
sections,
+ and a +block{} +handle-as-image
,
+ which is the expanded form of one of our aliases that had been defined as:
+ +block-as-image
. (Aliases
are defined in
+ the first section of the actions file and typically used to combine more
+ than one action.)
+
- Revision 1.37 2002/03/05 23:53:49 hal9
- jade barfs on '- -' embedded in comments. - -user option broke it.
+
+ Any one of these would have done the trick and blocked this as an unwanted
+ image. This is unnecessarily redundant since the last case effectively
+ would also cover the first. No point in taking chances with these guys
+ though ;-) Note that if you want an ad or obnoxious
+ URL to be invisible, it should be defined as ad.doubleclick.net
+ is done here -- as both a +block{}
+ and an
+ +handle-as-image
.
+ The custom alias +block-as-image
just
+ simplifies the process and make it more readable.
+
- Revision 1.36 2002/03/05 22:53:28 hal9
- Add new - - user option.
+
+ One last example. Let's try http://www.example.net/adsl/HOWTO/
.
+ This one is giving us problems. We are getting a blank page. Hmmm ...
+
- Revision 1.35 2002/03/05 00:17:27 hal9
- Added section on command line options.
+
+
- Revision 1.34 2002/03/04 19:32:07 oes
- Changed default port to 8118
+ Matches for http://www.example.net/adsl/HOWTO/:
- Revision 1.33 2002/03/03 19:46:13 hal9
- Emphasis on where/how to report bugs, etc
+ In file: default.action [ View ] [ Edit ]
- Revision 1.32 2002/03/03 09:26:06 joergs
- AmigaOS changes, config is now loaded from PROGDIR: instead of
- AmiTCP:db/junkbuster/ if no configuration file is specified on the
- command line.
+ {-add-header
+ -block
+ +change-x-forwarded-for{block}
+ -client-header-filter{hide-tor-exit-notation}
+ -content-type-overwrite
+ -crunch-client-header
+ -crunch-if-none-match
+ -crunch-incoming-cookies
+ -crunch-outgoing-cookies
+ -crunch-server-header
+ +deanimate-gifs
+ -downgrade-http-version
+ +fast-redirects {check-decoded-url}
+ -filter {js-events}
+ -filter {content-cookies}
+ -filter {all-popups}
+ -filter {banners-by-link}
+ -filter {tiny-textforms}
+ -filter {frameset-borders}
+ -filter {demoronizer}
+ -filter {shockwave-flash}
+ -filter {quicktime-kioskmode}
+ -filter {fun}
+ -filter {crude-parental}
+ -filter {site-specifics}
+ -filter {js-annoyances}
+ -filter {html-annoyances}
+ +filter {refresh-tags}
+ -filter {unsolicited-popups}
+ +filter {img-reorder}
+ +filter {banners-by-size}
+ +filter {webbugs}
+ +filter {jumping-windows}
+ +filter {ie-exploits}
+ -filter {google}
+ -filter {yahoo}
+ -filter {msn}
+ -filter {blogspot}
+ -filter {no-ping}
+ -force-text-mode
+ -handle-as-empty-document
+ -handle-as-image
+ -hide-accept-language
+ -hide-content-disposition
+ +hide-from-header{block}
+ +hide-referer{forge}
+ -hide-user-agent
+ -overwrite-last-modified
+ +prevent-compression
+ -redirect
+ -server-header-filter{xml-to-html}
+ -server-header-filter{html-to-xml}
+ +session-cookies-only
+ +set-image-blocker{blank} }
+ /
- Revision 1.31 2002/03/02 22:45:52 david__schmidt
- Just tweaking
+ { +block{Path contains "ads".} +handle-as-image }
+ /ads
+
+
- Revision 1.30 2002/03/02 22:00:14 hal9
- Updated 'New Features' list. Ran through spell-checker.
+
+ Ooops, the /adsl/
is matching /ads
in our
+ configuration! But we did not want this at all! Now we see why we get the
+ blank page. It is actually triggering two different actions here, and
+ the effects are aggregated so that the URL is blocked, and &my-app; is told
+ to treat the block as if it were an image. But this is, of course, all wrong.
+ We could now add a new action below this (or better in our own
+ user.action file) that explicitly
+ un blocks (
+ {-block}
) paths with
+ adsl
in them (remember, last match in the configuration
+ wins). There are various ways to handle such exceptions. Example:
+
- Revision 1.29 2002/03/02 20:34:07 david__schmidt
- Update OS/2 build section
+
+
- Revision 1.28 2002/02/24 14:34:24 jongfoster
- Formatting changes. Now changing the doctype to DocBook XML 4.1
- will work - no other changes are needed.
+ { -block }
+ /adsl
+
+
- Revision 1.27 2002/01/11 14:14:32 hal9
- Added a very short section on Templates
+
+ Now the page displays ;-)
+ Remember to flush your browser's caches when making these kinds of changes to
+ your configuration to insure that you get a freshly delivered page! Or, try
+ using Shift+Reload .
+
- Revision 1.26 2002/01/09 20:02:50 hal9
- Fix bug re: auto-detect config file changes.
+
+ But now what about a situation where we get no explicit matches like
+ we did with:
+
- Revision 1.25 2002/01/09 18:20:30 hal9
- Touch ups for *.action files.
+
+
- Revision 1.24 2001/12/02 01:13:42 hal9
- Fix typo.
+ { +block{Path starts with "ads".} +handle-as-image }
+ /ads
+
+
- Revision 1.23 2001/12/02 00:20:41 hal9
- Updates for recent changes.
+
+ That actually was very helpful and pointed us quickly to where the problem
+ was. If you don't get this kind of match, then it means one of the default
+ rules in the first section of default.action is causing
+ the problem. This would require some guesswork, and maybe a little trial and
+ error to isolate the offending rule. One likely cause would be one of the
+ +filter
actions.
+ These tend to be harder to troubleshoot.
+ Try adding the URL for the site to one of aliases that turn off
+ +filter
:
+
- Revision 1.22 2001/11/05 23:57:51 hal9
- Minor update for startup now daemon mode.
+
+
- Revision 1.21 2001/10/31 21:11:03 hal9
- Correct 2 minor errors
+ { shop }
+ .quietpc.com
+ .worldpay.com # for quietpc.com
+ .jungle.com
+ .scan.co.uk
+ .forbes.com
+
+
- Revision 1.18 2001/10/24 18:45:26 hal9
- *** empty log message ***
+
+ { shop }
is an alias
that expands to
+ { -filter -session-cookies-only }
.
+ Or you could do your own exception to negate filtering:
- Revision 1.17 2001/10/24 17:10:55 hal9
- Catching up with Jon's recent work, and a few other things.
+
- Revision 1.16 2001/10/21 17:19:21 swa
- wrong url in documentation
+
+
- Revision 1.15 2001/10/14 23:46:24 hal9
- Various minor changes. Fleshed out SEE ALSO section.
+ { -filter }
+ # Disable ALL filter actions for sites in this section
+ .forbes.com
+ developer.ibm.com
+ localhost
+
+
- Revision 1.13 2001/10/10 17:28:33 hal9
- Very minor changes.
+
+ This would turn off all filtering for these sites. This is best
+ put in user.action , for local site
+ exceptions. Note that when a simple domain pattern is used by itself (without
+ the subsequent path portion), all sub-pages within that domain are included
+ automatically in the scope of the action.
+
- Revision 1.12 2001/09/28 02:57:04 hal9
- Ditto :/
+
+ Images that are inexplicably being blocked, may well be hitting the
++filter{banners-by-size}
+ rule, which assumes
+ that images of certain sizes are ad banners (works well
+ most of the time since these tend to be standardized).
+
- Revision 1.11 2001/09/28 02:25:20 hal9
- Ditto.
+
+ { fragile }
is an alias that disables most
+ actions that are the most likely to cause trouble. This can be used as a
+ last resort for problem sites.
+
+
+
- Revision 1.9 2001/09/27 23:50:29 hal9
- A few changes. A short section on regular expression in appendix.
+ { fragile }
+ # Handle with care: easy to break
+ mail.google.
+ mybank.example.com
+
- Revision 1.8 2001/09/25 00:34:59 hal9
- Some additions, and re-arranging.
- Revision 1.7 2001/09/24 14:31:36 hal9
- Diddling.
+
+ Remember to flush caches! Note that the
+ mail.google reference lacks the TLD portion (e.g.
+ .com
). This will effectively match any TLD with
+ google in it, such as mail.google.de. ,
+ just as an example.
+
+
+ If this still does not work, you will have to go through the remaining
+ actions one by one to find which one(s) is causing the problem.
+
- Revision 1.6 2001/09/24 14:10:32 hal9
- Including David's OS/2 installation instructions.
+
- Revision 1.2 2001/09/13 15:27:40 swa
- cosmetics
+
- Revision 1.1 2001/09/12 15:36:41 swa
- source files for junkbuster documentation
+