X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fwebserver%2Fuser-manual%2Factions-file.html;h=7aa0b45ffc4a7609ffc4d39589727e4e2eb18602;hp=9e8e95e6335dd936484d972bcbd36474ee564679;hb=107c84d0c43b24ad437933c75774276f67165959;hpb=69b45dc21f48175fb34a8e1e2f45d46870e37941 diff --git a/doc/webserver/user-manual/actions-file.html b/doc/webserver/user-manual/actions-file.html index 9e8e95e6..7aa0b45f 100644 --- a/doc/webserver/user-manual/actions-file.html +++ b/doc/webserver/user-manual/actions-file.html @@ -1,3615 +1,4648 @@ - - + -
-The actions files are used to define what actions Privoxy takes for which URLs, and thus determines - how ad images, cookies and various other aspects of HTTP content and - transactions are handled, and on which sites (or even parts thereof). - There are a number of such actions, with a wide range of functionality. - Each action does something a little different. These actions give us a - veritable arsenal of tools with which to exert our control, preferences - and independence. Actions can be combined so that their effects are - aggregated when applied against a given set of URLs.
- -There are three action files included with Privoxy with differing purposes:
- -match-all.action - is used to define - which "actions" relating to - banner-blocking, images, pop-ups, content modification, cookie - handling etc should be applied by default. It should be the first - actions file loaded
-default.action - defines many exceptions - (both positive and negative) from the default set of actions that's - configured in match-all.action. It is a set - of rules that should work reasonably well as-is for most users. This - file is only supposed to be edited by the developers. It should be - the second actions file loaded.
-user.action - is intended to be for - local site preferences and exceptions. As an example, if your ISP or - your bank has specific requirements, and need special handling, this - kind of thing should go here. This file will not be upgraded.
-Edit Set to - Cautious Set to Medium - Set to Advanced
- -These have increasing levels of aggressiveness and have no influence on your browsing - unless you select them explicitly in the editor. A default - installation should be pre-set to Cautious. - New users should try this for a while before adjusting the settings - to more aggressive levels. The more aggressive the settings, then the - more likelihood there is of problems such as sites not working as - they should.
- -The Edit button allows you to turn - each action on/off individually for fine-tuning. The Cautious button changes the actions list to - low/safe settings which will activate ad blocking and a minimal set - of Privoxy's features, and - subsequently there will be less of a chance for accidental problems. - The Medium button sets the list to a - medium level of other features and a low level set of privacy - features. The Advanced button sets the - list to a high level of ad blocking and medium level of privacy. See - the chart below. The latter three buttons over-ride any changes via - with the Edit button. More fine-tuning - can be done in the lower sections of this internal page.
- -While the actions file editor allows to enable these settings in - all actions files, they are only supposed to be enabled in the first - one to make sure you don't unintentionally overrule earlier - rules.
- -The default profiles, and their associated actions, as pre-defined - in default.action are:
- -Table 1. Default Configurations
- -Feature | - -Cautious | - -Medium | - -Advanced | -
---|---|---|---|
Ad-blocking Aggressiveness | - -medium | - -high | - -high | -
Ad-filtering by size | - -no | - -yes | - -yes | -
Ad-filtering by link | - -no | - -no | - -yes | -
Pop-up killing | - -blocks only | - -blocks only | - -blocks only | -
Privacy Features | - -low | - -medium | - -medium/high | -
Cookie handling | - -none | - -session-only | - -kill | -
Referer forging | - -no | - -yes | - -yes | -
GIF de-animation | - -no | - -yes | - -yes | -
Fast redirects | - -no | - -no | - -yes | -
HTML taming | - -no | - -no | - -yes | -
JavaScript taming | - -no | - -no | - -yes | -
Web-bug killing | - -no | - -yes | - -yes | -
Image tag reordering | - -no | - -yes | - -yes | -
The list of actions files to be used are defined in the main - configuration file, and are processed in the order they are defined (e.g. - default.action is typically processed before - user.action). The content of these can all be - viewed and edited from http://config.privoxy.org/show-status. The over-riding - principle when applying actions, is that the last action that matches a - given URL wins. The broadest, most general rules go first (defined in - default.action), followed by any exceptions - (typically also in default.action), which are - then followed lastly by any local preferences (typically in user.action). Generally, user.action has the last word.
- -An actions file typically has multiple sections. If you want to use - "aliases" in an actions file, you have to - place the (optional) alias - section at the top of that file. Then comes the default set of rules - which will apply universally to all sites and pages (be very careful with using such a - universal set in user.action or any other - actions file after default.action, because it - will override the result from consulting any previous file). And then - below that, exceptions to the defined universal policies. You can regard - user.action as an appendix to default.action, with the advantage that it is a separate - file, which makes preserving your personal settings across Privoxy upgrades easier.
- -Actions can be used to block anything you want, including ads, - banners, or just some obnoxious URL whose content you would rather not - see. Cookies can be accepted or rejected, or accepted only during the - current browser session (i.e. not written to disk), content can be - modified, some JavaScripts tamed, user-tracking fooled, and much more. - See below for a complete list of - actions.
- -Note that some actions, like - cookie suppression or script disabling, may render some sites unusable - that rely on these techniques to work properly. Finding the right mix - of actions is not always easy and certainly a matter of personal taste. - And, things can always change, requiring refinements in the - configuration. In general, it can be said that the more "aggressive" your default settings (in the top section - of the actions file) are, the more exceptions for "trusted" sites you will have to make later. If, for - example, you want to crunch all cookies per default, you'll have to - make exceptions from that rule for sites that you regularly use and - that require cookies for actually useful purposes, like maybe your - bank, favorite shop, or newspaper.
- -We have tried to provide you with reasonable rules to start from in - the distribution actions files. But there is no general rule of thumb - on these things. There just are too many variables, and sites are - constantly changing. Sooner or later you will want to change the rules - (and read this chapter again :).
-The easiest way to edit the actions files is with a browser by using - our browser-based editor, which can be reached from http://config.privoxy.org/show-status. Note: the config file - option enable-edit-actions must be - enabled for this to work. The editor allows both fine-grained control - over every single feature on a per-URL basis, and easy choosing from - wholesale sets of defaults like "Cautious", - "Medium" or "Advanced". Warning: the "Advanced" setting is more aggressive, and will be more - likely to cause problems for some sites. Experienced users only!
- -If you prefer plain text editing to GUIs, you can of course also - directly edit the the actions files with your favorite text editor. - Look at default.action which is richly - commented with many good examples.
-Actions files are divided into sections. There are special sections, - like the "alias" sections which will be - discussed later. For now let's concentrate on regular sections: They - have a heading line (often split up to multiple lines for readability) - which consist of a list of actions, separated by whitespace and - enclosed in curly braces. Below that, there is a list of URL and tag - patterns, each on a separate line.
- -To determine which actions apply to a request, the URL of the - request is compared to all URL patterns in each "action file". Every time it matches, the list of - applicable actions for the request is incrementally updated, using the - heading of the section in which the pattern is located. The same is - done again for tags and tag patterns later on.
- -If multiple applying sections set the same action differently, the - last match wins. If not, the effects are aggregated. E.g. a URL might - match a regular section with a heading line of { - +handle-as-image - }, then later another one with just { - +block }, resulting in - both actions to - apply. And there may well be cases where you will want to combine - actions together. Such a section then might look like:
- -
- - { +handle-as-image +block{Banner ads.} } - # Block these as if they were images. Send no block page. - banners.example.com - media.example.com/.*banners - .example.com/images/ads/ -+ | + Privoxy 3.0.25 User Manual + | +||
---|---|---|---|
+ Prev + | ++ | ++ Next |
You can trace this process for URL patterns and any given URL by - visiting http://config.privoxy.org/show-url-info.
- -Examples and more detail on this is provided in the Appendix, - Troubleshooting: Anatomy of an - Action section.
-As mentioned, Privoxy uses - "patterns" to determine what actions might apply to which - sites and pages your browser attempts to access. These "patterns" use wild card type pattern matching to achieve a - high degree of flexibility. This allows one expression to be expanded - and potentially match against many similar patterns.
- -Generally, an URL pattern has the form <domain><port>/<path>, where the - <domain>, the <port> and the <path> are optional. (This is why the special - / pattern matches all URLs). Note that the - protocol portion of the URL pattern (e.g. http://) should not be included in the pattern. This is assumed - already!
- -The pattern matching syntax is different for the domain and path - parts of the URL. The domain part uses a simple globbing type matching - technique, while the path part uses more flexible "Regular Expressions" (POSIX - 1003.2).
- -The port part of a pattern is a decimal port number preceded by a - colon (:). If the domain part contains a - numerical IPv6 address, it has to be put into angle brackets - (<, >).
- -is a domain-only pattern and will match any request to - www.example.com, regardless of which - document on that server is requested. So ALL pages in this domain - would be covered by the scope of this action. Note that a simple - example.com is different and would NOT - match.
-means exactly the same. For domain-only patterns, the trailing - / may be omitted.
-matches all the documents on www.example.com whose name starts with /index.html.
-matches only the single document /index.html on www.example.com.
-matches the document /index.html, - regardless of the domain, i.e. on any web server - anywhere.
-Matches any URL because there's no requirement for either the - domain or the path to match anything.
-Matches any URL pointing to TCP port 8000.
-Matches any URL with the host address 2001:db8::1. (Note that the real URL uses plain - brackets, not angle brackets.)
-matches nothing, since it would be interpreted as a domain - name and there is no top-level domain called .html. So its a mistake.
-The matching of the domain part offers some flexible options: if - the domain starts or ends with a dot, it becomes unanchored at that - end. For example:
- -matches any domain with first-level domain com and second-level domain example. For example www.example.com, example.com and foo.bar.baz.example.com. Note that it wouldn't - match if the second-level domain was another-example.
-matches any domain that STARTS with www. - (It also matches the domain www but - most of the time that doesn't matter.)
-matches any domain that CONTAINS .example.. And, by the way, also included would - be any files or documents that exist within that domain since - no path limitations are specified. (Correctly speaking: It - matches any FQDN that contains example - as a domain.) This might be www.example.com, news.example.de, or www.example.net/cgi/testing.pl for instance. All - these cases are matched.
-Additionally, there are wild-cards that you can use in the domain - names themselves. These work similarly to shell globbing type - wild-cards: "*" represents zero or more - arbitrary characters (this is equivalent to the "Regular Expression" based - syntax of ".*"), "?" represents any single character (this is - equivalent to the regular expression syntax of a simple "."), and you can define "character classes" in square brackets which is - similar to the same regular expression technique. All of this can be - freely mixed:
- -matches "adserver.example.com", - "ads.example.com", etc but not - "sfads.example.com"
-matches all of the above, and then some.
-matches www.ipix.com, pictures.epix.com, a.b.c.d.e.upix.com etc.
-matches www1.example.com, - www4.example.cc, wwwd.example.cy, wwwz.example.com etc., but not wwww.example.com.
-While flexible, this is not the sophistication of full regular - expression based syntax.
-Privoxy uses "modern" POSIX 1003.2 "Regular Expressions" for - matching the path portion (after the slash), and is thus more - flexible.
- -There is an Appendix with a - brief quick-start into regular expressions, you also might want to - have a look at your operating system's documentation on regular - expressions (try man re_format).
- -Note that the path pattern is automatically left-anchored at the - "/", i.e. it matches as if it would start - with a "^" (regular expression speak for - the beginning of a line).
- -Please also note that matching in the path is CASE INSENSITIVE by - default, but you can switch to case sensitive at any point in the - pattern by using the "(?-i)" switch: - www.example.com/(?-i)PaTtErN.* will match - only documents whose path starts with PaTtErN in exactly this capitalization.
- -Is equivalent to just ".example.com", since any documents within that - domain are matched with or without the ".*" regular expression. This is redundant
-Will match any page in the domain of "example.com" that is named "index.html", and that is part of some path. For - example, it matches "www.example.com/testing/index.html" but NOT - "www.example.com/index.html" because - the regular expression called for at least two "/'s", thus the path requirement. It also would - match "www.example.com/testing/index_html", because of - the special meta-character ".".
-This regular expression is conditional so it will match any - page named "index.html" regardless - of path which in this case can have one or more "/'s". And this one must contain exactly - ".html" (but does not have to end - with that!).
-This regular expression will match any path of "example.com" that contains any of the words - "ads", "banner", "banners" - (because of the "?") or "junk". The path does not have to end in these - words, just contain them.
-This is very much the same as above, except now it must end - in either ".jpg", ".jpeg", ".gif" or - ".png". So this one is limited to - common image formats.
-There are many, many good examples to be found in default.action, and more tutorials below in Appendix on regular expressions.
-Tag patterns are used to change the applying actions based on the - request's tags. Tags can be created with either the client-header-tagger or - the server-header-tagger - action.
- -Tag patterns have to start with "TAG:", - so Privoxy can tell them apart from - URL patterns. Everything after the colon including white space, is - interpreted as a regular expression with path pattern syntax, except - that tag patterns aren't left-anchored automatically (Privoxy doesn't silently add a "^", you have to do it yourself if you need it).
- -To match all requests that are tagged with "foo" your pattern line should be "TAG:^foo$", "TAG:foo" - would work as well, but it would also match requests whose tags - contain "foo" somewhere. "TAG: foo" wouldn't work as it requires white - space.
- -Sections can contain URL and tag patterns at the same time, but - tag patterns are checked after the URL patterns and thus always - overrule them, even if they are located before the URL patterns.
- -Once a new tag is added, Privoxy checks right away if it's matched - by one of the tag patterns and updates the action settings - accordingly. As a result tags can be used to activate other tagger - actions, as long as these other taggers look for headers that haven't - already be parsed.
- -For example you could tag client requests which use the POST method, then use this tag to activate another - tagger that adds a tag if cookies are sent, and then use a block - action based on the cookie tag. This allows the outcome of one - action, to be input into a subsequent action. However if you'd - reverse the position of the described taggers, and activated the - method tagger based on the cookie tagger, no method tags would be - created. The method tagger would look for the request line, but at - the time the cookie tag is created, the request line has already been - parsed.
- -While this is a limitation you should be aware of, this kind of - indirection is seldom needed anyway and even the example doesn't make - too much sense.
-All actions are disabled by default, until they are explicitly - enabled somewhere in an actions file. Actions are turned on if preceded - with a "+", and turned off if preceded with - a "-". So a +action - means "do that action", e.g. +block means "please block URLs that - match the following patterns", and -block means "don't block URLs that - match the following patterns, even if +block - previously applied."
- -Again, actions are invoked by placing them on a line, enclosed in - curly braces and separated by whitespace, like in {+some-action -some-other-action{some-parameter}}, - followed by a list of URL patterns, one per line, to which they apply. - Together, the actions line and the following pattern lines make up a - section of the actions file.
- -Actions fall into three categories:
- ++ The actions files are used to define what actions Privoxy takes for which URLs, and thus + determines how ad images, cookies and various other aspects of HTTP + content and transactions are handled, and on which sites (or even + parts thereof). There are a number of such actions, with a wide range + of functionality. Each action does something a little different. + These actions give us a veritable arsenal of tools with which to + exert our control, preferences and independence. Actions can be + combined so that their effects are aggregated when applied against a + given set of URLs. +
++ There are three action files included with Privoxy with differing purposes: +
++
Boolean, i.e the action can only be "enabled" or "disabled". - Syntax:
- -
- - +name # enable action name - -name # disable action name -- |
-
Example: +handle-as-image
++ match-all.action - is used to define + which "actions" relating to + banner-blocking, images, pop-ups, content modification, cookie + handling etc should be applied by default. It should be the first + actions file loaded +
Parameterized, where some value is required in order to enable - this type of action. Syntax:
- -
- - +name{param} # enable action and set parameter to param, - # overwriting parameter from previous match if necessary - -name # disable action. The parameter can be omitted -- |
-
Note that if the URL matches multiple positive forms of a - parameterized action, the last match wins, i.e. the params from - earlier matches are simply ignored.
- -Example: +hide-user-agent{Mozilla/5.0 (X11; - U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070602 - Firefox/2.0.0.4}
-Multi-value. These look exactly like parameterized actions, but - they behave differently: If the action applies multiple times to - the same URL, but with different parameters, all the parameters from - all matches - are remembered. This is used for actions that can be executed for - the same request repeatedly, like adding multiple headers, or - filtering through multiple filters. Syntax:
- -
- - +name{param} # enable action and add param to the list of parameters - -name{param} # remove the parameter param from the list of parameters - # If it was the last one left, disable the action. - -name # disable this action completely and remove all parameters from the list -- |
-
Examples: +add-header{X-Fun-Header: Some - text} and +filter{html-annoyances}
-If nothing is specified in any actions file, no "actions" are taken. So in this case Privoxy would just be a normal, non-blocking, - non-filtering proxy. You must specifically enable the privacy and - blocking features you need (although the provided default actions files - will give a good starting point).
- -Later defined action sections always over-ride earlier ones of the - same type. So exceptions to any rules you make, should come in the - latter part of the file (or in a file that is processed later when - using multiple actions files such as user.action). For multi-valued actions, the actions are - applied in the order they are specified. Actions files are processed in - the order they are defined in config (the - default installation has three actions files). It also quite possible - for any given URL to match more than one "pattern" (because of wildcards and regular - expressions), and thus to trigger more than one set of actions! Last - match wins.
- -The list of valid Privoxy actions - are:
- -Confuse log analysis, custom applications
-Sends a user defined HTTP header to the web server.
-Multi-value.
-Any string value is possible. Validity of the defined HTTP - headers is not checked. It is recommended that you use the - "X-" prefix - for custom headers.
-This action may be specified multiple times, in order to - define multiple headers. This is rarely needed for the typical - user. If you don't know what "HTTP - headers" are, you definitely don't need to worry about - this one.
- -Headers added by this action are not modified by other - actions.
-
- -+add-header{X-User-Tracking: sucks} -- |
-
Block ads or other unwanted content
-Requests for URLs to which this action applies are blocked, - i.e. the requests are trapped by Privoxy and the requested URL is never - retrieved, but is answered locally with a substitute page or - image, as determined by the handle-as-image, - set-image-blocker, - and handle-as-empty-document - actions.
-Parameterized.
-A block reason that should be given to the user.
-Privoxy sends a special - "BLOCKED" page for requests to - blocked pages. This page contains the block reason given as - parameter, a link to find out why the block action applies, and - a click-through to the blocked content (the latter only if the - force feature is available and enabled).
- -A very important exception occurs if both block and handle-as-image, - apply to the same request: it will then be replaced by an - image. If set-image-blocker - (see below) also applies, the type of image will be determined - by its parameter, if not, the standard checkerboard pattern is - sent.
- -It is important to understand this process, in order to - understand how Privoxy deals - with ads and other unwanted content. Blocking is a core - feature, and one upon which various other features depend.
- -The filter action can perform a - very similar task, by "blocking" - banner images and other content through rewriting the relevant - URLs in the document's HTML source, so they don't get requested - in the first place. Note that this is a totally different - technique, and it's easy to confuse the two.
-
- -{+block{No nasty stuff for you.}} -# Block and replace with "blocked" page - .nasty-stuff.example.com - -{+block{Doubleclick banners.} +handle-as-image} -# Block and replace with image - .ad.doubleclick.net - .ads.r.us/banners/ - -{+block{Layered ads.} +handle-as-empty-document} -# Block and then ignore - adserver.example.net/.*\.js$ -- |
-
Improve privacy by not forwarding the source of the request - in the HTTP headers.
-Deletes the "X-Forwarded-For:" - HTTP header from the client request, or adds a new one.
-Parameterized.
-"block" to delete the - header.
-"add" to create the header - (or append the client's IP address to an already existing - one).
-It is safe and recommended to use block.
- -Forwarding the source address of the request may make sense - in some multi-user setups but is also a privacy risk.
-
- -+change-x-forwarded-for{block} -- |
-
Rewrite or remove single client headers.
-All client headers to which this action applies are filtered - on-the-fly through the specified regular expression based - substitutions.
-Parameterized.
-The name of a client-header filter, as defined in one of the - filter files.
-Client-header filters are applied to each header on its own, - not to all at once. This makes it easier to diagnose problems, - but on the downside you can't write filters that only change - header x if header y's value is z. You can do that by using - tags though.
- -Client-header filters are executed after the other header - actions have finished and use their output as input.
- -If the request URL gets changed, Privoxy will detect that and use the new - one. This can be used to rewrite the request destination behind - the client's back, for example to specify a Tor exit relay for - certain requests.
- -Please refer to the filter file - chapter to learn which client-header filters are available - by default, and how to create your own.
-
- -# Hide Tor exit notation in Host and Referer Headers -{+client-header-filter{hide-tor-exit-notation}} -/ - -- |
-
Block requests based on their headers.
-Client headers to which this action applies are filtered - on-the-fly through the specified regular expression based - substitutions, the result is used as tag.
-Parameterized.
-The name of a client-header tagger, as defined in one of the - filter files.
-Client-header taggers are applied to each header on its own, - and as the header isn't modified, each tagger "sees" the original.
- -Client-header taggers are the first actions that are - executed and their tags can be used to control every other - action.
-
- -# Tag every request with the User-Agent header -{+client-header-tagger{user-agent}} -/ - -# Tagging itself doesn't change the action -# settings, sections with TAG patterns do: -# -# If it's a download agent, use a different forwarding proxy, -# show the real User-Agent and make sure resume works. -{+forward-override{forward-socks5 10.0.0.2:2222 .} \ - -hide-if-modified-since \ - -overwrite-last-modified \ - -hide-user-agent \ - -filter \ - -deanimate-gifs \ -} -TAG:^User-Agent: NetBSD-ftp/ -TAG:^User-Agent: Novell ZYPP Installer -TAG:^User-Agent: RPM APT-HTTP/ -TAG:^User-Agent: fetch libfetch/ -TAG:^User-Agent: Ubuntu APT-HTTP/ -TAG:^User-Agent: MPlayer/ - -- |
-
Stop useless download menus from popping up, or change the - browser's rendering mode
-Replaces the "Content-Type:" HTTP - server header.
-Parameterized.
-Any string.
-The "Content-Type:" HTTP server - header is used by the browser to decide what to do with the - document. The value of this header can cause the browser to - open a download menu instead of displaying the document by - itself, even if the document's format is supported by the - browser.
- -The declared content type can also affect which rendering - mode the browser chooses. If XHTML is delivered as "text/html", many browsers treat it as yet - another broken HTML document. If it is send as "application/xml", browsers with XHTML support - will only display it, if the syntax is correct.
- -If you see a web site that proudly uses XHTML buttons, but - sets "Content-Type: text/html", you - can use Privoxy to overwrite - it with "application/xml" and - validate the web master's claim inside your XHTML-supporting - browser. If the syntax is incorrect, the browser will complain - loudly.
- -You can also go the opposite direction: if your browser - prints error messages instead of rendering a document falsely - declared as XHTML, you can overwrite the content type with - "text/html" and have it rendered as - broken HTML document.
- -By default content-type-overwrite - only replaces "Content-Type:" - headers that look like some kind of text. If you want to - overwrite it unconditionally, you have to combine it with - force-text-mode. - This limitation exists for a reason, think twice before - circumventing it.
- -Most of the time it's easier to replace this action with a - custom server-header - filter. It allows you to activate it for every - document of a certain site and it will still only replace the - content types you aimed at.
- -Of course you can apply content-type-overwrite to a whole site and then - make URL based exceptions, but it's a lot more work to get the - same precision.
-
- -# Check if www.example.net/ really uses valid XHTML -{ +content-type-overwrite{application/xml} } -www.example.net/ - -# but leave the content type unmodified if the URL looks like a style sheet -{-content-type-overwrite} -www.example.net/.*\.css$ -www.example.net/.*style -- |
-
Remove a client header Privoxy has no dedicated action for.
-Deletes every header sent by the client that contains the - string the user supplied as parameter.
-Parameterized.
-Any string.
-This action allows you to block client headers for which no - dedicated Privoxy action - exists. Privoxy will remove - every client header that contains the string you supplied as - parameter.
- -Regular expressions are not supported and you can't use this - action to block different headers in the same request, unless - they contain the same string.
- -crunch-client-header is only meant - for quick tests. If you have to block several different - headers, or only want to modify parts of them, you should use a - client-header - filter.
- -Warning | -
- Don't block any header without understanding the - consequences. - |
-
- -# Block the non-existent "Privacy-Violation:" client header -{ +crunch-client-header{Privacy-Violation:} } -/ - -- |
-
Prevent yet another way to track the user's steps between - sessions.
-Deletes the "If-None-Match:" HTTP - client header.
-Boolean.
-N/A
-Removing the "If-None-Match:" - HTTP client header is useful for filter testing, where you want - to force a real reload instead of getting status code - "304" which would cause the browser - to use a cached copy of the page.
- -It is also useful to make sure the header isn't used as a - cookie replacement (unlikely but possible).
- -Blocking the "If-None-Match:" - header shouldn't cause any caching problems, as long as the - "If-Modified-Since:" header isn't - blocked or missing as well.
- -It is recommended to use this action together with - hide-if-modified-since - and overwrite-last-modified.
-
- -# Let the browser revalidate cached documents but don't -# allow the server to use the revalidation headers for user tracking. -{+hide-if-modified-since{-60} \ - +overwrite-last-modified{randomize} \ - +crunch-if-none-match} -/ -- |
-
Prevent the web server from setting HTTP cookies on your - system
-Deletes any "Set-Cookie:" HTTP - headers from server replies.
-Boolean.
-N/A
-This action is only concerned with incoming HTTP - cookies. For outgoing HTTP cookies, use crunch-outgoing-cookies. - Use both - to disable HTTP cookies completely.
- -It makes no sense - at all to use this action in conjunction with the - session-cookies-only - action, since it would prevent the session cookies from being - set. See also filter-content-cookies.
-
- -+crunch-incoming-cookies -- |
+ + Feature + | ++ Cautious + | ++ Medium + | ++ Advanced + |
---|
Remove a server header Privoxy has no dedicated action for.
-Deletes every header sent by the server that contains the - string the user supplied as parameter.
-Parameterized.
-Any string.
-This action allows you to block server headers for which no - dedicated Privoxy action - exists. Privoxy will remove - every server header that contains the string you supplied as - parameter.
- -Regular expressions are not supported and you can't use this - action to block different headers in the same request, unless - they contain the same string.
- -crunch-server-header is only meant - for quick tests. If you have to block several different - headers, or only want to modify parts of them, you should use a - custom server-header - filter.
- -Warning | -
- Don't block any header without understanding the - consequences. - |
-
- -# Crunch server headers that try to prevent caching -{ +crunch-server-header{no-cache} } -/ -+ Ad-blocking Aggressiveness |
-
Prevent the web server from reading any HTTP cookies from - your system
-Deletes any "Cookie:" HTTP - headers from client requests.
-Boolean.
-N/A
-This action is only concerned with outgoing HTTP - cookies. For incoming HTTP cookies, use crunch-incoming-cookies. - Use both - to disable HTTP cookies completely.
- -It makes no sense - at all to use this action in conjunction with the - session-cookies-only - action, since it would prevent the session cookies from being - read.
-
- -+crunch-outgoing-cookies -+ medium |
-
Stop those annoying, distracting animated GIF images.
-De-animate GIF animations, i.e. reduce them to their first - or last image.
-Parameterized.
-"last" or "first"
-This will also shrink the images considerably (in bytes, not - pixels!). If the option "first" is - given, the first frame of the animation is used as the - replacement. If "last" is given, the - last frame of the animation is used instead, which probably - makes more sense for most banner animations, but also has the - risk of not showing the entire last frame (if it is only a - delta to an earlier frame).
- -You can safely use this action with patterns that will also - match non-GIF objects, because no attempt will be made at - anything that doesn't look like a GIF.
-
- -+deanimate-gifs{last} -+ high + |
+ + high |
Work around (very rare) problems with HTTP/1.1
-Downgrades HTTP/1.1 client requests and server replies to - HTTP/1.0.
-Boolean.
-N/A
-This is a left-over from the time when Privoxy didn't support important HTTP/1.1 - features well. It is left here for the unlikely case that you - experience HTTP/1.1-related problems with some server out - there.
- -Note that enabling this action is only a workaround. It - should not be enabled for sites that work without it. While it - shouldn't break any pages, it has an (usually negative) - performance impact.
- -If you come across a site where enabling this action helps, - please report it, so the cause of the problem can be analyzed. - If the problem turns out to be caused by a bug in Privoxy it should be fixed so the - following release works without the work around.
-
- -{+downgrade-http-version} -problem-host.example.com -+ Ad-filtering by size + |
+ + no + | ++ yes + | ++ yes |
Fool some click-tracking scripts and speed up indirect - links.
-Detects redirection URLs and redirects the browser without - contacting the redirection server first.
-Parameterized.
-"simple-check" to just search - for the string "http://" to - detect redirection URLs.
-"check-decoded-url" to decode - URLs (if necessary) before searching for redirection - URLs.
-Many sites, like yahoo.com, don't just link to other sites. - Instead, they will link to some script on their own servers, - giving the destination as a parameter, which will then redirect - you to the final target. URLs resulting from this scheme - typically look like: "http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/".
- -Sometimes, there are even multiple consecutive redirects - encoded in the URL. These redirections via scripts make your - web browsing more traceable, since the server from which you - follow such a link can see where you go to. Apart from that, - valuable bandwidth and time is wasted, while your browser asks - the server for one redirect after the other. Plus, it feeds the - advertisers.
- -This feature is currently not very smart and is scheduled - for improvement. If it is enabled by default, you will have to - create some exceptions to this action. It can lead to failures - in several ways:
- -Not every URLs with other URLs as parameters is evil. Some - sites offer a real service that requires this information to - work. For example a validation service needs to know, which - document to validate. fast-redirects - assumes that every URL parameter that looks like another URL is - a redirection target, and will always redirect to the last one. - Most of the time the assumption is correct, but if it isn't, - the user gets redirected anyway.
- -Another failure occurs if the URL contains other parameters - after the URL parameter. The URL: "http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar". - contains the redirection URL "http://www.example.net/", followed by another - parameter. fast-redirects doesn't know - that and will cause a redirect to "http://www.example.net/&foo=bar". Depending - on the target server configuration, the parameter will be - silently ignored or lead to a "page not - found" error. You can prevent this problem by first - using the redirect action to remove - the last part of the URL, but it requires a little effort.
- -To detect a redirection URL, fast-redirects only looks for the string - "http://", either in plain text - (invalid but often used) or encoded as "http%3a//". Some sites use their own URL - encoding scheme, encrypt the address of the target server or - replace it with a database id. In theses cases fast-redirects is fooled and the request reaches - the redirection server where it probably gets logged.
-
- - { +fast-redirects{simple-check} } - one.example.com - - { +fast-redirects{check-decoded-url} } - another.example.com/testing -+ Ad-filtering by link + |
+ + no + | ++ no + | ++ yes + | +
+ Pop-up killing + | ++ blocks only + | ++ blocks only + | ++ blocks only |
Get rid of HTML and JavaScript annoyances, banner - advertisements (by size), do fun text replacements, add - personalized effects, etc.
-All instances of text-based type, most notably HTML and - JavaScript, to which this action applies, can be filtered - on-the-fly through the specified regular expression based - substitutions. (Note: as of version 3.0.3 plain text documents - are exempted from filtering, because web servers often use the - text/plain MIME type for all files - whose type they don't know.)
-Parameterized.
-The name of a content filter, as defined in the filter file. Filters can be defined in - one or more files as defined by the filterfile - option in the config file. default.filter is the collection of filters - supplied by the developers. Locally defined filters should go - in their own file, such as user.filter.
- -When used in its negative form, and without parameters, - all - filtering is completely disabled.
-For your convenience, there are a number of pre-defined - filters available in the distribution filter file that you can - use. See the examples below for a list.
- -Filtering requires buffering the page content, which may - appear to slow down page rendering since nothing is displayed - until all content has passed the filters. (The total time until - the page is completely rendered doesn't change much, but it may - be perceived as slower since the page is not incrementally - displayed.) This effect will be more noticeable on slower - connections.
- -"Rolling your own" filters - requires a knowledge of "Regular Expressions" and - "HTML". This is very - powerful feature, and potentially very intrusive. Filters - should be used with caution, and where an equivalent - "action" is not available.
- -The amount of data that can be filtered is limited to the - buffer-limit option in the - main config file. The default is 4096 - KB (4 Megs). Once this limit is exceeded, the buffered data, - and all pending data, is passed through unfiltered.
- -Inappropriate MIME types, such as zipped files, are not - filtered at all. (Again, only text-based types except plain - text). Encrypted SSL data (from HTTPS servers) cannot be - filtered either, since this would violate the integrity of the - secure transaction. In some situations it might be necessary to - protect certain text, like source code, from filtering by - defining appropriate -filter - exceptions.
- -Compressed content can't be filtered either, but if - Privoxy is compiled with zlib - support and a supported compression algorithm is used (gzip or - deflate), Privoxy can first - decompress the content and then filter it.
- -If you use a Privoxy - version without zlib support, but want filtering to work on as - much documents as possible, even those that would normally be - sent compressed, you must use the prevent-compression - action in conjunction with filter.
- -Content filtering can achieve some of the same effects as - the block action, i.e. it can be - used to block ads and banners. But the mechanism works quite - differently. One effective use, is to block ad banners based on - their size (see below), since many of these seem to be somewhat - standardized.
- -Feedback with suggestions for new - or improved filters is particularly welcome!
- -The below list has only the names and a one-line description - of each predefined filter. There are more verbose - explanations of what these filters do in the filter file chapter.
-
- -+filter{js-annoyances} # Get rid of particularly annoying JavaScript abuse. -+ Privacy Features + |
+ + low + | ++ medium + | ++ medium/high |
- -+filter{js-events} # Kill all JS event bindings and timers (Radically destructive! Only for extra nasty sites). -+ Cookie handling + |
+ + none + | ++ session-only + | ++ kill |
- -+filter{html-annoyances} # Get rid of particularly annoying HTML abuse. -+ Referer forging + |
+ + no + | ++ yes + | ++ yes |
- -+filter{content-cookies} # Kill cookies that come in the HTML or JS content. -+ GIF de-animation + |
+ + no + | ++ yes + | ++ yes |
- -+filter{refresh-tags} # Kill automatic refresh tags (for dial-on-demand setups). -+ Fast redirects + |
+ + no + | ++ no + | ++ yes |
- -+filter{unsolicited-popups} # Disable only unsolicited pop-up windows. Useful if your browser lacks this ability. -+ HTML taming + |
+ + no + | ++ no + | ++ yes |
- -+filter{all-popups} # Kill all popups in JavaScript and HTML. Useful if your browser lacks this ability. -+ JavaScript taming + |
+ + no + | ++ no + | ++ yes + | +
+ Web-bug killing + | ++ no + | ++ yes + | ++ yes + | +
+ Image tag reordering + | ++ no + | ++ yes + | ++ yes |
+ The list of actions files to be used are defined in the main + configuration file, and are processed in the order they are defined + (e.g. default.action is typically processed + before user.action). The content of these + can all be viewed and edited from http://config.privoxy.org/show-status. The over-riding + principle when applying actions, is that the last action that matches + a given URL wins. The broadest, most general rules go first (defined + in default.action), followed by any + exceptions (typically also in default.action), which are then followed lastly by + any local preferences (typically in user.action). + Generally, user.action has the last word. +
++ An actions file typically has multiple sections. If you want to use + "aliases" in an actions file, you have to + place the (optional) alias + section at the top of that file. Then comes the default set of + rules which will apply universally to all sites and pages (be very careful with + using such a universal set in user.action + or any other actions file after default.action, because it will override the result + from consulting any previous file). And then below that, exceptions + to the defined universal policies. You can regard user.action as an appendix to default.action, with the advantage that it is a + separate file, which makes preserving your personal settings across + Privoxy upgrades easier. +
++ Actions can be used to block anything you want, including ads, + banners, or just some obnoxious URL whose content you would rather + not see. Cookies can be accepted or rejected, or accepted only during + the current browser session (i.e. not written to disk), content can + be modified, some JavaScripts tamed, user-tracking fooled, and much + more. See below for a complete + list of actions. +
++ Note that some actions, + like cookie suppression or script disabling, may render some sites + unusable that rely on these techniques to work properly. Finding + the right mix of actions is not always easy and certainly a matter + of personal taste. And, things can always change, requiring + refinements in the configuration. In general, it can be said that + the more "aggressive" your default + settings (in the top section of the actions file) are, the more + exceptions for "trusted" sites you will + have to make later. If, for example, you want to crunch all cookies + per default, you'll have to make exceptions from that rule for + sites that you regularly use and that require cookies for actually + useful purposes, like maybe your bank, favorite shop, or newspaper. +
++ We have tried to provide you with reasonable rules to start from in + the distribution actions files. But there is no general rule of + thumb on these things. There just are too many variables, and sites + are constantly changing. Sooner or later you will want to change + the rules (and read this chapter again :). +
++ The easiest way to edit the actions files is with a browser by + using our browser-based editor, which can be reached from http://config.privoxy.org/show-status. Note: the config + file option enable-edit-actions must be + enabled for this to work. The editor allows both fine-grained + control over every single feature on a per-URL basis, and easy + choosing from wholesale sets of defaults like "Cautious", "Medium" or + "Advanced". Warning: the "Advanced" setting is more aggressive, and will be + more likely to cause problems for some sites. Experienced users + only! +
++ If you prefer plain text editing to GUIs, you can of course also + directly edit the the actions files with your favorite text editor. + Look at default.action which is richly + commented with many good examples. +
++ Actions files are divided into sections. There are special + sections, like the "alias" sections which will + be discussed later. For now let's concentrate on regular sections: + They have a heading line (often split up to multiple lines for + readability) which consist of a list of actions, separated by + whitespace and enclosed in curly braces. Below that, there is a + list of URL and tag patterns, each on a separate line. +
++ To determine which actions apply to a request, the URL of the + request is compared to all URL patterns in each "action file". Every time it matches, the list of + applicable actions for the request is incrementally updated, using + the heading of the section in which the pattern is located. The + same is done again for tags and tag patterns later on. +
++ If multiple applying sections set the same action differently, the + last match wins. If not, the effects are aggregated. E.g. a URL + might match a regular section with a heading line of { +handle-as-image }, + then later another one with just { +block }, resulting in both actions to + apply. And there may well be cases where you will want to combine + actions together. Such a section then might look like: +
++
+
++ { +handle-as-image +block{Banner ads.} } + # Block these as if they were images. Send no block page. + banners.example.com + media.example.com/.*banners + .example.com/images/ads/ ++ |
+
+ You can trace this process for URL patterns and any given URL by + visiting http://config.privoxy.org/show-url-info. +
++ Examples and more detail on this is provided in the Appendix, Troubleshooting: Anatomy of an + Action section. +
++ As mentioned, Privoxy uses "patterns" to determine what actions might apply to + which sites and pages your browser attempts to access. These "patterns" use wild card type pattern matching to + achieve a high degree of flexibility. This allows one expression to + be expanded and potentially match against many similar patterns. +
++ Generally, an URL pattern has the form <host><port>/<path>, where the <host>, the <port> and the <path> are optional. (This is why the special + / pattern matches all URLs). Note that the + protocol portion of the URL pattern (e.g. http://) should not be included in the pattern. This is + assumed already! +
++ The pattern matching syntax is different for the host and path + parts of the URL. The host part uses a simple globbing type + matching technique, while the path part uses more flexible "Regular Expressions" (POSIX + 1003.2). +
++ The port part of a pattern is a decimal port number preceded by a + colon (:). If the host part contains a + numerical IPv6 address, it has to be put into angle brackets (<, >). +
++ is a host-only pattern and will match any request to www.example.com, regardless of which + document on that server is requested. So ALL pages in this + domain would be covered by the scope of this action. Note + that a simple example.com is + different and would NOT match. +
++ means exactly the same. For host-only patterns, the trailing + / may be omitted. +
++ matches all the documents on www.example.com whose name starts with /index.html. +
++ matches only the single document /index.html on www.example.com. +
++ matches the document /index.html, + regardless of the domain, i.e. on any web server anywhere. +
++ Matches any URL because there's no requirement for either the + domain or the path to match anything. +
++ Matches any URL pointing to TCP port 8000. +
++ Matches any URL with the host address 10.0.0.1. (Note that the real URL uses plain + brackets, not angle brackets.) +
++ Matches any URL with the host address 2001:db8::1. (Note that the real URL uses + plain brackets, not angle brackets.) +
++ matches nothing, since it would be interpreted as a domain + name and there is no top-level domain called .html. So its a mistake. +
++ The matching of the host part offers some flexible options: if + the host pattern starts or ends with a dot, it becomes unanchored + at that end. The host pattern is often referred to as domain + pattern as it is usually used to match domain names and not IP + addresses. For example: +
++ matches any domain with first-level domain com and second-level domain example. For example www.example.com, example.com and foo.bar.baz.example.com. Note that it + wouldn't match if the second-level domain was another-example. +
++ matches any domain that STARTS with www. (It also matches the domain www but most of the time that doesn't + matter.) +
++ matches any domain that CONTAINS .example.. And, by the way, also included + would be any files or documents that exist within that + domain since no path limitations are specified. (Correctly + speaking: It matches any FQDN that contains example as a domain.) This might be www.example.com, news.example.de, or www.example.net/cgi/testing.pl for instance. + All these cases are matched. +
++ Additionally, there are wild-cards that you can use in the domain + names themselves. These work similarly to shell globbing type + wild-cards: "*" represents zero or + more arbitrary characters (this is equivalent to the "Regular Expression" based + syntax of ".*"), "?" represents any single character (this is + equivalent to the regular expression syntax of a simple "."), and you can define "character classes" in square brackets which is + similar to the same regular expression technique. All of this can + be freely mixed: +
++ matches "adserver.example.com", + "ads.example.com", etc but not + "sfads.example.com" +
++ matches all of the above, and then some. +
++ matches www.ipix.com, pictures.epix.com, a.b.c.d.e.upix.com etc. +
++ matches www1.example.com, www4.example.cc, wwwd.example.cy, wwwz.example.com etc., but not wwww.example.com. +
++ While flexible, this is not the sophistication of full regular + expression based syntax. +
++ Privoxy uses "modern" POSIX 1003.2 "Regular Expressions" for + matching the path portion (after the slash), and is thus more + flexible. +
++ There is an Appendix with a + brief quick-start into regular expressions, you also might want + to have a look at your operating system's documentation on + regular expressions (try man re_format). +
++ Note that the path pattern is automatically left-anchored at the + "/", i.e. it matches as if it would + start with a "^" (regular expression + speak for the beginning of a line). +
++ Please also note that matching in the path is CASE INSENSITIVE by + default, but you can switch to case sensitive at any point in the + pattern by using the "(?-i)" switch: + www.example.com/(?-i)PaTtErN.* will + match only documents whose path starts with PaTtErN in exactly this capitalization. +
++ Is equivalent to just ".example.com", since any documents within + that domain are matched with or without the ".*" regular expression. This is redundant +
++ Will match any page in the domain of "example.com" that is named "index.html", and that is part of some path. + For example, it matches "www.example.com/testing/index.html" but NOT + "www.example.com/index.html" + because the regular expression called for at least two + "/'s", thus the path + requirement. It also would match "www.example.com/testing/index_html", + because of the special meta-character ".". +
++ This regular expression is conditional so it will match any + page named "index.html" + regardless of path which in this case can have one or more + "/'s". And this one must contain + exactly ".html" (but does not + have to end with that!). +
++ This regular expression will match any path of "example.com" that contains any of the words + "ads", "banner", "banners" (because of the "?") or "junk". + The path does not have to end in these words, just contain + them. +
++ This is very much the same as above, except now it must end + in either ".jpg", ".jpeg", ".gif" + or ".png". So this one is + limited to common image formats. +
++ There are many, many good examples to be found in default.action, and more tutorials below in Appendix on regular expressions. +
++ Request tag patterns are used to change the applying actions + based on the request's tags. Tags can be created based on HTTP + headers with either the client-header-tagger + or the server-header-tagger + action. +
++ Request tag patterns have to start with "TAG:", so Privoxy can tell them apart from other + patterns. Everything after the colon including white space, is + interpreted as a regular expression with path pattern syntax, + except that tag patterns aren't left-anchored automatically + (Privoxy doesn't silently add a + "^", you have to do it yourself if you + need it). +
++ To match all requests that are tagged with "foo" your pattern line should be "TAG:^foo$", "TAG:foo" + would work as well, but it would also match requests whose tags + contain "foo" somewhere. "TAG: foo" wouldn't work as it requires white + space. +
++ Sections can contain URL and request tag patterns at the same + time, but request tag patterns are checked after the URL patterns + and thus always overrule them, even if they are located before + the URL patterns. +
++ Once a new request tag is added, Privoxy checks right away if + it's matched by one of the request tag patterns and updates the + action settings accordingly. As a result request tags can be used + to activate other tagger actions, as long as these other taggers + look for headers that haven't already be parsed. +
++ For example you could tag client requests which use the POST method, then use this tag to activate + another tagger that adds a tag if cookies are sent, and then use + a block action based on the cookie tag. This allows the outcome + of one action, to be input into a subsequent action. However if + you'd reverse the position of the described taggers, and + activated the method tagger based on the cookie tagger, no method + tags would be created. The method tagger would look for the + request line, but at the time the cookie tag is created, the + request line has already been parsed. +
++ While this is a limitation you should be aware of, this kind of + indirection is seldom needed anyway and even the example doesn't + make too much sense. +
++ To match requests that do not have a certain request tag, specify + a negative tag pattern by prefixing the tag pattern line with + either "NO-REQUEST-TAG:" or "NO-RESPONSE-TAG:" instead of "TAG:". +
++ Negative request tag patterns created with "NO-REQUEST-TAG:" are checked after all client + headers are scanned, the ones created with "NO-RESPONSE-TAG:" are checked after all server + headers are scanned. In both cases all the created tags are + considered. +
++ Warning + | +
+ + This is an experimental feature. The syntax is likely to + change in future versions. + + |
+
+ Client tag patterns are not set based on HTTP headers but based + on the client's IP address. Users can enable them themselves, but + the Privoxy admin controls which tags are available and what + their effect is. +
++ After a client-specific tag has been defined with the client-specific-tag, + directive, action sections can be activated based on the tag by + using a CLIENT-TAG pattern. The CLIENT-TAG pattern is evaluated + at the same priority as URL patterns, as a result the last + matching pattern wins. Tags that are created based on client or + server headers are evaluated later on and can overrule CLIENT-TAG + and URL patterns! +
++ The tag is set for all requests that come from clients that + requested it to be set. Note that "clients" are differentiated by + IP address, if the IP address changes the tag has to be requested + again. +
++ Clients can request tags to be set by using the CGI interface http://config.privoxy.org/client-tags. +
++ Example: +
++
+
++# If the admin defined the client-specific-tag circumvent-blocks, +# and the request comes from a client that previously requested +# the tag to be set, overrule all previous +block actions that +# are enabled based on URL to CLIENT-TAG patterns. +{-block} +CLIENT-TAG:^circumvent-blocks$ + +# This section is not overruled because it's located after +# the previous one. +{+block{Nobody is supposed to request this.}} +example.org/blocked-example-page ++ |
+
+ All actions are disabled by default, until they are explicitly + enabled somewhere in an actions file. Actions are turned on if + preceded with a "+", and turned off if + preceded with a "-". So a +action means "do that + action", e.g. +block means "please block URLs that match the following + patterns", and -block means "don't block URLs that match the following patterns, + even if +block previously + applied." +
++ Again, actions are invoked by placing them on a line, enclosed in + curly braces and separated by whitespace, like in {+some-action -some-other-action{some-parameter}}, + followed by a list of URL patterns, one per line, to which they + apply. Together, the actions line and the following pattern lines + make up a section of the actions file. +
++ Actions fall into three categories: +
++
++ Boolean, i.e the action can only be "enabled" or "disabled". Syntax: +
++
+
++ +name # enable action name + -name # disable action name ++ |
+
+ Example: +handle-as-image +
++ Parameterized, where some value is required in order to enable + this type of action. Syntax: +
++
+
++ +name{param} # enable action and set parameter to param, + # overwriting parameter from previous match if necessary + -name # disable action. The parameter can be omitted ++ |
+
+ Note that if the URL matches multiple positive forms of a + parameterized action, the last match wins, i.e. the params from + earlier matches are simply ignored. +
++ Example: +hide-user-agent{Mozilla/5.0 (X11; + U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070602 + Firefox/2.0.0.4} +
++ Multi-value. These look exactly like parameterized actions, but + they behave differently: If the action applies multiple times + to the same URL, but with different parameters, all the parameters + from all + matches are remembered. This is used for actions that can be + executed for the same request repeatedly, like adding multiple + headers, or filtering through multiple filters. Syntax: +
++
+
++ +name{param} # enable action and add param to the list of parameters + -name{param} # remove the parameter param from the list of parameters + # If it was the last one left, disable the action. + -name # disable this action completely and remove all parameters from the list ++ |
+
+ Examples: +add-header{X-Fun-Header: Some + text} and +filter{html-annoyances} +
++ If nothing is specified in any actions file, no "actions" are taken. So in this case Privoxy would just be a normal, non-blocking, + non-filtering proxy. You must specifically enable the privacy and + blocking features you need (although the provided default actions + files will give a good starting point). +
++ Later defined action sections always over-ride earlier ones of the + same type. So exceptions to any rules you make, should come in the + latter part of the file (or in a file that is processed later when + using multiple actions files such as user.action). For multi-valued actions, the actions + are applied in the order they are specified. Actions files are + processed in the order they are defined in config (the default installation has three actions + files). It also quite possible for any given URL to match more than + one "pattern" (because of wildcards and + regular expressions), and thus to trigger more than one set of + actions! Last match wins. +
++ The list of valid Privoxy actions + are: +
++ Confuse log analysis, custom applications +
++ Sends a user defined HTTP header to the web server. +
++ Multi-value. +
++ Any string value is possible. Validity of the defined HTTP + headers is not checked. It is recommended that you use the + "X-" + prefix for custom headers. +
++ This action may be specified multiple times, in order to + define multiple headers. This is rarely needed for the + typical user. If you don't know what "HTTP headers" are, you definitely don't + need to worry about this one. +
++ Headers added by this action are not modified by other + actions. +
++
+
++# Add a DNT ("Do not track") header to all requests, +# event to those that already have one. +# +# This is just an example, not a recommendation. +# +# There is no reason to believe that user-tracking websites care +# about the DNT header and depending on the User-Agent, adding the +# header may make user-tracking easier. +{+add-header{DNT: 1}} +/ ++ |
+
+ Block ads or other unwanted content +
++ Requests for URLs to which this action applies are blocked, + i.e. the requests are trapped by Privoxy and the requested URL is never + retrieved, but is answered locally with a substitute page + or image, as determined by the handle-as-image, + set-image-blocker, + and handle-as-empty-document + actions. +
++ Parameterized. +
++ A block reason that should be given to the user. +
++ Privoxy sends a special + "BLOCKED" page for requests to + blocked pages. This page contains the block reason given as + parameter, a link to find out why the block action applies, + and a click-through to the blocked content (the latter only + if the force feature is available and enabled). +
++ A very important exception occurs if both block and handle-as-image, + apply to the same request: it will then be replaced by an + image. If set-image-blocker + (see below) also applies, the type of image will be + determined by its parameter, if not, the standard + checkerboard pattern is sent. +
++ It is important to understand this process, in order to + understand how Privoxy + deals with ads and other unwanted content. Blocking is a + core feature, and one upon which various other features + depend. +
++ The filter action can + perform a very similar task, by "blocking" banner images and other content + through rewriting the relevant URLs in the document's HTML + source, so they don't get requested in the first place. + Note that this is a totally different technique, and it's + easy to confuse the two. +
++
+
++{+block{No nasty stuff for you.}} +# Block and replace with "blocked" page + .nasty-stuff.example.com + +{+block{Doubleclick banners.} +handle-as-image} +# Block and replace with image + .ad.doubleclick.net + .ads.r.us/banners/ + +{+block{Layered ads.} +handle-as-empty-document} +# Block and then ignore + adserver.example.net/.*\.js$ ++ |
+
+ Improve privacy by not forwarding the source of the request + in the HTTP headers. +
++ Deletes the "X-Forwarded-For:" + HTTP header from the client request, or adds a new one. +
++ Parameterized. +
++ "block" to delete the + header. +
++ "add" to create the header + (or append the client's IP address to an already + existing one). +
++ It is safe and recommended to use block. +
++ Forwarding the source address of the request may make sense + in some multi-user setups but is also a privacy risk. +
++
+
+++change-x-forwarded-for{block} ++ |
+
+ Rewrite or remove single client headers. +
++ All client headers to which this action applies are + filtered on-the-fly through the specified regular + expression based substitutions. +
++ Multi-value. +
++ The name of a client-header filter, as defined in one of + the filter files. +
++ Client-header filters are applied to each header on its + own, not to all at once. This makes it easier to diagnose + problems, but on the downside you can't write filters that + only change header x if header y's value is z. You can do + that by using tags though. +
++ Client-header filters are executed after the other header + actions have finished and use their output as input. +
++ If the request URI gets changed, Privoxy will detect that and use the + new one. This can be used to rewrite the request + destination behind the client's back, for example to + specify a Tor exit relay for certain requests. +
++ Please refer to the filter file + chapter to learn which client-header filters are + available by default, and how to create your own. +
++
+
++# Hide Tor exit notation in Host and Referer Headers +{+client-header-filter{hide-tor-exit-notation}} +/ + ++ |
+
+ Block requests based on their headers. +
++ Client headers to which this action applies are filtered + on-the-fly through the specified regular expression based + substitutions, the result is used as tag. +
++ Multi-value. +
++ The name of a client-header tagger, as defined in one of + the filter files. +
++ Client-header taggers are applied to each header on its + own, and as the header isn't modified, each tagger "sees" the original. +
++ Client-header taggers are the first actions that are + executed and their tags can be used to control every other + action. +
++
+
++# Tag every request with the User-Agent header +{+client-header-tagger{user-agent}} +/ + +# Tagging itself doesn't change the action +# settings, sections with TAG patterns do: +# +# If it's a download agent, use a different forwarding proxy, +# show the real User-Agent and make sure resume works. +{+forward-override{forward-socks5 10.0.0.2:2222 .} \ + -hide-if-modified-since \ + -overwrite-last-modified \ + -hide-user-agent \ + -filter \ + -deanimate-gifs \ +} +TAG:^User-Agent: NetBSD-ftp/ +TAG:^User-Agent: Novell ZYPP Installer +TAG:^User-Agent: RPM APT-HTTP/ +TAG:^User-Agent: fetch libfetch/ +TAG:^User-Agent: Ubuntu APT-HTTP/ +TAG:^User-Agent: MPlayer/ - ++ |
+
+
+
++# Tag all requests with the Range header set +{+client-header-tagger{range-requests}} +/ + +# Disable filtering for the tagged requests. +# +# With filtering enabled Privoxy would remove the Range headers +# to be able to filter the whole response. The downside is that +# it prevents clients from resuming downloads or skipping over +# parts of multimedia files. +{-filter -deanimate-gifs} +TAG:^RANGE-REQUEST$ + ++ |
+
+ Stop useless download menus from popping up, or change the + browser's rendering mode +
++ Replaces the "Content-Type:" + HTTP server header. +
++ Parameterized. +
++ Any string. +
++ The "Content-Type:" HTTP server + header is used by the browser to decide what to do with the + document. The value of this header can cause the browser to + open a download menu instead of displaying the document by + itself, even if the document's format is supported by the + browser. +
++ The declared content type can also affect which rendering + mode the browser chooses. If XHTML is delivered as "text/html", many browsers treat it as + yet another broken HTML document. If it is send as "application/xml", browsers with XHTML + support will only display it, if the syntax is correct. +
++ If you see a web site that proudly uses XHTML buttons, but + sets "Content-Type: text/html", + you can use Privoxy to + overwrite it with "application/xml" and validate the web + master's claim inside your XHTML-supporting browser. If the + syntax is incorrect, the browser will complain loudly. +
++ You can also go the opposite direction: if your browser + prints error messages instead of rendering a document + falsely declared as XHTML, you can overwrite the content + type with "text/html" and have + it rendered as broken HTML document. +
++ By default content-type-overwrite + only replaces "Content-Type:" + headers that look like some kind of text. If you want to + overwrite it unconditionally, you have to combine it with + force-text-mode. + This limitation exists for a reason, think twice before + circumventing it. +
++ Most of the time it's easier to replace this action with a + custom server-header + filter. It allows you to activate it for every + document of a certain site and it will still only replace + the content types you aimed at. +
++ Of course you can apply content-type-overwrite to a whole site and + then make URL based exceptions, but it's a lot more work to + get the same precision. +
++
+
++# Check if www.example.net/ really uses valid XHTML +{ +content-type-overwrite{application/xml} } +www.example.net/ + +# but leave the content type unmodified if the URL looks like a style sheet +{-content-type-overwrite} +www.example.net/.*\.css$ +www.example.net/.*style ++ |
+
+ Remove a client header Privoxy has no dedicated action for. +
++ Deletes every header sent by the client that contains the + string the user supplied as parameter. +
++ Parameterized. +
++ Any string. +
++ This action allows you to block client headers for which no + dedicated Privoxy action + exists. Privoxy will + remove every client header that contains the string you + supplied as parameter. +
++ Regular expressions are not supported and you can't use this + action to block different headers in the same request, + unless they contain the same string. +
++ crunch-client-header is only meant + for quick tests. If you have to block several different + headers, or only want to modify parts of them, you should + use a client-header + filter. +
++ Warning + | +
+ + Don't block any header without understanding the + consequences. + + |
+
+
+
++# Block the non-existent "Privacy-Violation:" client header +{ +crunch-client-header{Privacy-Violation:} } +/ -
+
+ + 8.5.8. crunch-if-none-match ++
+
+
+
+ + 8.5.9. + crunch-incoming-cookies ++
+
+
+
+ + 8.5.10. crunch-server-header ++
+
+
+
+ + 8.5.11. + crunch-outgoing-cookies ++
+
+
+
+ + 8.5.12. deanimate-gifs ++
+
+
+
+ + 8.5.13. + downgrade-http-version ++
+
+
+ + 8.5.14. external-filter ++
+
|
+
+ Fool some click-tracking scripts and speed up indirect + links. +
++ Detects redirection URLs and redirects the browser without + contacting the redirection server first. +
++ Parameterized. +
++ "simple-check" to just + search for the string "http://" to detect redirection URLs. +
++ "check-decoded-url" to + decode URLs (if necessary) before searching for + redirection URLs. +
++ Many sites, like yahoo.com, don't just link to other sites. + Instead, they will link to some script on their own + servers, giving the destination as a parameter, which will + then redirect you to the final target. URLs resulting from + this scheme typically look like: "http://www.example.org/click-tracker.cgi?target=http%3a//www.example.net/". +
++ Sometimes, there are even multiple consecutive redirects + encoded in the URL. These redirections via scripts make + your web browsing more traceable, since the server from + which you follow such a link can see where you go to. Apart + from that, valuable bandwidth and time is wasted, while + your browser asks the server for one redirect after the + other. Plus, it feeds the advertisers. +
++ This feature is currently not very smart and is scheduled + for improvement. If it is enabled by default, you will have + to create some exceptions to this action. It can lead to + failures in several ways: +
++ Not every URLs with other URLs as parameters is evil. Some + sites offer a real service that requires this information + to work. For example a validation service needs to know, + which document to validate. fast-redirects assumes that every URL + parameter that looks like another URL is a redirection + target, and will always redirect to the last one. Most of + the time the assumption is correct, but if it isn't, the + user gets redirected anyway. +
++ Another failure occurs if the URL contains other parameters + after the URL parameter. The URL: "http://www.example.org/?redirect=http%3a//www.example.net/&foo=bar". + contains the redirection URL "http://www.example.net/", followed by + another parameter. fast-redirects + doesn't know that and will cause a redirect to "http://www.example.net/&foo=bar". + Depending on the target server configuration, the parameter + will be silently ignored or lead to a "page not found" error. You can prevent this + problem by first using the redirect action to + remove the last part of the URL, but it requires a little + effort. +
++ To detect a redirection URL, fast-redirects only looks for the string + "http://", either in plain text + (invalid but often used) or encoded as "http%3a//". Some sites use their own URL + encoding scheme, encrypt the address of the target server + or replace it with a database id. In theses cases fast-redirects is fooled and the + request reaches the redirection server where it probably + gets logged. +
++
+
++ { +fast-redirects{simple-check} } + one.example.com -
+ + 8.5.16. filter ++
+
|
+
- -+filter{fun} # Text replacements for subversive browsing fun! + +
|
+
- -+filter{crude-parental} # Crude parental filtering. Note that this filter doesn't work reliably. + +
|
+
- -+filter{ie-exploits} # Disable some known Internet Explorer bug exploits. + +
|
+
- -+filter{site-specifics} # Cure for site-specific problems. Don't apply generally! + +
|
+
- -+filter{no-ping} # Removes non-standard ping attributes in <a> and <area> tags. + +
|
+
- -+filter{google} # CSS-based block for Google text ads. Also removes a width limitation and the toolbar advertisement. + +
|
+
- -+filter{yahoo} # CSS-based block for Yahoo text ads. Also removes a width limitation. + +
|
+
- -+filter{msn} # CSS-based block for MSN text ads. Also removes tracking URLs and a width limitation. + +
|
+
- -+filter{blogspot} # Cleans up some Blogspot blogs. Read the fine print before using this. + +
- 8.5.16. force-text-mode- -
-
|
+
Warning | +
+++filter{webbugs} # Squish WebBugs (1x1 invisible GIFs used for user tracking). ++ |
- Think twice before activating this action. Filtering - binary data with regular expressions can cause file - damage. + |
+++filter{tiny-textforms} # Extend those tiny textareas up to 40x80 and kill the hard wrap. + |
- -+force-text-mode + +
- 8.5.17. forward-override- -
-
|
+
+++filter{frameset-borders} # Give frames a border and make them resizable. ++ |
+
Multi-value.
-
+++filter{iframes} # Removes all detected iframes. Should only be enabled for individual sites. ++ |
+
+++filter{demoronizer} # Fix MS's non-standard use of standard charsets. ++ |
+
"forward ." to use a direct - connection without any additional proxies.
-
+++filter{shockwave-flash} # Kill embedded Shockwave Flash objects. ++ |
+
"forward 127.0.0.1:8123" to - use the HTTP proxy listening at 127.0.0.1 port 8123.
-
+++filter{quicktime-kioskmode} # Make Quicktime movies saveable. ++ |
+
"forward-socks4a 127.0.0.1:9050 - ." to use the socks4a proxy listening at 127.0.0.1 - port 9050. Replace "forward-socks4a" with "forward-socks4" to use a socks4 connection - (with local DNS resolution) instead, use "forward-socks5" for socks5 connections - (with remote DNS resolution).
-
+++filter{fun} # Text replacements for subversive browsing fun! ++ |
+
"forward-socks4a 127.0.0.1:9050 - proxy.example.org:8000" to use the socks4a proxy - listening at 127.0.0.1 port 9050 to reach the HTTP proxy - listening at proxy.example.org port 8000. Replace - "forward-socks4a" with - "forward-socks4" to use a socks4 - connection (with local DNS resolution) instead, use - "forward-socks5" for socks5 - connections (with remote DNS resolution).
-
+++filter{crude-parental} # Crude parental filtering. Note that this filter doesn't work reliably. ++ |
+
+++filter{ie-exploits} # Disable some known Internet Explorer bug exploits. ++ |
+
This action takes parameters similar to the forward directives in the - configuration file, but without the URL pattern. It can be used - as replacement, but normally it's only used in cases where - matching based on the request URL isn't sufficient.
+ +
+++filter{site-specifics} # Cure for site-specific problems. Don't apply generally! ++ |
+
Warning | +
+++filter{no-ping} # Removes non-standard ping attributes in <a> and <area> tags. ++ |
- Please read the description for the forward directives before - using this action. Forwarding to the wrong people will - reduce your privacy and increase the chances of - man-in-the-middle attacks. + |
+++filter{google} # CSS-based block for Google text ads. Also removes a width limitation and the toolbar advertisement. ++ |
+
If the ports are missing or invalid, default values - will be used. This might change in the future and you - shouldn't rely on it. Otherwise incorrect syntax causes - Privoxy to exit.
+ +
+++filter{yahoo} # CSS-based block for Yahoo text ads. Also removes a width limitation. ++ |
+
Use the show-url-info CGI page to verify that your - forward settings do what you thought the do.
+ +
+++filter{msn} # CSS-based block for MSN text ads. Also removes tracking URLs and a width limitation. + |
+++filter{blogspot} # Cleans up some Blogspot blogs. Read the fine print before using this. ++ |
+
+ Force Privoxy to treat a + document as if it was in some kind of text format. +
++ Declares a document as text, even if the "Content-Type:" isn't detected as such. +
++ Boolean. +
++ N/A +
++ As explained above, Privoxy tries to only filter files + that are in some kind of text format. The same restrictions + apply to content-type-overwrite. + force-text-mode declares a + document as text, without looking at the "Content-Type:" first. +
++ Warning + | +
+ + Think twice before activating this action. + Filtering binary data with regular expressions can + cause file damage. + + |
+
+
+
+++force-text-mode -
+
-
- + 8.5.18. forward-override ++
+
-
-
-
- 8.5.18. handle-as-empty-document- -
-
-
-
-
- 8.5.19. handle-as-image- -
-
-
- 8.5.20. hide-accept-language- -
-
|
+
+ Mark URLs that should be replaced by empty documents if they get + blocked +
++ This action alone doesn't do anything noticeable. It just + marks URLs. If the block action also + applies, the presence or absence of this mark + decides whether an HTML "BLOCKED" page, or an empty document will be + sent to the client as a substitute for the blocked content. + The empty document isn't literally empty, + but actually contains a single space. +
++ Boolean. +
++ N/A +
++ Some browsers complain about syntax errors if JavaScript + documents are blocked with Privoxy's default HTML page; this + option can be used to silence them. And of course this + action can also be used to eliminate the Privoxy BLOCKED message in frames. +
++ The content type for the empty document can be specified + with content-type-overwrite{}, + but usually this isn't necessary. +
++
+
++# Block all documents on example.org that end with ".js", +# but send an empty document instead of the usual HTML message. +{+block{Blocked JavaScript} +handle-as-empty-document} +example.org/.*\.js$ - |
+
+ Mark URLs as belonging to images (so they'll be replaced by + images if they + do get blocked, rather than HTML pages) +
++ This action alone doesn't do anything noticeable. It just + marks URLs as images. If the block action also + applies, the presence or absence of this mark + decides whether an HTML "blocked" page, or a replacement image (as + determined by the set-image-blocker + action) will be sent to the client as a substitute for the + blocked content. +
++ Boolean. +
++ N/A +
++ The below generic example section is actually part of default.action. It marks all URLs + with well-known image file name extensions as images and + should be left intact. +
++ Users will probably only want to use the handle-as-image + action in conjunction with block, to block sources + of banners, whose URLs don't reflect the file type, like in + the second example section. +
++ Note that you cannot treat HTML pages as images in most + cases. For instance, (in-line) ad frames require an HTML + page to be sent, or they won't display properly. Forcing + handle-as-image in this situation + will not replace the ad frame with an image, but lead to + error messages. +
++
+
++# Generic image extensions: +# +{+handle-as-image} +/.*\.(gif|jpg|jpeg|png|bmp|ico)$ -
+ + 8.5.21. hide-accept-language ++
+
|
+
Prevent download menus for content you prefer to view inside - the browser.
-Deletes or replaces the "Content-Disposition:" HTTP header set by some - servers.
-Parameterized.
-Keyword: "block", or any user - defined value.
-Some servers set the "Content-Disposition:" HTTP header for documents - they assume you want to save locally before viewing them. The - "Content-Disposition:" header - contains the file name the browser is supposed to use by - default.
- -In most browsers that understand this header, it makes it - impossible to just - view the document, without downloading it first, - even if it's just a simple text file or an image.
- -Removing the "Content-Disposition:" header helps to prevent - this annoyance, but some browsers additionally check the - "Content-Type:" header, before they - decide if they can display a document without saving it first. - In these cases, you have to change this header as well, before - the browser stops displaying download menus.
- -It is also possible to change the server's file name - suggestion to another one, but in most cases it isn't worth the - time to set it up.
- -This action will probably be removed in the future, use - server-header filters instead.
-
- + |
+
Prevent yet another way to track the user's steps between - sessions.
-Deletes the "If-Modified-Since:" - HTTP client header or modifies its value.
-Parameterized.
-Keyword: "block", or a user - defined value that specifies a range of hours.
-Removing this header is useful for filter testing, where you - want to force a real reload instead of getting status code - "304", which would cause the browser - to use a cached copy of the page.
- -Instead of removing the header, hide-if-modified-since can also add or subtract - a random amount of time to/from the header's value. You specify - a range of minutes where the random factor should be chosen - from and Privoxy does the - rest. A negative value means subtracting, a positive value - adding.
- -Randomizing the value of the "If-Modified-Since:" makes it less likely that - the server can use the time as a cookie replacement, but you - will run into caching problems if the random range is too - high.
- -It is a good idea to only use a small negative value and let - overwrite-last-modified - handle the greater changes.
- -It is also recommended to use this action together with - crunch-if-none-match, - otherwise it's more or less pointless.
-
- + |
+
Keep your (old and ill) browser from telling web servers - your email address
-Deletes any existing "From:" HTTP - header, or replaces it with the specified string.
-Parameterized.
-Keyword: "block", or any user - defined value.
-The keyword "block" will - completely remove the header (not to be confused with the - block action).
- -Alternately, you can specify any value you prefer to be sent - to the web server. If you do, it is a matter of fairness not to - use any address that is actually used by a real person.
- -This action is rarely needed, as modern web browsers don't - send "From:" headers anymore.
-
- + |
+
Conceal which link you followed to get to a particular - site
-Deletes the "Referer:" (sic) HTTP - header from the client request, or replaces it with a forged - one.
-Parameterized.
-"conditional-block" to delete - the header completely if the host has changed.
-"conditional-forge" to forge - the header if the host has changed.
-"block" to delete the header - unconditionally.
-"forge" to pretend to be - coming from the homepage of the server we are talking - to.
-Any other string to set a user defined referrer.
-conditional-block is the only - parameter, that isn't easily detected in the server's log file. - If it blocks the referrer, the request will look like the - visitor used a bookmark or typed in the address directly.
- -Leaving the referrer unmodified for requests on the same - host allows the server owner to see the visitor's "click path", but in most cases she could also - get that information by comparing other parts of the log file: - for example the User-Agent if it isn't a very common one, or - the user's IP address if it doesn't change between different - requests.
- -Always blocking the referrer, or using a custom one, can - lead to failures on servers that check the referrer before they - answer any requests, in an attempt to prevent their content - from being embedded or linked to elsewhere.
- -Both conditional-block and - forge will work with referrer checks, - as long as content and valid referring page are on the same - host. Most of the time that's the case.
- -hide-referer is an alternate - spelling of hide-referrer and the two - can be can be freely substituted with each other. ("referrer" is the correct English spelling, - however the HTTP specification has a bug - it requires it to be - spelled as "referer".)
-
- + |