From c2a0d29a53b2852c6945df960917262ab6033c11 Mon Sep 17 00:00:00 2001 From: hal9 Date: Tue, 21 May 2002 04:55:33 +0000 Subject: [PATCH] Sync with html and source. --- doc/text/user-manual.txt | 1520 ++++++++++++++++++++++++++------------ 1 file changed, 1028 insertions(+), 492 deletions(-) diff --git a/doc/text/user-manual.txt b/doc/text/user-manual.txt index 792d4c6b..e3566a08 100644 --- a/doc/text/user-manual.txt +++ b/doc/text/user-manual.txt @@ -2,7 +2,7 @@ Privoxy User Manual Copyright © 2001, 2002 by Privoxy Developers -$Id: user-manual.sgml,v 1.111 2002/05/14 23:01:36 oes Exp $ +$Id: user-manual.sgml,v 1.117 2002/05/17 13:56:16 oes Exp $ The user manual gives users information on how to install, configure and use Privoxy. @@ -42,6 +42,9 @@ Table of Contents 3. Note to Upgraders 4. Quickstart to Using Privoxy + + 4.1. Quickstart to Ad Blocking + 5. Starting Privoxy 5.1. RedHat, Conectiva and Debian @@ -134,25 +137,29 @@ Table of Contents 8.5.21. Summary 8.6. Aliases - 8.7. Sample Actions Files - + 8.7. Actions Files Tutorial + + 8.7.1. default.action + 8.7.2. user.action + 9. The Filter File - 9.1. The +filter Action + 9.1. Filter File Tutorial 10. Templates 11. Contacting the Developers, Bug Reporting and Feature Requests 11.1. Get Support - 11.2. Report bugs - 11.3. Request new features - 11.4. Report ads or other filter problems + 11.2. Report Bugs + 11.3. Request New Features + 11.4. Report Ads or Other Actions-Related Problems 11.5. Other 12. Privoxy Copyright, License and History 12.1. License 12.2. History + 12.3. Authors 13. See Also 14. Appendix @@ -411,8 +418,8 @@ A quick list of things to be aware of before upgrading: config.privoxy.org/ (Shortcut: http://p.p/). Many aspects of configuration can be done here, including temporarily disabling Privoxy. - * The primary configuration file for cookie management, ad and banner - blocking, and many other aspects of Privoxy configuration is in the actions + * The primary configuration files for cookie management, ad and banner + blocking, and many other aspects of Privoxy configuration are the actions files. It is strongly recommended to become familiar with the new actions concept below, before modifying these files. Locally defined rules should go into user.action. @@ -423,33 +430,165 @@ A quick list of things to be aware of before upgrading: 4. Quickstart to Using Privoxy - * If upgrading, please back up any configuration files. See the Note to - Upgraders Section. + * If upgrading, from versions before 2.9.16, please back up any configuration + files. See the Note to Upgraders Section. - * Install Privoxy. See the Installation Section for platform specific + * Install Privoxy. See the Installation Section below for platform specific information. - * Start Privoxy, if the installation program has not done this already. See - the section Starting Privoxy. + * Advanced users and those who want to offer Privoxy service to more than + just their local machine should check the main config file, especially the + security-relevant options. These are off by default. + + * Start Privoxy, if the installation program has not done this already (may + vary according to platform). See the section Starting Privoxy. * Set your browser to use Privoxy as HTTP and HTTPS proxy by setting the proxy configuration for address of 127.0.0.1 and port 8118. (Junkbuster and earlier versions of Privoxy used port 8000.) See the section Starting - Privoxy. + Privoxy below for more details on this. - * Flush your browser's caches, to remove any cached ad images. + * Flush your browser's disk and memory caches, to remove any cached ad + images. - * Enjoy surfing with enhanced comfort and privacy. You may want to customize - the user.action file to personalize your new browsing experience. See the - Configuration section for more configuration options, and how to further + * A default installation should provide a reasonable starting point for most. + There will undoubtedly be occasions where you will want to adjust the + configuration, but that can be dealt with as the need arises. Little to no + initial configuration is required in most cases. + + See the Configuration section for more configuration options, and how to customize your installation. - * If you experience problems with sites that "misbehave", see the Anatomy of - an Action section in the Appendix. + * If you experience ads that slipped through, innocent images that are + blocked, or otherwise feel the need to fine-tune Privoxy's behaviour, take + a look at the actions files. As a quick start, you might find the richly + commented examples helpful. You can also view and edit the actions files + through the web-based user interface. The Appendix "Anatomy of an Action" + has hints how to debug actions that "misbehave". * Please see the section Contacting the Developers on how to report bugs or problems with websites or to get help. + * Now enjoy surfing with enhanced comfort and privacy! + +------------------------------------------------------------------------------- + +4.1. Quickstart to Ad Blocking + +Ad blocking is but one of Privoxy's array of features. Many of these features +are for the technically minded advanced user. But, ad and banner blocking is +surely common ground for everybody. + +This section will provide a quick summary of ad blocking so you can get up to +speed quickly without having to read the more extensive information provided +below, though this is highly recommeneded. + +First a bit of a warning ... blocking ads is much like blocking SPAM: the more +aggressive you are about it, the more likely you are to block things that were +not intended. So there is a trade off here. If you want extreme ad free +browsing, be prepared to deal with more "problem" sites, and to spend more time +adjusting the configuration to solve these unintended consequences. In short, +there is not an easy way to eliminate all ads. Either take the easy way and +settle for most ads blocked with the default configuration, or jump in and +tweak it for your personal surfing habits and preferences. + +Secondly, a brief explanation of Privoxy's "actions". "Actions" in this +context, are the directives we use to tell Privoxy to perform some task +relating to HTTP transactions (i.e. web browsing). We tell Privoxy to take some +"action". Each action has a unique name and function. While there are many +potential actions in Privoxy's arsenal, only a few are used for ad blocking. +Actions, and action configuration files, are explained in depth below. + +Actions are specified in Privoxy's configuration, followed by one or more URLs +to which the action should apply. URLs can actually be URL type patterns that +use wildcards so they can apply potentially to a range of similar URLs. + +When you connect to a website, the full path of the URL will either match one +of the "actions" as defined in Privoxy's configuration, or not. If so, then +Privoxy will perform the action accordingly. If not, then nothing special +happens. Futhermore, web pages may contain embedded, secondary URLs that your +web browser will display as it parses the original page's HTML content. An ad +image for instance, is just a URL embedded in the page somewhere. The image +itself may be on the same server, or a server somewhere else on the Internet. +Complex web pages will have many such embedded URLs. + +The actions we need to know about for ad blocking are: block, handle-as-image, +and set-image-blocker: + + * block - this action stops any contact between your browser and any URL + patterns that match this action's configuration. It can be used for + blocking ads, but also anything that is determined to be unwanted. By + itself, it simply stops any communication with the remote server. If this + is the only action that matches for this particular URL, then Privoxy will + display its own BLOCKED page to let you now what has happened. + + * handle-as-image - forces Privoxy to treat this URL as if it were an image. + Privoxy knows about common image types (e.g. GIF), but there are many + situations where this does not apply. So we'll force it. This is + particularly important for ad blocking, since once we can treat it as an + image, we can make more intelligent decisisions on how to handle it. There + are some limitations to this though. For instance, you can't just force an + image substituion for an entire HTML page in most situations. + + * set-image-blocker - tells Privoxy what to display in place of an ad image + that has hit a block rule. For this to come into play, the URL must match a + block action somewhere in the configuration. And, it must also either be of + a known image type, or match an handle-as-image action. + + The configuration options on what to display instead of the ad are: + + pattern - a checkboard pattern, so that an ad replacement is obvious. + This is the default. + + blank - A very small empty GIF image is displayed. This is the so-called + "invisible" configuration option. + + http:// - A redirect to any URL of the user's choosing (advanced + usage). + +The quickest way to adjust any of these settings is with your browser through +the special Privoxy editor at http://config.privoxy.org/show-status (shortcut: +http://p.p/show-status). This is an internal page, and does not require +Internet access. Select the appropriate "actions" file, and click "Edit". It is +best to put personal or local preferences in user.action since this is not +meant to be overwritten during upgrades, and will over-ride the settings in +other files. Here you can insert new "actions", and URLs for ad blocking or +other purposes, and make other adjustments to the configuration. Privoxy will +detect these changes automatically. + +A quick and simple step by step example: + + * Right click on the ad image to be blocked, then select "Copy Link Location" + from the pop-up menu. + + * Set your browser to http://config.privoxy.org/show-status + + * Find user.action in the top section, and click on "Edit": + + Figure 1. Actions Files in Use + + Screenshot of Files in Use + + * You should have an Actions section labeled +block. If not, click the "Edit" + button just under the word "Actions". This will bring up a list of all + actions. Find block near the top, and click in the "Enabled" column, then + "Submit" just below the list. + + * Now, in the +block actions section, click the "Add" button, and paste the + URL the browser got from "Copy Link Location". Remove the http:// at the + beginning of the URL. Then, click "Submit". + + * Now go back to the original page, and press SHIFT-Reload (or flush all + browser caches). The image should be gone now. + +This is a very crude and simple example. There might be good reasons to use a +wildcard pattern match to include potentially similar images from the same +site. For a more extensive explanation of "patterns", and the entire actions +concept, see the Actions section. + +For advanced users who want to hand edit their config files, you might want to +now go to the Actions Files Tutorial. + ------------------------------------------------------------------------------- 5. Starting Privoxy @@ -763,7 +902,7 @@ Type of value: File name, relative to confdir, without the .action suffix -Default value: +Default values: standard # Internal purposes, no editing recommended @@ -811,12 +950,19 @@ Effect if unset: Notes: - The "default.filter" file contains content modification rules that use - "regular expressions". These rules permit powerful changes on the content - of Web pages, e.g., you could disable your favorite JavaScript annoyances, + The filter file contains content modification rules that use regular + expressions. These rules permit powerful changes on the content of Web + pages, e.g., you could disable your favorite JavaScript annoyances, re-write the actual displayed text, or just have some fun replacing "Microsoft" with "MicroSuck" wherever it appears on a Web page. + The +filter{name} actions rely on the relevant filter (name) to be defined + in the filter file! + + A pre-defined filter file called default.filter that contains a bunch of + handy filters for common problems is included in the distribution. See the + section on the filter action for a list. + ------------------------------------------------------------------------------- 7.1.5. logfile @@ -1648,12 +1794,6 @@ content and transactions are handled, and on which sites (or even parts thereof). There are three such files included with Privoxy (as of version 2.9.15), with differing purposes: - * standard.action - is used by the web based editor, to set various - pre-defined sets of rules for the default actions section in - default.action. These have increasing levels of aggressiveness and have no - influence on your browsing unless you select them explicitly in the editor. - It is not recommend to edit this file. - * default.action - is the primary action file that sets the initial values for all actions. It is intended to provide a base level of functionality for Privoxy's array of features. So it is a set of broad rules that should @@ -1665,6 +1805,12 @@ thereof). There are three such files included with Privoxy (as of version special handling, this kind of thing should go here. This file will not be upgraded. + * standard.action - is used by the web based editor, to set various + pre-defined sets of rules for the default actions section in + default.action. These have increasing levels of aggressiveness and have no + influence on your browsing unless you select them explicitly in the editor. + It is not recommend to edit this file. + The list of actions files to be used are defined in the main configuration file, and are processed in the order they are defined. The content of these can all be viewed and edited from http://config.privoxy.org/show-status. @@ -2507,7 +2653,7 @@ Notes: use if you don't want any filtering at all. Note that it doesn't make sense to combine it with any filter action, since as soon as one filter applies, the whole document needs to be buffered anyway, which destroys the - advantage of the kill-popups action over it's filter equivalent. + advantage of the kill-popups action over its filter equivalent. Killing all pop-ups is a dangerous business. Many shops and banks rely on pop-ups to display forms, shopping carts etc, and killing only the unwanted @@ -2844,18 +2990,20 @@ Now let's define some aliases... {{alias}} # These aliases just save typing later: + # (Note that some already use other aliases!) # +crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies - +imageblock = +block +handle-as-image + block-as-image = +block +handle-as-image + mercy-for-cookies = -crunch-all-cookies -session-cookies-only # These aliases define combinations of actions # that are useful for certain types of sites: # fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referer -kill-popups - shop = -crunch-all-cookies -fast-redirects + shop = -crunch-all-cookies -filter{popups} -kill-popups - # Aliases defined from other aliases, for really lazy people ;-) + # Short names for other aliases, for really lazy people ;-) # c0 = +crunch-all-cookies c1 = -crunch-all-cookies @@ -2891,428 +3039,696 @@ require some actions to be disabled in order to function properly. ------------------------------------------------------------------------------- -8.7. Sample Actions Files - -Remember that the meaning of each action is reversed by preceding the action -with a "-", in place of the "+". Also, that some actions are turned on in the -default section of the actions file, and require little to no additional -configuration. These are just "on". - -But, other actions that are turned on in the default section do typically -require exceptions to be listed in the latter sections of one of our actions -file. For instance, by default no URLs are "blocked" (i.e. in the default -definitions of default.action). We need exceptions to this in order to enable -ad blocking in the lower sections. But we need to be very selective about what -we do block. Thus, the default is "off" for blocking. - -Below is a liberally commented sample default.action file to demonstrate how -all the pieces come together. And to show how exceptions to the default -policies can be handled. This is followed by a brief user.action with similar -examples. - -# Sample default.action file - -# Settings -- Don't change! For internal Privoxy use ONLY. -{{settings}} -for-privoxy-version=3.0 - - -########################################################################## -# Aliases must be defined *before* they are used. These are -# easier to remember, and can combine several actions into one. Once -# defined they can be used just like any built-in action -- but within -# this file only! Aliases do not require a + or - sign. -########################################################################## -{{alias}} - -# Some useful aliases. -# Alias to turn off cookie handling, ie allow all cookies unmolested. -# -mercy-for-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies \ - -session-cookies-only - -# Alias to both block and treat as if an image for ad blocking -# purposes. -# -+block-as-image = +block +handle-as-image - -# Shops should be allowed to set persistent cookies -# -shop = -filter mercy-for-cookies - -# Fragile sites should receive minimum interference: -# -fragile = -block -deanimate-gifs -fast-redirects -filter -hide-referer \ - mercy-for-cookies -kill-popups - -########################################################################## -# Matching starts here. Remember that at this time, all actions are -# disabled, so we need to explicitly enable the ones we want. -# -# We begin with "default" action settings, i.e. we define a set of actions -# for a pattern ("/") that matches all URLs. This default set will be -# applied to all requests as a start, and can be partly or wholly overridden -# by later matches further down this file, or in user.action. -# -# We will show all potential actions here whether they are enabled -# or not. We could omit any disabled action if we wanted, since all -# actions are 'off' by default anyway. Shown for completeness only. -# Actions are enabled if preceded by a '+', otherwise they are disabled -# (unless an alias has been defined without this). -########################################################################## - { \ - -add-header \ - -block \ - -deanimate-gifs \ - -downgrade-http-version \ - +fast-redirects \ - +filter{html-annoyances} \ - +filter{js-annoyances} \ - -filter{content-cookies} \ - -filter{popups} \ - +filter{webbugs} \ - -filter{refresh-tags} \ - -filter{fun} \ - +filter{nimda} \ - +filter{banners-by-size} \ - -filter{shockwave-flash} \ - -filter{crude-parental} \ - +hide-forwarded-for-headers \ - +hide-from-header{block} \ - -hide-referrer \ - -hide-user-agent \ - -handle-as-image \ - +set-image-blocker{pattern} \ - -limit-connect \ - +prevent-compression \ - -session-cookies-only \ - -crunch-outgoing-cookies \ - -crunch-incoming-cookies \ - -kill-popups \ - -send-vanilla-wafer \ - -send-wafer \ - } - / # forward slash will match *all* potential URL patterns. - -########################################################################## -# Default behavior is now set. Now we will define some exceptions to our -# default action policies. -########################################################################## - -# These sites are very complex and require very minimal interference. -# We'll disable most actions with our 'fragile' alias: - { fragile } - .office.microsoft.com # surprise, surprise! - .windowsupdate.microsoft.com - - -# Shopping sites - not as fragile but require some special -# handling. We still want to block ads, and we will allow -# persistent cookies via the 'shop' alias: - { shop } - .quietpc.com - .worldpay.com # for quietpc.com - .jungle.com - .scan.co.uk - - -# These sites require pop-ups too :( We'll combine our 'shop' -# alias with two other actions into one rule to allow all popups. - { shop -kill-popups -filter{popups} } - .dabs.com - .overclockers.co.uk - - -# The 'Fast-redirects' action breaks some sites. Disable this action -# for these known sensitive sites: - { -fast-redirects } - login.yahoo.com - edit.europe.yahoo.com - .google.com - .altavista.com/.*(like|url|link):http - .altavista.com/trans.*urltext=http - .nytimes.com - - -# Define which file types will be treated as images. Important -# for ad blocking. - { +handle-as-image } - /.*\.(gif|jpe?g|png|bmp|ico) - - -# Now lets list some domains that are known ad generators. And -# our alias that we use here will block these as well as force -# them to be treated as images. This combination of actions is -# important for ad blocking. What the browser will show instead is -# determined by the setting of "+set-image-blocker" - { +imageblock } - ar.atwola.com - .ad.doubleclick.net - .a.yimg.com/(?:(?!/i/).)*$ - .a[0-9].yimg.com/(?:(?!/i/).)*$ - bs*.gsanet.com - bs*.einets.com - .qkimg.net - ad.*.doubleclick.net - - -# These will just simply be blocked. They will generate the BLOCKED -# banner page, if matched. Heavy use of wildcards and regular -# expressions in this example. Enable block action: - { +block } - ad*. - .*ads. - banner?. - count*. - /.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) - /(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ - .hitbox.com - - -# The above block section will probably inadvertently catch some -# sites we DO NOT want blocked via the wildcards and regular expressions. -# Now let's set exceptions to the exceptions so the good guys get better -# treatment. Disable block action: - { -block } - advogato.org - adsl. - ad[ud]*. - advice. -# Let's just trust all .edu top level domains. - .edu - www.ugu.com/sui/ugu/adv -# We'll need to access to path names containing 'download' - .*downloads. - /downloads/ -# 'adv' is for globalintersec and means advanced, not advertisement - www.globalintersec.com/adv - - -# Don't filter *anything* from our friends at sourceforge. -# Notice we don't have to name the individual filter -# identifiers -- we just turn them all off in one fell swoop. -# Disable all filters for this one site: - { -filter } - .sourceforge.net - - -So far we are painting with a broad brush by setting general policies. The -above would be a reasonable starting point for many situations. Now, we want to -be more specific and have customized rules that are more suitable to our -personal habits and preferences. These would be for narrowly defined situations -like your ISP or your bank, and should be placed in user.action, which is -parsed after all other actions files and should not be clobbered by upgrades. -So any settings here, will have the last word and over-ride any previously -defined actions. - -Now a few examples of some things that one might do with a user.action file. +8.7. Actions Files Tutorial -# Sample user.action file. +The above chapters have shown which actions files there are and how they are +organized, how actions are specified and applied to URLs, how patterns work, +and how to define and use aliases. Now, let's look at an example default.action +and user.action file and see how all these pieces come together: -# Any aliases you want to use need to be re-defined here. -# Alias to turn off cookie handling, ie allow all cookies unmolested. - -crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies \ - -session-cookies-only +------------------------------------------------------------------------------- -# Fragile sites should have the minimum changes: - fragile = -block -deanimate-gifs -fast-redirects -filter -hide-referer \ - -crunch-all-cookies -kill-popups +8.7.1. default.action -# Allow persistent cookies for a few regular sites that we -# trust via our above alias. These will be saved from one browser session -# to the next. We are explicitly turning off any and all cookie handling, -# even though the crunch-*-cookies settings were disabled in our above -# default.action anyway. So cookies from these domains will come through -# unmolested. - { -crunch-all-cookies } - .sun.com - .yahoo.com - .msdn.microsoft.com - .redhat.com +Every config file should start with a short comment stating its purpose: +# Sample default.action file -# My ISP uses obnoxious self promoting images on many pages. -# Nuke them :) Note that "+handle-as-image" need not be specified, -# since all URLs ending in .gif will be tagged as images by the -# general rules in default.action anyway. - { +block } - www.my-isp-example.com/logo[0-9].gif +Then, since this is the default.action file, the first section is a special +section for internal use that you needn't change or worry about: +########################################################################## +# Settings -- Don't change! For internal Privoxy use ONLY. +########################################################################## + +{{settings}} +for-privoxy-version=3.0 + +After that comes the (optional) alias section. We'll use the example section +from the above chapter on aliases, that also explains why and how aliases are +used: + +########################################################################## +# Aliases +########################################################################## +{{alias}} + +# These aliases just save typing later: +# (Note that some already use other aliases!) +# ++crunch-all-cookies = +crunch-incoming-cookies +crunch-outgoing-cookies +-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies +block-as-image = +block +handle-as-image +mercy-for-cookies = -crunch-all-cookies -session-cookies-only + +# These aliases define combinations of actions +# that are useful for certain types of sites: +# +fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referer -kill-popups +shop = mercy-for-cookies -filter{popups} -kill-popups + +Now come the regular sections, i.e. sets of actions, accompanied by URL +patterns to which they apply. Remember all actions are disabled when matching +starts, so we have to explicitly enable the ones we want. + +The first regular section is probably the most important. It has only one +pattern, "/", but this pattern matches all URLs.. Therefore, the set of actions +used in this "default" section will be applied to all requests as a start. It +can be partly or wholly overridden by later matches further down this file, or +in user.action, but it will still be largely responsible for your overall +browsing experience. + +Again, at the start of matching, all actions are disabled, so there is no real +need to disable any actions here, but we will do that nonetheless, to have a +complete listing for your reference. (Remember: A "+" preceding the action name +enables the action, a "-" disables!). Also note how this long line has been +made more readable by splitting it into multiple lines with line continuation. + +########################################################################## +# "Defaults" section: +########################################################################## + { \ + -add-header \ + -block \ + -crunch-incoming-cookies \ + -crunch-outgoing-cookies \ + +deanimate-gifs \ + -downgrade-http-version \ + +fast-redirects \ + +filter{html-annoyances} \ + +filter{js-annoyances} \ + -filter{content-cookies} \ + +filter{popups} \ + +filter{webbugs} \ + -filter{refresh-tags} \ + -filter{fun} \ + +filter{nimda} \ + +filter{banners-by-size} \ + -filter{shockwave-flash} \ + -filter{crude-parental} \ + -handle-as-image \ + +hide-forwarded-for-headers \ + +hide-from-header{block} \ + +hide-referrer{forge} \ + -hide-user-agent \ + -kill-popups \ + -limit-connect \ + +prevent-compression \ + -send-vanilla-wafer \ + -send-wafer \ + +session-cookies-only \ + +set-image-blocker{pattern} \ + } + / # forward slash will match *all* potential URL patterns. + +The default behavior is now set. Note that some actions, like not hiding the +user agent, are part of a "general policy" that applies universally and won't +get any exceptions defined later. Other choices, like not blocking (which is +understandably the default!) need exceptions, i.e. we need to specify +explicitly what we want to block in later sections. We will also want to make +exceptions from our general pop-up-killing, and use our defined aliases for +that. + +The first of our specialized sections is concerned with "fragile" sites, i.e. +sites that require minimum interference, because they are either very complex +or very keen on tracking you (and have mechanisms in place that make them +unusable for people who avoid being tracked). We will simply use our +pre-defined fragile alias instead of stating the list of actions explicitly: + +########################################################################## +# Exceptions for sites that'll break under the default action set: +########################################################################## + +# "Fragile" Use a minimum set of actions for these sites (see alias above): +# +{ fragile } +.office.microsoft.com # surprise, surprise! +.windowsupdate.microsoft.com + +Shopping sites are not as fragile, but they typically require cookies to log +in, and pop-up windows for shopping carts or item details. Again, we'll use a +pre-defined alias: + +# Shopping sites: +# +{ shop } +.quietpc.com +.worldpay.com # for quietpc.com +.jungle.com +.scan.co.uk + +Then, there are sites which rely on pop-up windows (yuck!) to work. Since we +made pop-up-killing our default above, we need to make exceptions now. Mozilla +users, who can turn on smart handling of unwanted pop-ups in their browsers, +can safely choose -filter{popups} (and -kill-popups) above and hence don't need +this section. Anyway, disabling an already disabled action doesn't hurt, so +we'll define our exceptions regardless of what was chosen in the defaults +section: + +# These sites require pop-ups too :( +# +{ -kill-popups -filter{popups} } +.dabs.com +.overclockers.co.uk +.deutsche-bank-24.de + +The fast-redirects action, which we enabled per default above, breaks some +sites. So disable it for popular sites where we know it misbehaves: + +{ -fast-redirects } +login.yahoo.com +edit.*.yahoo.com +.google.com +.altavista.com/.*(like|url|link):http +.altavista.com/trans.*urltext=http +.nytimes.com + +It is important that Privoxy knows which URLs belong to images, so that if they +are to be blocked, a substitute image can be sent, rather than an HTML page. +Contacting the remote site to find out is not an option, since it would destroy +the loading time advantage of banner blocking, and it would feed the +advertisers (in terms of money and information). We can mark any URL as an +image with the handle-as-image action, and marking all URLs that end in a known +image file extension is a good start: + +########################################################################## +# Images: +########################################################################## + +# Define which file types will be treated as images, in case they get +# blocked further down this file: +# +{ +handle-as-image } +/.*\.(gif|jpe?g|png|bmp|ico)$ + +And then there are known banner sources. They often use scripts to generate the +banners, so it won't be visible from the URL that the request is for an image. +Hence we block them and mark them as images in one go, with the help of our +block-as-image alias defined above. (We could of course just as well use +block ++handle-as-image here.) Remember that the type of the replacement image is +chosen by the set-image-blocker action. Since all URLs have matched the default +section with its +set-image-blocker{pattern} action before, it still applies +and needn't be repeated: + +# Known ad generators: +# +{ block-as-image } +ar.atwola.com +.ad.doubleclick.net +.ad.*.doubleclick.net +.a.yimg.com/(?:(?!/i/).)*$ +.a[0-9].yimg.com/(?:(?!/i/).)*$ +bs*.gsanet.com +bs*.einets.com +.qkimg.net + +One of the most important jobs of Privoxy is to block banners. A huge bunch of +them are already "blocked" by the filter{banners-by-size} action, which we +enabled above, and which deletes the references to banner images from the pages +while they are loaded, so the browser doesn't request them anymore, and hence +they don't need to be blocked here. But this naturally doesn't catch all +banners, and some people choose not to use filters, so we need a comprehensive +list of patterns for banner URLs here, and apply the block action to them. + +First comes a bunch of generic patterns, which do most of the work, by matching +typical domain and path name components of banners. Then comes a list of +individual patterns for specific sites, which is omitted here to keep the +example short: + +########################################################################## +# Block these fine banners: +########################################################################## +{ +block } + +# Generic patterns: +# +ad*. +.*ads. +banner?. +count*. +/.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) +/(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ + +# Site-specific patterns (abbreviated): +# +.hitbox.com + +You wouldn't believe how many advertisers actually call their banner servers +ads.company.com, or call the directory in which the banners are stored simply +"banners". So the above generic patterns are surprisingly effective. + +But being very generic, they necessarily also catch URLs that we don't want to +block. The pattern .*ads. e.g. catches "nasty-ads.nasty-corp.com" as intended, +but also "downloads.sourcefroge.net" or "adsl.some-provider.net." So here come +some well-known exceptions to the +block section above. + +Note that these are exceptions to exceptions from the default! Consider the URL +"downloads.sourcefroge.net": Initially, all actions are deactivated, so it +wouldn't get blocked. Then comes the defaults section, which matches the URL, +but just deactivates the block action once again. Then it matches .*ads., an +exception to the general non-blocking policy, and suddenly +block applies. And +now, it'll match .*loads., where -block applies, so (unless it matches again +further down) it ends up with no block action applying. + +########################################################################## +# Save some innocent victims of the above generic block patterns: +########################################################################## + +# By domain: +# +{ -block } +adv[io]*. # (for advogato.org and advice.*) +adsl. # (has nothing to do with ads) +ad[ud]*. # (adult.* and add.*) +.edu # (universities don't host banners (yet!)) +.*loads. # (downloads, uploads etc) + +# By path: +# +/.*loads/ + +# Site-specific: +# +www.globalintersec.com/adv # (adv = advanced) +www.ugu.com/sui/ugu/adv -# Say the site where you do your home banking needs to open -# popup windows, but you have chosen to kill popups by -# default. This will allow it for your-example-bank.com: -# - { -filter{popups} -kill-popups } - .my-example-bank.com +Filtering source code can have nasty side effects, so make an exception for our +friends at sourceforge.net, and all paths with "cvs" in them. Note that -filter +disables all filters in one fell swoop! +# Don't filter code! +# +{ -filter } +/.*cvs +.sourceforge.net -# This site is delicate, and requires kid-glove -# treatment. - { fragile } - .forbes.com - +The actual default.action is of course more comprehensive, but we hope this +example made clear how it works. ------------------------------------------------------------------------------- -9. The Filter File +8.7.2. user.action -Any web page can be dynamically modified with the filter file. This -modification can be removal, or re-writing, of any web page content, including -tags and non-visible content. The default filter file is oddly enough -default.filter, located in the config directory. - -This is potentially a very powerful feature, and requires knowledge of both -"regular expression" and HTML in order create custom filters. But, there are a -number of useful filters included with Privoxy for many common situations. - -The included example file is divided into sections. Each section begins with -the FILTER keyword, followed by the identifier for that section, e.g. "FILTER: -webbugs". Each section performs a similar type of filtering, such as -"html-annoyances". - -This file uses regular expressions to alter or remove any string in the target -page. The expressions can only operate on one line at a time. Some examples -from the included default default.filter: - -Stop web pages from displaying annoying messages in the status bar by deleting -such references: +So far we are painting with a broad brush by setting general policies, which +would be a reasonable starting point for many people. Now, you'd maybe want to +be more specific and have customized rules that are more suitable to your +personal habits and preferences. These would be for narrowly defined situations +like your ISP or your bank, and should be placed in user.action, which is +parsed after all other actions files and hence has the last word, over-riding +any previously defined actions. user.action is also a safe place for your +personal settings, since default.action is actively maintained by the Privoxy +developers and you'll probably want to install updated versions from time to +time. + +So let's look at a few examples of things that one might typically do in +user.action: + +# My user.action file. + +As aliases are local to the actions file that they are defined in, you can't +use the ones from default.action, unless you repeat them here: + +# (Re-)define aliases for this file: +# +{{alias}} +-crunch-all-cookies = -crunch-incoming-cookies -crunch-outgoing-cookies +mercy-for-cookies = -crunch-all-cookies -session-cookies-only +fragile = -block -crunch-all-cookies -filter -fast-redirects -hide-referer -kill-popups +shop = mercy-for-cookies -filter{popups} -kill-popups +allow-ads = -block -filter{banners-by-size} # (see below) + +Say you have accounts on some sites that you visit regularly, and you don't +want to have to log in manually each time. So you'd like to allow persistent +cookies for these sites. The mercy-for-cookies alias defined above does exactly +that, i.e. it disables crunching of cookies in any direction, and processing of +cookies to make them temporary. + +{ mercy-for-cookies } +sunsolve.sun.com +slashdot.org +.yahoo.com +.msdn.microsoft.com +.redhat.com + +Your bank needs popups and is allergic to some filter, but you don't know +which, so you disable them all: + +{ -filter -kill-popups } +.your-home-banking-site.com + +While browsing the web with Privoxy you noticed some ads that sneaked through, +but you were too lazy to report them through our fine and easy feedback system, +so you have added them here: + +{ +block } +www.a-popular-site.com/some/unobvious/path +another.popular.site.net/more/junk/here/ + +Note that, assuming the banners in the above example have regular image +extensions (most do), +handle-as-image need not be specified, since all URLs +ending in these extensions will already have been tagged as images in the +relevant section of default.action by now. + +Then you noticed that the default configuration breaks Forbes Magazine, but you +were too lazy to find out which action is the culprit, and you were again too +lazy to give feedback, so you just used the fragile alias on the site, and -- +whoa! -- it worked: + +{ fragile } +.forbes.com + +You like the "fun" text replacements in default.filter, but it is disabled in +the distributed actions file. (My colleagues on the team just don't have a +sense of humour, that's why! ;-). So you'd like to turn it on in your private, +update-safe config, once and for all: + +{ +filter{fun} } +/ # For ALL sites! + +Note that the above is not really a good idea: There are exceptions to the +filters in default.action for things that really shouldn't be filtered, like +code on CVS->Web interfaces. Since user.action has the last word, these +exceptions won't be valid for the "fun" filtering specified here. + +Finally, you might think about how your favourite free websites are funded, and +find that they rely on displaying banner advertisements to survive. So you +might want to specifically allow banners for those sites that you feel provide +value to you: + +{ allow-ads } +.sourceforge.net +.slashdot.org +.osdn.net + +Note that allow-ads has been aliased to -block -filter{banners-by-size} above. - FILTER: html-annoyances +------------------------------------------------------------------------------- - # New browser windows should be resizeable and have a location and status - # bar. Make it so. - # - s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig - s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig - s/scrolling="?(no|0|Auto)"?/scrolling=1/ig - s/menubar="?(no|0)"?/menubar=1/ig +9. The Filter File - # The tag was a crime! - # - s*|**ig +All text substitutions that can be invoked through the filter action must first +be defined in the filter file, which is typically called default.filter and +which can be selected through the filterfile config option. - # Is this evil? - # - #s/framespacing="?(no|0)"?//ig - #s/margin(height|width)=[0-9]*//gi - +Typical reasons for doing such substitutions are to eliminate common annoyances +in HTML and JavaScript, such as pop-up windows, exit consoles, crippled windows +without navigation tools, the infamous tag etc, to suppress images with +certain width and height attributes (standard banner sizes or web-bugs), or +just to have fun. The possibilities are endless. -Just for kicks, replace any occurrence of "Microsoft" with "MicroSuck", and -have a little fun with topical buzzwords: +Filtering works on any text-based document type, including plain text, HTML, +JavaScript, CSS etc. (all text/* MIME types). Substitutions are made at the +source level, so if you want to "roll your own" filters, you should be familiar +with HTML syntax. - FILTER: fun +Just like the actions files, the filter file is organized in sections, which +are called filters here. Each filter consists of a heading line, that starts +with the keyword FILTER:, followed by the filter's name, and a short (one line) +description of what it does. Below that line come the jobs, i.e. lines that +define the actual text substitutions. By convention, the name of a filter +should describe what the filter eliminates. The comment is used in the +web-based user interface. - s/microsoft(?!.com)/MicroSuck/ig +Once a filter called name has been defined in the filter file, it can be +invoked by using an action of the form +filter{name} in any actions file. - # Buzzword Bingo: - # - s/industry-leading|cutting-edge|award-winning/BINGO!/ig - +A filter header line for a filter called "foo" could look like this: -Kill those pesky little web-bugs: +FILTER: foo Replace all "foo" with "bar" - # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) - FILTER: webbugs +Below that line, and up to the next header line, come the jobs that define what +text replacements the filter executes. They are specified in a syntax that +imitates Perl's s/// operator. If you are familiar with Perl, you will find +this to be quite intuitive, and may want to look at the PCRS man page for the +subtle differences to Perl behaviour. Most notably, the non-standard option +letter U is supported, which turns the default to ungreedy matching. - s/]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\s*['"]?1 -(\D[^>]*?)?>//sig - +If you are new to regular expressions, you might want to take a look at the +Appendix on regular expressions, and see the Perl manual for the s/// +operator's syntax and Perl-style regular expressions in general. The below +examples might also help to get you started. ------------------------------------------------------------------------------- -9.1. The +filter Action +9.1. Filter File Tutorial + +Now, let's complete our "foo" filter. We have already defined the heading, but +the jobs are still missing. Since all it does is to replace "foo" with "bar", +there is only one (trivial) job needed: + +s/foo/bar/ + +But wait! Didn't the comment say that all occurrences of "foo" should be +replaced? Our current job will only take care of the first "foo" on each page. +For global substitution, we'll need to add the g option: + +s/foo/bar/g + +Our complete filter now looks like this: + +FILTER: foo Replace all "foo" with "bar" +s/foo/bar/g + +Let's look at some real filters for more interesting examples. Here you see a +filter that protects against some common annoyances that arise from JavaScript +abuse. Let's look at its jobs one after the other: + +FILTER: js-annoyances Get rid of particularly annoying JavaScript abuse + +# Get rid of JavaScript referrer tracking. Test page: http://www.randomoddness.com/untitled.htm +# +s|()|$1"Not Your Business!"$2|Usg + +Following the header line and a comment, you see the job. Note that it uses | +as the delimiter instead of /, because the pattern contains a forward slash, +which would otherwise have to be escaped by a backslash (\). + +Now, let's examine the pattern: it starts with the text tag. + +That's more than we want, but the pattern continues: document\.referrer matches +only the exact string "document.referrer". The dot needed to be escaped, i.e. +preceded by a backslash, to take away its special meaning as a joker, and make +it just a regular dot. So far, the meaning is: Match from the start of the +first . You already know what .* means, so the whole +pattern translates to: Match from the start of the first " tag. Furthermore, the s +option says that the match may span multiple lines in the page, and the g +option again means that the substitution is global. + +So, to summarize, the pattern means: Match all scripts that contain the text +"document.referrer". Remember the parts of the script from (and including) the +start tag up to (and excluding) the string "document.referrer" as $1, and the +part following that string, up to and including the closing tag, as $2. + +Now the pattern is deciphered, but wasn't this about substituting things? So +lets look at the substitute: $1"Not Your Business!"$2 is easy to read: The text +remembered as $1, followed by "Not Your Business!" (including the quotation +marks!), followed by the text remembered as $2. This produces an exact copy of +the original string, with the middle part (the "document.referrer") replaced by +"Not Your Business!". + +The whole job now reads: Replace "document.referrer" by "Not Your Business!" +wherever it appears inside a