X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Ftext%2Fuser-manual.txt;h=ed0f6d402f85e9d46fbf015243ad01ceb6c3998c;hp=47332afb863e0dc110f570485023151aa7b95f5d;hb=321944b1997539a18dc73184c01a81f6b89acb65;hpb=0981554c94fbb1ef7e71b00a42e76354b8a9e86b diff --git a/doc/text/user-manual.txt b/doc/text/user-manual.txt index 47332afb..ed0f6d40 100644 --- a/doc/text/user-manual.txt +++ b/doc/text/user-manual.txt @@ -1,2059 +1,3491 @@ +Privoxy User Manual + +By: Privoxy Developers + +$Id: user-manual.sgml,v 1.100 2002/04/29 03:05:55 hal9 Exp $ + +The user manual gives users information on how to install, configure and use +Privoxy. + +Privoxy is a web proxy with advanced filtering capabilities for protecting +privacy, filtering web page content, managing cookies, controlling access, and +removing ads, banners, pop-ups and other obnoxious Internet junk. Privoxy has a +very flexible configuration and can be customized to suit individual needs and +tastes. Privoxy has application for both stand-alone systems and multi-user +networks. + +Privoxy is based on Internet Junkbuster (tm). + +You can find the latest version of the user manual at http://www.privoxy.org/ +user-manual/. Please see the Contact section on how to contact the developers. + +------------------------------------------------------------------------------- + +Table of Contents + +1. Introduction + + 1.1. Features + +3. Installation + + 3.1. Red Hat and SuSE RPMs + 3.2. Debian + 3.3. Windows + 3.4. Solaris, NetBSD, FreeBSD, HP-UX + 3.5. OS/2 + 3.6. Max OSX + 3.7. AmigaOS + +4. Note to Upgraders +5. Quickstart to Using Privoxy +6. Starting Privoxy + + 6.1. RedHat and Debian + 6.2. SuSE + 6.3. Windows + 6.4. Solaris, NetBSD, FreeBSD, HP-UX and others + 6.5. OS/2 + 6.6. MAX OSX + 6.7. AmigaOS + 6.8. Command Line Options + +7. Privoxy Configuration + + 7.1. Controlling Privoxy with Your Web Browser + 7.2. Configuration Files Overview + +8. The Main Configuration File + + 8.1. Configuration and Log File Locations + + 8.1.1. confdir + 8.1.2. logdir + 8.1.3. actionsfile + 8.1.4. filterfile + 8.1.5. logfile + 8.1.6. jarfile + 8.1.7. trustfile + 8.1.8. user-manual + + 8.2. Local Set-up Documentation + + 8.2.1. trust-info-url + 8.2.2. admin-address + 8.2.3. proxy-info-url + + 8.3. Debugging + + 8.3.1. debug + 8.3.2. single-threaded + + 8.4. Access Control and Security + + 8.4.1. listen-address + 8.4.2. toggle + 8.4.3. enable-remote-toggle + 8.4.4. enable-edit-actions + 8.4.5. ACLs: permit-access and deny-access + 8.4.6. buffer-limit + + 8.5. Forwarding + + 8.5.1. forward + 8.5.2. forward-socks4 and forward-socks4a + 8.5.3. Advanced Forwarding Examples + + 8.6. Windows GUI Options + +9. Actions Files + + 9.1. Finding the Right Mix + 9.2. How to Edit + 9.3. How Actions are Applied to URLs + 9.4. Patterns + + 9.4.1. The Domain Pattern + 9.4.2. The Path Pattern + + 9.5. Actions + + 9.5.1. +add-header + 9.5.2. +block + 9.5.3. +deanimate-gifs + 9.5.4. +downgrade-http-version + 9.5.5. +fast-redirects + 9.5.6. +filter + 9.5.7. +hide-forwarded-for-headers + 9.5.8. +hide-from-header + 9.5.9. +hide-referer + 9.5.10. +hide-user-agent + 9.5.11. +handle-as-image + 9.5.12. +set-image-blocker + 9.5.13. +limit-connect + 9.5.14. +prevent-compression + 9.5.15. +session-cookies-only + 9.5.16. +prevent-reading-cookies + 9.5.17. +prevent-setting-cookies + 9.5.18. +kill-popups + 9.5.19. +send-vanilla-wafer + 9.5.20. +send-wafer + 9.5.21. Summary + 9.5.22. Sample Actions Files + + 9.6. Aliases + +10. The Filter File + + 10.1. The +filter Action + +11. Templates +12. Contacting the Developers, Bug Reporting and Feature Requests + + 12.1. Get Support + 12.2. Report bugs + 12.3. Request new features + 12.4. Report ads or other filter problems + 12.5. Other + +13. Copyright and History + + 13.1. Copyright + 13.2. History + +14. See Also +15. Appendix + + 15.1. Regular Expressions + 15.2. Privoxy's Internal Pages + + 15.2.1. Bookmarklets + + 15.3. Chain of Events + 15.4. Anatomy of an Action + +------------------------------------------------------------------------------- + +1. Introduction + +This documentation is included with the current beta version of Privoxy, +v.2.9.14, and is mostly complete at this point. The most up to date reference +for the time being is still the comments in the source files and in the +individual configuration files. Development of version 3.0 is currently nearing +completion, and includes many significant changes and enhancements over earlier +versions. The target release date for stable v3.0 is "soon" ;-). + +Since this is a beta version, not all new features are well tested. This +documentation may be slightly out of sync as a result (especially with CVS +sources). And there may be bugs, though hopefully not many! + +------------------------------------------------------------------------------- + +1.1. Features + +In addition to Internet Junkbuster's traditional features of ad and banner +blocking and cookie management, Privoxy provides new features, some of them +currently under development: + + * FIXME: complete the list of features. change the order: most important + features to the top of the list. prefix new features with "NEW". + + * Integrated browser based configuration and control utility at http:// + config.privoxy.org/ (shortcut: http://p.p/). Browser-based tracing of rule + and filter effects. Remote toggling. + + * Blocking of annoying pop-up browser windows. + + * HTTP/1.1 compliant (but not all optional 1.1 features are supported). + + * Support for Perl Compatible Regular Expressions in the configuration files, + and generally a more sophisticated and flexible configuration syntax over + previous versions. + + * GIF de-animation. + + * Web page content filtering (removes banners based on size, invisible + "web-bugs", JavaScript and HTML annoyances, pop-ups, etc.) + + * Bypass many click-tracking scripts (avoids script redirection). + + * Multi-threaded (POSIX and native threads). + + * Auto-detection and re-reading of config file changes. + + * User-customizable HTML templates (e.g. 404 error page). + + * Improved cookie management features (e.g. session based cookies). + + * Improved signal handling, and a true daemon mode (Unix). + + * Every feature now controllable on a per-site or per-location basis, + configuration more powerful and versatile over-all. + + * Many smaller new features added, limitations and bugs removed, and security + holes fixed. + +------------------------------------------------------------------------------- + +3. Installation + +Privoxy is available both in convenient pre-compiled packages for a wide range +of operating systems, and as raw source code. For most users, we recommend +using the packages, which can be downloaded from our Privoxy Project Page. For +installing and compiling the source code, please look into our Developer +Manual. + +If you like to live on the bleeding edge and are not afraid of using possibly +unstable development versions, you can check out the up-to-the-minute version +directly from the CVS repository or simply download the nightly CVS tarball. +Again, we refer you to the Developer Manual. + +At present, Privoxy is known to run on Windows(95, 98, ME, 2000, XP), Linux +(RedHat, Suse, Debian), Mac OSX, OS/2, AmigaOS, FreeBSD, NetBSD, BeOS, and many +more flavors of Unix. + +Note: If you have a previous Junkbuster or Privoxy installation on your system, +you will need to remove it. Some platforms do this for you as part of their +installation procedure. (See below for your platform). + +In any case be sure to backup your old configuration if it is valuable to you. +See the note to upgraders section below. + +------------------------------------------------------------------------------- + +3.1. Red Hat and SuSE RPMs + +RPMs can be installed with rpm -Uvh privoxy-2.9.14-1.rpm, and will use /etc/ +privoxy for the location of configuration files. + +Note that on Red Hat, Privoxy will not be automatically started on system boot. +You will need to enable that using chkconfig, ntsysv, or similar methods. Note +that SuSE will automatically start Privoxy in the boot process. + +If you have problems with failed dependencies, try rebuilding the SRC RPM: rpm +--rebuild privoxy-2.9.14-1.src.rpm;. This will use your locally installed +libraries and RPM version. + +Also note that if you have a Junkbuster RPM installed on your system, you need +to remove it first, because the packages conflict. Otherwise, RPM will try to +remove Junkbuster automatically, before installing Privoxy. + +------------------------------------------------------------------------------- + +3.2. Debian + +FIXME. + +------------------------------------------------------------------------------- + +3.3. Windows + +Just double-click the installer, which will guide you through the installation +process. You will find the configuration files in the same directory as you +installed Privoxy in. We do not use the registry of Windows. + +------------------------------------------------------------------------------- + +3.4. Solaris, NetBSD, FreeBSD, HP-UX + +Create a new directory, cd to it, then unzip and untar the archive. For the +most part, you'll have to figure out where things go. FIXME. + +------------------------------------------------------------------------------- + +3.5. OS/2 + +First, make sure that no previous installations of Junkbuster and / or Privoxy +are left on your system. You can do this by + +Then, just double-click the WarpIN self-installing archive, which will guide +you through the installation process. A shadow of the Privoxy executable will +be placed in your startup folder so it will start automatically whenever OS/2 +starts. + +The directory you choose to install Privoxy into will contain all of the +configuration files. + +------------------------------------------------------------------------------- + +3.6. Max OSX + +Unzip the downloaded package (you can either double-click on the file in the +finder, or on the desktop if you downloaded it there). Then, double-click on +the package installer icon and follow the installation process. Privoxy will be +installed in the subdirectory /Applications/Privoxy.app. Privoxy will set +itself up to start automatically on system bring-up via /System/Library/ +StartupItems/Privoxy. + +------------------------------------------------------------------------------- + +3.7. AmigaOS + +Copy and then unpack the lha archive to a suitable location. All necessary +files will be installed into Privoxy directory, including all configuration and +log files. To uninstall, just remove this directory. + +Start Privoxy (with RUN <>NIL:) in your startnet script (AmiTCP), in s: +user-startup (RoadShow), as startup program in your startup script (Genesis), +or as startup action (Miami and MiamiDx). Privoxy will automatically quit when +you quit your TCP/IP stack (just ignore the harmless warning your TCP/IP stack +may display that Privoxy is still running). + +------------------------------------------------------------------------------- + +4. Note to Upgraders + +There are very significant changes from older versions of Junkbuster to the +current Privoxy. Configuration is substantially changed. Junkbuster 2.0.x and +earlier configuration files will not migrate. The functionality of the old +blockfile, cookiefile and imagelist, are now combined into the "actions files". +default.action, is the main actions file. Local exceptions should best be put +into user.action. + +A "filter file" (typically default.filter) is new as of Privoxy 2.9.x, and +provides some of the new sophistication (explained below). config is much the +same as before. + +If upgrading from a 2.0.x version, you will have to use the new config files, +and possibly adapt any personal rules from your older files. When porting +personal rules over from the old blockfile to the new actions files, please +note that even the pattern syntax has changed. If upgrading from 2.9.x +development versions, it is still recommended to use the new configuration +files. + +A quick list of things to be aware of before upgrading: + + * The default listening port is now 8118 due to a conflict with another + service (NAS). + + * Some installers may remove earlier versions completely. Save any important + configuration files! + + * Privoxy is controllable with a web browser at the special URL: http:// + config.privoxy.org/ (Shortcut: http://p.p/). Many aspects of configuration + can be done here, including temporarily disabling Privoxy. + + * The primary configuration file for cookie management, ad and banner + blocking, and many other aspects of Privoxy configuration is in the actions + files. It is strongly recommended to become familiar with the new actions + concept below, before modifying these files. Locally defined rules should + go into user.action. + + * Some installers may not automatically start Privoxy after installation. + +------------------------------------------------------------------------------- + +5. Quickstart to Using Privoxy + + * Install Privoxy. See the section Installing. + + * Start Privoxy. See the section Starting Privoxy. + + * Change your browser's configuration to use the proxy localhost on port + 8118. See the section Starting Privoxy. + + * Enjoy surfing with enhanced comfort and privacy. Please see the section + Contacting the Developers on how to report bugs or problems with websites + or to get help. You may want to change the file user.action to further + tweak your new browsing experience. + +------------------------------------------------------------------------------- + +6. Starting Privoxy + +Before launching Privoxy for the first time, you will want to configure your +browser(s) to use Privoxy as a HTTP and HTTPS proxy. The default is localhost +for the proxy address, and port 8118 (earlier versions used port 8000). This is +the one configuration step that must be done! + +With Netscape (and Mozilla), this can be set under Edit -> Preferences -> +Advanced -> Proxies -> HTTP Proxy. For Internet Explorer: Tools -> Internet +Properties -> Connections -> LAN Setting. Then, check "Use Proxy" and fill in +the appropriate info (Address: localhost, Port: 8118). Include if HTTPS proxy +support too. + +After doing this, flush your browser's disk and memory caches to force a +re-reading of all pages and to get rid of any ads that may be cached. You are +now ready to start enjoying the benefits of using Privoxy! + +Privoxy is typically started by specifying the main configuration file to be +used on the command line. If no configuration file is specified on the command +line, Privoxy will look for a file named config in the current directory. +Except on Win32 where it will try config.txt. + +------------------------------------------------------------------------------- + +6.1. RedHat and Debian + +We use a script. Note that RedHat does not start Privoxy upon booting per +default. It will use the file /etc/privoxy/config as its main configuration +file. FIXME: Debian?? + + # /etc/rc.d/init.d/privoxy start + +------------------------------------------------------------------------------- + +6.2. SuSE + +We use a script. It will use the file /etc/privoxy/config as its main +configuration file. Note that SuSE starts Privoxy upon booting your PC. + + # rcprivoxy start + +------------------------------------------------------------------------------- + +6.3. Windows + +Click on the Privoxy Icon to start Privoxy. If no configuration file is +specified on the command line, Privoxy will look for a file named config.txt. +Note that Windows will automatically start Privoxy upon booting you PC. + +------------------------------------------------------------------------------- + +6.4. Solaris, NetBSD, FreeBSD, HP-UX and others + +Example Unix startup command: + + # /usr/sbin/privoxy /etc/privoxy/config + +------------------------------------------------------------------------------- + +6.5. OS/2 + +FIXME. + +------------------------------------------------------------------------------- + +6.6. MAX OSX + +FIXME. + +------------------------------------------------------------------------------- + +6.7. AmigaOS + +FIXME. + +------------------------------------------------------------------------------- + +6.8. Command Line Options + +Privoxy may be invoked with the following command-line options: + + * --version + + Print version info and exit. Unix only. + + * --help + + Print short usage info and exit. Unix only. + + * --no-daemon + + Don't become a daemon, i.e. don't fork and become process group leader, and + don't detach from controlling tty. Unix only. + + * --pidfile FILE + + On startup, write the process ID to FILE. Delete the FILE on exit. Failure + to create or delete the FILE is non-fatal. If no FILE option is given, no + PID file will be used. Unix only. + + * --user USER[.GROUP] + + After (optionally) writing the PID file, assume the user ID of USER, and if + included the GID of GROUP. Exit if the privileges are not sufficient to do + so. Unix only. + + * configfile + + If no configfile is included on the command line, Privoxy will look for a + file named "config" in the current directory (except on Win32 where it will + look for "config.txt" instead). Specify full path to avoid confusion. If no + config file is found, Privoxy will fail to start. + +------------------------------------------------------------------------------- + +7. Privoxy Configuration + +All Privoxy configuration is stored in text files. These files can be edited +with a text editor. Many important aspects of Privoxy can also be controlled +easily with a web browser. + +------------------------------------------------------------------------------- + +7.1. Controlling Privoxy with Your Web Browser + +Privoxy's user interface can be reached through the special URL http:// +config.privoxy.org/ (shortcut: http://p.p/), which is a built-in page and works +without Internet access. You will see the following section: + + Privoxy Menu + ?? View & change the current configuration + ?? View the source code version numbers + ?? View the request headers. + ?? Look up which actions apply to a URL and why + ?? Toggle Privoxy on or off + + +This should be self-explanatory. Note the first item leads to an editor for the +"actions list", which is where the ad, banner, cookie, and URL blocking magic +is configured as well as other advanced features of Privoxy. This is an easy +way to adjust various aspects of Privoxy configuration. The actions file, and +other configuration files, are explained in detail below. + +"Toggle Privoxy On or Off" is handy for sites that might have problems with +your current actions and filters. You can in fact use it as a test to see +whether it is Privoxy causing the problem or not. Privoxy continues to run as a +proxy in this case, but all filtering is disabled. There is even a toggle +Bookmarklet offered, so that you can toggle Privoxy with one click from your +browser. + +------------------------------------------------------------------------------- + +7.2. Configuration Files Overview + +For Unix, *BSD and Linux, all configuration files are located in /etc/privoxy/ +by default. For MS Windows, OS/2, and AmigaOS these are all in the same +directory as the Privoxy executable. The name and number of configuration files +has changed from previous versions, and is subject to change as development +progresses. + +The installed defaults provide a reasonable starting point, though some +settings may be aggressive by some standards. For the time being, the principle +configuration files are: + + * The main configuration file is named config on Linux, Unix, BSD, OS/2, and + AmigaOS and config.txt on Windows. This is a required file. + + * default.action (the main actions file) is used to define the default + settings for various "actions" relating to images, banners, pop-ups, access + restrictions, banners and cookies. + + Multiple actions files may be defined in config. These are processed in the + order they are defined. Local customizations and locally preferred + exceptions to the default policies as defined in default.action are + probably best applied in user.action, which should be preserved across + upgrades. standard.action is also included. This is mostly for Privoxy's + internal use. + + There is also a web based editor that can be accessed from http:// + config.privoxy.org/show-status/ (Shortcut: http://p.p/show-status/) for the + various actions files. + + * default.filter (the filter file) can be used to re-write the raw page + content, including viewable text as well as embedded HTML and JavaScript, + and whatever else lurks on any given web page. The filtering jobs are only + pre-defined here; whether to apply them or not is up to the actions files. + +All files use the "#" character to denote a comment (the rest of the line will +be ignored) angd understand line continuation through placing a backslash ("\") +as the very last character in a line. If the # is preceded by a backslash, it +looses its special function. Placing a # in front of an otherwise valid +configuration line to prevent it from being interpreted is called "commenting +out" that line. + +The actions files and default.filter can use Perl style regular expressions for +maximum flexibility. + +After making any changes, there is no need to restart Privoxy in order for the +changes to take effect. Privoxy detects such changes automatically. Note, +however, that it may take one or two additional requests for the change to take +effect. When changing the listening address of Privoxy, these "wake up" +requests must obviously be sent to the old listening address. + +While under development, the configuration content is subject to change. The +below documentation may not be accurate by the time you read this. Also, what +constitutes a "default" setting, may change, so please check all your +configuration files on important issues. + +------------------------------------------------------------------------------- + +8. The Main Configuration File + +Again, the main configuration file is named config on Linux/Unix/BSD and OS/2, +and config.txt on Windows. Configuration lines consist of an initial keyword +followed by a list of values, all separated by whitespace (any number of spaces +or tabs). For example: + + confdir /etc/privoxy + + +Assigns the value /etc/privoxy to the option confdir and thus indicates that +the configuration directory is named "/etc/privoxy/". + +All options in the config file except for confdir and logdir are optional. +Watch out in the below description for what happens if you leave them unset. + +The main config file controls all aspects of Privoxy's operation that are not +location dependent (i.e. they apply universally, no matter where you may be +surfing). + +------------------------------------------------------------------------------- + +8.1. Configuration and Log File Locations + +Privoxy can (and normally does) use a number of other files for additional +configuration, help and logging. This section of the configuration file tells +Privoxy where to find those other files. + +------------------------------------------------------------------------------- + +8.1.1. confdir + +Specifies: + + The directory where the other configuration files are located + +Type of value: + + Path name + +Default value: + + /etc/privoxy (Unix) or Privoxy installation dir (Windows) + +Effect if unset: + + Mandatory + +Notes: + + No trailing "/", please + + When development goes modular and multi-user, the blocker, filter, and + per-user config will be stored in subdirectories of "confdir". For now, the + configuration directory structure is flat, except for confdir/templates, + where the HTML templates for CGI output reside (e.g. Privoxy's 404 error + page). + +------------------------------------------------------------------------------- + +8.1.2. logdir + +Specifies: + + The directory where all logging takes place (i.e. where logfile and jarfile + are located) + +Type of value: + + Path name + +Default value: + + /var/log/privoxy (Unix) or Privoxy installation dir (Windows) + +Effect if unset: + + Mandatory + +Notes: + + No trailing "/", please + +------------------------------------------------------------------------------- + +8.1.3. actionsfile + +Specifies: + + The actions file(s) to use + +Type of value: + + File name, relative to confdir + +Default value: + + standard # Internal purposes, recommended not editing + + default # Main actions file + + user # User customizations + +Effect if unset: + + No actions are taken at all. Simple neutral proxying. + +Notes: + + Multiple actionsfile lines are permitted, and are in fact recommended! + + The default values include standard.action, which is used for internal + purposes and should be loaded, default.action, which is the "main" actions + file maintained by the developers, and user.action, where you can make your + personal additions. + + Actions files are where all the per site and per URL configuration is done + for ad blocking, cookie management, privacy considerations, etc. There is + no point in using Privoxy without at least one actions file. + +------------------------------------------------------------------------------- + +8.1.4. filterfile + +Specifies: + + The filter file to use + +Type of value: + + File name, relative to confdir + +Default value: + + default.filter (Unix) or default.filter.txt (Windows) + +Effect if unset: + + No textual content filtering takes place, i.e. all +filter{name} actions in + the actions files are turned off + +Notes: + + The "default.filter" file contains content modification rules that use + "regular expressions". These rules permit powerful changes on the content + of Web pages, e.g., you could disable your favorite JavaScript annoyances, + re-write the actual displayed text, or just have some fun replacing + "Microsoft" with "MicroSuck" wherever it appears on a Web page. + +------------------------------------------------------------------------------- + +8.1.5. logfile + +Specifies: + + The log file to use + +Type of value: + + File name, relative to logdir + +Default value: + + logfile (Unix) or privoxy.log (Windows) + +Effect if unset: + + No log file is used, all log messages go to the console (stderr). + +Notes: + + The windows version will additionally log to the console. + + The logfile is where all logging and error messages are written. The level + of detail and number of messages are set with the debug option (see below). + The logfile can be useful for tracking down a problem with Privoxy (e.g., + it's not blocking an ad you think it should block) but in most cases you + probably will never look at it. + + Your logfile will grow indefinitely, and you will probably want to + periodically remove it. On Unix systems, you can do this with a cron job + (see "man cron"). For Red Hat, a logrotate script has been included. + + On SuSE Linux systems, you can place a line like "/var/log/privoxy.* +1024k + 644 nobody.nogroup" in /etc/logfiles, with the effect that cron.daily will + automatically archive, gzip, and empty the log, when it exceeds 1M size. + +------------------------------------------------------------------------------- + +8.1.6. jarfile + +Specifies: + + The file to store intercepted cookies in + +Type of value: + + File name, relative to logdir + +Default value: + + jarfile (Unix) or privoxy.jar (Windows) + +Effect if unset: + + Intercepted cookies are not stored at all. + +Notes: + + The jarfile may grow to ridiculous sizes over time. + +------------------------------------------------------------------------------- + +8.1.7. trustfile + +Specifies: + + The trust file to use + +Type of value: + + File name, relative to confdir + +Default value: + + Unset (commented out). When activated: trust (Unix) or trust.txt (Windows) + +Effect if unset: + + The whole trust mechanism is turned off. + +Notes: + + The trust mechanism is an experimental feature for building white-lists and + should be used with care. It is NOT recommended for the casual user. + + If you specify a trust file, Privoxy will only allow access to sites that + are named in the trustfile. You can also mark sites as trusted referrers + (with +), with the effect that access to untrusted sites will be granted, + if a link from a trusted referrer was used. The link target will then be + added to the "trustfile". Possible applications include limiting Internet + access for children. + + If you use + operator in the trust file, it may grow considerably over + time. + +------------------------------------------------------------------------------- + +8.1.8. user-manual + +Specifies: + + Location of the Privoxy User Manual. + +Type of value: + + A fully qualified URI + +Default value: + + http://www.privoxy.org/user-manual/ + +Effect if unset: + + The default will be used. + +Notes: + + The User Manual is used for help hints from some of the internal CGI pages. + It is normally packaged with the binary distributions, and would make more + sense to have this pointed at a locally installed copy. + + A more useful example (Unix): + + user-manual file:///usr/share/doc/privoxy-2.9.14/user-manual/ + +------------------------------------------------------------------------------- + +8.2. Local Set-up Documentation + +If you intend to operate Privoxy for more users that just yourself, it might be +a good idea to let them know how to reach you, what you block and why you do +that, your policies etc. + +------------------------------------------------------------------------------- + +8.2.1. trust-info-url + +Specifies: + + A URL to be displayed in the error page that users will see if access to an + untrusted page is denied. + +Type of value: + + URL + +Default value: + + Two example URL are provided + +Effect if unset: + + No links are displayed on the "untrusted" error page. + +Notes: + + The value of this option only matters if the experimental trust mechanism + has been activated. (See trustfile above.) + + If you use the trust mechanism, it is a good idea to write up some on-line + documentation about your trust policy and to specify the URL(s) here. Use + multiple times for multiple URLs. + + The URL(s) should be added to the trustfile as well, so users don't end up + locked out from the information on why they were locked out in the first + place! + +------------------------------------------------------------------------------- + +8.2.2. admin-address + +Specifies: + + An email address to reach the proxy administrator. + +Type of value: + + Email address + +Default value: + + Unset + +Effect if unset: + + No email address is displayed on error pages and the CGI user interface. + +Notes: + + If both admin-address and proxy-info-url are unset, the whole "Local + Privoxy Support" box on all generated pages will not be shown. + +------------------------------------------------------------------------------- + +8.2.3. proxy-info-url -Privoxy User Manual +Specifies: + + A URL to documentation about the local Privoxy setup, configuration or + policies. + +Type of value: + + URL + +Default value: + + Unset + +Effect if unset: + + No link to local documentation is displayed on error pages and the CGI user + interface. + +Notes: + + If both admin-address and proxy-info-url are unset, the whole "Local + Privoxy Support" box on all generated pages will not be shown. + + This URL shouldn't be blocked ;-) + +------------------------------------------------------------------------------- + +8.3. Debugging + +These options are mainly useful when tracing a problem. Note that you might +also want to invoke Privoxy with the --no-daemon command line option when +debugging. + +------------------------------------------------------------------------------- + +8.3.1. debug + +Specifies: + + Key values that determine what information gets logged. + +Type of value: + + Integer values + +Default value: + + 12289 (i.e.: URLs plus informational and warning messages) + +Effect if unset: + + Nothing gets logged. + +Notes: + + The available debug levels are: + + debug 1 # show each GET/POST/CONNECT request + debug 2 # show each connection status + debug 4 # show I/O status + debug 8 # show header parsing + debug 16 # log all data into the logfile + debug 32 # debug force feature + debug 64 # debug regular expression filter + debug 128 # debug fast redirects + debug 256 # debug GIF de-animation + debug 512 # Common Log Format + debug 1024 # debug kill pop-ups + debug 4096 # Startup banner and warnings. + debug 8192 # Non-fatal errors + + To select multiple debug levels, you can either add them or use multiple + debug lines. + + A debug level of 1 is informative because it will show you each request as + it happens. 1, 4096 and 8192 are highly recommended so that you will notice + when things go wrong. The other levels are probably only of interest if you + are hunting down a specific problem. They can produce a hell of an output + (especially 16). + + The reporting of fatal errors (i.e. ones which crash Privoxy) is always on + and cannot be disabled. + + If you want to use CLF (Common Log Format), you should set "debug 512" ONLY + and not enable anything else. + +------------------------------------------------------------------------------- + +8.3.2. single-threaded + +Specifies: + + Whether to run only one server thread + +Type of value: + + None + +Default value: + + Unset + +Effect if unset: + + Multi-threaded (or, where unavailable: forked) operation, i.e. the ability + to serve multiple requests simultaneously. + +Notes: + + This option is only there for debug purposes and you should never need to + use it. It will drastically reduce performance. + +------------------------------------------------------------------------------- + +8.4. Access Control and Security + +This section of the config file controls the security-relevant aspects of +Privoxy's configuration. + +------------------------------------------------------------------------------- + +8.4.1. listen-address + +Specifies: + + The IP address and TCP port on which Privoxy will listen for client + requests. + +Type of value: + + [IP-Address]:Port + +Default value: + + localhost:8118 + +Effect if unset: + + Bind to localhost (127.0.0.1), port 8118. This is suitable and recommended + for home users who run Privoxy on the same machine as their browser. + +Notes: + + You will need to configure your browser(s) to this proxy address and port. + + If you already have another service running on port 8118, or if you want to + serve requests from other machines (e.g. on your local network) as well, + you will need to override the default. + + If you leave out the IP address, Privoxy will bind to all interfaces + (addresses) on your machine and may become reachable from the Internet. In + that case, consider using access control lists (ACL's) (see "ACLs" below), + or a firewall. + +Example: + + Suppose you are running Privoxy on a machine which has the address + 192.168.0.1 on your local private network (192.168.0.0) and has another + outside connection with a different address. You want it to serve requests + from inside only: + + listen-address 192.168.0.1:8118 + +------------------------------------------------------------------------------- + +8.4.2. toggle + +Specifies: + + Initial state of "toggle" status + +Type of value: + + 1 or 0 + +Default value: + + 1 + +Effect if unset: + + Act as if toggled on + +Notes: + + If set to 0, Privoxy will start in "toggled off" mode, i.e. behave like a + normal, content-neutral proxy. See enable-remote-toggle below. This is not + really useful anymore, since toggling is much easier via the web interface + than via editing the conf file. + + The windows version will only display the toggle icon in the system tray if + this option is present. + +------------------------------------------------------------------------------- + +8.4.3. enable-remote-toggle + +Specifies: + + Whether or not the web-based toggle feature may be used + +Type of value: + + 0 or 1 + +Default value: + + 1 + +Effect if unset: + + The web-based toggle feature is disabled. + +Notes: + + When toggled off, Privoxy acts like a normal, content-neutral proxy, i.e. + it acts as if none of the actions applied to any URL. + + For the time being, access to the toggle feature can not be controlled + separately by "ACLs" or HTTP authentication, so that everybody who can + access Privoxy (see "ACLs" and listen-address above) can toggle it for all + users. So this option is not recommended for multi-user environments with + untrusted users. + + Note that you must have compiled Privoxy with support for this feature, + otherwise this option has no effect. + +------------------------------------------------------------------------------- + +8.4.4. enable-edit-actions + +Specifies: + + Whether or not the web-based actions file editor may be used + +Type of value: + + 0 or 1 + +Default value: + + 1 + +Effect if unset: + + The web-based actions file editor is disabled. + +Notes: + + For the time being, access to the editor can not be controlled separately + by "ACLs" or HTTP authentication, so that everybody who can access Privoxy + (see "ACLs" and listen-address above) can modify its configuration for all + users. So this option is not recommended for multi-user environments with + untrusted users. + + Note that you must have compiled Privoxy with support for this feature, + otherwise this option has no effect. + +------------------------------------------------------------------------------- + +8.4.5. ACLs: permit-access and deny-access + +Specifies: + + Who can access what. + +Type of value: + + src_addr[/src_masklen] [dst_addr[/dst_masklen]] + + Where src_addr and dst_addr are IP addresses in dotted decimal notation or + valid DNS names, and src_masklen and dst_masklen are subnet masks in CIDR + notation, i.e. integer values from 2 to 30 representing the length (in + bits) of the network address. The masks and the whole destination part are + optional. + +Default value: + + Unset + +Effect if unset: + + Don't restrict access further than implied by listen-address + +Notes: + + Access controls are included at the request of ISPs and systems + administrators, and are not usually needed by individual users. For a + typical home user, it will normally suffice to ensure that Privoxy only + listens on the localhost or internal (home) network address by means of the + listen-address option. + + Please see the warnings in the FAQ that this proxy is not intended to be a + substitute for a firewall or to encourage anyone to defer addressing basic + security weaknesses. + + Multiple ACL lines are OK. If any ACLs are specified, then the Privoxy + talks only to IP addresses that match at least one permit-access line and + don't match any subsequent deny-access line. In other words, the last match + wins, with the default being deny-access. + + If Privoxy is using a forwarder (see forward below) for a particular + destination URL, the dst_addr that is examined is the address of the + forwarder and NOT the address of the ultimate target. This is necessary + because it may be impossible for the local Privoxy to determine the IP + address of the ultimate target (that's often what gateways are used for). + + You should prefer using IP addresses over DNS names, because the address + lookups take time. All DNS names must resolve! You can not use domain + patterns like "*.org" or partial domain names. If a DNS name resolves to + multiple IP addresses, only the first one is used. + + Denying access to particular sites by ACL may have undesired side effects + if the site in question is hosted on a machine which also hosts other + sites. + +Examples: + + Explicitly define the default behavior if no ACL and listen-address are + set: "localhost" is OK. The absence of a dst_addr implies that all + destination addresses are OK: + + permit-access localhost + + Allow any host on the same class C subnet as www.privoxy.org access to + nothing but www.example.com: + + permit-access www.privoxy.org/24 www.example.com/32 + + Allow access from any host on the 26-bit subnet 192.168.45.64 to anywhere, + with the exception that 192.168.45.73 may not access + www.dirty-stuff.example.com: + + permit-access 192.168.45.64/26 + deny-access 192.168.45.73 www.dirty-stuff.example.com + +------------------------------------------------------------------------------- + +8.4.6. buffer-limit + +Specifies: + + Maximum size of the buffer for content filtering. + +Type of value: + + Size in Kbytes + +Default value: + + 4096 + +Effect if unset: + + Use a 4MB (4096 KB) limit. + +Notes: + + For content filtering, i.e. the +filter and +deanimate-gif actions, it is + necessary that Privoxy buffers the entire document body. This can be + potentially dangerous, since a server could just keep sending data + indefinitely and wait for your RAM to exhaust -- with nasty consequences. + Hence this option. + + When a document buffer size reaches the buffer-limit, it is flushed to the + client unfiltered and no further attempt to filter the rest of the document + is made. Remember that there may be multiple threads running, which might + require up to buffer-limit Kbytes each, unless you have enabled + "single-threaded" above. + +------------------------------------------------------------------------------- + +8.5. Forwarding + +This feature allows routing of HTTP requests through a chain of multiple +proxies. It can be used to better protect privacy and confidentiality when +accessing specific domains by routing requests to those domains through an +anonymous public proxy (see e.g. http://www.multiproxy.org/anon_list.htm) Or to +use a caching proxy to speed up browsing. Or chaining to a parent proxy may be +necessary because the machine that Privoxy runs on has no direct Internet +access. + +Also specified here are SOCKS proxies. Privoxy supports the SOCKS 4 and SOCKS +4A protocols. + +------------------------------------------------------------------------------- + +8.5.1. forward + +Specifies: + + To which parent HTTP proxy specific requests should be routed. + +Type of value: + + target_domain[:port] http_parent[/port] + + Where target_domain is a domain name pattern (see the chapter on domain + matching in the default.action file), http_parent is the address of the + parent HTTP proxy as an IP addresses in dotted decimal notation or as a + valid DNS name (or "." to denote "no forwarding", and the optional port + parameters are TCP ports, i.e. integer values from 1 to 64535 + +Default value: + + Unset + +Effect if unset: + + Don't use parent HTTP proxies. + +Notes: + + If http_parent is ".", then requests are not forwarded to another HTTP + proxy but are made directly to the web servers. + + Multiple lines are OK, they are checked in sequence, and the last match + wins. + +Examples: + + Everything goes to an example anonymizing proxy, except SSL on port 443 + (which it doesn't handle): + + forward .* anon-proxy.example.org:8080 + forward :443 . + + Everything goes to our example ISP's caching proxy, except for requests to + that ISP's sites: + + forward .*. caching-proxy.example-isp.net:8000 + forward .example-isp.net . + +------------------------------------------------------------------------------- + +8.5.2. forward-socks4 and forward-socks4a + +Specifies: + + Through which SOCKS proxy (and to which parent HTTP proxy) specific + requests should be routed. + +Type of value: + + target_domain[:port] socks_proxy[/port] http_parent[/port] + + Where target_domain is a domain name pattern (see the chapter on domain + matching in the default.action file), http_parent and socks_proxy are IP + addresses in dotted decimal notation or valid DNS names (http_parent may be + "." to denote "no HTTP forwarding"), and the optional port parameters are + TCP ports, i.e. integer values from 1 to 64535 + +Default value: + + Unset + +Effect if unset: + + Don't use SOCKS proxies. + +Notes: + + Multiple lines are OK, they are checked in sequence, and the last match + wins. + + The difference between forward-socks4 and forward-socks4a is that in the + SOCKS 4A protocol, the DNS resolution of the target hostname happens on the + SOCKS server, while in SOCKS 4 it happens locally. + + If http_parent is ".", then requests are not forwarded to another HTTP + proxy but are made (HTTP-wise) directly to the web servers, albeit through + a SOCKS proxy. + +Examples: + + From the company example.com, direct connections are made to all "internal" + domains, but everything outbound goes through their ISP's proxy by way of + example.com's corporate SOCKS 4A gateway to the Internet. + + forward-socks4a .*. socks-gw.example.com:1080 www-cache.example-isp.net:8080 + forward .example.com . + + A rule that uses a SOCKS 4 gateway for all destinations but no HTTP parent + looks like this: + + forward-socks4 .*. socks-gw.example.com:1080 . + +------------------------------------------------------------------------------- + +8.5.3. Advanced Forwarding Examples + +If you have links to multiple ISPs that provide various special content only to +their subscribers, you can configure multiple Privoxies which have connections +to the respective ISPs to act as forwarders to each other, so that your users +can see the internal content of all ISPs. + +Assume that host-a has a PPP connection to isp-a.net. And host-b has a PPP +connection to isp-b.net. Both run Privoxy. Their forwarding configuration can +look like this: + +host-a: + + forward .*. . + forward .isp-b.net host-b:8118 + +host-b: + + forward .*. . + forward .isp-a.net host-a:8118 + +Now, your users can set their browser's proxy to use either host-a or host-b +and be able to browse the internal content of both isp-a and isp-b. + +If you intend to chain Privoxy and squid locally, then chain as browser -> +squid -> privoxy is the recommended way. + +Assuming that Privoxy and squid run on the same box, your squid configuration +could then look like this: + + # Define Privoxy as parent proxy (without ICP) + cache_peer 127.0.0.1 parent 8118 7 no-query + + # Define ACL for protocol FTP + acl ftp proto FTP + + # Do not forward FTP requests to Privoxy + always_direct allow ftp + + # Forward all the rest to Privoxy + never_direct allow all + +You would then need to change your browser's proxy settings to squid's address +and port. Squid normally uses port 3128. If unsure consult http_port in +squid.conf. + +------------------------------------------------------------------------------- + +8.6. Windows GUI Options + +Privoxy has a number of options specific to the Windows GUI interface: + +If "activity-animation" is set to 1, the Privoxy icon will animate when +"Privoxy" is active. To turn off, set to 0. + + activity-animation 1 + + +If "log-messages" is set to 1, Privoxy will log messages to the console window: + + log-messages 1 + + +If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the amount +of memory used for the log messages displayed in the console window, will be +limited to "log-max-lines" (see below). + +Warning: Setting this to 0 will result in the buffer to grow infinitely and eat +up all your memory! + + log-buffer-size 1 + + +log-max-lines is the maximum number of lines held in the log buffer. See above. + + log-max-lines 200 + + +If "log-highlight-messages" is set to 1, Privoxy will highlight portions of the +log messages with a bold-faced font: + + log-highlight-messages 1 + + +The font used in the console window: + + log-font-name Comic Sans MS + + +Font size used in the console window: + + log-font-size 8 + + +"show-on-task-bar" controls whether or not Privoxy will appear as a button on +the Task bar when minimized: + + show-on-task-bar 0 + + +If "close-button-minimizes" is set to 1, the Windows close button will minimize +Privoxy instead of closing the program (close with the exit option on the File +menu). + + close-button-minimizes 1 + + +The "hide-console" option is specific to the MS-Win console version of Privoxy. +If this option is used, Privoxy will disconnect from and hide the command +console. + + #hide-console + + +------------------------------------------------------------------------------- + +9. Actions Files + +The actions files are used to define what actions Privoxy takes for which URLs, +and thus determines how ad images, cookies and various other aspects of HTTP +content and transactions are handled, and on which sites (or even parts +thereof). There are three such files included with Privoxy (as of version +2.9.15), with differing purposes: + + * standard.action - is used by the web based editor, to set various + pre-defined sets of rules for the default actions section in + default.action. These have increasing levels of aggressiveness. It is not + recommend to edit this file. + + * default.action - is the primary action file that sets the initial values + for all actions. It is intended to provide a base level of functionality + for Privoxy's array of features. So it is a set of broad rules that should + work reasonably well for users everywhere. This is the file that the + developers are keeping updated, and making available to users. + + * user.action - is intended to be for local site preferences and exceptions. + As an example, if your ISP or your bank has specific requirements, and need + special handling, this kind of thing should go here. This file will not be + upgraded. + +The list of actions files to be used are defined in the main configuration +file, and are processed in the order they are defined. The content of these can +all be viewed and edited from http://config.privoxy.org/show-status. + +An actions file typically has sections. Near the top, "aliases" are optionally +defined (discussed below), then the default set of rules which will apply +universally to all sites and pages. And then below that, exceptions to the +defined universal policies. + +Actions can be used to block anything you want, including ads, banners, or just +some obnoxious URL that you would rather not see. Cookies can be accepted or +rejected, or accepted only during the current browser session (i.e. not written +to disk), content can be modified, JavaScripts tamed, user-tracking fooled, and +much more. See below for a complete list of actions. + +------------------------------------------------------------------------------- + +9.1. Finding the Right Mix + +Note that some actions, like cookie suppression or script disabling, may render +some sites unusable that rely on these techniques to work properly. Finding the +right mix of actions is not always easy and certainly a matter of personal +taste. In general, it can be said that the more "aggressive" your default +settings (in the top section of the actions file) are, the more exceptions for +"trusted" sites you will have to make later. If, for example, you want to kill +popup windows per default, you'll have to make exceptions from that rule for +sites that you regularly use and that require popups for actually useful +content, like maybe your bank, favorite shop, or newspaper. + +We have tried to provide you with reasonable rules to start from in the +distribution actions files. But there is no general rule of thumb on these +things. There just are too many variables, and sites are constantly changing. +Sooner or later you will want to change the rules (and read this chapter again +:). + +------------------------------------------------------------------------------- + +9.2. How to Edit + +The easiest way to edit the "actions" files is with a browser by using our +browser-based editor, which can be reached from http://config.privoxy.org/ +show-status. + +If you prefer plain text editing to GUIs, you can of course also directly edit +the the actions files. + +------------------------------------------------------------------------------- + +9.3. How Actions are Applied to URLs + +Actions files are divided into sections. There are special sections, like the " +alias" sections which will be discussed later. For now let's concentrate on +regular sections: They have a heading line (often split up to multiple lines +for readability) which consist of a list of actions, separated by whitespace +and enclosed in curly braces. Below that, there is a list of URL patterns, each +on a separate line. + +To determine which actions apply to a request, the URL of the request is +compared to all patterns in this file. Every time it matches, the list of +applicable actions for the URL is incrementally updated, using the heading of +the section in which the pattern is located. If multiple matches for the same +URL set the same action differently, the last match wins. If not, the effects +are aggregated (e.g. a URL might match both the "+handle-as-image" and "+block" +actions). + +You can trace this process by visiting http://config.privoxy.org/show-url-info. + +More detail on this is provided in the Appendix, Anatomy of an Action. + +------------------------------------------------------------------------------- + +9.4. Patterns - By: Privoxy Developers - - $Id: user-manual.sgml,v 1.53 2002/03/24 11:51:00 swa Exp $ - - The user manual gives users information on how to install, configure - and use Privoxy. Privoxy is a web proxy with advanced filtering - capabilities for protecting privacy, filtering web page content, - managing cookies, controlling access, and removing ads, banners, - pop-ups and other obnoxious Internet Junk. Privoxy has a very flexible - configuration and can be customized to suit individual needs and - tastes. Privoxy has application for both stand-alone systems and - multi-user networks. - - You can find the latest version of the user manual at - [1]http://ijbswa.sourceforge.net/user-manual/. - _________________________________________________________________ - - Table of Contents - 1. [2]Introduction - - 1.1. [3]New Features - - 2. [4]Installation - - 2.1. [5]Source - 2.2. [6]Red Hat - 2.3. [7]SuSE - 2.4. [8]OS/2 - 2.5. [9]Windows - 2.6. [10]Other - - 3. [11]Privoxy Configuration - - 3.1. [12]Controlling Privoxy with Your Web Browser - 3.2. [13]Configuration Files Overview - 3.3. [14]The Main Configuration File - - 3.3.1. [15]Defining Other Configuration Files - 3.3.2. [16]Other Configuration Options - 3.3.3. [17]Access Control List (ACL) - 3.3.4. [18]Forwarding - 3.3.5. [19]Windows GUI Options - - 3.4. [20]The Actions File - - 3.4.1. [21]URL Domain and Path Syntax - 3.4.2. [22]Actions - 3.4.3. [23]Aliases - - 3.5. [24]The Filter File - 3.6. [25]Templates - - 4. [26]Quickstart to Using Privoxy - - 4.1. [27]Command Line Options - - 5. [28]Contacting the Developers, Bug Reporting and Feature Requests - 6. [29]Copyright and History - - 6.1. [30]License - 6.2. [31]History - - 7. [32]See also - 8. [33]Appendix - - 8.1. [34]Regular Expressions - - 21 - 22 - 23 - 24 - 25 - 26 - 27 - 28 - 29 - - 8.2. [35]Privoxy's Internal Pages - 8.3. [36]Anatomy of an Action - -1. Introduction +Generally, a pattern has the form /, where both the and + are optional. (This is why the pattern / matches all URLs). - Privoxy is a web proxy with advanced filtering capabilities for - protecting privacy, filtering and modifying web page content, managing - cookies, controlling access, and removing ads, banners, pop-ups and - other obnoxious Internet Junk. Privoxy has a very flexible - configuration and can be customized to suit individual needs and - tastes. Privoxy has application for both stand-alone systems and - multi-user networks. - - This documentation is included with the current BETA version of - Privoxy and is mostly complete at this point. The most up to date - reference for the time being is still the comments in the source files - and in the individual configuration files. Development of version 3.0 - is currently nearing completion, and includes many significant changes - and enhancements over earlier versions. The target release date for - stable v3.0 is "soon" ;-) +www.example.com/ - Since this is a BETA version, not all new features are well tested. - This documentation may be slightly out of sync as a result (especially - with CVS sources). And there may be bugs, though hopefully not many! - _________________________________________________________________ + is a domain-only pattern and will match any request to www.example.com, + regardless of which document on that server is requested. -1.1. New Features - - In addition to Internet Junkbuster's traditional feature of ad and - banner blocking and cookie management, Privoxy provides new features, - some of them currently under development: +www.example.com - * Integrated browser based configuration and control utility - ([37]http://i.j.b). Browser-based tracing of rule and filter - effects. - * Blocking of annoying pop-up browser windows. - * HTTP/1.1 compliant (most, but not all 1.1 features are supported). - * Support for Perl Compatible Regular Expressions in the - configuration files, and generally a more sophisticated and - flexible configuration syntax over previous versions. - * GIF de-animation. - * Web page content filtering (removes banners based on size, - invisible "web-bugs", JavaScript, pop-ups, status bar abuse, etc.) - * Bypass many click-tracking scripts (avoids script redirection). - * Multi-threaded (POSIX and native threads). - * Auto-detection and re-reading of config file changes. - * User-customizable HTML templates (e.g. 404 error page). - * Improved cookie management features (e.g. session based cookies). - * Builds from source on most UNIX-like systems. Packages available - for: Linux (RedHat, SuSE, or Debian), Windows, Sun Solaris, Mac - OSX, OS/2, HP-UX 11 and AmigaOS. - * In addition, the configuration is much more powerful and versatile - over-all. - _________________________________________________________________ + means exactly the same. For domain-only patterns, the trailing / may be + omitted. -2. Installation - - Privoxy is available as raw source code, or pre-compiled binaries. See - the [38]Privoxy Home Page for binaries and current release info. - Privoxy is also available via [39]CVS. This is the recommended - approach at this time. But please be aware that CVS is constantly - changing, and it may break in mysterious ways. - _________________________________________________________________ +www.example.com/index.html -2.1. Source - - For gzipped tar archives, unpack the source: + matches only the single document /index.html on www.example.com. - tar xzvf ijb_source_* [.tgz or .tar.gz] - cd ijb_source_2.9.11_beta - - For retrieving the current CVS sources, you'll need the CVS package - installed first. To download CVS source: +/index.html - cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login - cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co cu -rrent - cd current - - This will create a directory named current/, which will contain the - source tree. - - Then, in either case, to build from tarball/CVS source: - - ./configure (--help to see options) - make (the make from gnu, gmake for *BSD) - su - make -n install (to see where all the files will go) - make install (to really install) - - For Redhat and SuSE Linux RPM packages, see below. - _________________________________________________________________ + matches the document /index.html, regardless of the domain, i.e. on any web + server. -2.2. Red Hat - - To build Redhat RPM packages, install source as above. Then: +index.html - autoheader [suggested for CVS source] - autoconf [suggested for CVS source] - ./configure - make redhat-dist + matches nothing, since it would be interpreted as a domain name and there + is no top-level domain called .html. + +------------------------------------------------------------------------------- + +9.4.1. The Domain Pattern - This will create both binary and src RPMs in the usual places. - Example: +The matching of the domain part offers some flexible options: if the domain +starts or ends with a dot, it becomes unanchored at that end. For example: + +.example.com - /usr/src/redhat/RPMS/i686/privoxy-2.9.11-1.i686.rpm + matches any domain that ENDS in .example.com - /usr/src/redhat/SRPMS/privoxy-2.9.11-1.src.rpm +www. - To install, of course: + matches any domain that STARTS with www. - rpm -Uvv /usr/src/redhat/RPMS/i686/privoxy-2.9.11-1.i686.rpm - - This will place the Privoxy configuration files in /etc/privoxy/, and - log files in /var/log/privoxy/. - _________________________________________________________________ +.example. -2.3. SuSE - - To build SuSE RPM packages, install source as above. Then: + matches any domain that CONTAINS .example. (Correctly speaking: It matches + any FQDN that contains example as a domain.) - autoheader [suggested for CVS source] - autoconf [suggested for CVS source] - ./configure - make suse-dist +Additionally, there are wild-cards that you can use in the domain names +themselves. They work pretty similar to shell wild-cards: "*" stands for zero +or more arbitrary characters, "?" stands for any single character, you can +define character classes in square brackets and all of that can be freely +mixed: - This will create both binary and src RPMs in the usual places. - Example: +ad*.example.com - /usr/src/packages/RPMS/i686/privoxy-2.9.11-1.i686.rpm + matches "adserver.example.com", "ads.example.com", etc but not + "sfads.example.com" - /usr/src/packages/SRPMS/privoxy-2.9.11-1.src.rpm +*ad*.example.com - To install, of course: + matches all of the above, and then some. - rpm -Uvv /usr/src/packages/RPMS/i686/privoxy-2.9.11-1.i686.rpm - - This will place the Privoxy configuration files in /etc/privoxy/, and - log files in /var/log/privoxy/. - _________________________________________________________________ +.?pix.com -2.4. OS/2 - - Privoxy is packaged in a WarpIN self- installing archive. The - self-installing program will be named depending on the release - version, something like: ijbos2_setup_1.2.3.exe. In order to install - it, simply run this executable or double-click on its icon and follow - the WarpIN installation panels. A shadow of the Privoxy executable - will be placed in your startup folder so it will start automatically - whenever OS/2 starts. + matches www.ipix.com, pictures.epix.com, a.b.c.d.e.upix.com etc. - The directory you choose to install Privoxy into will contain all of - the configuration files. +www[1-9a-ez].example.c* - If you would like to build binary images on OS/2 yourself, you will - need a few Unix-like tools: autoconf, autoheader and sh. These tools - will be used to create the required config.h file, which is not part - of the source distribution because it differs based on platform. You - will also need a compiler. The distribution has been created using IBM - VisualAge compilers, but you can use any compiler you like. GCC/EMX - has the disadvantage of needing to be single-threaded due to a - limitation of EMX's implementation of the select() socket call. + matches www1.example.com, www4.example.cc, wwwd.example.cy, + wwwz.example.com etc., but not wwww.example.com. - In addition to needing the source code distribution as outlined - earlier, you will want to extract the os2seutp directory from CVS: - cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login +------------------------------------------------------------------------------- - cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co os2 -setup +9.4.2. The Path Pattern - This will create a directory named os2setup/, which will contain the - Makefile.vac makefile and os2build.cmd which is used to completely - create the binary distribution. The sequence of events for building - the executable for yourself goes something like this: - cd current - autoheader - autoconf - sh configure - cd ..\os2setup - nmake -f Makefile.vac +Privoxy uses Perl compatible regular expressions (through the PCRE library) for +matching the path. - You will see this sequence laid out in os2build.cmd. - _________________________________________________________________ - -2.5. Windows +There is an Appendix with a brief quick-start into regular expressions, and +full (very technical) documentation on PCRE regex syntax is available on-line +at http://www.pcre.org/man.txt. You might also find the Perl man page on +regular expressions (man perlre) useful, which is available on-line at http:// +www.perldoc.com/perl5.6/pod/perlre.html. - Click-click. (I need help on this. Not a clue here. Also for - configuration section below. HB.) - _________________________________________________________________ - -2.6. Other +Note that the path pattern is automatically left-anchored at the "/", i.e. it +matches as if it would start with a "^" (regular expression speak for the +beginning of a line). + +Please also note that matching in the path is case INSENSITIVE by default, but +you can switch to case sensitive at any point in the pattern by using the "(? +-i)" switch: www.example.com/(?-i)PaTtErN.* will match only documents whose +path starts with PaTtErN in exactly this capitalization. - Some quick notes on other Operating Systems. +------------------------------------------------------------------------------- + +9.5. Actions + +All actions are disabled by default, until they are explicitly enabled +somewhere in an actions file. Actions are turned on if preceded with a "+", and +turned off if preceded with a "-". So a "+action" means "do that action", e.g. +"+block" means please "block the following URL patterns". + +Actions are invoked by enclosing the action name in curly braces (e.g. +{+some_action}), followed by a list of URLs (or patterns that match URLs) to +which the action applies. There are three classes of actions: + + * Boolean, i.e the action can only be "on" or "off". Examples: - For FreeBSD (and other *BSDs?), the build will require gmake instead - of the included make. gmake is available from [40]http://www.gnu.org. - The rest should be the same as above for Linux/Unix. - _________________________________________________________________ + {+name} # enable this action + {-name} # disable this action + -3. Privoxy Configuration - - All Privoxy configuration is kept in text files. These files can be - edited with a text editor. Many important aspects of Privoxy can also - be controlled easily with a web browser. - _________________________________________________________________ + * Parameterized, e.g. "+/-hide-user-agent{ Mozilla 1.0 }", where some value + is required in order to enable this type of action. Examples: + + {+name{param}} # enable action and set parameter to "param" + {-name} # disable action ("parameter") can be omitted + + + * Multi-value, e.g. "{+/-add-header{Name: value}}" or "{+/-send-wafer{name= + value}}"), where some value needs to be defined in addition to simply + enabling the action. Examples: + + {+name{param=value}} # enable action and set "param" to "value" + {-name{param=value}} # remove the parameter "param" completely + {-name} # disable this action totally and remove param too + -3.1. Controlling Privoxy with Your Web Browser +If nothing is specified in any actions file, no "actions" are taken. So in this +case Privoxy would just be a normal, non-blocking, non-anonymizing proxy. You +must specifically enable the privacy and blocking features you need (although +the provided default actions files will give a good starting point). - Privoxy can be reached by the special URL [41]http://i.j.b/ (or - alternately [42]http://ijbswa.sourceforge.net/config/), which is an - internal page. You will see the following section: - -Please choose from the following options: - - * Show information about the current configuration - * Show the source code version numbers - * Show the client's request headers. - * Show which actions apply to a URL and why - * Toggle Privoxy on or off - * Edit the actions list - - - This should be self-explanatory. Note the last item is an editor for - the "actions list", which is where much of the ad, banner, cookie, and - URL blocking magic is configured as well as other advanced features of - Privoxy. This is an easy way to adjust various aspects of Privoxy - configuration. The actions file, and other configuration files, are - explained in detail below. Privoxy will automatically detect any - changes to these files. - - "Toggle Privoxy On or Off" is handy for sites that might have problems - with your current actions and filters, or just to test if a site - misbehaves, whether it is Privoxy causing the problem or not. Privoxy - continues to run as a proxy in this case, but all filtering is - disabled. - _________________________________________________________________ - -3.2. Configuration Files Overview - - For Unix, *BSD and Linux, all configuration files are located in - /etc/privoxy/ by default. For MS Windows, OS/2, and AmigaOS these are - all in the same directory as the Privoxy executable. The name and - number of configuration files has changed from previous versions, and - is subject to change as development progresses. - - The installed defaults provide a reasonable starting point, though - possibly aggressive by some standards. For the time being, there are - only three default configuration files (this will change in time): - - * The main configuration file is named config on Linux, Unix, BSD, - OS/2, and AmigaOS and config.txt on Windows. - * The default.action file is used to define various "actions" - relating to images, banners, pop-ups, access restrictions, banners - and cookies. There is a CGI based editor for this file that can be - accessed via [43]http://i.j.b. (Other actions files are included - as well with differing levels of filtering and blocking, e.g. - ijb-basic.action.) - * The default.filter file can be used to re-write the raw page - content, including viewable text as well as embedded HTML and - JavaScript, and whatever else lurks on any given web page. - - default.action and default.filter can use Perl style regular - expressions for maximum flexibility. All files use the "#" character - to denote a comment. Such lines are not processed by Privoxy. After - making any changes, there is no need to restart Privoxy in order for - the changes to take effect. Privoxy should detect such changes - automatically. +Later defined actions always over-ride earlier ones. So exceptions to any rules +you make, should come in the latter part of the file (or in a file that is +processed later when using multiple actions files). For multi-valued actions, +the actions are applied in the order they are specified. Actions files are +processed in the order they are defined in config (the default installation has +three actions files). It also quite possible for any given URL pattern to match +more than one action! + +The list of valid Privoxy "actions" are: + +------------------------------------------------------------------------------- + +9.5.1. +add-header + +Type: + + Multi-value. + +Typical uses: + + Send a user defined HTTP header to the web server. - While under development, the configuration content is subject to - change. The below documentation may not be accurate by the time you - read this. Also, what constitutes a "default" setting, may change, so - please check all your configuration files on important issues. - _________________________________________________________________ - -3.3. The Main Configuration File +Possible values: + + Any value is possible. Validity of the defined HTTP headers is not checked. + +Example usage: + + {+add-header{X-User-Tracking: sucks}} + .example.com + + +Notes: + + This action may be specified multiple times, in order to define multiple + headers. This is rarely needed for the typical user. If you don't know what + "HTTP headers" are, you definitely don't need to worry about this one. + +------------------------------------------------------------------------------- - Again, the main configuration file is named config on Linux/Unix/BSD - and OS/2, and config.txt on Windows. Configuration lines consist of an - initial keyword followed by a list of values, all separated by - whitespace (any number of spaces or tabs). For example: - - blockfile blocklist.ini +9.5.2. +block + +Type: + + Boolean. - Indicates that the blockfile is named "blocklist.ini". (A default - installation does not use this.) +Typical uses: - A "#" indicates a comment. Any part of a line following a "#" is - ignored, except if the "#" is preceded by a "\". - - Thus, by placing a "#" at the start of an existing configuration line, - you can make it a comment and it will be treated as if it weren't - there. This is called "commenting out" an option and can be useful to - turn off features: If you comment out the "logfile" line, Privoxy will - not log to a file at all. Watch for the "default:" section in each - explanation to see what happens if the option is left unset (or - commented out). - - Long lines can be continued on the next line by using a "\" as the - very last character. + Used to block a URL from reaching your browser. The URL may be anything, + but is typically used to block ads or other obnoxious content. - There are various aspects of Privoxy behavior that can be tuned. - _________________________________________________________________ - -3.3.1. Defining Other Configuration Files - - Privoxy can use a number of other files to tell it what ads to block, - what cookies to accept, etc. This section of the configuration file - tells Privoxy where to find all those other files. +Possible values: - On Windows and AmigaOS, Privoxy looks for these files in the same - directory as the executable. On Unix and OS/2, Privoxy looks for these - files in the current working directory. In either case, an absolute - path name can be used to avoid problems. + N/A - When development goes modular and multi-user, the blocker, filter, and - per-user config will be stored in subdirectories of "confdir". For - now, only confdir/templates is used for storing HTML templates for CGI - results. +Example usage: + + {+block} + .banners.example.com + .ads.r.us + + +Notes: + + If a URL matches one of the blocked patterns, Privoxy will intercept the + URL and display its special "BLOCKED" page instead. If there is sufficient + space, a large red banner will appear with a friendly message about why the + page was blocked, and a way to go there anyway. If there is insufficient + space a smaller "BLOCKED" page will appear without the red banner. Click + here to view the default blocked HTML page (Privoxy must be running for + this to work as intended!). + + A very important exception is if the URL matches both "+block" and + "+handle-as-image", then it will be handled by "+set-image-blocker" (see + below). It is important to understand this process, in order to understand + how Privoxy is able to deal with ads and other objectionable content. + + The "+filter" action can also perform some of the same functionality as + "+block", but by virtue of very different programming techniques, and is + most often used for different reasons. + +------------------------------------------------------------------------------- + +9.5.3. +deanimate-gifs + +Type: - The location of the configuration files: - - confdir /etc/privoxy # No trailing /, please. + Parameterized. - The directory where all logging (i.e. logfile and jarfile) takes - place. No trailing "/", please: +Typical uses: - logdir /var/log/privoxy + To stop those annoying, distracting animated GIF images. - Note that all file specifications below are relative to the above two - directories! +Possible values: - The "default.action" file contains patterns to specify the actions to - apply to requests for each site. Default: Cookies to and from all - destinations are kept only during the current browser session (i.e. - they are not saved to disk). Pop-ups are disabled for all sites. All - sites are filtered through selected sections of "default.filter". No - sites are blocked. The Privoxy logo is displayed for filtered ads and - other images. The syntax of this file is explained in detail - [44]below. Other "actions" files are included, and you are free to use - any of them. They have varying degrees of aggressiveness. - - actionsfile default.action - - The "default.filter" file contains content modification rules that use - "regular expressions". These rules permit powerful changes on the - content of Web pages, e.g., you could disable your favorite JavaScript - annoyances, re-write the actual displayed text, or just have some fun - replacing "Microsoft" with "MicroSuck" wherever it appears on a Web - page. Default: whatever the developers are playing with :-/ - - Filtering requires buffering the page content, which may appear to - slow down page rendering since nothing is displayed until all content - has passed the filters. (It does not really take longer, but seems - that way since the page is not incrementally displayed.) This effect - will be more noticeable on slower connections. + "last" or "first" - filterfile default.filter - - The logfile is where all logging and error messages are written. The - logfile can be useful for tracking down a problem with Privoxy (e.g., - it's not blocking an ad you think it should block) but in most cases - you probably will never look at it. - - Your logfile will grow indefinitely, and you will probably want to - periodically remove it. On Unix systems, you can do this with a cron - job (see "man cron"). For Redhat, a logrotate script has been - included. - - On SuSE Linux systems, you can place a line like "/var/log/privoxy.* - +1024k 644 nobody.nogroup" in /etc/logfiles, with the effect that - cron.daily will automatically archive, gzip, and empty the log, when - it exceeds 1M size. - - Default: Log to the a file named logfile. Comment out to disable - logging. - - logfile logfile - - The "jarfile" defines where Privoxy stores the cookies it intercepts. - Note that if you use a "jarfile", it may grow quite large. Default: - Don't store intercepted cookies. - - #jarfile jarfile - - If you specify a "trustfile", Privoxy will only allow access to sites - that are named in the trustfile. You can also mark sites as trusted - referrers, with the effect that access to untrusted sites will be - granted, if a link from a trusted referrer was used. The link target - will then be added to the "trustfile". This is a very restrictive - feature that typical users most probably want to leave disabled. - Default: Disabled, don't use the trust mechanism. - - #trustfile trust +Example usage: - If you use the trust mechanism, it is a good idea to write up some - on-line documentation about your blocking policy and to specify the - URL(s) here. They will appear on the page that your users receive when - they try to access untrusted content. Use multiple times for multiple - URLs. Default: Don't display links on the "untrusted" info page. + {+deanimate-gifs{last}} + .example.com + - trust-info-url http://www.your-site.com/why_we_block.html - trust-info-url http://www.your-site.com/what_we_allow.html - _________________________________________________________________ +Notes: -3.3.2. Other Configuration Options - - This part of the configuration file contains options that control how - Privoxy operates. - - "Admin-address" should be set to the email address of the proxy - administrator. It is used in many of the proxy-generated pages. - Default: fill@me.in.please. - - #admin-address fill@me.in.please - - "Proxy-info-url" can be set to a URL that contains more info about - this Privoxy installation, it's configuration and policies. It is used - in many of the proxy-generated pages and its use is highly recommended - in multi-user installations, since your users will want to know why - certain content is blocked or modified. Default: Don't show a link to - on-line documentation. - - proxy-info-url http://www.your-site.com/proxy.html - - "Listen-address" specifies the address and port where Privoxy will - listen for connections from your Web browser. The default is to listen - on the localhost port 8118, and this is suitable for most users. (In - your web browser, under proxy configuration, list the proxy server as - "localhost" and the port as "8118"). - - If you already have another service running on port 8118, or if you - want to serve requests from other machines (e.g. on your local - network) as well, you will need to override the default. The syntax is - "listen-address []:". If you leave out the IP - address, Privoxy will bind to all interfaces (addresses) on your - machine and may become reachable from the Internet. In that case, - consider using access control lists (acl's) (see "aclfile" above), or - a firewall. - - For example, suppose you are running Privoxy on a machine which has - the address 192.168.0.1 on your local private network (192.168.0.0) - and has another outside connection with a different address. You want - it to serve requests from inside only: - - listen-address 192.168.0.1:8118 - - If you want it to listen on all addresses (including the outside - connection): - - listen-address :8118 - - If you do this, consider using ACLs (see "aclfile" above). Note: you - will need to point your browser(s) to the address and port that you - have configured here. Default: localhost:8118 (127.0.0.1:8118). - - The debug option sets the level of debugging information to log in the - logfile (and to the console in the Windows version). A debug level of - 1 is informative because it will show you each request as it happens. - Higher levels of debug are probably only of interest to developers. - - debug 1 # GPC = show each GET/POST/CONNECT request - debug 2 # CONN = show each connection status - debug 4 # IO = show I/O status - debug 8 # HDR = show header parsing - debug 16 # LOG = log all data into the logfile - debug 32 # FRC = debug force feature - debug 64 # REF = debug regular expression filter - debug 128 # = debug fast redirects - debug 256 # = debug GIF de-animation - debug 512 # CLF = Common Log Format - debug 1024 # = debug kill pop-ups - debug 4096 # INFO = Startup banner and warnings. - debug 8192 # ERROR = Non-fatal errors + De-animate all animated GIF images, i.e. reduce them to their last frame. + This will also shrink the images considerably (in bytes, not pixels!). If + the option "first" is given, the first frame of the animation is used as + the replacement. If "last" is given, the last frame of the animation is + used instead, which probably makes more sense for most banner animations, + but also has the risk of not showing the entire last frame (if it is only a + delta to an earlier frame). - It is highly recommended that you enable ERROR reporting (debug 8192), - at least until v3.0 is released. +------------------------------------------------------------------------------- + +9.5.4. +downgrade-http-version + +Type: - The reporting of FATAL errors (i.e. ones which crash Privoxy) is - always on and cannot be disabled. + Boolean. - If you want to use CLF (Common Log Format), you should set "debug 512" - ONLY, do not enable anything else. +Typical uses: - Multiple "debug" directives, are OK - they're logical-OR'd together. + "+downgrade-http-version" will downgrade HTTP/1.1 client requests to HTTP/ + 1.0 and downgrade the responses as well. - debug 15 # same as setting the first 4 listed above +Possible values: - Default: + N/A - debug 1 # URLs - debug 4096 # Info - debug 8192 # Errors - *we highly recommended enabling this* +Example usage: - Privoxy normally uses "multi-threading", a software technique that - permits it to handle many different requests simultaneously. In some - cases you may wish to disable this -- particularly if you're trying to - debug a problem. The "single-threaded" option forces Privoxy to handle - requests sequentially. Default: Multi-threaded mode. + {+downgrade-http-version} + .example.com + - #single-threaded +Notes: - "toggle" allows you to temporarily disable all Privoxy's filtering. - Just set "toggle 0". + Use this action for servers that use HTTP/1.1 protocol features that + Privoxy doesn't handle well yet. HTTP/1.1 is only partially implemented. + Default is not to downgrade requests. This is an infrequently needed + action, and is used to help with rare problem sites only. - The Windows version of Privoxy puts an icon in the system tray, which - also allows you to change this option. If you right-click on that icon - (or select the "Options" menu), one choice is "Enable". Clicking on - enable toggles Privoxy on and off. This is useful if you want to - temporarily disable Privoxy, e.g., to access a site that requires - cookies which you would otherwise have blocked. This can also be - toggled via a web browser at the Privoxy internal address of - [45]http://i.j.b on any platform. +------------------------------------------------------------------------------- + +9.5.5. +fast-redirects + +Type: - "toggle 1" means Privoxy runs normally, "toggle 0" means that Privoxy - becomes a non-anonymizing non-blocking proxy. Default: 1 (on). + Boolean. - toggle 1 +Typical uses: - For content filtering, i.e. the "+filter" and "+deanimate-gif" - actions, it is necessary that Privoxy buffers the entire document - body. This can be potentially dangerous, since a server could just - keep sending data indefinitely and wait for your RAM to exhaust. With - nasty consequences. + The "+fast-redirects" action enables interception of "redirect" requests + from one server to another, which are used to track users.Privoxy can cut + off all but the last valid URL in a redirect request and send a local + redirect back to your browser without contacting the intermediate site(s). - The buffer-limit option lets you set the maximum size in Kbytes that - each buffer may use. When the documents buffer exceeds this size, it - is flushed to the client unfiltered and no further attempt to filter - the rest of it is made. Remember that there may multiple threads - running, which might require increasing the "buffer-limit" Kbytes - each, unless you have enabled "single-threaded" above. +Possible values: - buffer-limit 4069 + N/A - To enable the web-based default.action file editor set - enable-edit-actions to 1, or 0 to disable. Note that you must have - compiled Privoxy with support for this feature, otherwise this option - has no effect. This internal page can be reached at [46]http://i.j.b. +Example usage: - Security note: If this is enabled, anyone who can use the proxy can - edit the actions file, and their changes will affect all users. For - shared proxies, you probably want to disable this. Default: enabled. + {+fast-redirects} + .example.com + - enable-edit-actions 1 +Notes: - Allow Privoxy to be toggled on and off remotely, using your web - browser. Set "enable-remote-toggle"to 1 to enable, and 0 to disable. - Note that you must have compiled Privoxy with support for this - feature, otherwise this option has no effect. + Many sites, like yahoo.com, don't just link to other sites. Instead, they + will link to some script on their own server, giving the destination as a + parameter, which will then redirect you to the final target. URLs resulting + from this scheme typically look like: http://some.place/some_script?http:// + some.where-else. - Security note: If this is enabled, anyone who can use the proxy can - toggle it on or off (see [47]http://i.j.b), and their changes will - affect all users. For shared proxies, you probably want to disable - this. Default: enabled. + Sometimes, there are even multiple consecutive redirects encoded in the + URL. These redirections via scripts make your web browsing more traceable, + since the server from which you follow such a link can see where you go to. + Apart from that, valuable bandwidth and time is wasted, while your browser + ask the server for one redirect after the other. Plus, it feeds the + advertisers. - enable-remote-toggle 1 - _________________________________________________________________ + This is a normally "on" feature, and often requires exceptions for sites + that are sensitive to defeating this mechanism. -3.3.3. Access Control List (ACL) +------------------------------------------------------------------------------- - Access controls are included at the request of some ISPs and systems - administrators, and are not usually needed by individual users. Please - note the warnings in the FAQ that this proxy is not intended to be a - substitute for a firewall or to encourage anyone to defer addressing - basic security weaknesses. +9.5.6. +filter + +Type: + + Parameterized. + +Typical uses: + + Apply page filtering as defined by named sections of the default.filter + file to the specified site(s). "Filtering" can be any modification of the + raw page content, including re-writing or deletion of content. - If no access settings are specified, the proxy talks to anyone that - connects. If any access settings file are specified, then the proxy - talks only to IP addresses permitted somewhere in this file and not - denied later in this file. +Possible values: - Summary -- if using an ACL: + "+filter" must include the name of one of the section identifiers from + default.filter (or whatever filterfile is specified in config). - Client must have permission to receive service. +Example usage (from the current default.filter): - LAST match in ACL wins. + +filter{html-annoyances}: Get rid of particularly annoying HTML abuse. - Default behavior is to deny service. + +filter{js-annoyances}: Get rid of particularly annoying JavaScript abuse - The syntax for an entry in the Access Control List is: + +filter{content-cookies}: Kill cookies that come in the HTML or JS content - ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ] + +filter{popups}: Kill all popups in JS and HTML - Where the individual fields are: + +filter{frameset-borders}: Give frames a border and make them resizable - ACTION = "permit-access" or "deny-access" - SRC_ADDR = client hostname or dotted IP address - SRC_MASKLEN = number of bits in the subnet mask for the source - DST_ADDR = server or forwarder hostname or dotted IP address - DST_MASKLEN = number of bits in the subnet mask for the target + +filter{webbugs}: Squish WebBugs (1x1 invisible GIFs used for user + tracking) - The field separator (FS) is whitespace (space or tab). + +filter{refresh-tags}: Kill automatic refresh tags (for dial-on-demand + setups) - IMPORTANT NOTE: If Privoxy is using a forwarder (see below) or a - gateway for a particular destination URL, the DST_ADDR that is - examined is the address of the forwarder or the gateway and NOT the - address of the ultimate target. This is necessary because it may be - impossible for the local Privoxy to determine the address of the - ultimate target (that's often what gateways are used for). + +filter{fun}: Text replacements for subversive browsing fun! - Here are a few examples to show how the ACL features work: + +filter{nimda}: Remove Nimda (virus) code. - "localhost" is OK -- no DST_ADDR implies that ALL destination - addresses are OK: + +filter{banners-by-size}: Kill banners by size (very efficient!) - permit-access localhost + +filter{shockwave-flash}: Kill embedded Shockwave Flash objects - A silly example to illustrate permitting any host on the class-C - subnet with Privoxy to go anywhere: + +filter{crude-parental}: Kill all web pages that contain the words "sex" or + "warez" - permit-access www.privoxy.com/24 +Notes: - Except deny one particular IP address from using it at all: + This is potentially a very powerful feature! And requires a knowledge of + regular expressions if you want to "roll your own". Filtering operates on a + line by line basis throughout the entire page. - deny-access ident.privoxy.com + Filtering requires buffering the page content, which may appear to slow + down page rendering since nothing is displayed until all content has passed + the filters. (It does not really take longer, but seems that way since the + page is not incrementally displayed.) This effect will be more noticeable + on slower connections. - You can also specify an explicit network address and subnet mask. - Explicit addresses do not have to be resolved to be used. + Filtering can achieve some of the effects as the "+block" action, i.e. it + can be used to block ads and banners. In the overall scheme of things, + filtering is one of the first things "Privoxy" does with a web page. So + other most other actions are applied to the already "filtered" page. - permit-access 207.153.200.0/24 +------------------------------------------------------------------------------- + +9.5.7. +hide-forwarded-for-headers + +Type: - A subnet mask of 0 matches anything, so the next line permits - everyone. + Boolean. - permit-access 0.0.0.0/0 +Typical uses: - Note, you cannot say: + Block any existing X-Forwarded-for HTTP header, and do not add a new one. - permit-access .org +Possible values: - to allow all *.org domains. Every IP address listed must resolve - fully. + N/A - An ISP may want to provide a Privoxy that is accessible by "the world" - and yet restrict use of some of their private content to hosts on its - internal network (i.e. its own subscribers). Say, for instance the ISP - owns the Class-B IP address block 123.124.0.0 (a 16 bit netmask). This - is how they could do it: +Example usage: - permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere - # with the following exceptions - : + {+hide-forwarded-for-headers} + .example.com + - deny-access 0.0.0.0/0 123.124.0.0/16 # block all external request - s for - # sites on the ISP's network - permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main - # web site - permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go - # anywhere +Notes: - Note that if some hostnames are listed with multiple IP addresses, the - primary value returned by DNS (via gethostbyname()) is used. Default: - Anyone can access the proxy. - _________________________________________________________________ + It is fairly safe to leave this on. It does not seem to break many sites. -3.3.4. Forwarding +------------------------------------------------------------------------------- - This feature allows chaining of HTTP requests via multiple proxies. It - can be used to better protect privacy and confidentiality when - accessing specific domains by routing requests to those domains to a - special purpose filtering proxy such as lpwa.com. Or to use a caching - proxy to speed up browsing. +9.5.8. +hide-from-header + +Type: + + Parameterized. - It can also be used in an environment with multiple networks to route - requests via multiple gateways allowing transparent access to multiple - networks without having to modify browser configurations. +Typical uses: - Also specified here are SOCKS proxies. Privoxy SOCKS 4 and SOCKS 4A. - The difference is that SOCKS 4A will resolve the target hostname using - DNS on the SOCKS server, not our local DNS client. + To block the browser from sending your email address in a "From:" header. - The syntax of each line is: +Possible values: - forward target_domain[:port] http_proxy_host[:port] - forward-socks4 target_domain[:port] socks_proxy_host[:port] - http_proxy_host[:port] - forward-socks4a target_domain[:port] socks_proxy_host[:port] - http_proxy_host[:port] + Keyword: "block", or any user defined value. - If http_proxy_host is ".", then requests are not forwarded to a HTTP - proxy but are made directly to the web servers. +Example usage: - Lines are checked in sequence, and the last match wins. + {+hide-from-header{block}} + .example.com + + +Notes: + + The keyword "block" will completely remove the header (not to be confused + with the "+block" action). Alternately, you can specify any value you + prefer to send to the web server. + +------------------------------------------------------------------------------- + +9.5.9. +hide-referer + +Type: - There is an implicit line equivalent to the following, which specifies - that anything not finding a match on the list is to go out without - forwarding or gateway protocol, like so: + Parameterized. - forward .* . # implicit +Typical uses: - In the following common configuration, everything goes to Lucent's - LPWA, except SSL on port 443 (which it doesn't handle): + Don't send the "Referer:" (sic) HTTP header to the web site. Or, + alternately send a forged header instead. - forward .* lpwa.com:8000 - forward :443 . +Possible values: - Some users have reported difficulties related to LPWA's use of "." as - the last element of the domain, and have said that this can be fixed - with this: + Prevent the header from being sent with the keyword, "block". Or, "forge" a + URL to one from the same server as the request. Or, set to user defined + value of your choice. - forward lpwa. lpwa.com:8000 +Example usage: - (NOTE: the syntax for specifying target_domain has changed since the - previous paragraph was written -- it will not work now. More - information is welcome.) + {+hide-referer{forge}} + .example.com + - In this fictitious example, everything goes via an ISP's caching - proxy, except requests to that ISP: +Notes: - forward .* caching.myisp.net:8000 - forward myisp.net . + "forge" is the preferred option here, since some servers will not send + images back otherwise. - For the @home network, we're told the forwarding configuration is - this: + "+hide-referrer" is an alternate spelling of "+hide-referer". It has the + exact same parameters, and can be freely mixed with, "+hide-referer". + ("referrer" is the correct English spelling, however the HTTP specification + has a bug - it requires it to be spelled as "referer".) - forward .* proxy:8080 +------------------------------------------------------------------------------- + +9.5.10. +hide-user-agent + +Type: - Also, we're told they insist on getting cookies and JavaScript, so you - should allow cookies from home.com. We consider JavaScript a potential - security risk. Java need not be enabled. + Parameterized. - In this example direct connections are made to all "internal" domains, - but everything else goes through Lucent's LPWA by way of the company's - SOCKS gateway to the Internet. +Typical uses: - forward-socks4 .* lpwa.com:8000 firewall.my_company.com:1080 - forward my_company.com . + To change the "User-Agent:" header so web servers can't tell your browser + type. Who's business is it anyway? - This is how you could set up a site that always uses SOCKS but no - forwarders: +Possible values: - forward-socks4a .* . firewall.my_company.com:1080 + Any user defined string. - An advanced example for network administrators: +Example usage: - If you have links to multiple ISPs that provide various special - content to their subscribers, you can configure forwarding to pass - requests to the specific host that's connected to that ISP so that - everybody can see all of the content on all of the ISPs. + {+hide-user-agent{Netscape 6.1 (X11; I; Linux 2.4.18 i686)}} + .msn.com + - This is a bit tricky, but here's an example: +Notes: - host-a has a PPP connection to isp-a.com. And host-b has a PPP - connection to isp-b.com. host-a can run a Privoxy proxy with - forwarding like this: + Warning! This breaks many web sites that depend on this in order to + determine how the target browser will respond to various requests. Use with + caution. - forward .* . - forward isp-b.com host-b:8118 +------------------------------------------------------------------------------- + +9.5.11. +handle-as-image + +Type: - host-b can run a Privoxy proxy with forwarding like this: + Boolean. - forward .* . - forward isp-a.com host-a:8118 +Typical uses: - Now, anyone on the Internet (including users on host-a and host-b) can - set their browser's proxy to either host-a or host-b and be able to - browse the content on isp-a or isp-b. + To define what Privoxy should treat automatically as an image, and is an + important ingredient of how ads are handled. - Here's another practical example, for University of Kent at Canterbury - students with a network connection in their room, who need to use the - University's Squid web cache. +Possible values: - forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for: - forward .ukc.ac.uk . # Anything on the same domain as us - forward * . # Host with no domain specified - forward 129.12.*.* . # A dotted IP on our /16 network. - forward 127.*.*.* . # Loopback address - forward localhost.localdomain . # Loopback address - forward www.ukc.mirror.ac.uk . # Specific host + N/A - If you intend to chain Privoxy and squid locally, then chain as - browser -> squid -> privoxy is the recommended way. +Example usage: - Your squid configuration could then look like this: + {+handle-as-image} + /.*\.(gif|jpg|jpeg|png|bmp|ico) + - # Define Privoxy as parent cache +Notes: - cache_peer 127.0.0.1 parent 8118 0 no-query + This only has meaning if the URL (or pattern) also is "+block"ed, in which + case a user definable image can be sent rather than a HTML page. This is + integral to the whole concept of ad blocking: the URL must match both a + "+block" rule, and "+handle-as-image". (See "+set-image-blocker" below for + control over what will actually be displayed by the browser.) - # Define ACL for protocol FTP - acl FTP proto FTP - # Do not forward ACL FTP to privoxy - always_direct allow FTP - # Do not forward ACL CONNECT (https) to privoxy - always_direct allow CONNECT - # Forward the rest to privoxy - never_direct allow all - _________________________________________________________________ + There is little reason to change the default definition for this action. -3.3.5. Windows GUI Options +------------------------------------------------------------------------------- + +9.5.12. +set-image-blocker - Privoxy has a number of options specific to the Windows GUI interface: +Type: - If "activity-animation" is set to 1, the Privoxy icon will animate - when "Privoxy" is active. To turn off, set to 0. + Parameterized. - activity-animation 1 +Typical uses: - If "log-messages" is set to 1, Privoxy will log messages to the - console window: + Decide what to do with URLs that end up tagged with both "+block" and + "+handle-as-image", e.g an advertisement. - log-messages 1 +Possible values: - If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the - amount of memory used for the log messages displayed in the console - window, will be limited to "log-max-lines" (see below). + There are four available options: "-set-image-blocker" will send a HTML + "blocked" page, usually resulting in a "broken image" icon. + "+set-image-blocker{blank}" will send a 1x1 transparent GIF image. + "+set-image-blocker{pattern}" will send a checkerboard type pattern (the + default). And finally, "+set-image-blocker{http://xyz.com}" will send a + HTTP temporary redirect to the specified image. This has the advantage of + the icon being being cached by the browser, which will speed up the + display. - Warning: Setting this to 0 will result in the buffer to grow - infinitely and eat up all your memory! +Example usage: - log-buffer-size 1 + {+set-image-blocker{blank}} + .example.com + - log-max-lines is the maximum number of lines held in the log buffer. - See above. +Notes: - log-max-lines 200 + If you want invisible ads, they need to meet criteria as matching both + images and blocked actions. And then, "image-blocker" should be set to + "blank" for invisibility. Note you cannot treat HTML pages as images in + most cases. For instance, frames require an HTML page to display. So a + frame that is an ad, typically cannot be treated as an image. Forcing an + "image" in this situation just will not work reliably. - If "log-highlight-messages" is set to 1, Privoxy will highlight - portions of the log messages with a bold-faced font: +------------------------------------------------------------------------------- + +9.5.13. +limit-connect + +Type: - log-highlight-messages 1 + Parameterized. - The font used in the console window: +Typical uses: - log-font-name Comic Sans MS + By default, Privoxy only allows HTTP CONNECT requests to port 443 (the + standard, secure HTTPS port). Use "+limit-connect" to disable this + altogether, or to allow more ports. - Font size used in the console window: +Possible values: - log-font-size 8 + Any valid port number, or port number range. - "show-on-task-bar" controls whether or not Privoxy will appear as a - button on the Task bar when minimized: +Example usages: - show-on-task-bar 0 + +limit-connect{443} # + This is the default and need not be specified. + +limit-connect{80,443} # Ports 80 and 443 are OK. + +limit-connect{-3, 7, 20-100, 500-} # + Port less than 3, 7, 20 to 100 and above 500 are OK. + - If "close-button-minimizes" is set to 1, the Windows close button will - minimize Privoxy instead of closing the program (close with the exit - option on the File menu). +Notes: - close-button-minimizes 1 + The CONNECT methods exists in HTTP to allow access to secure websites + (https:// URLs) through proxies. It works very simply: the proxy connects + to the server on the specified port, and then short-circuits its + connections to the client and to the remote proxy. This can be a big + security hole, since CONNECT-enabled proxies can be abused as TCP relays + very easily. - The "hide-console" option is specific to the MS-Win console version of - Privoxy. If this option is used, Privoxy will disconnect from and hide - the command console. + If you want to allow CONNECT for more ports than this, or want to forbid + CONNECT altogether, you can specify a comma separated list of ports and + port ranges (the latter using dashes, with the minimum defaulting to 0 and + max to 65K). - #hide-console - _________________________________________________________________ + If you don't know what any of this means, there probably is no reason to + change this one. -3.4. The Actions File +------------------------------------------------------------------------------- - The "default.action" file (formerly actionsfile) is used to define - what actions Privoxy takes, and thus determines how images, cookies - and various other aspects of HTTP content and transactions are - handled. Images can be anything you want, including ads, banners, or - just some obnoxious URL that you would rather not see. Cookies can be - accepted or rejected, or accepted only during the current browser - session (i.e. not written to disk). Changes to default.action should - be immediately visible to Privoxy without the need to restart. +9.5.14. +prevent-compression + +Type: + + Boolean. + +Typical uses: + + Prevent the specified websites from compressing HTTP data. - The easiest way to edit "actions" file is with a browser by loading - [48]http://i.j.b/, and then select "Edit Actions List". A text editor - can also be used. +Possible values: - To determine which actions apply to a request, the URL of the request - is compared to all patterns in this file. Every time it matches, the - list of applicable actions for the URL is incrementally updated. You - can trace this process by visiting [49]http://i.j.b/show-url-info. + N/A - There are four types of lines in this file: comments (begin with a "#" - character), actions, aliases and patterns, all of which are explained - below, as well as the configuration file syntax that Privoxy - understands. - _________________________________________________________________ +Example usage: -3.4.1. URL Domain and Path Syntax + {+prevent-compression} + .example.com + + +Notes: + + Some websites do this, which can be a problem for Privoxy, since "+filter", + "+kill-popups" and "+gif-deanimate" will not work on compressed data. This + will slow down connections to those websites, though. Default typically is + to turn "prevent-compression" on. + +------------------------------------------------------------------------------- + +9.5.15. +session-cookies-only - Generally, a pattern has the form /, where both the - and part are optional. If you only specify a domain - part, the "/" can be left out: +Type: - www.example.com - is a domain only pattern and will match any request - to "www.example.com". + Boolean. - www.example.com/ - means exactly the same. +Typical uses: - www.example.com/index.html - matches only the single document - "/index.html" on "www.example.com". + Allow cookies for the current browser session only. - /index.html - matches the document "/index.html", regardless of the - domain. +Possible values: - index.html - matches nothing, since it would be interpreted as a - domain name and there is no top-level domain called ".html". + N/A - The matching of the domain part offers some flexible options: if the - domain starts or ends with a dot, it becomes unanchored at that end. - For example: +Example usage (disabling): - .example.com - matches any domain that ENDS in ".example.com". + {-session-cookies-only} + .example.com + - www. - matches any domain that STARTS with "www". +Notes: - Additionally, there are wild-cards that you can use in the domain - names themselves. They work pretty similar to shell wild-cards: "*" - stands for zero or more arbitrary characters, "?" stands for any - single character. And you can define character classes in square - brackets and they can be freely mixed: + If websites set cookies, "+session-cookies-only" will make sure they are + erased when you exit and restart your web browser. This makes profiling + cookies useless, but won't break sites which require cookies so that you + can log in for transactions. This is generally turned on for all sites, and + is the recommended setting. - ad*.example.com - matches "adserver.example.com", "ads.example.com", - etc but not "sfads.example.com". + "+prevent-*-cookies" actions should be turned off as well (see below), for + "+session-cookies-only" to work. Or, else no cookies will get through at + all. For, "persistent" cookies that survive across browser sessions, see + below as well. - *ad*.example.com - matches all of the above, and then some. +------------------------------------------------------------------------------- + +9.5.16. +prevent-reading-cookies + +Type: + + Boolean. + +Typical uses: - .?pix.com - matches "www.ipix.com", "pictures.epix.com", - "a.b.c.d.e.upix.com", etc. + Explicitly prevent the web server from reading any cookies on your system. - www[1-9a-ez].example.com - matches "www1.example.com", - "www4.example.com", "wwwd.example.com", "wwwz.example.com", etc., but - not "wwww.example.com". +Possible values: - If Privoxy was compiled with "pcre" support (default), Perl compatible - regular expressions can be used. See the pcre/docs/ directory or "man - perlre" (also available on - [50]http://www.perldoc.com/perl5.6/pod/perlre.html) for details. A - brief discussion of regular expressions is in the [51]Appendix. For - instance: + N/A - /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any - path that includes "advert" followed immediately by one or more - digits, then a "." and ending in either "jpeg" or "jpg". So we match - "example.com/ads/advert2.jpg", and - "www.example.com/ads/banners/advert39.jpeg", but not - "www.example.com/ads/banners/advert39.gif" (no gifs in the example - pattern). +Example usage: - Please note that matching in the path is case INSENSITIVE by default, - but you can switch to case sensitive at any point in the pattern by - using the "(?-i)" switch: + {+prevent-reading-cookies} + .example.com + - www.example.com/(?-i)PaTtErN.* - will match only documents whose path - starts with "PaTtErN" in exactly this capitalization. - _________________________________________________________________ +Notes: -3.4.2. Actions + Often used in conjunction with "+prevent-setting-cookies" to disable + cookies completely. Note that "+session-cookies-only" requires these to + both be disabled (or else it never gets any cookies to cache). + + For "persistent" cookies to work (i.e. they survive across browser sessions + and reboots), all three cookie settings should be "off" for the specified + sites. + +------------------------------------------------------------------------------- - Actions are enabled if preceded with a "+", and disabled if preceded - with a "-". Actions are invoked by enclosing the action name in curly - braces (e.g. {+some_action}), followed by a list of URLs to which the - action applies. There are three classes of actions: +9.5.17. +prevent-setting-cookies + +Type: - * Boolean (e.g. "+/-block"): - {+name} # enable this action - {-name} # disable this action - - * parameterized (e.g. "+/-hide-user-agent"): - {+name{param}} # enable action and set parameter to "param" - {-name} # disable action - - * Multi-value (e.g. "{+/-add-header{Name: value}}", - "{+/-wafer{name=value}}"): - {+name{param}} # enable action and add parameter "param" - {-name{param}} # remove the parameter "param" - {-name} # disable this action totally - - If nothing is specified in this file, no "actions" are taken. So in - this case Privoxy would just be a normal, non-blocking, - non-anonymizing proxy. You must specifically enable the privacy and - blocking features you need (although the provided default - default.action file will give a good starting point). + Boolean. - Later defined actions always over-ride earlier ones. For multi-valued - actions, the actions are applied in the order they are specified. +Typical uses: - The list of valid Privoxy "actions" are: + Explicitly block the web server from storing cookies on your system. - * Add the specified HTTP header, which is not checked for validity. - You may specify this many times to specify many different headers: - +add-header{Name: value} - - * Block this URL totally. In a default installation, a "blocked" URL - will result in bright red banner that says "BLOCKED", with a - reason why it is being blocked. - +block - - * De-animate all animated GIF images, i.e. reduce them to their last - frame. This will also shrink the images considerably (in bytes, - not pixels!). If the option "first" is given, the first frame of - the animation is used as the replacement. If "last" is given, the - last frame of the animation is used instead, which probably makes - more sense for most banner animations, but also has the risk of - not showing the entire last frame (if it is only a delta to an - earlier frame). - +deanimate-gifs{last} - +deanimate-gifs{first} - - * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0 - and downgrade the responses as well. Use this action for servers - that use HTTP/1.1 protocol features that Privoxy doesn't handle - well yet. HTTP/1.1 is only partially implemented. Default is not - to downgrade requests. - +downgrade - - * Many sites, like yahoo.com, don't just link to other sites. - Instead, they will link to some script on their own server, giving - the destination as a parameter, which will then redirect you to - the final target. URLs resulting from this scheme typically look - like: http://some.place/some_script?http://some.where-else. - Sometimes, there are even multiple consecutive redirects encoded - in the URL. These redirections via scripts make your web browsing - more traceable, since the server from which you follow such a link - can see where you go to. Apart from that, valuable bandwidth and - time is wasted, while your browser ask the server for one redirect - after the other. Plus, it feeds the advertisers. - The "+fast-redirects" option enables interception of these - requests by Privoxy, who will cut off all but the last valid URL - in the request and send a local redirect back to your browser - without contacting the remote site. - +fast-redirects - - * Apply the filters in the section_header section of the - default.filter file to the site(s). default.filter sections are - grouped according to like functionality. - +filter{section_header} - - Filter sections that are pre-defined in the supplied - default.filter include: - - html-annoyances: Get rid of particularly annoying HTML abuse. +Possible values: - js-annoyances: Get rid of particularly annoying JavaScript abuse + N/A - no-poups: Kill all popups in JS and HTML +Example usage: - frameset-borders: Give frames a border + {+prevent-setting-cookies} + .example.com + - webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) +Notes: - no-refresh: Automatic refresh sucks on auto-dialup lines + Often used in conjunction with "+prevent-reading-cookies" to disable + cookies completely (see above). - fun: Text replacements for subversive browsing fun! +------------------------------------------------------------------------------- + +9.5.18. +kill-popups + +Type: - nimda: Remove (virus) Nimda code. + Boolean. - banners-by-size: Kill banners by size +Typical uses: - crude-parental: Kill all web pages that contain the words "sex" or - "warez" + Stop those annoying JavaScript pop-up windows! - * Block any existing X-Forwarded-for header, and do not add a new - one: - +hide-forwarded - - * If the browser sends a "From:" header containing your e-mail - address, this either completely removes the header ("block"), or - changes it to the specified e-mail address. - +hide-from{block} - +hide-from{spam@sittingduck.xqq} - - * Don't send the "Referer:" (sic) header to the web site. You can - block it, forge a URL to the same server as the request (which is - preferred because some sites will not send images otherwise) or - set it to a constant string of your choice. - +hide-referer{block} - +hide-referer{forge} - +hide-referer{http://nowhere.com} - - * Alternative spelling of "+hide-referer". It has the same - parameters, and can be freely mixed with, "+hide-referer". - ("referrer" is the correct English spelling, however the HTTP - specification has a bug - it requires it to be spelled "referer".) - +hide-referrer{...} - - * Change the "User-Agent:" header so web servers can't tell your - browser type. Warning! This breaks many web sites. Specify the - user-agent value you want. Example, pretend to be using Netscape - on Linux: - +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)} - - * Treat this URL as an image. This only matters if it's also - "+block"ed, in which case a "blocked" image can be sent rather - than a HTML page. See "+image-blocker{}" below for the control - over what is actually sent. If you want invisible ads, they should - be defined as images and blocked. And also, "image-blocker" should - be set to "blank". - +image - - * Decides what to do with URLs that end up tagged with "{+block - +image}", e.g an advertizement. There are five options. - "-image-blocker" will send a HTML "blocked" page, usually - resulting in a "broken image" icon. "+image-blocker{logo}" will - send a Privoxy logo image. "+image-blocker{blank}" will send a 1x1 - transparent GIF image. And finally, - "+image-blocker{http://xyz.com}" will send a HTTP temporary - redirect to the specified image. This has the advantage of the - icon being being cached by the browser, which will speed up the - display. "+image-blocker{pattern}" will send a checkboard type - pattern, which scales better than the logo (which can get blocky - if the browser enlarges it too much). - +image-blocker{logo} - +image-blocker{blank} - +image-blocker{pattern} - +image-blocker{http://i.j.b/send-banner} - - * By default (i.e. in the absence of a "+limit-connect" action), - Privoxy will only allow CONNECT requests to port 443, which is the - standard port for https as a precaution. - The CONNECT methods exists in HTTP to allow access to secure - websites (https:// URLs) through proxies. It works very simply: - the proxy connects to the server on the specified port, and then - short-circuits its connections to the client and to the remote - proxy. This can be a big security hole, since CONNECT-enabled - proxies can be abused as TCP relays very easily. - If you want to allow CONNECT for more ports than this, or want to - forbid CONNECT altogether, you can specify a comma separated list - of ports and port ranges (the latter using dashes, with the - minimum defaulting to 0 and max to 65K): - +limit-connect{443} # This is the default and need no be - specified. - +limit-connect{80,443} # Ports 80 and 443 are OK. - +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to - 100 - #and above 500 are OK. - - * "+no-compression" prevents the website from compressing the data. - Some websites do this, which can be a problem for Privoxy, since - "+filter", "+no-popup" and "+gif-deanimate" will not work on - compressed data. This will slow down connections to those - websites, though. Default is "nocompression" is turned on. - +nocompression - - * If the website sets cookies, "no-cookies-keep" will make sure they - are erased when you exit and restart your web browser. This makes - profiling cookies useless, but won't break sites which require - cookies so that you can log in for transactions. Default: on. - +no-cookies-keep - - * Prevent the website from reading cookies: - +no-cookies-read - - * Prevent the website from setting cookies: - +no-cookies-set - - * Filter the website through a built-in filter to disable those - obnoxious JavaScript pop-up windows via window.open(), etc. The - two alternative spellings are equivalent. - +no-popup - +no-popups - - * This action only applies if you are using a jarfile for saving - cookies. It sends a cookie to every site stating that you do not - accept any copyright on cookies sent to you, and asking them not - to track you. Of course, this is a (relatively) unique header they - could use to track you. - +vanilla-wafer - - * This allows you to add an arbitrary cookie. It can be specified - multiple times in order to add as many cookies as you like. - +wafer{name=value} - - The meaning of any of the above is reversed by preceding the action - with a "-", in place of the "+". - - Some examples: - - Turn off cookies by default, then allow a few through for specified - sites: - - # Turn off all persistent cookies - { +no-cookies-read } - { +no-cookies-set } - # Allow cookies for this browser session ONLY - { +no-cookies-keep } - # Exceptions to the above, sites that benefit from persistent cookies - { -no-cookies-read } - { -no-cookies-set } - { -no-cookies-keep } - .javasoft.com - .sun.com - .yahoo.com - .msdn.microsoft.com - .redhat.com - # Alternative way of saying the same thing - {-no-cookies-set -no-cookies-read -no-cookies-keep} - .sourceforge.net - .sf.net - - Now turn off "fast redirects", and then we allow two exceptions: - - # Turn them off! - {+fast-redirects} - - # Reverse it for these two sites, which don't work right without it. - {-fast-redirects} - www.ukc.ac.uk/cgi-bin/wac\.cgi\? - login.yahoo.com - - Turn on page filtering according to rules in the defined sections of - refilterfile, and make one exception for sourceforge: - - # Run everything through the filter file, using only the - # specified sections: - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}\ - +filter{webbugs} +filter{nimda} +filter{banners-by-size} - - # Then disable filtering of code from sourceforge! - {-filter} - .cvs.sourceforge.net - - Now some URLs that we want "blocked", ie we won't see them. Many of - these use regular expressions that will expand to match multiple URLs: - - # Blocklist: - {+block} - /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g)) - /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/]) - /.*/(ng)?adclient\.cgi - /.*/(plain|live|rotate)[-_.]?ads?/ - /.*/(sponsor)s?[0-9]?/ - /.*/_?(plain|live)?ads?(-banners)?/ - /.*/abanners/ - /.*/ad(sdna_image|gifs?)/ - /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe) - /.*/adbanners/ - /.*/adserver - /.*/adstream\.cgi - /.*/adv((er)?ts?|ertis(ing|ements?))?/ - /.*/banner_?ads/ - /.*/banners?/ - /.*/banners?\.cgi/ - /.*/cgi-bin/centralad/getimage - /.*/images/addver\.gif - /.*/images/marketing/.*\.(gif|jpe?g) - /.*/popupads/ - /.*/siteads/ - /.*/sponsor.*\.gif - /.*/sponsors?[0-9]?/ - /.*/advert[0-9]+\.jpg - /Media/Images/Adds/ - /ad_images/ - /adimages/ - /.*/ads/ - /bannerfarm/ - /grafikk/annonse/ - /graphics/defaultAd/ - /image\.ng/AdType - /image\.ng/transactionID - /images/.*/.*_anim\.gif # alvin brattli - /ip_img/.*\.(gif|jpe?g) - /rotateads/ - /rotations/ - /worldnet/ad\.cgi - /cgi-bin/nph-adclick.exe/ - /.*/Image/BannerAdvertising/ - /.*/ad-bin/ - /.*/adlib/server\.cgi - /autoads/ - - Note that many of these actions have the potential to cause a page to - misbehave, possibly even not to display at all. There are many ways a - site designer may choose to design his site, and what HTTP header - content he may depend on. There is no way to have hard and fast rules - for all sites. See the [52]Appendix for a brief example on - troubleshooting actions. - _________________________________________________________________ - -3.4.3. Aliases - - Custom "actions", known to Privoxy as "aliases", can be defined by - combining other "actions". These can in turn be invoked just like the - built-in "actions". Currently, an alias can contain any character - except space, tab, "=", "{" or "}". But please use only "a"- "z", - "0"-"9", "+", and "-". Alias names are not case sensitive, and must be - defined before anything else in the default.actionfile ! And there can - only be one set of "aliases" defined. - - Now let's define a few aliases: - - # Useful customer aliases we can use later. These must come first! - {{alias}} - +no-cookies = +no-cookies-set +no-cookies-read - -no-cookies = -no-cookies-set -no-cookies-read - fragile = -block -no-cookies -filter -fast-redirects -hide-refere - r -no-popups - shop = -no-cookies -filter -fast-redirects - +imageblock = +block +image - #For people who don't like to type too much: ;-) - c0 = +no-cookies - c1 = -no-cookies - c2 = -no-cookies-set +no-cookies-read - c3 = +no-cookies-set -no-cookies-read - #... etc. Customize to your heart's content. - - Some examples using our "shop" and "fragile" aliases from above: - - # These sites are very complex and require - # minimal interference. - {fragile} - .office.microsoft.com - .windowsupdate.microsoft.com - .nytimes.com - # Shopping sites - still want to block ads. - {shop} - .quietpc.com - .worldpay.com # for quietpc.com - .jungle.com - .scan.co.uk - # These shops require pop-ups - {shop -no-popups} - .dabs.com - .overclockers.co.uk - _________________________________________________________________ - -3.5. The Filter File - - Any web page can be dynamically modified with the filter file. This - modification can be removal, or re-writing, of any web page content, - including tags and non-visible content. The default filter file is - default.filter, located in the config directory. - - The included example file is divided into sections. Each section - begins with the FILTER keyword, followed by the identifier for that - section, e.g. "FILTER: webbugs". Each section performs a similar type - of filtering, such as "html-annoyances". - - This file uses regular expressions to alter or remove any string in - the target page. The expressions can only operate on one line at a - time. Some examples from the included default default.filter: - - Stop web pages from displaying annoying messages in the status bar by - deleting such references: - - FILTER: html-annoyances - # New browser windows should be resizeable and have a location and st - atus - # bar. Make it so. - # - s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig - s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig - s/scrolling="?(no|0|Auto)"?/scrolling=1/ig - s/menubar="?(no|0)"?/menubar=1/ig - # The tag was a crime! - # - s*|**ig - # Is this evil? - # - #s/framespacing="?(no|0)"?//ig - #s/margin(height|width)=[0-9]*//gi - - Just for kicks, replace any occurrence of "Microsoft" with - "MicroSuck", and have a little fun with topical buzzwords: - - FILTER: fun - s/microsoft(?!.com)/MicroSuck/ig - # Buzzword Bingo: - # - s/industry-leading|cutting-edge|award-winning/BING - O!/ig - - Kill those pesky little web-bugs: - - # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) - FILTER: webbugs - s/]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\ - s*['"]?1(\D[^>]*?)?>//sig - _________________________________________________________________ - -3.6. Templates - - When Privoxy displays one of its internal pages, such as a 404 Not - Found error page, it uses the appropriate template. On Linux, BSD, and - Unix, these are located in /etc/privoxy/templates by default. These - may be customized, if desired. - _________________________________________________________________ - -4. Quickstart to Using Privoxy - - Install package, then run and enjoy! Privoxy is typically started by - specifying the main configuration file to be used on the command line. - Example Unix startup command: - - - # /usr/sbin/privoxy /etc/privoxy/config - - - An init script is provided for SuSE and Redhat. - - For for SuSE: /etc/rc.d/privoxy start - - For RedHat: /etc/rc.d/init.d/privoxy start - - If no configuration file is specified on the command line, Privoxy - will look for a file named config in the current directory. Except on - Win32 where it will try config.txt. If no file is specified on the - command line and no default configuration file can be found, Privoxy - will fail to start. - - Be sure your browser is set to use the proxy which is by default at - localhost, port 8118. With Netscape (and Mozilla), this can be set - under Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy. For - Internet Explorer: Tools > Internet Properties -> Connections -> LAN - Setting. Then, check "Use Proxy" and fill in the appropriate info - (Address: localhost, Port: 8118). Include if HTTPS proxy support too. - - The included default configuration files should give a reasonable - starting point, though may be somewhat aggressive in blocking junk. - You will probably want to keep an eye out for sites that require - persistent cookies, and add these to default.action as needed. By - default, most of these will be accepted only during the current - browser session, until you add them to the configuration. If you want - the browser to handle this instead, you will need to edit - default.action and disable this feature. If you use more than one - browser, it would make more sense to let Privoxy handle this. In which - case, the browser(s) should be set to accept all cookies. - - If a particular site shows problems loading properly, try adding it to - the {fragile} section of default.action. This will turn off most - actions for this site. - - Privoxy is HTTP/1.1 compliant, but not all 1.1 features are as yet - implemented. If browsers that support HTTP/1.1 (like Mozilla or recent - versions of I.E.) experience problems, you might try to force HTTP/1.0 - compatibility. For Mozilla, look under Edit -> Preferences -> Debug -> - Networking. Or set the "+downgrade" config option in default.action. - - After running Privoxy for a while, you can start to fine tune the - configuration to suit your personal, or site, preferences and - requirements. There are many, many aspects that can be customized. - "Actions" (as specified in default.action) can be adjusted by pointing - your browser to [53]http://i.j.b/, and then follow the link to "edit - the actions list". (This is an internal page and does not require - Internet access.) - - In fact, various aspects of Privoxy configuration can be viewed from - this page, including current configuration parameters, source code - version numbers, the browser's request headers, and "actions" that - apply to a given URL. In addition to the default.action file editor - mentioned above, Privoxy can also be turned "on" and "off" from this - page. - - If you encounter problems, please verify it is a Privoxy bug, by - disabling Privoxy, and then trying the same page. Also, try another - browser if possible to eliminate browser or site problems. Before - reporting it as a bug, see if there is not a configuration option that - is enabled that is causing the page not to load. You can then add an - exception for that page or site. If a bug, please report it to the - developers (see below). - _________________________________________________________________ - -4.1. Command Line Options - - Privoxy may be invoked with the following command-line options: - - * --version - Print version info and exit, Unix only. - * --help - Print a short usage info and exit, Unix only. - * --no-daemon - Don't become a daemon, i.e. don't fork and become process group - leader, don't detach from controlling tty. Unix only. - * --pidfile FILE - On startup, write the process ID to FILE. Delete the FILE on exit. - Failiure to create or delete the FILE is non-fatal. If no FILE - option is given, no PID file will be used. Unix only. - * --user USER[.GROUP] - After (optionally) writing the PID file, assume the user ID of - USER, and if included the GID of GROUP. Exit if the privileges are - not sufficient to do so. Unix only. - * configfile - If no configfile is included on the command line, Privoxy will - look for a file named "config" in the current directory (except on - Win32 where it will look for "config.txt" instead). Specify full - path to avoid confusion. - _________________________________________________________________ - -5. Contacting the Developers, Bug Reporting and Feature Requests - - We value your feedback. However, to provide you with the best support, - please note: - - * Use the [54]Sourceforge support forum to get help. - * Submit bugs only thru our [55]Sourceforge bug forum. Make sure - that the bug has not already been submitted. Please try to verify - that it is a Privoxy bug, and not a browser or site bug first. If - you are using your own custom configuration, please try the stock - configs to see if the problem is a configuration related bug. And - if not using the latest development snapshot, please try the - latest one. Or even better, CVS sources. - * Submit feature requests only thru our [56]Sourceforge feature - request forum. - - For any other issues, feel free to use the [57]mailing lists. +Possible values: + + N/A - Anyone interested in actively participating in development and related - discussions can join the appropriate mailing list [58]here. Archives - are available here too. - _________________________________________________________________ +Example usage: -6. Copyright and History + {+kill-popups} + .example.com + + +Notes: + + "+kill-popups" uses a built in filter to disable pop-ups that use the + window.open() function, etc. This is one of the first actions processed by + Privoxy as it contacts the remote web server. This action is not always + 100% reliable, and is supplemented by "+filter{popups}". + +------------------------------------------------------------------------------- -6.1. License +9.5.19. +send-vanilla-wafer - Privoxy is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; either version 2 of the License, or (at your - option) any later version. +Type: - This program is distributed in the hope that it will be useful, but - WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - General Public License for more details, which is available from - [59]the Free Software Foundation, Inc, 59 Temple Place - Suite 330, - Boston, MA 02111-1307, USA. - _________________________________________________________________ + Boolean. -6.2. History - - Junkbuster was originally written by Anonymous Coders and - [60]Junkbuster's Corporation, and was released as free open-source - software under the GNU GPL. [61]Stefan Waldherr made many - improvements, and started the [62]SourceForge project Privoxy to - rekindle development. There are now several active developers - contributing. The last stable release was v2.0.2, which has now grown - whiskers ;-). - _________________________________________________________________ +Typical uses: -7. See also + Sends a cookie for every site stating that you do not accept any copyright + on cookies sent to you, and asking them not to track you. + +Possible values: + + N/A + +Example usage: + + {+send-vanilla-wafer} + .example.com + + +Notes: + + This action only applies if you are using a jarfile for saving cookies. Of + course, this is a (relatively) unique header and could conceivably be used + to track you. + +------------------------------------------------------------------------------- + +9.5.20. +send-wafer - [63]http://sourceforge.net/projects/ijbswa +Type: - [64]http://ijbswa.sourceforge.net/ + Multi-value. - [65]http://i.j.b/ +Typical uses: - [66]http://www.junkbusters.com/ht/en/cookies.html - - [67]http://www.waldherr.org/junkbuster/ + This allows you to send an arbitrary, user definable cookie. - [68]http://privacy.net/analyze/ - - [69]http://www.squid-cache.org/ - _________________________________________________________________ - -8. Appendix - -8.1. Regular Expressions - - Privoxy can use "regular expressions" in various config files. - Assuming support for "pcre" (Perl Compatible Regular Expressions) is - compiled in, which is the default. Such configuration directives do - not require regular expressions, but they can be used to increase - flexibility by matching a pattern with wild-cards against URLs. - - If you are reading this, you probably don't understand what "regular - expressions" are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) - - "Regular expressions" is a way of matching one character expression - against another to see if it matches or not. One of the "expressions" - is a literal string of readable characters (letter, numbers, etc), and - the other is a complex string of literal characters combined with - wild-cards, and other special characters, called meta-characters. The - "meta-characters" have special meanings and are used to build the - complex pattern to be matched against. Perl Compatible Regular - Expressions is an enhanced form of the regular expression language - with backward compatibility. - - To make a simple analogy, we do something similar when we use - wild-card characters when listing files with the dir command in DOS. - *.* matches all filenames. The "special" character here is the - asterisk which matches any and all characters. We can be more specific - and use ? to match just individual characters. So "dir file?.text" - would match "file1.txt", "file2.txt", etc. We are pattern matching, - using a similar technique to "regular expressions"! - - Regular expressions do essentially the same thing, but are much, much - more powerful. There are many more "special characters" and ways of - building complex patterns however. Let's look at a few of the common - ones, and then some examples: - - . - Matches any single character, e.g. "a", "A", "4", ":", or "@". - - ? - The preceding character or expression is matched ZERO or ONE - times. Either/or. - - + - The preceding character or expression is matched ONE or MORE - times. - - * - The preceding character or expression is matched ZERO or MORE - times. - - \ - The "escape" character denotes that the following character should - be taken literally. This is used where one of the special characters - (e.g. ".") needs to be taken literally and not as a special - meta-character. - - [] - Characters enclosed in brackets will be matched if any of the - enclosed characters are encountered. - - () - parentheses are used to group a sub-expression, or multiple - sub-expressions. - - | - The "bar" character works like an "or" conditional statement. A - match is successful if the sub-expression on either side of "|" - matches. - - s/string1/string2/g - This is used to rewrite strings of text. - "string1" is replaced by "string2" in this example. - - These are just some of the ones you are likely to use when matching - URLs with Privoxy, and is a long way from a definitive list. This is - enough to get us started with a few simple examples which may be more - illuminating: - - /.*/banners/.* - A simple example that uses the common combination of - "." and "*" to denote any character, zero or more times. In other - words, any string at all. So we start with a literal forward slash, - then our regular expression pattern (".*") another literal forward - slash, the string "banners", another forward slash, and lastly another - ".*". We are building a directory path here. This will match any file - with the path that has a directory named "banners" in it. The ".*" - matches any characters, and this could conceivably be more forward - slashes, so it might expand into a much longer looking path. For - example, this could match: - "/eye/hate/spammers/banners/annoy_me_please.gif", or just - "/banners/annoying.html", or almost an infinite number of other - possible combinations, just so it has "banners" in the path somewhere. - - A now something a little more complex: - - /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal - forward slashes again ("/"), so we are building another expression - that is a file path statement. We have another ".*", so we are - matching against any conceivable sub-path, just so it matches our - expression. The only true literal that must match our pattern is adv, - together with the forward slashes. What comes after the "adv" string - is the interesting part. - - Remember the "?" means the preceding expression (either a literal - character or anything grouped with "(...)" in this case) can exist or - not, since this means either zero or one match. So - "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual - sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means - "or". We have two of those. For instance, "(ing|ements?)", can expand - to match either "ing" OR "ements?". What is being done here, is an - attempt at matching as many variations of "advertisement", and - similar, as possible. So this would expand to match just "adv", or - "advert", or "adverts", or "advertising", or "advertisement", or - "advertisements". You get the idea. But it would not match - "advertizements" (with a "z"). We could fix that by changing our - regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", - which would then match either spelling. - - /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with - forward slashes. Anything in the square brackets "[]" can be matched. - This is using "0-9" as a shorthand expression to mean any digit one - through nine. It is the same as saying "0123456789". So any digit - matches. The "+" means one or more of the preceding expression must be - included. The preceding expression here is what is in the square - brackets -- in this case, any digit one through nine. Then, at the - end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this - needs to match the expression on either side of that bar character - also. A simple "gif" on one side, and the other side will in turn - match either "jpeg" or "jpg", since the "?" means the letter "e" is - optional and can be matched once or not at all. So we are building an - expression here to match image GIF or JPEG type image file. It must - include the literal string "advert", then one or more digits, and a - "." (which is now a literal, and not a special character, since it is - escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some - possible matches would include: "//advert1.jpg", - "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It - would not match "advert1.gif" (no leading slash), or "/adverts232.jpg" - (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is - not in the expression anywhere). - - s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck" - will replace any occurrence of "microsoft". The "i" at the end of the - expression means ignore case. The "(?!.com)" means the match should - fail if "microsoft" is followed by ".com". In other words, this acts - like a "NOT" modifier. In case this is a hyperlink, we don't want to - break it ;-). - - We are barely scratching the surface of regular expressions here so - that you can understand the default Privoxy configuration files, and - maybe use this knowledge to customize your own installation. There is - much, much more that can be done with regular expressions. Now that - you know enough to get started, you can learn more on your own :/ - - More reading on Perl Compatible Regular expressions: - [70]http://www.perldoc.com/perl5.6/pod/perlre.html - _________________________________________________________________ - -8.2. Privoxy's Internal Pages - - Since Privoxy proxies each requested web page, it is easy for Privoxy - to trap certain URLs. In this way, we can talk directly to Privoxy, - and see how it is configured, see how our rules are being applied, - change these rules and other configuration options, and even turn - Privoxy's filtering off, all with a web browser. - - The URLs listed below are the special ones that allow direct access to - Privoxy. Of course, Privoxy must be running to access these. If not, - you will get a friendly error message. Internet access is not - necessary either. - - * Privoxy main page: - - [71]http://ijbswa.sourceforge.net/config/ - Alternately, this may be reached at [72]http://i.j.b/, but this - variation may not work as reliably as the above in some - configurations. - * Show information about the current configuration: - - [73]http://ijbswa.sourceforge.net/config/show-status - * Show the source code version numbers: - - [74]http://ijbswa.sourceforge.net/config/show-version - * Show the client's request headers: - - [75]http://ijbswa.sourceforge.net/config/show-request - * Show which actions apply to a URL and why: - - [76]http://ijbswa.sourceforge.net/config/show-url-info - * Toggle Privoxy on or off: - - [77]http://ijbswa.sourceforge.net/config/toggle - Short cuts. Turn off, then on: - - [78]http://ijbswa.sourceforge.net/config/toggle?set=disable - - [79]http://ijbswa.sourceforge.net/config/toggle?set=enable - * Edit the actions list file: - - [80]http://ijbswa.sourceforge.net/config/edit-actions - - These may be bookmarked for quick reference. - _________________________________________________________________ - -8.3. Anatomy of an Action - - The way Privoxy applies "actions" to any given URL can be complex, and - not always so easy to understand what is happening. And sometimes we - need to be able to see just what Privoxy is doing. Especially, if - something Privoxy is doing is causing us a problem inadvertantly. It - can be a little daunting to look at the actions files themselves, - since they tend to be filled with "regular expressions" whose - consequences are not always so obvious. Privoxy provides the - [81]http://ijbswa.sourceforge.net/config/show-url-info page that can - show us very specifically how actions are being applied to any given - URL. This is a big help for troubleshooting. - - First, enter one URL (or partial URL) at the prompt, and then Privoxy - will tell us how current configuration will handle it. This will not - help with filtering effects from the default.filter! It also will not - tell you about any other URLs that may be embedded within the URL you - are testing. For instance, images such as ads are expressed as URLs - within the raw page source of HTML pages. So you will only get info - for the actual URL that is pasted into the prompt area -- not any - sub-URLs. If you want to know about embedded URLs like ads, you will - have to dig those out of the HTML source. Use your browser's "View - Page Source" option for this. - - Let's look at an example, [82]google.com, one section at a time: - - System default actions: - - { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter - -hide-forwarded -hide-from -hide-referer -hide-user-agent -image - -image-blocker -limit-connect -no-compression -no-cookies-keep - -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer } - - - This is the top section, and only tells us of the compiled in - defaults. This is basically what Privoxy would do if there were not - any "actions" defined, i.e. it does nothing. Every action is disabled. - This is not particularly informative for our purposes here. OK, next - section: - - Matches for http://google.com: - - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } - / - - { -no-cookies-keep -no-cookies-read -no-cookies-set } - .google.com +Possible values: + + User specified cookie name and corresponding value. + +Example usage: + + {+send-wafer{name=value}} + .example.com + + +Notes: + + This can be specified multiple times in order to add as many cookies as you + like. + +------------------------------------------------------------------------------- +9.5.21. Summary + +Note that many of these actions have the potential to cause a page to +misbehave, possibly even not to display at all. There are many ways a site +designer may choose to design his site, and what HTTP header content, and other +criteria, he may depend on. There is no way to have hard and fast rules for all +sites. See the Appendix for a brief example on troubleshooting actions. + +------------------------------------------------------------------------------- + +9.5.22. Sample Actions Files + +Remember that the meaning of any of the above references is reversed by +preceding the action with a "-", in place of the "+". Also, that some actions +are turned on in the default section of the actions file, and require little to +no additional configuration. These are just "on". + +But, other actions that are turned on in the default section do typically +require exceptions to be listed in the latter sections of one of our actions +file. For instance, by default no URLs are "blocked" (i.e. in the default +definitions of default.action). We need exceptions to this in order to enable +ad blocking in the lower sections. But we need to be very selective about what +we do block. Thus, the default is "off" for blocking. + +Below is a liberally commented sample default.action file to demonstrate how +all the pieces come together. And to show how exceptions to the default +policies can be handled. This is followed by a brief user.action with similar +examples. + +# Sample default.action file + +# Settings -- Don't change! For internal Privoxy use ONLY. +{{settings}} +for-privoxy-version=3.0 + + +########################################################################## +# Aliases must be defined *before* they are used. These are +# easier to remember, and can combine several actions into one. Once +# defined they can be used just like any built-in action -- but within +# this file only! Aliases do not require a + or - sign. +########################################################################## + +# Some useful aliases. +# Alias to turn off cookie handling, ie allow all cookies unmolested. + -prevent-cookies = -prevent-setting-cookies -prevent-reading-cookies \ + -session-cookies-only + +# Alias to both block and treat as if an image for ad blocking +# purposes. + +imageblock = +block +handle-as-image + +# Fragile sites should have the minimum changes: + fragile = -block -deanimate-gifs -fast-redirects -filter -hide-referer \ + -prevent-cookies -kill-popups + +# Shops should be allowed to set persistent cookies + shop = -filter -prevent-cookies -session-cookies-only + + +########################################################################## +# Begin default action settings. Anything in this section will match +# all URLs -- UNLESS we have exceptions that also match, defined below this +# section. We will show all potential actions here whether they are on +# or off. We could omit any disabled action if we wanted, since all +# actions are 'off' by default anyway. Shown for completeness only. +# Actions are enabled if preceded by a '+', otherwise they are disabled +# (unless an alias has been defined without this). +########################################################################## + { \ + -add-header \ + -block \ + -deanimate-gifs \ + -downgrade-http-version \ + +fast-redirects \ + +filter{html-annoyances} \ + +filter{js-annoyances} \ + -filter{content-cookies} \ + -filter{popups} \ + +filter{webbugs} \ + -filter{refresh-tags} \ + -filter{fun} \ + +filter{nimda} \ + +filter{banners-by-size} \ + -filter{shockwave-flash} \ + -filter{crude-prental} \ + +hide-forwarded-for-headers \ + +hide-from-header{block} \ + -hide-referrer \ + -hide-user-agent \ + -handle-as-image \ + +set-image-blocker{pattern} \ + -limit-connect \ + +prevent-compression \ + -session-cookies-only \ + -prevent-reading-cookies \ + -prevent-setting-cookies \ + -kill-popups \ + -send-vanilla-wafer \ + -send-wafer \ + } + / # forward slash will match *all* potential URL patterns. + +########################################################################## +# Default behavior is now set. Now we will define some exceptions to our +# default action policies. +########################################################################## + +# These sites are very complex and require very minimal interference. +# We'll disable most actions with our 'fragile' alias: + { fragile } + .office.microsoft.com # surprise, surprise! + .windowsupdate.microsoft.com + + +# Shopping sites - not as fragile but require some special +# handling. We still want to block ads, and we will allow +# persistant cookies via the 'shop' alias: + { shop } + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + + +# These sites require pop-ups too :( We'll combine our 'shop' +# alias with two other actions into one rule to allow all popups. + { shop -kill-popups -filter{popups} } + .dabs.com + .overclockers.co.uk + + +# The 'Fast-redirects' action breaks some sites. Disable this action +# for these known sensitive sites: { -fast-redirects } - .google.com + login.yahoo.com + edit.europe.yahoo.com + .google.com + .altavista.com/.*(like|url|link):http + .altavista.com/trans.*urltext=http + .nytimes.com + + +# Define which file types will be treated as images. Important +# for ad blocking. + { +handle-as-image } + /.*\.(gif|jpe?g|png|bmp|ico) + + +# Now lets list some domains that are known ad generators. And +# our alias that we use here will block these as well as force +# them to be treated as images. This combination of actions is +# important for ad blocking. What the browser will show instead is +# determined by the setting of "+set-image-blocker" + { +imageblock } + ar.atwola.com + .ad.doubleclick.net + .a.yimg.com/(?:(?!/i/).)*$ + .a[0-9].yimg.com/(?:(?!/i/).)*$ + bs*.gsanet.com + bs*.einets.com + .qkimg.net + ad.*.doubleclick.net + + +# These will just simply be blocked. They will generate the BLOCKED +# banner page, if matched. Heavy use of wildcards and regular +# expressions in this example. Enable block action: + { +block } + ad*. + .*ads. + banner?. + count*. + /.*count(er)?\.(pl|cgi|exe|dll|asp|php[34]?) + /(?:.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?)/ + .hitbox.com + + +# The above block section will probably inadvertantly catch some +# sites we DO NOT want blocked via the wildcards and regular expressions. +# Now let's set exceptions to the exceptions so the good guys get better +# treatment. Disable block action: + { -block } + advogato.org + adsl. + ad[ud]*. + advice. +# Let's just trust all .edu top level domains. + .edu + www.ugu.com/sui/ugu/adv +# We'll need to access to path names containing 'download' + .*downloads. + /downloads/ +# 'adv' is for globalintersec and means advanced, not advertisement + www.globalintersec.com/adv + + +# Don't filter *anything* from our friends at sourceforge. +# Notice we don't have to name the individual filter +# identifiers -- we just turn them all off in one fell swoop. +# Disable all filters for this one site: + { -filter } + .sourceforge.net + + +So far we are painting with a broad brush by setting general policies. The +above would be a reasonable starting point for many situations. Now, we want to +be more specific and have customized rules that are more suitable to our +personal habits and preferences. These would be for narrowly defined situations +like your ISP or your bank, and should be placed in user.action, which is +parsed after all other actions files and should not be clobbered by upgrades. +So any settings here, will have the last word and over-ride any previously +defined actions. + +Now a few examples of some things that one might do with a user.action file. + +# Sample user.action file. + +# Any aliases you want to use need to be re-defined here. +# Alias to turn off cookie handling, ie allow all cookies unmolested. + -prevent-cookies = -prevent-setting-cookies -prevent-reading-cookies \ + -session-cookies-only + +# Fragile sites should have the minimum changes: + fragile = -block -deanimate-gifs -fast-redirects -filter -hide-referer \ + -prevent-cookies -kill-popups + +# Allow persistent cookies for a few regular sites that we +# trust via our above alias. These will be saved from one browser session +# to the next. We are explicity turning off any and all cookie handling, +# even though the prevent-*-cookie settings were disabled in our above +# default.action anyway. So cookies from these domains will come through +# unmolested. + { -prevent-cookies } + .sun.com + .yahoo.com + .msdn.microsoft.com + .redhat.com + + +# My ISP uses obnoxious self promoting images on many pages. +# Nuke them :) Note that "+handle-as-image" need not be specified, +# since all URLs ending in .gif will be tagged as images by the +# general rules in default.action anyway. + { +block } + www.my-isp-example.com/logo[0-9].gif + + +# Say the site where you do your homebanking needs to open +# popup windows, but you have chosen to kill popups by +# default. This will allow it for your-example-bank.com: +# + { -filter{popups} -kill-popups } + .my-example-bank.com + + +# This site is delicate, and requires kid-glove +# treatment. + { fragile } + .forbes.com + + +------------------------------------------------------------------------------- + +9.6. Aliases + +Custom "actions", known to Privoxy as "aliases", can be defined by combining +other "actions". These can in turn be invoked just like the built-in "actions". +Currently, an alias can contain any character except space, tab, "=", "{" or "} +". But please use only "a"- "z", "0"-"9", "+", and "-". Alias names are not +case sensitive, and must be defined before other actions in the actions file! +And there can only be one set of "aliases" defined per file. Each actions file +may have its own aliases, but they are only visible within that file. Aliases +do not requir a "+" or "-" sign in front, since they are merely expanded. + +Now let's define a few aliases: + + # Useful custom aliases we can use later. These must come first! + {{alias}} + +prevent-cookies = +prevent-setting-cookies +prevent-reading-cookies + -prevent-cookies = -prevent-setting-cookies -prevent-reading-cookies + fragile = + -block -prevent-cookies -filter -fast-redirects -hide-referer -kill-popups + shop = -prevent-cookies -filter -fast-redirects + +imageblock = +block +handle-as-image + + # Aliases defined from other aliases, for people who don't like to type + # too much: ;-) + c0 = +prevent-cookies + c1 = -prevent-cookies + #... etc. Customize to your heart's content. + +Some examples using our "shop" and "fragile" aliases from above. These would +appear in the lower sections of an actions file as exceptions to the default +actions (as defined in the upper section): - This is much more informative, and tells us how we have defined our - "actions", and which ones match for our example, "google.com". The - first grouping shows our default settings, which would apply to all - URLs. If you look at your "actions" file, this would be the section - just below the "aliases" section near the top. This applies to all - URLs as signified by the single forward slash -- "/". + # These sites are very complex and require + # minimal interference. + {fragile} + .office.microsoft.com + .windowsupdate.microsoft.com + .nytimes.com + + # Shopping sites - but we still want to block ads. + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + .scan.co.uk + + # These shops require pop-ups also + {shop -kill-popups} + .dabs.com + .overclockers.co.uk - These are the default actions we have enabled. But we can define - additional actions that would be exceptions to these general rules, - and then list specific URLs that these exceptions would apply to. Last - match wins. Just below this then are two explict matches for - ".google.com". The first is negating our various cookie blocking - actions (i.e. we will allow cookies here). The second is allowing - "fast-redirects". Note that there is a leading dot here -- - ".google.com". This will match any hosts and sub-domains, in the - google.com domain also, such as "www.google.com". So, apparently, we - have these actions defined somewhere in the lower part of our actions - file, and "google.com" is referenced in these sections. + +The "shop" and "fragile" aliases are often used for "problem" sites that +require most actions to be disabled in order to function properly. + +------------------------------------------------------------------------------- + +10. The Filter File + +Any web page can be dynamically modified with the filter file. This +modification can be removal, or re-writing, of any web page content, including +tags and non-visible content. The default filter file is oddly enough +default.filter, located in the config directory. + +This is potentially a very powerful feature, and requires knowledge of both +"regular expression" and HTML in order create custom filters. But, there are a +number of useful filters included with Privoxy for many common situations. + +The included example file is divided into sections. Each section begins with +the FILTER keyword, followed by the identifier for that section, e.g. "FILTER: +webbugs". Each section performs a similar type of filtering, such as +"html-annoyances". + +This file uses regular expressions to alter or remove any string in the target +page. The expressions can only operate on one line at a time. Some examples +from the included default default.filter: + +Stop web pages from displaying annoying messages in the status bar by deleting +such references: + + FILTER: html-annoyances + + # New browser windows should be resizeable and have a location and status + # bar. Make it so. + # + s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig + s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig + s/scrolling="?(no|0|Auto)"?/scrolling=1/ig + s/menubar="?(no|0)"?/menubar=1/ig + + # The tag was a crime! + # + s*|**ig + + # Is this evil? + # + #s/framespacing="?(no|0)"?//ig + #s/margin(height|width)=[0-9]*//gi - And now we pull it altogether in the bottom section and summarize how - Privoxy is appying all its "actions" to "google.com": + +Just for kicks, replace any occurrence of "Microsoft" with "MicroSuck", and +have a little fun with topical buzzwords: + + FILTER: fun + + s/microsoft(?!.com)/MicroSuck/ig + + # Buzzword Bingo: + # + s/industry-leading|cutting-edge|award-winning/BINGO!/ig - Final results: - -add-header -block -deanimate-gifs -downgrade -fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression - -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer - -wafer +Kill those pesky little web-bugs: + # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) + FILTER: webbugs - Now another example, "ad.doubleclick.net": + s/]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\s*['"]?1 +(\D[^>]*?)?>//sig - { +block +image } - .ad.doubleclick.net - { +block +image } - ad*. +------------------------------------------------------------------------------- + +10.1. The +filter Action + +Filters are enabled with the "+filter" action from within one of the actions +files. "+filter" requires one parameter, which should match one of the section +identifiers in the filter file itself. Example: + + +filter{html-annoyances} + +This would activate that particular filter. Similarly, "+filter" can be turned +off for selected sites as: "-filter{html-annoyances}". Remember too, all +actions are off by default, unless they are explicity enabled in one of the +actions files. + +------------------------------------------------------------------------------- + +11. Templates + +When Privoxy displays one of its internal pages, such as a 404 Not Found error +page (Privoxy must be running for link to work as intended), it uses the +appropriate template. On Linux, BSD, and Unix, these are located in /etc/ +privoxy/templates by default. These may be customized, if desired. +cgi-style.css is used to control the HTML attributes (fonts, etc). + +The default Blocked (Privoxy needs to be running for page to display) banner +page with the bright red top banner, is called just "blocked". This may be +customized or replaced with something else if desired. + +------------------------------------------------------------------------------- + +12. Contacting the Developers, Bug Reporting and Feature Requests + +We value your feedback. However, to provide you with the best support, please +note the following sections. + +------------------------------------------------------------------------------- + +12.1. Get Support + +To get support, use the Sourceforge Support Forum: - { +block +image } - .doubleclick.net + http://sourceforge.net/tracker/?group_id=11118&atid=211118 +------------------------------------------------------------------------------- - We'll just show the interesting part here, the explicit matches. It is - matched three different times. Each as an "+block +image", which is - the expanded form of one of our aliases that had been defined as: - "+imageblock". ("Aliases" are defined in the first section of the - actions file and typically used to combine more than one action.) +12.2. Report bugs + +To submit bugs, use the Sourceforge Bug Forum: + + http://sourceforge.net/tracker/?group_id=11118&atid=111118. + +Make sure that the bug has not already been submitted. Please try to verify +that it is a Privoxy bug, and not a browser or site bug first. If you are using +your own custom configuration, please try the stock configs to see if the +problem is a configuration related bug. And if not using the latest development +snapshot, please try the latest one. Or even better, CVS sources. Please be +sure to include the Privoxy version, platform, browser, any pertinent log data, +any other relevant details (please be specific) and, if possible, some way to +reproduce the bug. + +------------------------------------------------------------------------------- + +12.3. Request new features + +To submit ideas on new features, use the Sourceforge feature request forum: + + http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse. + +------------------------------------------------------------------------------- + +12.4. Report ads or other filter problems + +You can also send feedback on websites that Privoxy has problems with. Please +bookmark the following link: "Privoxy - Submit Filter Feedback". Once you surf +to a page with problems, use the bookmark to send us feedback. We will look +into the issue as soon as possible. + +New, improved default.action files will occasionally be made available based on +your feedback. These will be announced on the ijbswa-announce list. + +------------------------------------------------------------------------------- + +12.5. Other + +For any other issues, feel free to use the mailing lists: + + http://sourceforge.net/mail/?group_id=11118. + +Anyone interested in actively participating in development and related +discussions can also join the appropriate mailing list. Archives are available, +too. See the page on Sourceforge. + +------------------------------------------------------------------------------- + +13. Copyright and History + +13.1. Copyright + +Privoxy is free software; you can redistribute it and/or modify it under the +terms of the GNU General Public License as published by the Free Software +Foundation; either version 2 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A +PARTICULAR PURPOSE. See the GNU General Public License for more details, which +is available from the Free Software Foundation, Inc, 59 Temple Place - Suite +330, Boston, MA 02111-1307, USA. + +You should have received a copy of the GNU General Public License along with +this program; if not, write to the Free Software Foundation, Inc., 59 Temple +Place, Suite 330, Boston, MA 02111-1307 USA. + +------------------------------------------------------------------------------- + +13.2. History + +Privoxy is evolved, and derived from, the Internet Junkbuster, with many +improvments and enhancements over the original. + +Junkbuster was originally written by Anonymous Coders and Junkbusters +Corporation, and was released as free open-source software under the GNU GPL. +Stefan Waldherr made many improvements, and started the SourceForge project +Privoxy to rekindle development. There are now several active developers +contributing. The last stable release of Junkbuster was v2.0.2, which has now +grown whiskers ;-). + +------------------------------------------------------------------------------- + +14. See Also + +Other references and sites of interest to Privoxy users: + +http://www.privoxy.org/, The Privoxy Home page. + +http://sourceforge.net/projects/ijbswa, the Project Page for Privoxy on +Sourceforge. + +http://p.p/, access Privoxy from your browser. Alternately, http:// +config.privoxy.org may work in some situations where the first does not. + +http://p.p/, and select "Privoxy - Submit Filter Feedback" to submit "misses" +to the developers. + +http://www.junkbusters.com/ht/en/cookies.html + +http://www.waldherr.org/junkbuster/ + +http://privacy.net/analyze/ + +http://www.squid-cache.org/ + + + +------------------------------------------------------------------------------- + +15. Appendix + +15.1. Regular Expressions + +Privoxy can use "regular expressions" in various config files. Assuming support +for "pcre" (Perl Compatible Regular Expressions) is compiled in, which is the +default. Such configuration directives do not require regular expressions, but +they can be used to increase flexibility by matching a pattern with wild-cards +against URLs. + +If you are reading this, you probably don't understand what "regular +expressions" are, or what they can do. So this will be a very brief +introduction only. A full explanation would require a book ;-) + +"Regular expressions" is a way of matching one character expression against +another to see if it matches or not. One of the "expressions" is a literal +string of readable characters (letter, numbers, etc), and the other is a +complex string of literal characters combined with wild-cards, and other +special characters, called meta-characters. The "meta-characters" have special +meanings and are used to build the complex pattern to be matched against. Perl +Compatible Regular Expressions is an enhanced form of the regular expression +language with backward compatibility. + +To make a simple analogy, we do something similar when we use wild-card +characters when listing files with the dir command in DOS. *.* matches all +filenames. The "special" character here is the asterisk which matches any and +all characters. We can be more specific and use ? to match just individual +characters. So "dir file?.text" would match "file1.txt", "file2.txt", etc. We +are pattern matching, using a similar technique to "regular expressions"! + +Regular expressions do essentially the same thing, but are much, much more +powerful. There are many more "special characters" and ways of building complex +patterns however. Let's look at a few of the common ones, and then some +examples: + +. - Matches any single character, e.g. "a", "A", "4", ":", or "@". + +? - The preceding character or expression is matched ZERO or ONE times. Either/ +or. + ++ - The preceding character or expression is matched ONE or MORE times. + +* - The preceding character or expression is matched ZERO or MORE times. + +\ - The "escape" character denotes that the following character should be taken +literally. This is used where one of the special characters (e.g. ".") needs to +be taken literally and not as a special meta-character. Example: "example +\.com", makes sure the period is recognized only as a period (and not expanded +to its meta-character meaning of any single character). + +[] - Characters enclosed in brackets will be matched if any of the enclosed +characters are encountered. For instance, "[0-9]" matches any numeric digit +(zero through nine). As an example, we can combine this with "+" to match any +digit one of more times: "[0-9]+". + +() - parentheses are used to group a sub-expression, or multiple +sub-expressions. + +| - The "bar" character works like an "or" conditional statement. A match is +successful if the sub-expression on either side of "|" matches. As an example: +"/(this|that) example/" uses grouping and the bar character and would match +either "this example" or "that example", and nothing else. + +s/string1/string2/g - This is used to rewrite strings of text. "string1" is +replaced by "string2" in this example. There must of course be a match on +"string1" first. + +These are just some of the ones you are likely to use when matching URLs with +Privoxy, and is a long way from a definitive list. This is enough to get us +started with a few simple examples which may be more illuminating: + +/.*/banners/.* - A simple example that uses the common combination of "." and " +*" to denote any character, zero or more times. In other words, any string at +all. So we start with a literal forward slash, then our regular expression +pattern (".*") another literal forward slash, the string "banners", another +forward slash, and lastly another ".*". We are building a directory path here. +This will match any file with the path that has a directory named "banners" in +it. The ".*" matches any characters, and this could conceivably be more forward +slashes, so it might expand into a much longer looking path. For example, this +could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or just "/ +banners/annoying.html", or almost an infinite number of other possible +combinations, just so it has "banners" in the path somewhere. + +A now something a little more complex: + +/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal forward +slashes again ("/"), so we are building another expression that is a file path +statement. We have another ".*", so we are matching against any conceivable +sub-path, just so it matches our expression. The only true literal that must +match our pattern is adv, together with the forward slashes. What comes after +the "adv" string is the interesting part. + +Remember the "?" means the preceding expression (either a literal character or +anything grouped with "(...)" in this case) can exist or not, since this means +either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as +are the individual sub-expressions: "(er)", "(ing|ements?)", and the "s". The " +|" means "or". We have two of those. For instance, "(ing|ements?)", can expand +to match either "ing" OR "ements?". What is being done here, is an attempt at +matching as many variations of "advertisement", and similar, as possible. So +this would expand to match just "adv", or "advert", or "adverts", or +"advertising", or "advertisement", or "advertisements". You get the idea. But +it would not match "advertizements" (with a "z"). We could fix that by changing +our regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which +would then match either spelling. + +/.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward +slashes. Anything in the square brackets "[]" can be matched. This is using +"0-9" as a shorthand expression to mean any digit one through nine. It is the +same as saying "0123456789". So any digit matches. The "+" means one or more of +the preceding expression must be included. The preceding expression here is +what is in the square brackets -- in this case, any digit one through nine. +Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so +this needs to match the expression on either side of that bar character also. A +simple "gif" on one side, and the other side will in turn match either "jpeg" +or "jpg", since the "?" means the letter "e" is optional and can be matched +once or not at all. So we are building an expression here to match image GIF or +JPEG type image file. It must include the literal string "advert", then one or +more digits, and a "." (which is now a literal, and not a special character, +since it is escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". +Some possible matches would include: "//advert1.jpg", "/nasty/ads/ +advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match +"advert1.gif" (no leading slash), or "/adverts232.jpg" (the expression does not +include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere). + +s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck" will +replace any occurrence of "microsoft". The "i" at the end of the expression +means ignore case. The "(?!.com)" means the match should fail if "microsoft" is +followed by ".com". In other words, this acts like a "NOT" modifier. In case +this is a hyperlink, we don't want to break it ;-). + +We are barely scratching the surface of regular expressions here so that you +can understand the default Privoxy configuration files, and maybe use this +knowledge to customize your own installation. There is much, much more that can +be done with regular expressions. Now that you know enough to get started, you +can learn more on your own :/ + +More reading on Perl Compatible Regular expressions: http://www.perldoc.com/ +perl5.6/pod/perlre.html + +------------------------------------------------------------------------------- + +15.2. Privoxy's Internal Pages + +Since Privoxy proxies each requested web page, it is easy for Privoxy to trap +certain special URLs. In this way, we can talk directly to Privoxy, and see how +it is configured, see how our rules are being applied, change these rules and +other configuration options, and even turn Privoxy's filtering off, all with a +web browser. + +The URLs listed below are the special ones that allow direct access to Privoxy. +Of course, Privoxy must be running to access these. If not, you will get a +friendly error message. Internet access is not necessary either. + + * Privoxy main page: + + http://config.privoxy.org/ + + Alternately, this may be reached at http://p.p/, but this variation may not + work as reliably as the above in some configurations. + + * Show information about the current configuration, including viewing and + editing of actions files: + + http://config.privoxy.org/show-status + + * Show the source code version numbers: + + http://config.privoxy.org/show-version + + * Show the browser's request headers: + + http://config.privoxy.org/show-request + + * Show which actions apply to a URL and why: - Any one of these would have done the trick and blocked this as an - unwanted image. This is unnecessarily redundant since the last case - effectively would also cover the first. No point in taking chances - with these guys though ;-) Note that if you want an ad or obnoxious - URL to be invisible, it should be defined as "ad.doubleclick.net" is - done here -- as both a "+block" and an "+image". The custom alias - "+imageblock" does this for us. + http://config.privoxy.org/show-url-info + + * Toggle Privoxy on or off. In this case, "Privoxy" continues to run, but + only as a pass-through proxy, with no actions taking place: - One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/". - This one is giving us problems. We are getting a blank page. Hmmm... + http://config.privoxy.org/toggle + + Short cuts. Turn off, then on: - Matches for http://www.rhapsodyk.net/adsl/HOWTO/: + http://config.privoxy.org/toggle?set=disable + + http://config.privoxy.org/toggle?set=enable + +These may be bookmarked for quick reference. See next. + +------------------------------------------------------------------------------- - { -add-header -block +deanimate-gifs -downgrade +fast-redirects - +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} - +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} - +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} - -hide-user-agent -image +image-blocker{blank} +no-compression - +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups - -vanilla-wafer -wafer } - / +15.2.1. Bookmarklets - { +block +image } - /ads +Below are some "bookmarklets" to allow you to easily access a "mini" version of +some of Privoxy's special pages. They are designed for MS Internet Explorer, +but should work equally well in Netscape, Mozilla, and other browsers which +support JavaScript. They are designed to run directly from your bookmarks - not +by clicking the links below (although that should work for testing). +To save them, right-click the link and choose "Add to Favorites" (IE) or "Add +Bookmark" (Netscape). You will get a warning that the bookmark "may not be +safe" - just click OK. Then you can run the Bookmarklet directly from your +favorites/bookmarks. For even faster access, you can put them on the "Links" +bar (IE) or the "Personal Toolbar" (Netscape), and run them with a single +click. - Ooops, the "/adsl/" is matching "/ads"! But we did not want this at - all! Now we see why we get the blank page. We could now add a new - action below this that explictly does not block (-block) pages with - "adsl". There are various ways to handle such exceptions. Example: + * Privoxy - Enable - { -block } - /adsl - - - Now the page displays ;-) - -References - - Visible links - 1. http://ijbswa.sourceforge.net/user-manual/ - 2. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INTRODUCTION - 3. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN28 - 4. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION - 5. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SOURCE - 6. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-RH - 7. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SUSE - 8. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OS2 - 9. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-WIN - 10. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OTHER - 11. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONFIGURATION - 12. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN147 - 13. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN165 - 14. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN196 - 15. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN229 - 16. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN322 - 17. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN459 - 18. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN547 - 19. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN656 - 20. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE - 21. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN754 - 22. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN828 - 23. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1148 - 24. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#FILTERFILE - 25. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1207 - 26. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#QUICKSTART - 27. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1263 - 28. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONTACT - 29. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#COPYRIGHT - 30. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1322 - 31. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1328 - 32. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#SEEALSO - 33. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#APPENDIX - 34. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX - 35. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1512 - 36. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSANAT - 37. http://i.j.b/ - 38. http://sourceforge.net/projects/ijbswa/ - 39. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/ - 40. http://www.gnu.org/ - 41. http://i.j.b/ - 42. http://ijbswa.sourceforge.net/config/ - 43. http://i.j.b/ - 44. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE - 45. http://i.j.b/ - 46. http://i.j.b/ - 47. http://i.j.b/ - 48. http://i.j.b/ - 49. http://i.j.b/show-url-info - 50. http://www.perldoc.com/perl5.6/pod/perlre.html - 51. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX - 52. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSANAT - 53. http://i.j.b/ - 54. http://sourceforge.net/tracker/?group_id=11118&atid=211118 - 55. http://sourceforge.net/tracker/?group_id=11118&atid=111118 - 56. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse - 57. http://sourceforge.net/mail/?group_id=11118 - 58. http://sourceforge.net/mail/?group_id=11118 - 59. http://www.gnu.org/copyleft/gpl.html - 60. http://www.junkbusters.com/ht/en/ijbfaq.html - 61. http://www.waldherr.org/junkbuster/ - 62. http://sourceforge.net/projects/ijbswa/ - 63. http://sourceforge.net/projects/ijbswa - 64. http://ijbswa.sourceforge.net/ - 65. http://i.j.b/ - 66. http://www.junkbusters.com/ht/en/cookies.html - 67. http://www.waldherr.org/junkbuster/ - 68. http://privacy.net/analyze/ - 69. http://www.squid-cache.org/ - 70. http://www.perldoc.com/perl5.6/pod/perlre.html - 71. http://ijbswa.sourceforge.net/config/ - 72. http://i.j.b/ - 73. http://ijbswa.sourceforge.net/config/show-status - 74. http://ijbswa.sourceforge.net/config/show-version - 75. http://ijbswa.sourceforge.net/config/show-request - 76. http://ijbswa.sourceforge.net/config/show-url-info - 77. http://ijbswa.sourceforge.net/config/toggle - 78. http://ijbswa.sourceforge.net/config/toggle?set=disable - 79. http://ijbswa.sourceforge.net/config/toggle?set=enable - 80. http://ijbswa.sourceforge.net/config/edit-actions - 81. http://ijbswa.sourceforge.net/config/show-url-info - 82. http://google.com/ - - Hidden links: - 83. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1384 - 84. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1392 - 85. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1395 - 86. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1398 - 87. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1401 - 88. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1406 - 89. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1409 - 90. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1412 - 91. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1418 + * Privoxy - Disable + + * Privoxy - Toggle Privoxy (Toggles between enabled and disabled) + + * Privoxy- View Status + + * Privoxy - Submit Filter Feedback + +Credit: The site which gave me the general idea for these bookmarklets is +www.bookmarklets.com. They have more information about bookmarklets. + +------------------------------------------------------------------------------- + +15.3. Chain of Events + +Let's take a quick look at the basic sequence of events when a web page is +requested by your browser and Privoxy is on duty: + + * First, your web browser requests a web page. The browser knows to send the + request to Privoxy, which will in turn, relay the request to the remote web + server after passing the following tests: + + * Privoxy traps any request for its own internal CGI pages (e.g http://p.p/) + and sends the CGI page back to the browser. + + * Next, Privoxy checks to see if the URL matches any "+block" patterns. If + so, the URL is then blocked, and the remote web server will not be + contacted. "+handle-as-image" is then checked and if it does not match, an + HTML "BLOCKED" page is sent back. Otherwise, if it does match, an image is + returned. The type of image depends on the setting of "+set-image-blocker" + (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere). + + * Untrusted URLs are blocked. If URLs are being added to the trust file, then + that is done. + + * If the URL pattern matches the "+fast-redirects" action, it is then + processed. Unwanted parts of the requested URL are stripped. + + * Now the rest of the client browser's request headers are processed. If any + of these match any of the relevant actions (e.g. "+hide-user-agent", etc.), + headers are suppressed or forged as determined by these actions and their + parameters. + + * Now the web server starts sending its response back (i.e. typically a web + page and related data). + + * First, the server headers are read and processed to determine, among other + things, the MIME type (document type) and encoding. The headers are then + filtered as deterimed by the "+prevent-setting-cookies", + "+session-cookies-only", and "+downgrade-http-version" actions. + + * If the "+kill-popups" action applies, and it is an HTML or JavaScript + document, the popup-code in the response is filtered on-the-fly as it is + received. + + * If a "+filter" or "+deanimate-gifs" action applies (and the document type + fits the action), the rest of the page is read into memory (up to a + configurable limit). Then the filter rules (from default.filter) are + processed against the buffered content. Filters are applied in the order + they are specified in the default.filter file. Animated GIFs, if present, + are reduced to either the first or last frame, depending on the action + setting.The entire page, which is now filtered, is then sent by Privoxy + back to your browser. + + If neither "+filter" or "+deanimate-gifs" matches, then Privoxy passes the + raw data through to the client browser as it becomes available. + + * As the browser receives the now (probably filtered) page content, it reads + and then requests any URLs that may be embedded within the page source, + e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g. + frames), sounds, etc. For each of these objects, the browser issues a new + request. And each such request is in turn processed as above. Note that a + complex web page may have many such embedded URLs. + +------------------------------------------------------------------------------- + +15.4. Anatomy of an Action + +The way Privoxy applies "actions" and "filters" to any given URL can be +complex, and not always so easy to understand what is happening. And sometimes +we need to be able to see just what Privoxy is doing. Especially, if something +Privoxy is doing is causing us a problem inadvertently. It can be a little +daunting to look at the actions and filters files themselves, since they tend +to be filled with "regular expressions" whose consequences are not always so +obvious. + +One quick test to see if Privoxy is causing a problem or not, is to disable it +temporarily. This should be the first troubleshooting step. See the +Bookmarklets section on a quick and easy way to do this (be sure to flush +caches afterward!). + +Privoxy also provides the http://config.privoxy.org/show-url-info page that can +show us very specifically how actions are being applied to any given URL. This +is a big help for troubleshooting. + +First, enter one URL (or partial URL) at the prompt, and then Privoxy will tell +us how the current configuration will handle it. This will not help with +filtering effects (i.e. the "+filter" action) from the default.filter file +since this is handled very differently and not so easy to trap! It also will +not tell you about any other URLs that may be embedded within the URL you are +testing. For instance, images such as ads are expressed as URLs within the raw +page source of HTML pages. So you will only get info for the actual URL that is +pasted into the prompt area -- not any sub-URLs. If you want to know about +embedded URLs like ads, you will have to dig those out of the HTML source. Use +your browser's "View Page Source" option for this. Or right click on the ad, +and grab the URL. + +Let's try an example, google.com, and look at it one section at a time: + + Matches for http://google.com: + +--- File standard --- +(no matches in this file) + +--- File default --- + +{ -add-header -block +deanimate-gifs{last} -downgrade-http-version +fast-redirects + -filter{popups} -filter{fun} -filter{shockwave-flash} -filter{crude-parental} + +filter{html-annoyances} +filter{js-annoyances} +filter{content-cookies} + +filter{webbugs} +filter{refresh-tags} +filter{nimda} +filter{banners-by-size} + +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} + -hide-user-agent -handle-as-image +set-image-blocker{pattern} -limit-connect + +prevent-compression +session-cookies-only -prevent-reading-cookies + -prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer } +/ + + { -session-cookies-only } + .google.com + + { -fast-redirects } + .google.com + +--- File user --- +(no matches in this file) + +This tells us how we have defined our "actions", and which ones match for our +example, "google.com". The first listing is any matches for the standard.action +file. No hits at all here on "standard". Then next is "default", or our +default.action file. The large, multi-line listing, is how the actions are set +to match for all URLs, i.e. our default settings. If you look at your "actions" +file, this would be the section just below the "aliases" section near the top. +This will apply to all URLs as signified by the single forward slash at the end +of the listing -- "/". + +But we can define additional actions that would be exceptions to these general +rules, and then list specific URLs (or patterns) that these exceptions would +apply to. Last match wins. Just below this then are two explicit matches for +".google.com". The first is negating our previous cookie setting, which was for +"+session-cookies-only" (i.e. not persistent). So we will allow persistent +cookies for google. The second turns off any "+fast-redirects" action, allowing +this to take place unmolested. Note that there is a leading dot here -- +".google.com". This will match any hosts and sub-domains, in the google.com +domain also, such as "www.google.com". So, apparently, we have these two +actions defined somewhere in the lower part of our default.action file, and +"google.com" is referenced somewhere in these latter sections. + +Then, for our user.action file, we again have no hits. + +And finally we pull it all together in the bottom section and summarize how +Privoxy is applying all its "actions" to "google.com": + + Final results: + -add-header -block +deanimate-gifs{last} -downgrade-http-version -fast-redirects + -filter{popups} -filter{fun} -filter{shockwave-flash} -filter{crude-parental} + +filter{html-annoyances} +filter{js-annoyances} +filter{content-cookies} + +filter{webbugs} +filter{refresh-tags} +filter{nimda} +filter{banners-by-size} + +hide-forwarded-for-headers +hide-from-header{block} +hide-referer{forge} + -hide-user-agent -handle-as-image +set-image-blocker{pattern} -limit-connect + +prevent-compression -session-cookies-only -prevent-reading-cookies + -prevent-setting-cookies -kill-popups -send-vanilla-wafer -send-wafer + +Notice the only difference here to the previous listing, is to "fast-redirects" +and "session-cookies-only". + +Now another example, "ad.doubleclick.net": + + { +block +handle-as-image } + .ad.doubleclick.net + + { +block +handle-as-image } + ad*. + + { +block +handle-as-image } + .doubleclick.net + +We'll just show the interesting part here, the explicit matches. It is matched +three different times. Each as an "+block +handle-as-image", which is the +expanded form of one of our aliases that had been defined as: "+imageblock". ( +"Aliases" are defined in the first section of the actions file and typically +used to combine more than one action.) + +Any one of these would have done the trick and blocked this as an unwanted +image. This is unnecessarily redundant since the last case effectively would +also cover the first. No point in taking chances with these guys though ;-) +Note that if you want an ad or obnoxious URL to be invisible, it should be +defined as "ad.doubleclick.net" is done here -- as both a "+block" and an +"+handle-as-image". The custom alias "+imageblock" just simplifies the process +and make it more readable. + +One last example. Let's try "http://www.rhapsodyk.net/adsl/HOWTO/". This one is +giving us problems. We are getting a blank page. Hmmm... + + Matches for http://www.rhapsodyk.net/adsl/HOWTO/: + + { -add-header -block +deanimate-gifs -downgrade-http-version +fast-redirects + +filter{html-annoyances} +filter{js-annoyances} +filter{kill-popups} + +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} + +filter{fun} +hide-forwarded-for-headers +hide-from-header{block} + +hide-referer{forge} -hide-user-agent -handle-as-image +set-image-blocker{blank} + +prevent-compression +session-cookies-only -prevent-setting-cookies + -prevent-reading-cookies +kill-popups -send-vanilla-wafer -send-wafer } + / + + { +block +handle-as-image } + /ads + +Ooops, the "/adsl/" is matching "/ads"! But we did not want this at all! Now we +see why we get the blank page. We could now add a new action below this that +explicitly does not block ("{-block}") paths with "adsl". There are various +ways to handle such exceptions. Example: + + { -block } + /adsl + +Now the page displays ;-) Be sure to flush your browser's caches when making +such changes. Or, try using Shift+Reload. + +But now what about a situation where we get no explicit matches like we did +with: + + { +block +handle-as-image } + /ads + +That actually was very telling and pointed us quickly to where the problem was. +If you don't get this kind of match, then it means one of the default rules in +the first section is causing the problem. This would require some guesswork, +and maybe a little trial and error to isolate the offending rule. One likely +cause would be one of the "{+filter}" actions. Try adding the URL for the site +to one of aliases that turn off "+filter": + + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + .forbes.com + +"{shop}" is an "alias" that expands to "{ -filter -session-cookies-only }". Or +you could do your own exception to negate filtering: + + {-filter} + .forbes.com + +This would probably be most appropriately put in user.action, for local site +exceptions. + +"{fragile}" is an alias that disables most actions. This can be used as a last +resort for problem sites. Remember to flush caches! If this still does not +work, you will have to go through the remaining actions one by one to find +which one(s) is causing the problem. +