X-Git-Url: http://www.privoxy.org/gitweb/?p=privoxy.git;a=blobdiff_plain;f=doc%2Fsource%2Fuser-manual.sgml;h=1e1c9d2710e51310cbd2701484aab33d72286a06;hp=a7fe4aaf26a166a4da7aa6fd8aad40c8b144fe2d;hb=354e3dc6f1e2091e190238b0129aa962deff3472;hpb=2fdcad5d38e5a63296391ab509c72181b6a1d04a diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index a7fe4aaf..1e1c9d27 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -1,4 +1,22 @@ - + + + + + + + + + + + + + + + + + +]> -
-Junkbuster User Manual +Privoxy User Manual -$Id: user-manual.sgml,v 1.35 2002/03/05 00:17:27 hal9 Exp $ +$Id: user-manual.sgml,v 1.82 2002/04/18 12:04:50 oes Exp $ - By: Junkbuster Developers + By: Privoxy Developers + - The user manual gives the users information on how to install and configure - Internet Junkbuster. Internet - Junkbuster is an application that provides privacy and - security to users of the World Wide Web. + + This is here to keep vim syntax file from breaking :/ + If I knew enough to fix it, I would. + PLEASE DO NOT REMOVE! HB: hal@foobox.net + +]]> + -You can find the latest version of the user manual at http://ijbswa.sourceforge.net/user-manual/. - + The user manual gives users information on how to install, configure and use + Privoxy. + + + + &p-intro; + - Feel free to send a note to the developers at ijbswa-developers@lists.sourceforge.net. - + You can find the latest version of the user manual at http://www.privoxy.org/user-manual/. + Please see the Contact section on how to + contact the developers. + + + + + + + + + + + -Introduction - - Internet Junkbuster is a web proxy with advanced - filtering capabilities for protecting privacy, filtering and modifying web - page content, managing cookies, controlling access, and removing ads, - banners, pop-ups and other obnoxious Internet Junk. - Junkbuster has a very flexible configuration and - can be customized to suit individual needs and tastes. Internet - Junkbuster has application for both stand-alone systems and - multi-user networks. - +Introduction - This documentation is included with the current BETA version of - Internet Junkbuster and mostly complete at this - point. The most up to date reference for the time being is still the comments - in the source files and in the individual configuration files. Development - of version 3.0 is currently nearing completion, and includes many significant - changes and enhancements over earlier versions. The target release date for - stable v3.0 is soon ;-) + This documentation is included with the current &p-status; version of + Privoxy, v.&p-version;soon ;-)]]>. + - Since this is a BETA version, not all new features are well tested. This - documentation may be slightly out of sync as a result. And there - may be bugs, though hopefully not many! + Since this is a &p-status; version, not all new features are well tested. This + documentation may be slightly out of sync as a result (especially with + CVS sources). And there may be bugs, though hopefully + not many! - +]]> - + New Features - In addition to Junkbuster's traditional features - of ad and banner blocking and cookie management, this is a list of new - features currently under development: + In addition to Internet Junkbuster's traditional + features of ad and banner blocking and cookie management, + Privoxy provides new features: + - - - - - - Integrated browser based configuration and control utility (http://i.j.b). Browser-based tracing of rule - and filter effects. - - - - - - Modularized configuration that will allow for system wide settings, and - individual user settings. (not implemented yet, probably a 3.1 feature) - - - - - - Blocking of annoying pop-up browser windows. - - + + &newfeatures; + + - - - HTTP/1.1 compliant (most, but not all 1.1 features are supported). - - + - - - Support for Perl Compatible Regular Expressions in the configuration files, and - generally a more sophisticated and flexible configuration syntax over - previous versions. - - + - - - GIF de-animation. - - - - - - Web page content filtering (removes banners based on size, - invisible web-bugs, JavaScript, pop-ups, status bar abuse, - etc.) - - - - - - Bypass many click-tracking scripts (avoids script redirection). - - - - - - - Multi-threaded (POSIX and native threads). - - - - - Auto-detection and re-reading of config file changes. - - + +Installation - - - User-customizable HTML templates (e.g. 404 error page). - - + + Privoxy is available both in convenient pre-compiled + packages for a wide range of operating systems, and as raw source code. + For most users, we recommend using the packages, which can be downloaded from our + Privoxy Project Page. + - - - Improved cookie management features (e.g. session based cookies). - - + + If you like to live on the bleeding edge and are not afraid of using + possibly unstable development versions, you can check out the up-to-the-minute + version directly from the + CVS repository or simply download the nightly CVS + tarball. + - - - Builds from source on most UNIX-like systems. Packages available for: Linux - (RedHat, SuSE, or Debian), Windows, Sun Solaris, Mac OSX, OS/2. - - - + + &supported; + - - - In addition, the configuration is much more powerful and versatile over-all. - - + +Binary Packages - + + Note: If you have a previous Junkbuster or + Privoxy installation on your system, you + will either need to remove it, or that might be done by the setup + procedure. (See below for your platform). - + + In any case be sure to backup your old configuration + if it is valuable to you. In that case, also see the + note to upgraders. + - + + How to install the binary packages depends on your operating system: + - + +Redhat and SuSE RPMs + + RPMs can be installed with rpm -Uvh <name-of-rpm.rpm>, + and will use /etc/privoxy for configuration files. + - -Installation - Junkbuster is available as raw source code, or - pre-compiled binaries. See the Junkbuster Home Page - for current release info. Junkbuster is also available - via CVS. - This is the recommended approach at this time. But please be aware that CVS - is constantly changing, and it may break in mysterious ways. + Note that if you have a Junkbuster RPM installed + on your system, you need to remove it first, because the packages conflict. + Otherwise, RPM will try removing Junkbuster automaticaly, before installing + privoxy. + -Source +Debian - For gzipped tar archives, unpack the source: + FIXME. + + + +Windows - - tar xzvf ijb_source_* [.tgz or .tar.gz] - cd ijb_source_2.9.10_beta - + Just double-click the installer, which will guide you through + the installation process. + + + +Solaris, NetBSD, FreeBSD, HP-UX - For retrieving the current CVS sources, you'll need the CVS - package installed first. To download CVS source: + Create a new directory, cd to it, then unzip and + untar the archive. For the most part, you'll have to figure out where + things go. FIXME. + + + +OS/2 - - cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login - cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co current - cd current - + First, make sure that no previous installations of + Junkbuster and / or + Privoxy are left on your + system. - This will create a directory named current/, which will - contain the source tree. + Then, just double-click the WarpIN self-installing archive, which will + guide you through the installation process. A shadow of the + Privoxy executable will be placed in your + startup folder so it will start automatically whenever OS/2 starts. - Then, in either case, to build from tarball/CVS source: + The directory you choose to install Privoxy + into will contain all of the configuration files. + + +Max OSX - - ./configure (--help to see options) - make (the make from gnu, gmake for *BSD) - su - make -n install (to see where all the files will go) - make install (to really install) - + FIXME. + + +AmigaOS - For Redhat and SuSE Linux RPM packages, see below. + Unpack the .lha archive, then FIXME. + + + + +Building from Source + +&buildsource; + + + + + -Red Hat - - To build Redhat RPM packages, install source as above. Then: - +Quickstart to Using <application>Privoxy</application> + + + + +Note to Upgraders - - autoheader [suggested for CVS source] - autoconf [suggested for CVS source] - ./configure - make redhat-dist - + There are very significant changes from older versions of + Junkbuster to the current + Privoxy. Configuration is substantially + changed. Junkbuster 2.0.x and earlier + configuration files will not migrate. The functionality of the old + blockfile, cookiefile and + imagelist, are now combined into the + actions file (default.action + for most installations). - - This will create both binary and src RPMs in the usual places. Example: + A filter file (typically default.filter) + is new with Privoxy 2.9.x, and provides some + of the new sophistication (explained below). config is + much the same as before. - -    /usr/src/redhat/RPMS/i686/junkbuster-2.9.11-1.i686.rpm + If upgrading from a 2.0.x version, you will have to use the new config + files, and possibly adapt any personal rules from your older files. + When porting personal rules over from the old blockfile + to the new actions file, please note that even the pattern syntax has + changed. + If upgrading from 2.9.x development versions, it is still recommended + to use the new configuration files. -    /usr/src/redhat/SRPMS/junkbuster-2.9.11-1.src.rpm + A quick list of things to be aware of before upgrading: - To install, of course: + + + + + The default listening port is now 8118 due to a conflict with another + service (NAS). + + + + + Some installers may remove earlier versions completely. Save any + important configuration files! + + + + + Privoxy is controllable with a web browser + at the special URL: http://config.privoxy.org/ + (Shortcut: http://p.p/). Many + aspects of configuration can be done here, including temporarily disabling + Privoxy. + + + + + The primary configuration file for cookie management, ad and banner + blocking, and many other aspects of Privoxy + configuration is default.action. It is strongly + recommended to become familiar with the new actions concept below, + before modifying this file. + + + + + + + Some installers may not automatically start + Privoxy after installation. + + + + + + + + +Starting <application>Privoxy</application> - - rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.11-1.i686.rpm - + Before launching Privoxy for the first time, you + will want to configure your browser(s) to use Privoxy + as a HTTP and HTTPS proxy. The default is localhost for the proxy address, + and port 8118 (earlier versions used port 8000). This is the one required + configuration that must be done! + + + + With Netscape (and + Mozilla), this can be set under Edit + -> Preferences -> Advanced -> Proxies -> HTTP Proxy. + For Internet Explorer: Tools -> + Internet Properties -> Connections -> LAN Setting. Then, + check Use Proxy and fill in the appropriate info (Address: + localhost, Port: 8118). Include if HTTPS proxy support too. - This will place the Junkbuster configuration - files in /etc/junkbuster/, and log files in - /var/log/junkbuster/. + After doing this, flush your browser's disk and memory caches to force a + re-reading of all pages and to get rid of any ads that may be cached. You + are now ready to start enjoying the benefits of using + Privoxy. - - -SuSE - To build SuSE RPM packages, install source as above. Then: + Privoxy is typically started by specifying the + main configuration file to be used on the command line. Example Unix startup + command: - autoheader [suggested for CVS source] - autoconf [suggested for CVS source] - ./configure - make suse-dist + + # /usr/sbin/privoxy /etc/privoxy/config + - This will create both binary and src RPMs in the usual places. Example: + An init script is provided for SuSE and Redhat. -    /usr/src/packages/RPMS/i686/junkbuster-2.9.11-1.i686.rpm + For for SuSE: rcprivoxy start + -    /usr/src/packages/SRPMS/junkbuster-2.9.11-1.src.rpm + For RedHat: /etc/rc.d/init.d/privoxy start + - To install, of course: + If no configuration file is specified on the command line, + Privoxy will look for a file named + config in the current directory. Except on Win32 where + it will try config.txt. If no file is specified on the + command line and no default configuration file can be found, + Privoxy will fail to start. + - - rpm -Uvv /usr/src/packages/RPMS/i686/junkbuster-2.9.11-1.i686.rpm - + The included default configuration files should give a reasonable starting + point. Most of the per site configuration is done in the + actions files. These are where various cookie actions are + defined, ad and banner blocking, and other aspects of + Privoxy configuration. There are several such + files included, with varying levels of aggressiveness. - This will place the Junkbuster configuration - files in /etc/junkbuster/, and log files in - /var/log/junkbuster/. + You will probably want to keep an eye out for sites that require persistent + cookies, and add these to default.action as needed. By + default, most of these will be accepted only during the current browser + session (aka session cookies), until you add them to the + configuration. If you want the browser to handle this instead, you will need + to edit default.action and disable this feature. If you + use more than one browser, it would make more sense to let + Privoxy handle this. In which case, the + browser(s) should be set to accept all cookies. - - - - -OS/2 - - - - Junkbuster is packaged in a WarpIN self- - installing archive. The self-installing program will be named depending - on the release version, something like: - ijbos2_setup_1.2.3.exe. In order to install it, simply - run this executable or double-click on its icon and follow the WarpIN - installation panels. A shadow of the Junkbuster - executable will be placed in your startup folder so it will start - automatically whenever OS/2 starts. + Another feature where you will probably want to define exceptions for trusted + sites is the popup-killing (through the +popup and + +filter{popups} actions), because your favorite shopping, + banking, or leisure site may need popups. - The directory you choose to install Junkbuster - into will contain all of the configuration files. + Privoxy is HTTP/1.1 compliant, but not all of + the optional 1.1 features are as yet supported. In the unlikely event that + you experience inexplicable problems with browsers that use HTTP/1.1 per default + (like Mozilla or recent versions of I.E.), you might + try to force HTTP/1.0 compatibility. For Mozilla, look under Edit -> + Preferences -> Debug -> Networking. + Alternatively, set the +downgrade config option in + default.action which will downgrade your browser's HTTP + requests from HTTP/1.1 to HTTP/1.0 before processing them. - If you would like to build binary images on OS/2 yourself, you will need - a few Unix-like tools: autoconf, autoheader and sh. These tools will be - used to create the required config.h file, which is not part of the - source distribution because it differs based on platform. You will also - need a compiler. - The distribution has been created using IBM VisualAge compilers, but you - can use any compiler you like. GCC/EMX has the disadvantage of needing - to be single-threaded due to a limitation of EMX's implementation of the - select() socket call. + After running Privoxy for a while, you can + start to fine tune the configuration to suit your personal, or site, + preferences and requirements. There are many, many aspects that can + be customized. Actions (as specified in default.action) + can be adjusted by pointing your browser to + http://config.privoxy.org/ + (shortcut: http://p.p/), + and then follow the link to edit the actions list. + (This is an internal page and does not require Internet access.) - In addition to needing the source code distribution as outlined earlier, - you will want to extract the os2seutp directory from CVS: - - cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login - cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co os2setup - - This will create a directory named os2setup/, which will contain the - Makefile.vac makefile and os2build.cmd - which is used to completely create the binary distribution. The sequence - of events for building the executable for yourself goes something like this: - - cd current - autoheader - autoconf - sh configure - cd ..\os2setup - nmake -f Makefile.vac - - You will see this sequence laid out in os2build.cmd. + In fact, various aspects of Privoxy + configuration can be viewed from this page, including + current configuration parameters, source code version numbers, + the browser's request headers, and actions that apply + to a given URL. In addition to the default.action file + editor mentioned above, Privoxy can also + be turned on and off (toggled) from this page. - - - - -Windows -Click-click. (I need help on this. Not a clue here. Also for -configuration section below. HB.) + + If you encounter problems, try loading the page without + Privoxy. If that helps, enter the URL where + you have the problems into the browser + based rule tracing utility. See which rules apply and why, and + then try turning them off for that site one after the other, until the problem + is gone. When you have found the culprit, you might want to turn the rest on + again. - - -Other - Some quick notes on other Operating Systems. + If the above paragraph sounds gibberish to you, you might want to read more about the actions concept + or even dive deep into the Appendix + on actions. - For FreeBSD (and other *BSDs?), the build will require gmake - instead of the included make. gmake is - available from http://www.gnu.org. - The rest should be the same as above for Linux/Unix. + If you can't get rid of the problem at all, think you've found a bug in + Privoxy, want to propose a new feature or smarter rules, please see the + chapter "Contacting the Developers, .." below. - - - - -Invoking and Configuring JunkBuster - - For Unix, *BSD and Linux, all configuration files are located in - /etc/junkbuster/ by default. For MS Windows, OS/2, and - AmigaOS these are all in the same directory as the - Junkbuster executable. The name and number of - configuration files has changed from previous versions, and is subject to - change as development progresses. - - + +Command Line Options - The installed defaults provide a reasonable starting point, though possibly - aggressive by some standards. For the time being, there are only three - default configuration files (this will change in time): - - - - - - - - The main configuration file is named config - on Linux, Unix, BSD, OS/2, and AmigaOS and config.txt - on Windows. - - - - - - The ijb.action file is used to define various - actions relating to images, banners, pop-ups, access - restrictions, banners and cookies. There is a CGI based editor for this - file that can be accessed via http://i.j.b. This is the easiest method of - configuring actions. (Other actions - files are included as well with differing levels of filtering - and blocking, e.g. ijb-basic.action.) - - - - - - The re_filterfile file can be used to rewrite the raw - page content, including text as well as embedded HTML and JavaScript. - - - - - - - - ijb.action and re_filterfile - can use Perl style regular expressions for maximum flexibility. All files use - the # character to denote a comment. Such - lines are not processed by Junkbuster. After - making any changes, there is no need to restart - Junkbuster in order for the changes to take - effect. Junkbuster should detect such changes - automatically. - - - - While under development, the configuration content is subject to change. - The below documentation may not be accurate by the time you read this. - Also, what constitutes a default setting, may change, so - please check all your configuration files on important issues. - - - - - - - -Command Line Options - - JunkBuster may be invoked with the following - command-line options: + Privoxy may be invoked with the following + command-line options: @@ -560,20 +562,20 @@ configuration section below. HB.) On startup, write the process ID to FILE. Delete the - FILE on exit. Failiure to create or delete the + FILE on exit. Failure to create or delete the FILE is non-fatal. If no FILE option is given, no PID file will be used. Unix only. - --user USER + --user USER[.GROUP] After (optionally) writing the PID file, assume the user ID of - USER. Exit if the privileges are not sufficient to do - so. Unix only. + USER, and if included the GID of GROUP. Exit if the + privileges are not sufficient to do so. Unix only. @@ -582,1104 +584,1567 @@ configuration section below. HB.) If no configfile is included on the command line, - JunkBuster will look for a file named + Privoxy will look for a file named config in the current directory (except on Win32 where it will look for config.txt instead). Specify full path to avoid confusion. - + + + +<application>Privoxy</application> Configuration + + All Privoxy configuration is stored + in text files. These files can be edited with a text editor. + Many important aspects of Privoxy can + also be controlled easily with a web browser. + + + + -The Main Configuration File +Controlling <application>Privoxy</application> with Your Web Browser - Again, the main configuration file is named config on - Linux/Unix/BSD and OS/2, and config.txt on Windows. - Configuration lines consist of an initial keyword followed by a list of - values, all separated by whitespace (any number of spaces or tabs). For - example: - + Privoxy's user interface can be reached through the special + URL http://config.privoxy.org/ + (shortcut: http://p.p/), + which is a built-in page and works without Internet access. + You will see the following section: - - - - - blockfile blocklist.ini - - - - Indicates that the blockfile is named blocklist.ini. - + - - A # indicates a comment. Any part of a - line following a # is ignored, except if - the # is preceded by a - \. - +Please choose from the following options: - - Thus, by placing a # at the start of an - existing configuration line, you can make it a comment and it will be treated - as if it weren't there. This is called commenting out an - option and can be useful to turn off features: If you comment out the - logfile line, junkbuster will not - log to a file at all. Watch for the default: section in each - explanation to see what happens if the option is left unset (or commented - out). + * Privoxy main page + * Show information about the current configuration + * Show the source code version numbers + * Show the request headers. + * Show which actions apply to a URL and why + * Toggle Privoxy on or off + * Edit the actions list + + - Long lines can be continued on the next line by using a - \ as the very last character. + This should be self-explanatory. Note the last item is an editor for the + actions list, which is where much of the ad, banner, cookie, + and URL blocking magic is configured as well as other advanced features of + Privoxy. This is an easy way to adjust various + aspects of Privoxy configuration. The actions + file, and other configuration files, are explained in detail below. - There are various aspects of Junkbuster behavior - that can be tuned. - + Toggle Privoxy On or Off is handy for sites that might + have problems with your current actions and filters. You can in fact use + it as a test to see whether it is Privoxy + causing the problem or not. Privoxy continues + to run as a proxy in this case, but all filtering is disabled. There + is even a toggle Bookmarklet offered, so that you can toggle + Privoxy with one click from your browser. + - + - -Defining Other Configuration Files + - - Junkbuster can use a number of other files to tell it - what ads to block, what cookies to accept, etc. This section of the - configuration file tells Junkbuster where to find - all those other files. - - - On Windows and AmigaOS, - Junkbuster looks for these files in the same - directory as the executable. On Unix and OS/2, - Junkbuster looks for these files in the current - working directory. In either case, an absolute path name can be used to - avoid problems. - - - When development goes modular and multi-user, the blocker, filter, and - per-user config will be stored in subdirectories of confdir. - For now, only confdir/templates is used for storing HTML - templates for CGI results. - - - The location of the configuration files: - + + +Configuration Files Overview - - - - confdir /etc/junkbuster # No trailing /, please. - - - + For Unix, *BSD and Linux, all configuration files are located in + /etc/privoxy/ by default. For MS Windows, OS/2, and + AmigaOS these are all in the same directory as the + Privoxy executable. - The directory where all logging (i.e. logfile and - jarfile) takes place. No trailing - /, please: + The installed defaults provide a reasonable starting point, though possibly + aggressive by some standards. For the time being, there are only three + default configuration files (this may change in time): - - - - logdir /var/log/junkbuster - - - - + - - Note that all file specifications below are relative to - the above two directories! - + + + The main configuration file is named config + on Linux, Unix, BSD, OS/2, and AmigaOS and config.txt + on Windows. + + - - The ijb.action file contains patterns to specify the actions to - apply to requests for each site. Default: Cookies to and from all - destinations are kept only during the current browser session (i.e. they - are not saved to disk). Pop-ups are disabled for all sites. All sites are - filtered if re_filterfile specified. No sites are blocked. An - empty image is displayed for filtered ads and other images (formerly - tinygif). The syntax of this file is explained in detail below. - + + + default.action (the actions file) is used to define + which of a set of various actions relating to images, banners, + pop-ups, access restrictions, banners and cookies are to be applied where. + There is a web based editor for this file that can be accessed at http://config.privoxy.org/edit-actions/ + (Shortcut: http://p.p/edit-actions/). + (Other actions files are included as well with differing levels of filtering + and blocking, e.g. basic.action.) + + - - - - - actionsfile ijb.action - - - - + + + default.filter (the filter file) can be used to re-write the raw + page content, including viewable text as well as embedded HTML and JavaScript, + and whatever else lurks on any given web page. The filtering jobs are only + pre-defined here; whether to apply them or not is up to the actions file. + + - - The re_filterfile file contains content modification rules. - These rules permit powerful changes on the content of Web pages, e.g., you - could disable your favorite JavaScript annoyances, rewrite the actual - content, or just have some fun replacing Microsoft with - MicroSuck wherever it appears on a Web page. Default: No - content modification, or whatever the developers are playing with :-/ + - - - - re_filterfile re_filterfile - - - + All files use the # character to denote a + comment (the rest of the line will be ignored) and understand line continuation + through placing a backslash ("\") as the very last character + in a line. If the # is preceded by a backslash, it looses + its special function. Placing a # in front of an otherwise + valid configuration line to prevent it from being interpreted is called "commenting + out" that line. - The logfile is where all logging and error messages are written. The logfile - can be useful for tracking down a problem with - Junkbuster (e.g., it's not blocking an ad you - think it should block) but in most cases you probably will never look at it. + default.action and default.filter + can use Perl style regular expressions for maximum flexibility. - Your logfile will grow indefinitely, and you will probably want to - periodically remove it. On Unix systems, you can do this with a cron job - (see man cron). For Redhat, a logrotate - script has been included. + After making any changes, there is no need to restart + Privoxy in order for the changes to take + effect. Privoxy detects such changes + automatically. Note, however, that it may take one or two additional + requests for the change to take effect. When changing the listening address + of Privoxy, these wake up requests + must obviously be sent to the old listening address. + - On SuSE Linux systems, you can place a line like /var/log/junkbuster.* - +1024k 644 nobody.nogroup in /etc/logfiles, with - the effect that cron.daily will automatically archive, gzip, and empty the - log, when it exceeds 1M size. + While under development, the configuration content is subject to change. + The below documentation may not be accurate by the time you read this. + Also, what constitutes a default setting, may change, so + please check all your configuration files on important issues. +]]> + - - Default: Log to the a file named logfile. - Comment out to disable logging. - - - - - - logfile logfile - - - - + + +The Main Configuration File - The jarfile defines where - Junkbuster stores the cookies it intercepts. Note - that if you use a jarfile, it may grow quite large. Default: - Don't store intercepted cookies. + Again, the main configuration file is named config on + Linux/Unix/BSD and OS/2, and config.txt on Windows. + Configuration lines consist of an initial keyword followed by a list of + values, all separated by whitespace (any number of spaces or tabs). For + example: - #jarfile jarfile + confdir /etc/privoxy - - + + - If you specify a trustfile, - Junkbuster will only allow access to sites that - are named in the trustfile. You can also mark sites as trusted referrers, - with the effect that access to untrusted sites will be granted, if a link - from a trusted referrer was used. The link target will then be added to the - trustfile. This is a very restrictive feature that typical - users most probably want to leave disabled. Default: Disabled, don't use the - trust mechanism. + Assigns the value /etc/privoxy to the option + confdir and thus indicates that the configuration + directory is named /etc/privoxy/. - - - - #trustfile trust - - - - - - - If you use the trust mechanism, it is a good idea to write up some on-line - documentation about your blocking policy and to specify the URL(s) here. They - will appear on the page that your users receive when they try to access - untrusted content. Use multiple times for multiple URLs. Default: Don't - display links on the untrusted info page. + All options in the config file except for confdir and + logdir are optional. Watch out in the below description + for what happens if you leave them unset. - - - - trust-info-url http://www.your-site.com/why_we_block.html - trust-info-url http://www.your-site.com/what_we_allow.html - - - + The main config file controls all aspects of Privoxy's + operation that are not location dependent (i.e. they apply universally, no matter + where you may be surfing). - - - - - -Other Configuration Options +Configuration and Log File Locations - This part of the configuration file contains options that control how - Junkbuster operates. + Privoxy can (and normally does) use a number of + other files for additional configuration and logging. + This section of the configuration file tells Privoxy + where to find those other files. - - Admin-address should be set to the email address of the proxy - administrator. It is used in many of the proxy-generated pages. Default: - fill@me.in.please. - - - - - - #admin-address fill@me.in.please - - - - +confdir - - Proxy-info-url can be set to a URL that contains more info - about this Junkbuster installation, it's - configuration and policies. It is used in many of the proxy-generated pages - and its use is highly recommended in multi-user installations, since your - users will want to know why certain content is blocked or modified. Default: - Don't show a link to on-line documentation. - + + + Specifies: + + The directory where the other configuration files are located + + + + Type of value: + + Path name + + + + Default value: + + /etc/privoxy (Unix) or Privoxy installation dir (Windows) + + + + Effect if unset: + + Mandatory + + + + Notes: + + + No trailing /, please + + + When development goes modular and multi-user, the blocker, filter, and + per-user config will be stored in subdirectories of confdir. + For now, the configuration directory structure is flat, except for + confdir/templates, where the HTML templates for CGI + output reside (e.g. Privoxy's 404 error page). + + + + + - - - - - proxy-info-url http://www.your-site.com/proxy.html - - - - - - Listen-address specifies the address and port where - Junkbuster will listen for connections from your - Web browser. The default is to listen on the localhost port 8118, and - this is suitable for most users. (In your web browser, under proxy - configuration, list the proxy server as localhost and the - port as 8118). - +logdir - - If you already have another service running on port 8118, or if you want to - serve requests from other machines (e.g. on your local network) as well, you - will need to override the default. The syntax is - listen-address [<ip-address>]:<port>. If you leave - out the IP address, junkbuster will bind to all - interfaces (addresses) on your machine and may become reachable from the - Internet. In that case, consider using access control lists (acl's) (see - aclfile above), or a firewall. - + + + Specifies: + + + The directory where all logging takes place (i.e. where logfile and + jarfile are located) + + + + + Type of value: + + Path name + + + + Default value: + + /var/log/privoxy (Unix) or Privoxy installation dir (Windows) + + + + Effect if unset: + + Mandatory + + + + Notes: + + + No trailing /, please + + + + + - - For example, suppose you are running Junkbuster on - a machine which has the address 192.168.0.1 on your local private network - (192.168.0.0) and has another outside connection with a different address. - You want it to serve requests from inside only: - +actionsfile - - - - - listen-address 192.168.0.1:8118 - - - - + + + Specifies: + + + The actions file to use + + + + + Type of value: + + File name, relative to confdir + + + + Default value: + + default.action (Unix) or default.action.txt (Windows) + + + + Effect if unset: + + + No action is taken at all. Simple neutral proxying. + + + + + Notes: + + + There is no point in using Privoxy without + an actions file. There are three different actions files included in the + distribution, with varying degrees of aggressiveness: + default.action, intermediate.action and + advanced.action. + + + + + - - If you want it to listen on all addresses (including the outside - connection): - +filterfile - - - - - listen-address :8118 - - - - + + + Specifies: + + + The filter file to use + + + + + Type of value: + + File name, relative to confdir + + + + Default value: + + default.filter (Unix) or default.filter.txt (Windows) + + + + Effect if unset: + + + No textual content filtering takes place, i.e. all + +filter{name} + actions in the actions file are turned off + + + + + Notes: + + + The default.filter file contains content modification rules + that use regular expressions. These rules permit powerful + changes on the content of Web pages, e.g., you could disable your favorite + JavaScript annoyances, re-write the actual displayed text, or just have some + fun replacing Microsoft with MicroSuck wherever + it appears on a Web page. + + + + + - - If you do this, consider using ACLs (see aclfile above). Note: - you will need to point your browser(s) to the address and port that you have - configured here. Default: localhost:8118 (127.0.0.1:8118). - +logfile - - The debug option sets the level of debugging information to log in the - logfile (and to the console in the Windows version). A debug level of 1 is - informative because it will show you each request as it happens. Higher - levels of debug are probably only of interest to developers. - - - - - - - debug 1 # GPC = show each GET/POST/CONNECT request - debug 2 # CONN = show each connection status - debug 4 # IO = show I/O status - debug 8 # HDR = show header parsing - debug 16 # LOG = log all data into the logfile - debug 32 # FRC = debug force feature - debug 64 # REF = debug regular expression filter - debug 128 # = debug fast redirects - debug 256 # = debug GIF de-animation - debug 512 # CLF = Common Log Format - debug 1024 # = debug kill pop-ups - debug 4096 # INFO = Startup banner and warnings. - debug 8192 # ERROR = Non-fatal errors - - - - - - - It is highly recommended that you enable ERROR - reporting (debug 8192), at least until the next stable release. - - - - The reporting of FATAL errors (i.e. ones which crash - JunkBuster) is always on and cannot be disabled. - - - - If you want to use CLF (Common Log Format), you should set debug - 512 ONLY, do not enable anything else. - - - - Multiple debug directives, are OK - they're logical-OR'd - together. - - - - - - - debug 15 # same as setting the first 4 listed above - - - - - - - Default: - - - - - - - debug 1 # URLs - debug 4096 # Info - debug 8192 # Errors - *we highly recommended enabling this* - - - - + + + Specifies: + + + The log file to use + + + + + Type of value: + + File name, relative to logdir + + + + Default value: + + logfile (Unix) or privoxy.log (Windows) + + + + Effect if unset: + + + No log file is used, all log messages go to the console (stderr). + + + + + Notes: + + + The windows version will additionally log to the console. + + + The logfile is where all logging and error messages are written. The level + of detail and number of messages are set with the debug + option (see below). The logfile can be useful for tracking down a problem with + Privoxy (e.g., it's not blocking an ad you + think it should block) but in most cases you probably will never look at it. + + + Your logfile will grow indefinitely, and you will probably want to + periodically remove it. On Unix systems, you can do this with a cron job + (see man cron). For Redhat, a logrotate + script has been included. + + + On SuSE Linux systems, you can place a line like /var/log/privoxy.* + +1024k 644 nobody.nogroup in /etc/logfiles, with + the effect that cron.daily will automatically archive, gzip, and empty the + log, when it exceeds 1M size. + + + + + - - Junkbuster normally uses - multi-threading, a software technique that permits it to - handle many different requests simultaneously. In some cases you may wish to - disable this -- particularly if you're trying to debug a problem. The - single-threaded option forces - Junkbuster to handle requests sequentially. - Default: Multi-threaded mode. - +jarfile - - - - - #single-threaded - - - - + + + Specifies: + + + The file to store intercepted cookies in + + + + + Type of value: + + File name, relative to logdir + + + + Default value: + + jarfile (Unix) or privoxy.jar (Windows) + + + + Effect if unset: + + + Intercepted cookies are not stored at all. + + + + + Notes: + + + The jarfile may grow to ridiculous sizes over time. + + + + + - - toggle allows you to temporarily disable all - Junkbuster's filtering. Just set toggle - 0. - +trustfile - - The Windows version of Junkbuster puts an icon in - the system tray, which also allows you to change this option. If you - right-click on that icon (or select the Options menu), one - choice is Enable. Clicking on enable toggles - Junkbuster on and off. This is useful if you want - to temporarily disable Junkbuster, e.g., to access - a site that requires cookies which you would otherwise have blocked. This can also - be toggled via a web browser at the Junkbuster - internal address of http://i.j.b on - any platform. - + + + Specifies: + + + The trust file to use + + + + + Type of value: + + File name, relative to confdir + + + + Default value: + + Unset (commented out). When activated: trust (Unix) or trust.txt (Windows) + + + + Effect if unset: + + + The whole trust mechanism is turned off. + + + + + Notes: + + + The trust mechanism is an experimental feature for building white-lists and should + be used with care. It is NOT recommended for the casual user. + + + If you specify a trust file, Privoxy will only allow + access to sites that are named in the trustfile. + You can also mark sites as trusted referrers (with +), with + the effect that access to untrusted sites will be granted, if a link from a + trusted referrer was used. + The link target will then be added to the trustfile. + Possible applications include limiting Internet access for children. + + + If you use + operator in the trust file, it may grow considerably over time. + + + + + - - toggle 1 means Junkbuster runs - normally, toggle 0 means that - Junkbuster becomes a non-anonymizing non-blocking - proxy. Default: 1 (on). - + - - - - - toggle 1 - - - - + - - For content filtering, i.e. the +filter and - +deanimate-gif actions, it is necessary that - Junkbuster buffers the entire document body. - This can be potentially dangerous, since a server could just keep sending - data indefinitely and wait for your RAM to exhaust. With nasty consequences. - - - The buffer-limit option lets you set the maximum - size in Kbytes that each buffer may use. When the documents buffer exceeds - this size, it is flushed to the client unfiltered and no further attempt to - filter the rest of it is made. Remember that there may multiple threads - running, which might require increasing the buffer-limit - Kbytes each, unless you have enabled - single-threaded above. - - - - - - buffer-limit 4069 - - - - + - - To enable the web-based ijb.action file editor set - enable-edit-actions to 1, or 0 to disable. Note - that you must have compiled JunkBuster with - support for this feature, otherwise this option has no effect. This - internal page can be reached at http://i.j.b. - + +Local Set-up Documentation - - Security note: If this is enabled, anyone who can use the proxy - can edit the actions file, and their changes will affect all users. - For shared proxies, you probably want to disable this. Default: enabled. - + + If you intend to operate Privoxy for more users + that just yourself, it might be a good idea to let them know how to reach + you, what you block and why you do that, your policies etc. + - - - - - enable-edit-actions 1 - - - - +trust-info-url - - Allow JunkBuster to be toggled on and off - remotely, using your web browser. Set enable-remote-toggleto - 1 to enable, and 0 to disable. Note that you must have compiled - JunkBuster with support for this feature, - otherwise this option has no effect. - + + + Specifies: + + + A URL to be displayed in the error page that users will see if access to an untrusted page is denied. + + + + + Type of value: + + URL + + + + Default value: + + Two example URL are provided + + + + Effect if unset: + + + No links are displayed on the "untrusted" error page. + + + + + Notes: + + + The value of this option only matters if the experimental trust mechanism has been + activated. (See trustfile above.) + + + If you use the trust mechanism, it is a good idea to write up some on-line + documentation about your trust policy and to specify the URL(s) here. + Use multiple times for multiple URLs. + + + The URL(s) should be added to the trustfile as well, so users don't end up + locked out from the information on why they were locked out in the first place! + + + + + - - Security note: If this is enabled, anyone who can use the proxy can toggle - it on or off (see http://i.j.b), and - their changes will affect all users. For shared proxies, you probably want to - disable this. Default: enabled. - +admin-address - - - - - enable-remote-toggle 1 - - - - + + + Specifies: + + + An email address to reach the proxy administrator. + + + + + Type of value: + + Email address + + + + Default value: + + Unset + + + + Effect if unset: + + + No email address is displayed on error pages and the CGI user interface. + + + + + Notes: + + + If both admin-address and proxy-info-url + are unset, the whole "Local Privoxy Support" box on all generated pages will + not be shown. + + + + + + +proxy-info-url + + + + Specifies: + + + A URL to documentation about the local Privoxy setup, + configuration or policies. + + + + + Type of value: + + URL + + + + Default value: + + Unset + + + + Effect if unset: + + + No link to local documentation is displayed on error pages and the CGI user interface. + + + + + Notes: + + + If both admin-address and proxy-info-url + are unset, the whole "Local Privoxy Support" box on all generated pages will + not be shown. + + + This URL shouldn't be blocked ;-) + + + + + - - -Access Control List (ACL) - - Access controls are included at the request of some ISPs and systems - administrators, and are not usually needed by individual users. Please note - the warnings in the FAQ that this proxy is not intended to be a substitute - for a firewall or to encourage anyone to defer addressing basic security - weaknesses. - +Debugging - - If no access settings are specified, the proxy talks to anyone that - connects. If any access settings file are specified, then the proxy - talks only to IP addresses permitted somewhere in this file and not - denied later in this file. - - - - Summary -- if using an ACL: - - - - - Client must have permission to receive service. - - - - - LAST match in ACL wins. - - - - - Default behavior is to deny service. - - - - - The syntax for an entry in the Access Control List is: - - - - - - - ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ] - - - - - - - Where the individual fields are: - - - - - - - ACTION = permit-access or deny-access - - SRC_ADDR = client hostname or dotted IP address - SRC_MASKLEN = number of bits in the subnet mask for the source - - DST_ADDR = server or forwarder hostname or dotted IP address - DST_MASKLEN = number of bits in the subnet mask for the target - - - - - - - - The field separator (FS) is whitespace (space or tab). - - - - IMPORTANT NOTE: If the junkbuster is using a - forwarder (see below) or a gateway for a particular destination URL, the - DST_ADDR that is examined is the address of the forwarder - or the gateway and NOT the address of the ultimate - target. This is necessary because it may be impossible for the local - Junkbuster to determine the address of the - ultimate target (that's often what gateways are used for). - - - - Here are a few examples to show how the ACL features work: - - - - localhost is OK -- no DST_ADDR implies that - ALL destination addresses are OK: - - - - - - - permit-access localhost - - - - - - - A silly example to illustrate permitting any host on the class-C subnet with - Junkbuster to go anywhere: - - - - - - - permit-access www.junkbusters.com/24 - - - - - - - Except deny one particular IP address from using it at all: - - - - - - - deny-access ident.junkbusters.com - - - - - - - You can also specify an explicit network address and subnet mask. - Explicit addresses do not have to be resolved to be used. - - - - - - - permit-access 207.153.200.0/24 - - - - - - - A subnet mask of 0 matches anything, so the next line permits everyone. - - - - - - - permit-access 0.0.0.0/0 - - - - - - - Note, you cannot say: - - - - - - - permit-access .org - - - - - - - to allow all *.org domains. Every IP address listed must resolve fully. - + + These options are mainly useful when tracing a problem. + Note that you might also want to invoke + Privoxy with the --no-daemon + command line option when debugging. + - - An ISP may want to provide a Junkbuster that is - accessible by the world and yet restrict use of some of their - private content to hosts on its internal network (i.e. its own subscribers). - Say, for instance the ISP owns the Class-B IP address block 123.124.0.0 (a 16 - bit netmask). This is how they could do it: - +debug - - - - - permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere - # with the following exceptions: - - deny-access 0.0.0.0/0 123.124.0.0/16 # block all external requests for - # sites on the ISP's network - - permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main - # web site + + + Specifies: + + + Key values that determine what information gets logged. + + + + + Type of value: + + Integer values + + + + Default value: + + 12289 (i.e.: URLs plus informational and warning messages) + + + + Effect if unset: + + + Nothing gets logged. + + + + + Notes: + + + The available debug levels are: + + + + debug 1 # show each GET/POST/CONNECT request + debug 2 # show each connection status + debug 4 # show I/O status + debug 8 # show header parsing + debug 16 # log all data into the logfile + debug 32 # debug force feature + debug 64 # debug regular expression filter + debug 128 # debug fast redirects + debug 256 # debug GIF de-animation + debug 512 # Common Log Format + debug 1024 # debug kill pop-ups + debug 4096 # Startup banner and warnings. + debug 8192 # Non-fatal errors + + + + To select multiple debug levels, you can either add them or use + multiple debug lines. + + + A debug level of 1 is informative because it will show you each request + as it happens. 1, 4096 and 8192 are highly recommended + so that you will notice when things go wrong. The other levels are probably + only of interest if you are hunting down a specific problem. They can produce + a hell of an output (especially 16). + + + + The reporting of fatal errors (i.e. ones which crash + Privoxy) is always on and cannot be disabled. + + + If you want to use CLF (Common Log Format), you should set debug + 512 ONLY and not enable anything else. + + + + + - permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go - # anywhere - - - - +single-threaded - - Note that if some hostnames are listed with multiple IP addresses, - the primary value returned by DNS (via gethostbyname()) is used. Default: - Anyone can access the proxy. - + + + Specifies: + + + Whether to run only one server thread + + + + + Type of value: + + None + + + + Default value: + + Unset + + + + Effect if unset: + + + Multi-threaded (or, where unavailable: forked) operation, i.e. the ability to + serve multiple requests simultaneously. + + + + + Notes: + + + This option is only there for debug purposes and you should never + need to use it. It will drastically reduce performance. + + + + + - - - -Forwarding - - - This feature allows chaining of HTTP requests via multiple proxies. - It can be used to better protect privacy and confidentiality when - accessing specific domains by routing requests to those domains - to a special purpose filtering proxy such as lpwa.com. Or to use - a caching proxy to speed up browsing. - - - - It can also be used in an environment with multiple networks to route - requests via multiple gateways allowing transparent access to multiple - networks without having to modify browser configurations. - - - - Also specified here are SOCKS proxies. Junkbuster - SOCKS 4 and SOCKS 4A. The difference is that SOCKS 4A will resolve the target - hostname using DNS on the SOCKS server, not our local DNS client. - - - - The syntax of each line is: - - - - - - - forward target_domain[:port] http_proxy_host[:port] - forward-socks4 target_domain[:port] socks_proxy_host[:port] http_proxy_host[:port] - forward-socks4a target_domain[:port] socks_proxy_host[:port] http_proxy_host[:port] - - - - - - - If http_proxy_host is ., then requests are not forwarded to a - HTTP proxy but are made directly to the web servers. - +Access Control and Security - - Lines are checked in sequence, and the last match wins. - + + This section of the config file controls the security-relevant aspects + of Privoxy's configuration. + - - There is an implicit line equivalent to the following, which specifies that - anything not finding a match on the list is to go out without forwarding - or gateway protocol, like so: - +listen-address - - - - - forward .* . # implicit - - - - + + + Specifies: + + + The IP address and TCP port on which Privoxy will + listen for client requests. + + + + + Type of value: + + [IP-Address]:Port + + + + Default value: + + localhost:8118 + + + + Effect if unset: + + + Bind to localhost (127.0.0.1), port 8118. This is suitable and recommended for + home users who run Privoxy on the same machine as + their browser. + + + + + Notes: + + + You will need to configure your browser(s) to this proxy address and port. + + + If you already have another service running on port 8118, or if you want to + serve requests from other machines (e.g. on your local network) as well, you + will need to override the default. + + + If you leave out the IP address, Privoxy will + bind to all interfaces (addresses) on your machine and may become reachable + from the Internet. In that case, consider using access control lists (acl's) + (see ACLs below), or a firewall. + + + + + Example: + + + Suppose you are running Privoxy on + a machine which has the address 192.168.0.1 on your local private network + (192.168.0.0) and has another outside connection with a different address. + You want it to serve requests from inside only: + + + + listen-address 192.168.0.1:8118 + + + + + + - - In the following common configuration, everything goes to Lucent's LPWA, - except SSL on port 443 (which it doesn't handle): - +toggle - - - - - forward .* lpwa.com:8000 - forward :443 . - - - - + + + Specifies: + + + Initial state of "toggle" status + + + + + Type of value: + + 1 or 0 + + + + Default value: + + 1 + + + + Effect if unset: + + + Act as if toggled on + + + + + Notes: + + + If set to 0, Privoxy will start in + toggled off mode, i.e. behave like a normal, content-neutral + proxy. See enable-remote-toggle + below. This is not really useful anymore, since toggling is much easier + via the web + interface then via editing the conf file. + + + The windows version will only display the toggle icon in the system tray + if this option is present. + + + + + - - See the FAQ for instructions on how to automate the login procedure for LPWA. - Some users have reported difficulties related to LPWA's use of - . as the last element of the domain, and have said that this - can be fixed with this: - - - - - - - forward lpwa. lpwa.com:8000 - - - - - - - (NOTE: the syntax for specifying target_domain has changed since the - previous paragraph was written -- it will not work now. More information - is welcome.) - - - In this fictitious example, everything goes via an ISP's caching proxy, - except requests to that ISP: - +enable-remote-toggle + + + Specifies: + + + Whether or not the web-based toggle + feature may be used + + + + + Type of value: + + 0 or 1 + + + + Default value: + + 1 + + + + Effect if unset: + + + The web-based toggle feature is disabled. + + + + + Notes: + + + When toggled off, Privoxy acts like a normal, + content-neutral proxy, i.e. it acts as if none of the actions applied to + any URL. + + + For the time being, access to the toggle feature can not be + controlled separately by ACLs or HTTP authentication, + so that everybody who can access Privoxy (see + ACLs and listen-address above) can + toggle it for all users. So this option is not recommended + for multi-user environments with untrusted users. + + + Note that you must have compiled Privoxy with + support for this feature, otherwise this option has no effect. + + + + + - - - - - forward .* caching.myisp.net:8000 - forward myisp.net . - - - - - - For the @home network, we're told the forwarding configuration is this: - +enable-edit-actions + + + Specifies: + + + Whether or not the web-based actions + file editor may be used + + + + + Type of value: + + 0 or 1 + + + + Default value: + + 1 + + + + Effect if unset: + + + The web-based actions file editor is disabled. + + + + + Notes: + + + For the time being, access to the editor can not be + controlled separately by ACLs or HTTP authentication, + so that everybody who can access Privoxy (see + ACLs and listen-address above) can + modify its configuration for all users. So this option is not + recommended for multi-user environments with untrusted users. + + + Note that you must have compiled Privoxy with + support for this feature, otherwise this option has no effect. + + + + + + +ACLs: permit-access and deny-access + + + Specifies: + + + Who can access what. + + + + + Type of value: + + + src_addr[/src_masklen] + [dst_addr[/dst_masklen]] + + + Where src_addr and + dst_addr are IP addresses in dotted decimal notation or valid + DNS names, and src_masklen and + dst_masklen are subnet masks in CIDR notation, i.e. integer + values from 2 to 30 representing the length (in bits) of the network address. The masks and the whole + destination part are optional. + + + + + Default value: + + Unset + + + + Effect if unset: + + + Don't restrict access further than implied by listen-address + + + + + Notes: + + + Access controls are included at the request of ISPs and systems + administrators, and are not usually needed by individual users. + For a typical home user, it will normally suffice to ensure that + Privoxy only listens on the localhost or internal (home) + network address by means of the listen-address option. + + + Please see the warnings in the FAQ that this proxy is not intended to be a substitute + for a firewall or to encourage anyone to defer addressing basic security + weaknesses. + + + Multiple ACL lines are OK. + If any ACLs are specified, then the Privoxy + talks only to IP addresses that match at least one permit-access line + and don't match any subsequent deny-access line. In other words, the + last match wins, with the default being deny-access. + + + If Privoxy is using a forwarder (see forward below) + for a particular destination URL, the dst_addr + that is examined is the address of the forwarder and NOT the address + of the ultimate target. This is necessary because it may be impossible for the local + Privoxy to determine the IP address of the + ultimate target (that's often what gateways are used for). + + + You should prefer using IP addresses over DNS names, because the address lookups take + time. All DNS names must resolve! You can not use domain patterns + like *.org or partial domain names. If a DNS name resolves to multiple + IP addresses, only the first one is used. + + + Denying access to particular sites by ACL may have undesired side effects + if the site in question is hosted on a machine which also hosts other sites. + + + + + Examples: + + + Explicitly define the default behavior if no ACL and + listen-address are set: localhost + is OK. The absence of a dst_addr implies that + all destination addresses are OK: + + + + permit-access localhost + + + + Allow any host on the same class C subnet as www.privoxy.org access to + nothing but www.example.com: + + + + permit-access www.privoxy.org/24 www.example.com/32 + + + + Allow access from any host on the 26-bit subnet 192.168.45.64 to anywhere, + with the exception that 192.168.45.73 may not access www.dirty-stuff.example.com: + + + + permit-access 192.168.45.64/26 + deny-access 192.168.45.73 www.dirty-stuff.example.com + + + + + + +buffer-limit - - - - - forward .* proxy:8080 - - - - + + + Specifies: + + + Maximum size of the buffer for content filtering. + + + + + Type of value: + + Size in Kbytes + + + + Default value: + + 4096 + + + + Effect if unset: + + + Use a 4MB (4096 KB) limit. + + + + + Notes: + + + For content filtering, i.e. the +filter and + +deanimate-gif actions, it is necessary that + Privoxy buffers the entire document body. + This can be potentially dangerous, since a server could just keep sending + data indefinitely and wait for your RAM to exhaust -- with nasty consequences. + Hence this option. + + + When a document buffer size reaches the buffer-limit, it is + flushed to the client unfiltered and no further attempt to + filter the rest of the document is made. Remember that there may be multiple threads + running, which might require up to buffer-limit Kbytes + each, unless you have enabled single-threaded + above. + + + + + - - Also, we're told they insist on getting cookies and JavaScript, so you should - add home.com to the cookie file. We consider JavaScript a security risk. - Java need not be enabled. - + - - In this example direct connections are made to all internal - domains, but everything else goes through Lucent's LPWA by way of the - company's SOCKS gateway to the Internet. - + - - - - - forward-socks4 .* lpwa.com:8000 firewall.my_company.com:1080 - forward my_company.com . - - - - - - This is how you could set up a site that always uses SOCKS but no forwarders: - + - - - - - forward-socks4a .* . firewall.my_company.com:1080 - - - - + +Forwarding - An advanced example for network administrators: + This feature allows routing of HTTP requests through a chain of + multiple proxies. + It can be used to better protect privacy and confidentiality when + accessing specific domains by routing requests to those domains + through an anonymous public proxy (see e.g. http://www.multiproxy.org/anon_list.htm) + Or to use a caching proxy to speed up browsing. Or chaining to a parent + proxy may be necessary because the machine that Privoxy + runs on has no direct Internet access. - If you have links to multiple ISPs that provide various special content to - their subscribers, you can configure forwarding to pass requests to the - specific host that's connected to that ISP so that everybody can see all - of the content on all of the ISPs. + Also specified here are SOCKS proxies. Privoxy + supports the SOCKS 4 and SOCKS 4A protocols. - - This is a bit tricky, but here's an example: - +forward + + + Specifies: + + + To which parent HTTP proxy specific requests should be routed. + + + + + Type of value: + + + target_domain[:port] + http_parent[/port] + + + Where target_domain is a domain name pattern (see the + chapter on domain matching in the actions file), + http_parent is the address of the parent HTTP proxy + as an IP addresses in dotted decimal notation or as a valid DNS name (or . to denote + no forwarding, and the optional + port parameters are TCP ports, i.e. integer + values from 1 to 64535 + + + + + Default value: + + Unset + + + + Effect if unset: + + + Don't use parent HTTP proxies. + + + + + Notes: + + + If http_parent is ., then requests are not + forwarded to another HTTP proxy but are made directly to the web servers. + + + Multiple lines are OK, they are checked in sequence, and the last match wins. + + + + + Examples: + + + Everything goes to an example anonymizing proxy, except SSL on port 443 (which it doesn't handle): + + + + forward .* anon-proxy.example.org:8080 + forward :443 . + + + + Everything goes to our example ISP's caching proxy, except for requests + to that ISP's sites: + + + + forward .*. caching-proxy.example-isp.net:8000 + forward .example-isp.net . + + + + + + + +forward-socks4 and forward-socks4a + + + Specifies: + + + Through which SOCKS proxy (and to which parent HTTP proxy) specific requests should be routed. + + + + + Type of value: + + + target_domain[:port] + socks_proxy[/port] + http_parent[/port] + + + Where target_domain is a domain name pattern (see the + chapter on domain matching in the actions file), + http_parent and socks_proxy + are IP addresses in dotted decimal notation or valid DNS names (http_parent + may be . to denote no HTTP forwarding), and the optional + port parameters are TCP ports, i.e. integer values from 1 to 64535 + + + + + Default value: + + Unset + + + + Effect if unset: + + + Don't use SOCKS proxies. + + + + + Notes: + + + Multiple lines are OK, they are checked in sequence, and the last match wins. + + + The difference between forward-socks4 and forward-socks4a + is that in the SOCKS 4A protocol, the DNS resolution of the target hostname happens on the SOCKS + server, while in SOCKS 4 it happens locally. + + + If http_parent is ., then requests are not + forwarded to another HTTP proxy but are made (HTTP-wise) directly to the web servers, albeit through + a SOCKS proxy. + + + + + Examples: + + + From the company example.com, direct connections are made to all + internal domains, but everything outbound goes through + their ISP's proxy by way of example.com's corporate SOCKS 4A gateway to + the Internet. + + + + forward-socks4a .*. socks-gw.example.com:1080 www-cache.example-isp.net:8080 + forward .example.com . + + + + A rule that uses a SOCKS 4 gateway for all destinations but no HTTP parent looks like this: + + + + forward-socks4 .*. socks-gw.example.com:1080 . + + + + + + +Advanced Forwarding Examples - host-a has a PPP connection to isp-a.com. And host-b has a PPP connection to - isp-b.com. host-a can run a Junkbuster proxy with - forwarding like this: + If you have links to multiple ISPs that provide various special content + only to their subscribers, you can configure multiple Privoxies + which have connections to the respective ISPs to act as forwarders to each other, so that + your users can see the internal content of all ISPs. - - - - forward .* . - forward isp-b.com host-b:8118 - - - + Assume that host-a has a PPP connection to isp-a.net. And host-b has a PPP connection to + isp-b.net. Both run Privoxy. Their forwarding + configuration can look like this: - host-b can run a Junkbuster proxy with forwarding - like this: + host-a: - - - - forward .* . - forward isp-a.com host-a:8118 - - - + + forward .*. . + forward .isp-b.net host-b:8118 + - Now, anyone on the Internet (including users on host-a - and host-b) can set their browser's proxy to either - host-a or host-b and be able to browse the content on isp-a or isp-b. + host-b: - Here's another practical example, for University of Kent at - Canterbury students with a network connection in their room, who - need to use the University's Squid web cache. + + forward .*. . + forward .isp-a.net host-a:8118 + - - - - forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for: - forward .ukc.ac.uk . # Anything on the same domain as us - forward * . # Host with no domain specified - forward 129.12.*.* . # A dotted IP on our /16 network. - forward 127.*.*.* . # Loopback address - forward localhost.localdomain . # Loopback address - forward www.ukc.mirror.ac.uk . # Specific host - - - + Now, your users can set their browser's proxy to use either + host-a or host-b and be able to browse the internal content + of both isp-a and isp-b. - If you intend to chain Junkbuster and + If you intend to chain Privoxy and squid locally, then chain as - browser -> squid -> junkbuster is the recommended way. + browser -> squid -> privoxy is the recommended way. - Your squid configuration could then look like this: + Assuming that Privoxy and squid + run on the same box, your squid configuration could then look like this: - - - - # Define junkbuster as parent cache - - cache_peer 127.0.0.1 parent 8118 0 no-query - - # Define ACL for protocol FTP - acl FTP proto FTP + + # Define Privoxy as parent proxy (without ICP) + cache_peer 127.0.0.1 parent 8118 7 no-query - # Do not forward ACL FTP to junkbuster - always_direct allow FTP + # Define ACL for protocol FTP + acl ftp proto FTP - # Do not forward ACL CONNECT (https) to junkbuster - always_direct allow CONNECT + # Do not forward FTP requests to Privoxy + always_direct allow ftp - # Forward the rest to junkbuster + # Forward all the rest to Privoxy never_direct allow all - - - + + + + + You would then need to change your browser's proxy settings to squid's address and port. + Squid normally uses port 3128. If unsure consult http_port in squid.conf. + + @@ -1693,14 +2158,14 @@ configuration section below. HB.) Removed references to Win32. HB 09/23/01 --> - Junkbuster has a number of options specific to the + Privoxy has a number of options specific to the Windows GUI interface: If activity-animation is set to 1, the - Junkbuster icon will animate when - Junkbuster is active. To turn off, set to 0. + Privoxy icon will animate when + Privoxy is active. To turn off, set to 0. @@ -1715,7 +2180,7 @@ Removed references to Win32. HB 09/23/01 If log-messages is set to 1, - Junkbuster will log messages to the console + Privoxy will log messages to the console window: @@ -1767,7 +2232,7 @@ Removed references to Win32. HB 09/23/01 If log-highlight-messages is set to 1, - Junkbuster will highlight portions of the log + Privoxy will highlight portions of the log messages with a bold-faced font: @@ -1811,7 +2276,7 @@ Removed references to Win32. HB 09/23/01 show-on-task-bar controls whether or not - Junkbuster will appear as a button on the Task bar + Privoxy will appear as a button on the Task bar when minimized: @@ -1827,7 +2292,7 @@ Removed references to Win32. HB 09/23/01 If close-button-minimizes is set to 1, the Windows close - button will minimize Junkbuster instead of closing + button will minimize Privoxy instead of closing the program (close with the exit option on the File menu). @@ -1843,8 +2308,8 @@ Removed references to Win32. HB 09/23/01 The hide-console option is specific to the MS-Win console - version of JunkBuster. If this option is used, - Junkbuster will disconnect from and hide the + version of Privoxy. If this option is used, + Privoxy will disconnect from and hide the command console. @@ -1869,153 +2334,269 @@ Removed references to Win32. HB 09/23/01 The Actions File - The ijb.action file (formerly - actionsfile) is used to define what actions - Junkbuster takes, and thus determines how images, - cookies and various other aspects of HTTP content and transactions are - handled. Images can be anything you want, including ads, banners, or just - some obnoxious image that you would rather not see. Cookies can be accepted - or rejected, or accepted only during the current browser session (i.e. - not written to disk). Changes to ijb.action should - be immediately visible to Junkbuster without - the need to restart. + The actions file (default.action, formerly: + actionsfile or ijb.action) is used + to define what actions Privoxy takes for which + URLs, and thus determines how ad images, cookies and various other aspects + of HTTP content and transactions are handled on which sites (or even parts + thereof). - - To determine which actions apply to a request, the URL of the request is - compared to all patterns in this file. Every time it matches, the list of - applicable actions for the URL is incrementally updated. You can trace - this process by visiting http://i.j.b/show-url-info. + + Anything you want can blocked, including ads, banners, or just some obnoxious + URL that you would rather not see. Cookies can be accepted or rejected, or + accepted only during the current browser session (i.e. not written to disk), + content can be modified, JavaScripts tamed, user-tracking fooled, and much more. + See below for a complete list of available actions. + + +Finding the Right Mix - The actions file can be edited with a browser by loading - http://i.j.b/, and then select - Edit Actions. + Note that some actions like cookie suppression or script disabling may + render some sites unusable, which rely on these techniques to work properly. + Finding the right mix of actions is not easy and certainly a matter of personal + taste. In general, it can be said that the more aggressive + your default settings (in the top section of the actions file) are, + the more exceptions for trusted sites you will have to + make later. If, for example, you want to kill popup windows per default, you'll + have to make exceptions from that rule for sites that you regularly use + and that require popups for actually useful content, like maybe your bank, + favorite shop, or newspaper. - There are four types of lines in this file: comments (begin with a - # character), actions, aliases and patterns, all of which are - explained below, as well as the configuration file syntax that - Junkbuster understands. - + We have tried to provide you with reasonable rules to start from in the + distribution actions file. But there is no general rule of thumb on these + things. There just are too many variables, and sites are constantly changing. + Sooner or later you will want to change the rules (and read this chapter). - + -URL Domain and Path Syntax +How to Edit - Generally, a pattern has the form <domain>/<path>, where both the - <domain> and <path> part are optional. If you only specify a - domain part, the / can be left out: + The easiest way to edit the actions file is with a browser by + using our browser-based editor, which is available at http://config.privoxy.org/edit-actions. - www.example.com - is a domain only pattern and will match any request to - www.example.com. + If you prefer plain text editing to GUIs, you can of course also directly edit the + default.action file. + - - www.example.com/ - means exactly the same. - + +How Actions are Applied to URLs - www.example.com/index.html - matches only the single - document /index.html on www.example.com. + The actions file is divided into sections. There are special sections, + like the alias sections which will be discussed later. For now let's + concentrate on regular sections: They have a heading line (often split + up to multiple lines for readability) which consist of a list of actions, + separated by whitespace and enclosed in curly braces. Below that, there + is a list of URL patterns, each on a separate line. - /index.html - matches the document /index.html, regardless of - the domain. + To determine which actions apply to a request, the URL of the request is + compared to all patterns in this file. Every time it matches, the list of + applicable actions for the URL is incrementally updated, using the heading + of the section in which the pattern is located. If multiple matches for + the same URL set the same action differently, the last match wins. - index.html - matches nothing, since it would be - interpreted as a domain name and there is no top-level domain called - .html. + You can trace this process by visiting http://config.privoxy.org/show-url-info. - The matching of the domain part offers some flexible options: if the - domain starts or ends with a dot, it becomes unanchored at that end. - For example: + More detail on this is provided in the Appendix, + Anatomy of an Action. + + + +Patterns - .example.com - matches any domain that ENDS in - .example.com. + Generally, a pattern has the form <domain>/<path>, + where both the <domain> and <path> + are optional. (This is why the pattern / matches all URLs). + + + www.example.com/ + + + is a domain-only pattern and will match any request to www.example.com, + regardless of which document on that server is requested. + + + + + www.example.com + + + means exactly the same. For domain-only patterns, the trailing / may + be omitted. + + + + + www.example.com/index.html + + + matches only the single document /index.html + on www.example.com. + + + + + /index.html + + + matches the document /index.html, regardless of the domain, + i.e. on any web server. + + + + + index.html + + + matches nothing, since it would be interpreted as a domain name and + there is no top-level domain called .html. + + + + + +The Domain Pattern + - www. - matches any domain that STARTS with - www. + The matching of the domain part offers some flexible options: if the + domain starts or ends with a dot, it becomes unanchored at that end. + For example: + + + .example.com + + + matches any domain that ENDS in + .example.com + + + + + www. + + + matches any domain that STARTS with + www. + + + + + .example. + + + matches any domain that CONTAINS .example. + (Correctly speaking: It matches any FQDN that contains example as a domain.) + + + + + Additionally, there are wild-cards that you can use in the domain names themselves. They work pretty similar to shell wild-cards: * stands for zero or more arbitrary characters, ? stands for - any single character. And you can define character classes in square - brackets and they can be freely mixed: + any single character, you can define character classes in square + brackets and all of that can be freely mixed: - - ad*.example.com - matches adserver.example.com, - ads.example.com, etc but not sfads.example.com. - + + + ad*.example.com + + + matches adserver.example.com, + ads.example.com, etc but not sfads.example.com + + + + + *ad*.example.com + + + matches all of the above, and then some. + + + + + .?pix.com + + + matches www.ipix.com, + pictures.epix.com, a.b.c.d.e.upix.com etc. + + + + + www[1-9a-ez].example.c* + + + matches www1.example.com, + www4.example.cc, wwwd.example.cy, + wwwz.example.com etc., but not + wwww.example.com. + + + + - - *ad*.example.com - matches all of the above, and then some. - + - - .?pix.com - matches www.ipix.com, - pictures.epix.com, a.b.c.d.e.upix.com, etc. - +The Path Pattern - www[1-9a-ez].example.com - matches www1.example.com, - www4.example.com, wwwd.example.com, - wwwz.example.com, etc., but not - wwww.example.com. + Privoxy uses Perl compatible regular expressions + (through the PCRE library) for + matching the path. - If Junkbuster was compiled with - pcre support (default), Perl compatible regular expressions - can be used. See the pcre/docs/ directory or man - perlre (also available on http://www.perldoc.com/perl5.6/pod/perlre.html) - for details. A brief discussion of regular expressions is in the - Appendix. For instance: + There is an Appendix with a brief quick-start into regular + expressions, and full (very technical) documentation on PCRE regex syntax is available on-line + at http://www.pcre.org/man.txt. + You might also find the Perl man page on regular expressions (man perlre) + useful, which is available on-line at http://www.perldoc.com/perl5.6/pod/perlre.html. - /.*/advert[0-9]+\.jpe?g - would match a URL from any - domain, with any path that includes advert followed - immediately by one or more digits, then a . and ending in - either jpeg or jpg. So we match - example.com/ads/advert2.jpg, and - www.example.com/ads/banners/advert39.jpeg, but not - www.example.com/ads/banners/advert39.gif (no gifs in the - example pattern). + Note that the path pattern is automatically left-anchored at the /, + i.e. it matches as if it would start with a ^. - Please note that matching in the path is case + Please also note that matching in the path is case INSENSITIVE by default, but you can switch to case sensitive at any point in the pattern by using the (?-i) switch: - - - - www.example.com/(?-i)PaTtErN.* - will match only - documents whose path starts with PaTtErN in + www.example.com/(?-i)PaTtErN.* will match only + documents whose path starts with PaTtErN in exactly this capitalization. + @@ -2092,20 +2673,22 @@ Removed references to Win32. HB 09/23/01 If nothing is specified in this file, no actions are taken. - So in this case JunkBuster would just be a + So in this case Privoxy would just be a normal, non-blocking, non-anonymizing proxy. You must specifically enable the privacy and blocking features you need (although the - provided default ijb.action file will + provided default default.action file will give a good starting point). - Later defined actions always over-ride earlier ones. For multi-valued - actions, the actions are applied in the order they are specified. + Later defined actions always over-ride earlier ones. So exceptions + to any rules you make, should come in the latter part of the file. For + multi-valued actions, the actions are applied in the order they are + specified. - The list of valid Junkbuster actions are: + The list of valid Privoxy actions are: @@ -2130,7 +2713,11 @@ Removed references to Win32. HB 09/23/01 - Block this URL totally. + Block this URL totally. In a default installation, a blocked + URL will result in bright red banner that says BLOCKED, + with a reason why it is being blocked, and an option to see it anyway. + The page displayed for this is the blocked template + file. @@ -2171,7 +2758,7 @@ Removed references to Win32. HB 09/23/01 +downgrade will downgrade HTTP/1.1 client requests to HTTP/1.0 and downgrade the responses as well. Use this action for servers that use HTTP/1.1 protocol features that - Junkbuster doesn't handle well yet. HTTP/1.1 + Privoxy doesn't handle well yet. HTTP/1.1 is only partially implemented. Default is not to downgrade requests. @@ -2191,7 +2778,7 @@ Removed references to Win32. HB 09/23/01 will link to some script on their own server, giving the destination as a parameter, which will then redirect you to the final target. URLs resulting from this scheme typically look like: - http://some.place/some_script?http://some.where-else. + http://some.place/some_script?http://some.where-else. Sometimes, there are even multiple consecutive redirects encoded in the @@ -2203,9 +2790,9 @@ Removed references to Win32. HB 09/23/01 The +fast-redirects option enables interception of these - requests by Junkbuster, who will cut off all but - the last valid URL in the request and send a local redirect back to your - browser without contacting the remote site. + types of requests by Privoxy, who will cut off + all but the last valid URL in the request and send a local redirect back to + your browser without contacting the intermediate site(s). @@ -2220,17 +2807,101 @@ Removed references to Win32. HB 09/23/01 - Filter the website through the re_filterfile: - + Apply the filters in the section_header + section of the default.filter file to the site(s). + default.filter sections are grouped according to like + functionality. Filters can be used to + re-write any of the raw page content. This is a potentially a + very powerful feature! + + - +filter{filename} + +filter{section_header} + + + Filter sections that are pre-defined in the supplied + default.filter include: + + +
+ + + html-annoyances: Get rid of particularly annoying HTML abuse. + + + + + js-annoyances: Get rid of particularly annoying JavaScript abuse + + + + + content-cookies: Kill cookies that come in the HTML or JS content + + + + + popups: Kill all popups in JS and HTML + + + + + frameset-borders: Give frames a border and make them resizable + + + + + webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) + + + + + refresh-tags: Kill automatic refresh tags (for dial-on-demand setups) + + + + + fun: Text replacements for subversive browsing fun! + + + + + nimda: Remove Nimda (virus) code. + + + + + banners-by-size: Kill banners by size (very efficient!) + + + + + shockwave-flash: Kill embedded Shockwave Flash objects + + + + + crude-parental: Kill all web pages that contain the words "sex" or "warez" + + +
+ + + + Note: Filtering requires buffering the page content, which may appear to slow down + page rendering since nothing is displayed until all content has passed + the filters. (It does not really take longer, but seems that way since + the page is not incrementally displayed.) This effect will be more noticeable + on slower connections. + +
@@ -2271,7 +2942,7 @@ Removed references to Win32. HB 09/23/01 Don't send the Referer: (sic) header to the web site. You can block it, forge a URL to the same server as the request (which is preferred because some sites will not send images otherwise) or set it to a - constant string of your choice. + constant, user defined string of your choice.
@@ -2322,13 +2993,13 @@ Removed references to Win32. HB 09/23/01 + + ++image-blocker{blank} will send a 1x1 transparent GIF +image. And finally, +image-blocker{http://xyz.com} will send a +HTTP temporary redirect to the specified image. This has the advantage of the +icon being being cached by the browser, which will speed up the display. ++image-blocker{pattern} will send a checkerboard type pattern: + + +
- +image-blocker{logo} + +image-blocker{blank} - +image-blocker{http://i.j.b/send-banner} + +image-blocker{pattern} + +image-blocker{http://p.p/send-banner} @@ -2393,7 +3077,7 @@ Removed references to Win32. HB 09/23/01 By default (i.e. in the absence of a +limit-connect - action), Junkbuster will only allow CONNECT + action), Privoxy will only allow CONNECT requests to port 443, which is the standard port for https as a precaution. @@ -2433,10 +3117,10 @@ Removed references to Win32. HB 09/23/01 +no-compression prevents the website from compressing the data. Some websites do this, which can be a problem for - Junkbuster, since +filter, + Privoxy, since +filter, +no-popup and +gif-deanimate will not work on compressed data. This will slow down connections to those websites, - though. Default is nocompression is turned on. + though. Default is no-compression is turned on. @@ -2617,17 +3301,21 @@ Removed references to Win32. HB 09/23/01 - Turn on page filtering, with one exception for sourceforge: - + Turn on page filtering according to rules in the defined sections + of default.filter, and make one exception for + Sourceforge: + - # Run everything through the default filter file (re_filterfile): - {+filter} - - # But please don't re_filter code from sourceforge! + # Run everything through the filter file, using only the + # specified sections: + +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}\ + +filter{webbugs} +filter{nimda} +filter{banners-by-size} + + # Then disable filtering of code from sourceforge! {-filter} .cvs.sourceforge.net @@ -2636,9 +3324,9 @@ Removed references to Win32. HB 09/23/01 - Now some URLs that we want blocked, ie we won't see them. - Many of these use regular expressions that will expand to match multiple - URLs: + Now some URLs that we want blocked (normally generates + the blocked banner). Many of these use regular expressions + that will expand to match multiple URLs: @@ -2695,6 +3383,15 @@ Removed references to Win32. HB 09/23/01 + + Note that many of these actions have the potential to cause a page to + misbehave, possibly even not to display at all. There are many ways + a site designer may choose to design his site, and what HTTP header + content he may depend on. There is no way to have hard and fast rules + for all sites. See the Appendix + for a brief example on troubleshooting actions. + + @@ -2704,7 +3401,7 @@ Removed references to Win32. HB 09/23/01 Aliases - Custom actions, known to Junkbuster + Custom actions, known to Privoxy as aliases, can be defined by combining other actions. These can in turn be invoked just like the built-in actions. Currently, an alias can contain any character except space, tab, =, @@ -2712,7 +3409,7 @@ Removed references to Win32. HB 09/23/01 z, 0-9, +, and -. Alias names are not case sensitive, and must be defined before anything else in the - ijb.actionfile ! And there can only be one set of + default.actionfile! And there can only be one set of aliases defined. @@ -2724,7 +3421,7 @@ Removed references to Win32. HB 09/23/01 - # Useful customer aliases we can use later. These must come first! + # Useful custom aliases we can use later. These must come first! {{alias}} +no-cookies = +no-cookies-set +no-cookies-read -no-cookies = -no-cookies-set -no-cookies-read @@ -2775,6 +3472,13 @@ Removed references to Win32. HB 09/23/01
+ + The shop and fragile aliases are often used for + problem sites that require most actions to be disabled + in order to function properly. + + +
@@ -2785,66 +3489,437 @@ Removed references to Win32. HB 09/23/01 The Filter File - The filter file defines what filtering of web pages - Junkbuster does. The default filter file is - re_filterfile, located in the config directory. In this - file, any document content, whether viewable text or - embedded non-visible content, can be changed. + Any web page can be dynamically modified with the filter file. This + modification can be removal, or re-writing, of any web page content, + including tags and non-visible content. The default filter file is + default.filter, located in the config directory. + + + + This is potentially a very powerful feature, and requires knowledge of both + regular expression and HTML in order create custom + filters. But, there are a number of useful filters included with + Privoxy for many common situations. + + + + The included example file is divided into sections. Each section begins + with the FILTER keyword, followed by the identifier + for that section, e.g. FILTER: webbugs. Each section performs + a similar type of filtering, such as html-annoyances. + + + + This file uses regular expressions to alter or remove any string in the + target page. The expressions can only operate on one line at a time. Some + examples from the included default default.filter: + + + + Stop web pages from displaying annoying messages in the status bar by + deleting such references: + + + + + + + FILTER: html-annoyances + + # New browser windows should be resizeable and have a location and status + # bar. Make it so. + # + s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig + s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig + s/scrolling="?(no|0|Auto)"?/scrolling=1/ig + s/menubar="?(no|0)"?/menubar=1/ig + + # The <BLINK> tag was a crime! + # + s*<blink>|</blink>**ig + + # Is this evil? + # + #s/framespacing="?(no|0)"?//ig + #s/margin(height|width)=[0-9]*//gi + + + + + + + Just for kicks, replace any occurrence of Microsoft with + MicroSuck, and have a little fun with topical buzzwords: + + + + + + + FILTER: fun + + s/microsoft(?!.com)/MicroSuck/ig + + # Buzzword Bingo: + # + s/industry-leading|cutting-edge|award-winning/<font color=red><b>BINGO!</b></font>/ig + + + + + + + Kill those pesky little web-bugs: + + + + + + + # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking) + FILTER: webbugs + + s/<img\s+[^>]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\s*['"]?1(\D[^>]*?)?>/<!-- Squished WebBug -->/sig + + + + + + + + + + + + + + +Templates + + When Privoxy displays one of its internal + pages, such as a 404 Not Found error page, it uses the appropriate template. + On Linux, BSD, and Unix, these are located in + /etc/privoxy/templates by default. These may be + customized, if desired. cgi-style.css is + used to control the HTML attributes (fonts, etc). + + + The default Blocked banner page with the bright red top + banner, is called just blocked. This + may be customized or replaced with something else if desired. + + + + + + + + + + + + +Contacting the Developers, Bug Reporting and Feature +Requests + + + &contacting; + + + + + +Submitting Ads and <quote>Action</quote> Problems + + Ads and banners that are not stopped by Privoxy + can be submitted to the developers by accessing a special page and filling + out the brief, required form. Conversely, you can also report pages, images, + etc. that Privoxy is blocking, but should not. + The form itself does require Internet access. + + + To do this, point your browser to Privoxy + at http://config.privoxy.org/ + (shortcut: http://p.p/), and then select + Actions file feedback system, + near the bottom of the page. Paste in the URL that is the cause of the + unwanted behavior, and follow the prompts. The developers will + try to incorporate a fix for the problem you reported into future versions. + + + + New default.actions files will occasionally be made + available based on your feedback. These + will be announced on the + ijbswa-announce + list. + + + + + + + +Copyright and History + +Copyright + + ©right; + + + + + + + + +History + + &history; + + + + + +See Also + + &seealso; + + + + + + +Appendix + + + + +Regular Expressions + + Privoxy can use regular expressions + in various config files. Assuming support for pcre (Perl + Compatible Regular Expressions) is compiled in, which is the default. Such + configuration directives do not require regular expressions, but they can be + used to increase flexibility by matching a pattern with wild-cards against + URLs. + + + + If you are reading this, you probably don't understand what regular + expressions are, or what they can do. So this will be a very brief + introduction only. A full explanation would require a book ;-) + + + + Regular expressions is a way of matching one character + expression against another to see if it matches or not. One of the + expressions is a literal string of readable characters + (letter, numbers, etc), and the other is a complex string of literal + characters combined with wild-cards, and other special characters, called + meta-characters. The meta-characters have special meanings and + are used to build the complex pattern to be matched against. Perl Compatible + Regular Expressions is an enhanced form of the regular expression language + with backward compatibility. + + + + To make a simple analogy, we do something similar when we use wild-card + characters when listing files with the dir command in DOS. + *.* matches all filenames. The special + character here is the asterisk which matches any and all characters. We can be + more specific and use ? to match just individual + characters. So dir file?.text would match + file1.txt, file2.txt, etc. We are pattern + matching, using a similar technique to regular expressions! + + + + Regular expressions do essentially the same thing, but are much, much more + powerful. There are many more special characters and ways of + building complex patterns however. Let's look at a few of the common ones, + and then some examples: + + + + + . - Matches any single character, e.g. a, + A, 4, :, or @. + + + + + + ? - The preceding character or expression is matched ZERO or ONE + times. Either/or. + + + + + + + - The preceding character or expression is matched ONE or MORE + times. + + + + + + * - The preceding character or expression is matched ZERO or MORE + times. + + + + + + \ - The escape character denotes that + the following character should be taken literally. This is used where one of the + special characters (e.g. .) needs to be taken literally and + not as a special meta-character. + + + + + + [] - Characters enclosed in brackets will be matched if + any of the enclosed characters are encountered. + + + + + + () - parentheses are used to group a sub-expression, + or multiple sub-expressions. + + + + + + | - The bar character works like an + or conditional statement. A match is successful if the + sub-expression on either side of | matches. + + + + + + s/string1/string2/g - This is used to rewrite strings of text. + string1 is replaced by string2 in this + example. + + + + + These are just some of the ones you are likely to use when matching URLs with + Privoxy, and is a long way from a definitive + list. This is enough to get us started with a few simple examples which may + be more illuminating: + + + + /.*/banners/.* - A simple example + that uses the common combination of . and * to + denote any character, zero or more times. In other words, any string at all. + So we start with a literal forward slash, then our regular expression pattern + (.*) another literal forward slash, the string + banners, another forward slash, and lastly another + .*. We are building + a directory path here. This will match any file with the path that has a + directory named banners in it. The .* matches + any characters, and this could conceivably be more forward slashes, so it + might expand into a much longer looking path. For example, this could match: + /eye/hate/spammers/banners/annoy_me_please.gif, or just + /banners/annoying.html, or almost an infinite number of other + possible combinations, just so it has banners in the path + somewhere. - This file uses regular expressions to alter or remove any string in the - target page. Some examples from the included default re_filterfile: + A now something a little more complex: - Stop web pages from displaying annoying messages in the status bar by - deleting such references: + /.*/adv((er)?ts?|ertis(ing|ements?))?/ - + We have several literal forward slashes again (/), so we are + building another expression that is a file path statement. We have another + .*, so we are matching against any conceivable sub-path, just so + it matches our expression. The only true literal that must + match our pattern is adv, together with + the forward slashes. What comes after the adv string is the + interesting part. - - - - # The status bar is for displaying link targets, not pointless buzzwords. - # Again, check it out on http://www.airport-cgn.de/. - s/status='.*?';*//ig - - - + Remember the ? means the preceding expression (either a + literal character or anything grouped with (...) in this case) + can exist or not, since this means either zero or one match. So + ((er)?ts?|ertis(ing|ements?)) is optional, as are the + individual sub-expressions: (er), + (ing|ements?), and the s. The | + means or. We have two of those. For instance, + (ing|ements?), can expand to match either ing + OR ements?. What is being done here, is an + attempt at matching as many variations of advertisement, and + similar, as possible. So this would expand to match just adv, + or advert, or adverts, or + advertising, or advertisement, or + advertisements. You get the idea. But it would not match + advertizements (with a z). We could fix that by + changing our regular expression to: + /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match + either spelling. - Just for kicks, replace any occurrence of Microsoft with - MicroSuck: + /.*/advert[0-9]+\.(gif|jpe?g) - Again + another path statement with forward slashes. Anything in the square brackets + [] can be matched. This is using 0-9 as a + shorthand expression to mean any digit one through nine. It is the same as + saying 0123456789. So any digit matches. The + + means one or more of the preceding expression must be included. The preceding + expression here is what is in the square brackets -- in this case, any digit + one through nine. Then, at the end, we have a grouping: (gif|jpe?g). + This includes a |, so this needs to match the expression on + either side of that bar character also. A simple gif on one side, and the other + side will in turn match either jpeg or jpg, + since the ? means the letter e is optional and + can be matched once or not at all. So we are building an expression here to + match image GIF or JPEG type image file. It must include the literal + string advert, then one or more digits, and a . + (which is now a literal, and not a special character, since it is escaped + with \), and lastly either gif, or + jpeg, or jpg. Some possible matches would + include: //advert1.jpg, + /nasty/ads/advert1234.gif, + /banners/from/hell/advert99.jpg. It would not match + advert1.gif (no leading slash), or + /adverts232.jpg (the expression does not include an + s), or /advert1.jsp (jsp is not + in the expression anywhere). - - - - s/microsoft(?!.com)/MicroSuck/ig - - - + s/microsoft(?!.com)/MicroSuck/i - This is + a substitution. MicroSuck will replace any occurrence of + microsoft. The i at the end of the expression + means ignore case. The (?!.com) means + the match should fail if microsoft is followed by + .com. In other words, this acts like a NOT + modifier. In case this is a hyperlink, we don't want to break it ;-). - Kill those auto-refresh tags: + We are barely scratching the surface of regular expressions here so that you + can understand the default Privoxy + configuration files, and maybe use this knowledge to customize your own + installation. There is much, much more that can be done with regular + expressions. Now that you know enough to get started, you can learn more on + your own :/ - - - - # Kill refresh tags. I like to refresh myself. Manually. - # check it out on http://www.airport-cgn.de/ and go to the arrivals page. - # - s/<meta[^>]*http-equiv[^>]*refresh.*URL=([^>]*?)"?>/<link rev="x-refresh" href=$1>/i - s/<meta[^>]*http-equiv="?page-enter"?[^>]*content=[^>]*>/<!--no page enter for me-->/i - - - + More reading on Perl Compatible Regular expressions: + http://www.perldoc.com/perl5.6/pod/perlre.html @@ -2852,500 +3927,479 @@ Removed references to Win32. HB 09/23/01 - - -Templates +<application>Privoxy</application>'s Internal Pages + - When Junkbuster displays one of its internal - pages, such as a 404 Not Found error page, it uses the appropriate template. - On Linux, BSD, and Unix, these are locate in - /etc/junkbuster/templates by default. These may be - customized, if desired. + Since Privoxy proxies each requested + web page, it is easy for Privoxy to + trap certain special URLs. In this way, we can talk directly to + Privoxy, and see how it is + configured, see how our rules are being applied, change these + rules and other configuration options, and even turn + Privoxy's filtering off, all with + a web browser. - - - - - - - - -Quickstart to Using Junkbuster - Install package, then run and enjoy! JunkBuster - accepts only one command line option -- the configuration file to be - used. Example Unix startup command: + The URLs listed below are the special ones that allow direct access + to Privoxy. Of course, + Privoxy must be running to access these. If + not, you will get a friendly error message. Internet access is not + necessary either. - + + + + + Privoxy main page: + +
+ + http://config.privoxy.org/ + +
+ + Alternately, this may be reached at http://p.p/, but this + variation may not work as reliably as the above in some configurations. + +
+ + + + Show information about the current configuration: + +
+ + http://config.privoxy.org/show-status + +
+
+ + + + Show the source code version numbers: + +
+ + http://config.privoxy.org/show-version + +
+
- # /usr/sbin/junkbuster /etc/junkbuster/config + + + Show the client's request headers: + +
+ + http://config.privoxy.org/show-request + +
+
-
-
+ + + Show which actions apply to a URL and why: + +
+ + http://config.privoxy.org/show-url-info + +
+
+ + + + Toggle Privoxy on or off. In this case, Privoxy continues + to run, but only as a pass-through proxy, with no actions taking place: + +
+ + http://config.privoxy.org/toggle + +
+ + Short cuts. Turn off, then on: + +
+ + http://config.privoxy.org/toggle?set=disable + +
+
+ + http://config.privoxy.org/toggle?set=enable + +
+
- - An init script is provided for SuSE and Redhat. + + + Edit the actions list file: + +
+ + http://config.privoxy.org/edit-actions + +
+
+ +
-For for SuSE: /etc/rc.d/junkbuster start - + These may be bookmarked for quick reference. - -For RedHat: /etc/rc.d/init.d/junkbuster start - + +Bookmarklets - If no configuration file is specified on the command line, - Junkbuster will look for a file named - config in the current directory. Except on Win32 where - it will try config.txt. If no file is specified on the - command line and no default configuration file can be found, - Junkbuster will fail to start. + Below are some bookmarklets to allow you to easily access a + mini version of some of Privoxy's + special pages. They are designed for MS Internet Explorer, but should work + equally well in Netscape, Mozilla, and other browsers which support + JavaScript. They are designed to run directly from your bookmarks - not by + clicking the links below (although that should work for testing). - - Be sure your browser is set to use the proxy which is by default at - localhost, port 8118. With Netscape (and - Mozilla), this can be set under Edit - -> Preferences -> Advanced -> Proxies -> HTTP Proxy. - For Internet Explorer: Tools > - Internet Properties -> Connections -> LAN Setting. Then, - check Use Proxy and fill in the appropriate info (Address: - localhost, Port: 8118). Include if HTTPS proxy support too. + To save them, right-click the link and choose Add to Favorites + (IE) or Add Bookmark (Netscape). You will get a warning that + the bookmark may not be safe - just click OK. Then you can run the + Bookmarklet directly from your favorites/bookmarks. For even faster access, + you can put them on the Links bar (IE) or the Personal + Toolbar (Netscape), and run them with a single click. - The included default configuration files should give a reasonable starting - point, though may be somewhat aggressive in blocking junk. You will probably - want to keep an eye out for sites that require persistent cookies, and add these to - ijb.action as needed. By default, most of these will - be accepted only during the current browser session, until you add them to - the configuration. If you want the browser to handle this instead, you will - need to edit ijb.action and disable this feature. If you - use more than one browser, it would make more sense to let - Junkbuster handle this. In which case, the - browser(s) should be set to accept all cookies. - + - - If a particular site shows problems loading properly, try adding it - to the {fragile} section of - ijb.action. This will turn off most actions for - this site. - + + + Enable Privoxy + + - - Junkbuster is HTTP/1.1 compliant, but not all 1.1 - features are as yet implemented. If browsers that support HTTP/1.1 (like - Mozilla or recent versions of I.E.) experience - problems, you might try to force HTTP/1.0 compatibility. For Mozilla, look - under Edit -> Preferences -> Debug -> Networking. - Or set the +downgrade config option in - ijb.action. - + + + Disable Privoxy + + - - After running Junkbuster for a while, you can - start to fine tune the configuration to suit your personal, or site, - preferences and requirements. There are many, many aspects that can - be customized. Actions (as specified in ijb.action) - can be adjusted by pointing your browser to - http://i.j.b/, - and then follow the link to edit the actions list. - (This is an internal page and does not require Internet access.) - + + + Toggle Privoxy (Toggles between enabled and disabled) + + - - In fact, various aspects of Junkbuster - configuration can be viewed from this page, including - current configuration parameters, source code version numbers, - the browser's request headers, and actions that apply - to a given URL. In addition to the ijb.action file - editor mentioned above, Junkbuster can also - be turned on and off from this page. + + + View Privoxy Status + + + + + + Actions file feedback system + + + + + + - If you encounter problems, please verify it is a - Junkbuster bug, by disabling - Junkbuster, and then trying the same page. - Also, try another browser if possible to eliminate browser or site - problems. Before reporting it as a bug, see if there is not a configuration - option that is enabled that is causing the page not to load. You can - then add an exception for that page or site. If a bug, please report it to - the developers (see below). + Credit: The site which gave me the general idea for these bookmarklets is + www.bookmarklets.com. They + have more information about bookmarklets. -
+ - -Contacting the Developers, Bug Reporting and Feature -Requests - - Please do not use the mailing lists for feature requests or - bug reports. They are not as easily tracked this way! + - - - - Feature requests and other questions should be posted to the Feature - request page at SourceForge. There is also an archive there. - + + +Anatomy of an Action + + + The way Privoxy applies actions + and filters to any given URL can be complex, and not always so + easy to understand what is happening. And sometimes we need to be able to + see just what Privoxy is + doing. Especially, if something Privoxy is doing + is causing us a problem inadvertently. It can be a little daunting to look at + the actions and filters files themselves, since they tend to be filled with + regular expressions whose consequences are not always + so obvious. Privoxy provides the + http://config.privoxy.org/show-url-info + page that can show us very specifically how actions + are being applied to any given URL. This is a big help for troubleshooting. + - Anyone interested in actively participating in development and related - discussions can join the appropriate mailing list - here. - Archives are available here too. + First, enter one URL (or partial URL) at the prompt, and then + Privoxy will tell us + how the current configuration will handle it. This will not + help with filtering effects from the default.filter file! It + also will not tell you about any other URLs that may be embedded within the + URL you are testing (i.e. a web page). For instance, images such as ads are expressed as URLs + within the raw page source of HTML pages. So you will only get info for the + actual URL that is pasted into the prompt area -- not any sub-URLs. If you + want to know about embedded URLs like ads, you will have to dig those out of + the HTML source. Use your browser's View Page Source option + for this. Or right click on the ad, and grab the URL. - Please report bugs, using the form at - Sourceforge. - Please try to verify that it is a Junkbuster bug, - and not a browser or site bug first. Also, check to make sure this is not - already a known bug. If you are using your own custom configuration, please - try the stock configs to see if the problem is a configuration related bug. + Let's look at an example, google.com, + one section at a time: - - + + + System default actions: - -Copyright and History + { -add-header -block -deanimate-gifs -downgrade -fast-redirects -filter + -hide-forwarded -hide-from -hide-referer -hide-user-agent -image + -image-blocker -limit-connect -no-compression -no-cookies-keep + -no-cookies-read -no-cookies-set -no-popups -vanilla-wafer -wafer } + + + - -License - Internet Junkbuster is free software; you can - redistribute it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. + This is the top section, and only tells us of the compiled in defaults. This + is basically what Privoxy would do if there + were not any actions defined, i.e. it does nothing. Every action + is disabled. This is not particularly informative for our purposes here. OK, + next section: - This program is distributed in the hope that it will be useful, but WITHOUT - ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS - FOR A PARTICULAR PURPOSE. See the GNU General Public License for more - details, which is available from the Free Software Foundation, - Inc, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. - + - + Matches for http://google.com: - + { -add-header -block +deanimate-gifs -downgrade +fast-redirects + +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} + +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} + +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} + -hide-user-agent -image +image-blocker{blank} +no-compression + +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups + -vanilla-wafer -wafer } + / + { -no-cookies-keep -no-cookies-read -no-cookies-set } + .google.com - + { -fast-redirects } + .google.com + + + - -History - Junkbuster was originally written by Anonymous - Coders and Junkbuster's - Corporation, and was released as free open-source software under the - GNU GPL. Stefan - Waldherr made many improvements, and started the SourceForge project to - rekindle development. There are now several active developers contributing. - The last stable release was v2.0.2, which has now grown whiskers ;-). + This is much more informative, and tells us how we have defined our + actions, and which ones match for our example, + google.com. The first grouping shows our default + settings, which would apply to all URLs. If you look at your actions + file, this would be the section just below the aliases section + near the top. This applies to all URLs as signified by the single forward + slash -- /. + - + + These are the default actions we have enabled. But we can define additional + actions that would be exceptions to these general rules, and then list + specific URLs that these exceptions would apply to. Last match wins. + Just below this then are two explicit matches for .google.com. + The first is negating our various cookie blocking actions (i.e. we will allow + cookies here). The second is allowing fast-redirects. Note + that there is a leading dot here -- .google.com. This will + match any hosts and sub-domains, in the google.com domain also, such as + www.google.com. So, apparently, we have these actions defined + somewhere in the lower part of our actions file, and + google.com is referenced in these sections. - + - -See also - - - -   http://sourceforge.net/projects/ijbswa - - - - -   http://ijbswa.sourceforge.net/ - - - - -   http://i.j.b/ - - - - -   http://www.junkbusters.com/ht/en/cookies.html - - - - -   http://www.waldherr.org/junkbuster/ - - - - -   http://privacy.net/analyze/ - - - - -  http://www.squid-cache.org/ - - + And now we pull it altogether in the bottom section and summarize how + Privoxy is applying all its actions + to google.com: - + + + Final results: - -Appendix + -add-header -block -deanimate-gifs -downgrade -fast-redirects + +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} + +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} + +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} + -hide-user-agent -image +image-blocker{blank} -limit-connect +no-compression + -no-cookies-keep -no-cookies-read -no-cookies-set +no-popups -vanilla-wafer + -wafer + + - - -Regular Expressions - Junkbuster can use regular expressions - in various config files. Assuming support for pcre (Perl - Compatible Regular Expressions) is compiled in, which is the default. Such - configuration directives do not require regular expressions, but they can be - used to increase flexibility by matching a pattern with wild-cards against - URLs. + Now another example, ad.doubleclick.net: - If you are reading this, you probably don't understand what regular - expressions are, or what they can do. So this will be a very brief - introduction only. A full explanation would require a book ;-) + + + { +block +image } + .ad.doubleclick.net + + { +block +image } + ad*. + + { +block +image } + .doubleclick.net + + - Regular expressions is a way of matching one character - expression against another to see if it matches or not. One of the - expressions is a literal string of readable characters - (letter, numbers, etc), and the other is a complex string of literal - characters combined with wild-cards, and other special characters, called - meta-characters. The meta-characters have special meanings and - are used to build the complex pattern to be matched against. Perl Compatible - Regular Expressions is an enhanced form of the regular expression language - with backward compatibility. + We'll just show the interesting part here, the explicit matches. It is + matched three different times. Each as an +block +image, + which is the expanded form of one of our aliases that had been defined as: + +imageblock. (Aliases are defined in the + first section of the actions file and typically used to combine more + than one action.) - To make a simple analogy, we do something similar when we use wild-card - characters when listing files with the dir command in DOS. - *.* matches all filenames. The special - character here is the asterisk which matches any and all characters. We can be - more specific and use ? to match just individual - characters. So dir file?.text would match - file1.txt, file2.txt, etc. We are pattern - matching, using a similar technique to regular expressions! + Any one of these would have done the trick and blocked this as an unwanted + image. This is unnecessarily redundant since the last case effectively + would also cover the first. No point in taking chances with these guys + though ;-) Note that if you want an ad or obnoxious + URL to be invisible, it should be defined as ad.doubleclick.net + is done here -- as both a +block and an + +image. The custom alias +imageblock does this + for us. - Regular expressions do essentially the same thing, but are much, much more - powerful. There are many more special characters and ways of - building complex patterns however. Let's look at a few of the common ones, - and then some examples: + One last example. Let's try http://www.rhapsodyk.net/adsl/HOWTO/. + This one is giving us problems. We are getting a blank page. Hmmm... - - - . - Matches any single character, e.g. a, - A, 4, :, or @. - - - - - - ? - The preceding character or expression is matched ZERO or ONE - times. Either/or. - - - - - - + - The preceding character or expression is matched ONE or MORE - times. - - - - - - * - The preceding character or expression is matched ZERO or MORE - times. - - + + - - - \ - The escape character denotes that - the following character should be taken literally. This is used where one of the - special characters (e.g. .) needs to be taken literally and - not as a special meta-character. - - + Matches for http://www.rhapsodyk.net/adsl/HOWTO/: - - - [] - Characters enclosed in brackets will be matched if - any of the enclosed characters are encountered. - - + { -add-header -block +deanimate-gifs -downgrade +fast-redirects + +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups} + +filter{webbugs} +filter{nimda} +filter{banners-by-size} +filter{hal} + +filter{fun} +hide-forwarded +hide-from{block} +hide-referer{forge} + -hide-user-agent -image +image-blocker{blank} +no-compression + +no-cookies-keep -no-cookies-read -no-cookies-set +no-popups + -vanilla-wafer -wafer } + / - - - () - parentheses are used to group a sub-expression, - or multiple sub-expressions. - - + { +block +image } + /ads - - - | - The bar character works like an - or conditional statement. A match is successful if the - sub-expression on either side of | matches. - - + + - - - s/string1/string2/g - This is used to rewrite strings of text. - string1 is replaced by string2 in this - example. - - + + Ooops, the /adsl/ is matching /ads! But + we did not want this at all! Now we see why we get the blank page. We could + now add a new action below this that explicitly does not + block (-block) pages with adsl. There are various ways to + handle such exceptions. Example: + - These are just some of the ones you are likely to use when matching URLs with - Junkbuster, and is a long way from a definitive - list. This is enough to get us started with a few simple examples which may - be more illuminating: + + + { -block } + /adsl + + - /.*/banners/.* - A simple example - that uses the common combination of . and * to - denote any character, zero or more times. In other words, any string at all. - So we start with a literal forward slash, then our regular expression pattern - (.*) another literal forward slash, the string - banners, another forward slash, and lastly another - .*. We are building - a directory path here. This will match any file with the path that has a - directory named banners in it. The .* matches - any characters, and this could conceivably be more forward slashes, so it - might expand into a much longer looking path. For example, this could match: - /eye/hate/spammers/banners/annoy_me_please.gif, or just - /banners/annoying.html, or almost an infinite number of other - possible combinations, just so it has banners in the path - somewhere. + Now the page displays ;-) Be sure to flush your browser's caches when + making such changes. Or, try using Shift+Reload. - A now something a little more complex: + But now what about a situation where we get no explicit matches like + we did with: - /.*/adv((er)?ts?|ertis(ing|ements?))?/ - - We have several literal forward slashes again (/), so we are - building another expression that is a file path statement. We have another - .*, so we are matching against any conceivable sub-path, just so - it matches our expression. The only true literal that must - match our pattern is adv, together with - the forward slashes. What comes after the adv string is the - interesting part. + + + { -block } + /adsl + + - Remember the ? means the preceding expression (either a - literal character or anything grouped with (...) in this case) - can exist or not, since this means either zero or one match. So - ((er)?ts?|ertis(ing|ements?)) is optional, as are the - individual sub-expressions: (er), - (ing|ements?), and the s. The | - means or. We have two of those. For instance, - (ing|ements?), can expand to match either ing - OR ements?. What is being done here, is an - attempt at matching as many variations of advertisement, and - similar, as possible. So this would expand to match just adv, - or advert, or adverts, or - advertising, or advertisement, or - advertisements. You get the idea. But it would not match - advertizements (with a z). We could fix that by - changing our regular expression to: - /.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/, which would then match - either spelling. + That actually was very telling and pointed us quickly to where the problem + was. If you don't get this kind of match, then it means one of the default + rules in the first section is causing the problem. This would require some + guesswork, and maybe a little trial and error to isolate the offending rule. + One likely cause would be one of the {+filter} actions. Try + adding the URL for the site to one of aliases that turn off +filter: - /.*/advert[0-9]+\.(gif|jpe?g) - Again - another path statement with forward slashes. Anything in the square brackets - [] can be matched. This is using 0-9 as a - shorthand expression to mean any digit one through nine. It is the same as - saying 0123456789. So any digit matches. The + - means one or more of the preceding expression must be included. The preceding - expression here is what is in the square brackets -- in this case, any digit - one through nine. Then, at the end, we have a grouping: (gif|jpe?g). - This includes a |, so this needs to match the expression on - either side of that bar character also. A simple gif on one side, and the other - side will in turn match either jpeg or jpg, - since the ? means the letter e is optional and - can be matched once or not at all. So we are building an expression here to - match image GIF or JPEG type image file. It must include the literal - string advert, then one or more digits, and a . - (which is now a literal, and not a special character, since it is escaped - with \), and lastly either gif, or - jpeg, or jpg. Some possible matches would - include: //advert1.jpg, - /nasty/ads/advert1234.gif, - /banners/from/hell/advert99.jpg. It would not match - advert1.gif (no leading slash), or - /adverts232.jpg (the expression does not include an - s), or /advert1.jsp (jsp is not - in the expression anywhere). + + + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + .forbes.com + + - s/microsoft(?!.com)/MicroSuck/i - This is - a substitution. MicroSuck will replace any occurrence of - microsoft. The i at the end of the expression - means ignore case. The (?!.com) means - the match should fail if microsoft is followed by - .com. In other words, this acts like a NOT - modifier. In case this is a hyperlink, we don't want to break it ;-). + {shop} is an alias that expands to + { -filter -no-cookies -no-cookies-keep }. Or you could do + your own exception to negate filtering: + - We are barely scratching the surface of regular expressions here so that you - can understand the default Junkbuster - configuration files, and maybe use this knowledge to customize your own - installation. There is much, much more that can be done with regular - expressions. Now that you know enough to get started, you can learn more on - your own :/ + + + {-filter} + .forbes.com + + - More reading on Perl Compatible Regular expressions: - http://www.perldoc.com/perl5.6/pod/perlre.html + {fragile} is an alias that disables most actions. This can be + used as a last resort for problem sites. Remember to flush caches! If this + still does not work, you will have to go through the remaining actions one by + one to find which one(s) is causing the problem. @@ -3373,6 +4427,161 @@ communication (bugs, feature requests, etc.) Temple Place - Suite 330, Boston, MA 02111-1307, USA. $Log: user-manual.sgml,v $ + Revision 1.82 2002/04/18 12:04:50 oes + Cosmetics + + Revision 1.81 2002/04/18 11:50:24 oes + Extended Install section - needs fixing by packagers + + Revision 1.80 2002/04/18 10:45:19 oes + Moved text to buildsource.sgml, renamed some filters, details + + Revision 1.79 2002/04/18 03:18:06 hal9 + Spellcheck, and minor touchups. + + Revision 1.78 2002/04/17 18:04:16 oes + Proofreading part 2 + + Revision 1.77 2002/04/17 13:51:23 oes + Proofreading, part one + + Revision 1.76 2002/04/16 04:25:51 hal9 + -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section. + -Note about proxy may need requests to re-read config files. + + Revision 1.75 2002/04/12 02:08:48 david__schmidt + Remove OS/2 building info... it is already in the developer-manual + + Revision 1.74 2002/04/11 00:54:38 hal9 + Add small section on submitting actions. + + Revision 1.73 2002/04/10 18:45:15 swa + generated + + Revision 1.72 2002/04/10 04:06:19 hal9 + Added actions feedback to Bookmarklets section + + Revision 1.71 2002/04/08 22:59:26 hal9 + Version update. Spell chkconfig correctly :) + + Revision 1.70 2002/04/08 20:53:56 swa + ? + + Revision 1.69 2002/04/06 05:07:29 hal9 + -Add privoxy-man-page.sgml, for man page. + -Add authors.sgml for AUTHORS (and p-authors.sgml) + -Reworked various aspects of various docs. + -Added additional comments to sub-docs. + + Revision 1.68 2002/04/04 18:46:47 swa + consistent look. reuse of copyright, history et. al. + + Revision 1.67 2002/04/04 17:27:57 swa + more single file to be included at multiple points. make maintaining easier + + Revision 1.66 2002/04/04 06:48:37 hal9 + Structural changes to allow for conditional inclusion/exclusion of content + based on entity toggles, e.g. 'entity % p-not-stable "INCLUDE"'. And + definition of internal entities, e.g. 'entity p-version "2.9.13"' that will + eventually be set by Makefile. + More boilerplate text for use across multiple docs. + + Revision 1.65 2002/04/03 19:52:07 swa + enhance squid section due to user suggestion + + Revision 1.64 2002/04/03 03:53:43 hal9 + A few minor bug fixes, and touch ups. Ready for review. + + Revision 1.63 2002/04/01 16:24:49 hal9 + Define entities to include boilerplate text. See doc/source/*. + + Revision 1.62 2002/03/30 04:15:53 hal9 + - Fix privoxy.org/config links. + - Paste in Bookmarklets from Toggle page. + - Move Quickstart nearer top, and minor rework. + + Revision 1.61 2002/03/29 01:31:08 hal9 + Minor update. + + Revision 1.60 2002/03/27 01:57:34 hal9 + Added more to Anatomy section. + + Revision 1.59 2002/03/27 00:54:33 hal9 + Touch up intro for new name. + + Revision 1.58 2002/03/26 22:29:55 swa + we have a new homepage! + + Revision 1.57 2002/03/24 20:33:30 hal9 + A few minor catch ups with name change. + + Revision 1.56 2002/03/24 16:17:06 swa + configure needs to be generated. + + Revision 1.55 2002/03/24 16:08:08 swa + we are too lazy to make a block-built + privoxy logo. hence removed the option. + + Revision 1.54 2002/03/24 15:46:20 swa + name change related issue. + + Revision 1.53 2002/03/24 11:51:00 swa + name change. changed filenames. + + Revision 1.52 2002/03/24 11:01:06 swa + name change + + Revision 1.51 2002/03/23 15:13:11 swa + renamed every reference to the old name with foobar. + fixed "application foobar application" tag, fixed + "the foobar" with "foobar". left junkbustser in cvs + comments and remarks to history untouched. + + Revision 1.50 2002/03/23 05:06:21 hal9 + Touch up. + + Revision 1.49 2002/03/21 17:01:05 hal9 + New section in Appendix. + + Revision 1.48 2002/03/12 06:33:01 hal9 + Catching up to Andreas and re_filterfile changes. + + Revision 1.47 2002/03/11 13:13:27 swa + correct feedback channels + + Revision 1.46 2002/03/10 00:51:08 hal9 + Added section on JB internal pages in Appendix. + + Revision 1.45 2002/03/09 17:43:53 swa + more distros + + Revision 1.44 2002/03/09 17:08:48 hal9 + New section on Jon's actions file editor, and move some stuff around. + + Revision 1.43 2002/03/08 00:47:32 hal9 + Added imageblock{pattern}. + + Revision 1.42 2002/03/07 18:16:55 swa + looks better + + Revision 1.41 2002/03/07 16:46:43 hal9 + Fix a few markup problems for jade. + + Revision 1.40 2002/03/07 16:28:39 swa + provide correct feedback channels + + Revision 1.39 2002/03/06 16:19:28 hal9 + Note on perceived filtering slowdown per FR. + + Revision 1.38 2002/03/05 23:55:14 hal9 + Stupid I did it again. Double hyphen in comment breaks jade. + + Revision 1.37 2002/03/05 23:53:49 hal9 + jade barfs on '- -' embedded in comments. - -user option broke it. + + Revision 1.36 2002/03/05 22:53:28 hal9 + Add new - - user option. + Revision 1.35 2002/03/05 00:17:27 hal9 Added section on command line options.