From: hal9 Date: Mon, 24 Sep 2001 01:27:56 +0000 (+0000) Subject: Draft. Unfinished. X-Git-Tag: v_2_9_9~38 X-Git-Url: http://www.privoxy.org/gitweb/?a=commitdiff_plain;h=e5e8de7741d7605bbf0419df7e2638ff58401c61;p=privoxy.git Draft. Unfinished. --- diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index 9ad3f1a8..e7c6334d 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -17,6 +17,15 @@ Junkbusters Corporation. http://www.junkbusters.com --> + +
Junkbuster User Manual @@ -33,86 +42,2409 @@ - The user manual gives the users information on how to install and -configure the Internet Junkbuster. The Internet Junkbuster is an application -that provides privacy and security to the user of the world wide web. + The user manual gives the users information on how to install and configure + Internet Junkbuster. Internet + Junkbuster is an application that provides privacy and + security to users of the World Wide Web. -You can find the latest version of the user manual at http://ijbswa.sourceforge.net/user-manual/. +You can find the latest version of the user manual at http://ijbswa.sourceforge.net/doc/user-manual/. Feel free to send a note to the developers at ijbswa-developers@lists.sourceforge.net. + + + Introduction -To be filled. + + Internet Junkbuster is a web proxy with advanced + filtering capabilities for protecting privacy, filtering web page content, + managing cookies and removing ads, banners, pop-ups and other obnoxious + Internet Junk. Junkbuster has a very flexible + configuration and can be customized to suit individual needs and tastes. + Internet Junkbuster has application for both + stand-alone systems and multi-user networks. + + + + This documentation is included with the current development version of + Internet Junkbuster and is incomplete at this + point. The most up to date reference for the time being is still the comments + in the source files and in the individual configuration files. Development + of version 3.0 is currently underway, and includes significant changes and + enhancements over earlier verions. + + + + Since this is a development version, there are bugs! - -Quickstart to Using Junkbuster -To be filled. + + +License + + Internet Junkbuster is free software; you can + redistribute it and/or modify it under the terms of the GNU General Public + License as published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + + + This program is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for more + details, which is available from the Free Software Foundation, + Inc, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + + + + + + + + + + +History + + Junkbuster was originally written by JunkBusters + Corporation, and was released as free open-source software under the + GNU GPL. Stefan + Waldherr made many improvements, and started the SourceForge project to + rekindle development. + + + + + + Installation -To be filled. + + Junkbuster is available as raw source code, or + pre-compiled binaries. See the Junkbuster Home Page + for current releases. Junkbuster is also available + via CVS. + This is the recommended approach at this time. + + + +Source + + For gzipped tar archives, unpack the source: + + + + + tar zxvf ijb_source_2.9* + cd ijb_source_2.9* + + + + + For retrieving the current CVS sources, you'll need the CVS + package installed first. To download CVS source: + + + + + cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login + cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co current + cd current + + + + + This will create a directory named current/, which will + contain the source tree. + + + + Then, in either case, to build from source: + + + + + ./configure + make + su + make install + + + For Redhat and SuSE Linux RPM packages, see below. + + + + + Red Hat -To be filled. + + To build Redhat RPM packages, install source as above. Then: + + + + + ./configure + make redhat-dist + + + + + This will create both binary and src RPMs in the usual places. Example: + + + +    /usr/src/redhat/RPMS/i686/junkbuster-2.9.8-1.i686.rpm + + +    /usr/src/redhat/SRPMS/junkbuster-2.9.8-1.src.rpm + + + + To install, of course: + + + + + rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.8-1.i686.rpm + + + + + This will place the Junkbuster configuration + files in /etc/junkbuster/, and log files in + /var/log/junkbuster/. + SuSE -To be filled. + + To build SuSE RPM packages, install source as above. Then: + + + + + ./configure + make suse-dist + + + + + This will create both binary and src RPMs in the usual places. Example: + + + +    /usr/src/suse/RPMS/i686/junkbuster-2.9.8-1.i686.rpm + + +    /usr/src/suse/SRPMS/junkbuster-2.9.8-1.src.rpm + + + + To install, of course: + + + + + rpm -Uvv /usr/src/suse/RPMS/i686/junkbuster-2.9.8-1.i686.rpm + + + + This will place the Junkbuster configuration + files in /etc/junkbuster/, and log files in + /var/log/junkbuster/. + + + Windows -To be filled. +I need help on this. Not a clue here. Also for +configuration section below. Other -To be filled. +I need help on this too. OS/2? What others? + + + -Configuration -To be filled. +Junkbuster Configuration + + For Unix and Linux, all configuraton files are located in + /etc/junkbuster/ by default. For MS Windows, these + are all in the same directory as the Junkbuster + executable. The name and number of configuration files has changed from + previous versions, and is subject to change as development progresses. - + + + The installed defaults provide a reasonable starting point. For the + time being, there are only three default configuration files (this will + change in time): + + + + + + + + The main configuration file is named config + on Linux and Unix, and junkbustr.txt on Windows. + + + + + + The actionsfile file is used to define various + actions relating to images, banners, pop-ups, banners and cookies. + + + + + + The re_filterfile file can be used to rewrite the raw + page content, including text as well as embedded HTML and JavaScript. + + + + + + + + actionsfile and re_filterfile + can use Perl style regular expressions for maximum flexibility. All files use + the # character to denote a comment. Such + lines are not processed by Junkbuster. After + making any changes, restart Junkbuster in order + for the changes to take effect. + + -Contact the developers -To be filled. mention the support forums as the primary channel of -communication (bugs, feature requests, etc.) + + +The Main Configuration File + + Again, the main configuration file is named config on + Linux and Unix, and junkbustr.txt on Windows. + Configuration lines consist of an initial keyword followed by a list of + values, all separated by whitespace (any number of spaces or tabs). For + example: + + + + + + + blockfile blocklist.ini + + + + + + + Indicates that the blockfile is named blocklist.ini. + + + + The # indicates a comment. Any part of a + line following a # is ignored, except if + the # is preceded by a + \. - + + + Thus, by placing a # at the start of an + existing configuration line, you can make it a comment and it will be treated + as if it weren't there. This is called commenting out an + option and can be useful to turn off features: If you comment out the + logfile line, junkbuster will not + log to a file at all. Watch for the default: section in each + explanation to see what happens if the option is left unset (or commented + out). + + + + Long lines can be continued on the next line by using a + \ as the very last character. + + + + There are various aspects of Junkbuster behavior + that can be adjusted. + + -Copyright and History -To be filled. + + +Defining Other Configuration Files + + + Junkbuster can use a number of other files to tell it + what ads to block, what cookies to accept, etc. This section of the + configuration file tells Junkbuster where to find + all those other files. + + + + On Windows, Junkbuster + looks for these files in the same directory as the executable. On Unix, + Junkbuster looks for these files in the current + working directory. In either case, an absolute path name can be used to + avoid problems. + + + + When development goes modular and multiuser, the blocker, filter, and + per-user config will be stored in subdirectories of confdir. + For now, only confdir/templates is used for storing HTML + templates for CGI results. - + + + The location of the configuration files: + + + + + + + confdir /etc/junkbuster # No trailing /, please. + + + + + + + The directory where all logging (i.e. logfile and + jarfile) takes place. No trailing + /, please: + + + + + + + logdir /var/log/junkbuster + + + + + + + Note that all file specifications below are relative to + the above two directories! + + + + The actionsfile contains patterns to specify the actions to + apply to requests for each site. Default: Cookies to and from all + destinations are filtered. Popups are disabled for all sites. All sites are + filtered if re_filterfile specified. No sites are blocked. An empty image is + displayed for filtered ads and other images (formerly + tinygif). The syntax of this file is explained in detail + below. + + + + + + + actionsfile actionsfile + + + + + + + The re_filterfile file contains content modification rules. + These rules permit powerful changes on the content of Web pages, e.g., you + could disable your favourite JavaScript annoyances, rewrite the actual + content, or just have some fun replacing Microsoft with + MicroSuck wherever it appears on a Web page. Default: No + content modification, or whatever the developers are playing with :-/ + + + + + + + re_filterfile re_filterfile + + + + + + + The logfile is where all logging and error messages are written. The logfile + can be useful for tracking down a problem with + Junkbuster (e.g., it's not blocking an ad you + think it should block) but in most cases you probably will never look at it. + + + + Your logfile will grow indefinitely, and you will probably want to + periodically remove it. On Unix systems, you can do this with a cron job + (see man cron). For Redhat, a logrotate + script has been included. + + + + On SuSE Linux systems, you can place a line like /var/log/junkbuster.* + +1024k 644 nobody.nogroup in /etc/logfiles, with + the effect that cron.daily will automatically archive, gzip, and empty the + log, when it exceeds 1M size. + + + + Default: Log to the a file named logfile. + Comment out to disable logging. + + + + + + + logfile logfile + + + + + + + The jarfile defines where + Junkbuster stores the cookies it intercepts. Note + that if you use a jarfile, it may grow quite large. Default: + Don't store intercepted cookies. + + + + + + + #jarfile jarfile + + + + + + + If you specify a trustfile, + Junkbuster will only allow access to sites that + are named in the trustfile. You can also mark sites as trusted referrers, + with the effect that access to untrusted sites will be granted, if a link + from a trusted referrer was used. The link target will then be added to the + trustfile. This is a very restrictive feature that typical + users most propably want to leave disabled. Default: Disabled, don't use the + trust mechanism. + + + + + + + #trustfile trust + + + + + + + If you use the trust mechanism, it is a good idea to write up some online + documentation about your blocking policy and to specify the URL(s) here. They + will appear on the page that your users receive when they try to access + untrusted content. Use multiple times for multiple URLs. Default: Don't + display links on the untrusted info page. + + + + + + + trust-info-url http://www.your-site.com/why_we_block.html + trust-info-url http://www.your-site.com/what_we_allow.html + + + + + + + + + + -See also -To be filled. + + +Other Configuration Options + + + This part of the configuration file contains options that control how + Junkbuster operates. + + + + Admin-address should be set to the email address of the proxy + administrator. It is used in many of the proxy-generated pages. Default: + fill@me.in.please. + + + + + + + #admin-address fill@me.in.please + + + + + + + Proxy-info-url can be set to a URL that contains more info + about this Junkbuster installation, it's + configuration and policies. It is used in many of the proxy-generated pages + and its use is highly recommended in multi-user installations, since your + users will want to know why certain content is blocked or modified. Default: + Don't show a link to online documentation. + + + + + + + proxy-info-url http://www.your-site.com/proxy.html + + + + + + + Listen-address specifies the address and port where + Junkbuster will listen for connections from your + Web browser. The default is to listen on the localhost port 8000, and + this is suitable for most users. (In your web browser, under proxy + configuration, list the proxy server as localhost and the + port as 8000). + + + + If you already have another service running on port 8000, or if you want to + serve requests from other machines (e.g. on your local network) as well, you + will need to override the default. The syntax is + listen-address [<ip-address>]:<port>. If you leave + out the IP adress, junkbuster will bind to all + interfaces (addresses) on your machine and may become reachable from the + internet. In that case, consider using access control lists (acl's) (see + aclfile above). + + + + For example, suppose you are running Junkbuster on + a machine which has the address 192.168.0.1 on your local private network + (192.168.0.0) and has another outside connection with a different address. + You want it to serve requests from inside only: + + + + + + + listen-address 192.168.0.1:8000 + + + + + + + If you want it to listen on all addresses (including the outside + connection): + + + + + + + listen-address :8000 + + + + + + + If you do this, consider using ACLs (see aclfile above). Note: + you will need to point your browser(s) to the address and port that you have + configured here. Default: localhost:8000 (127.0.0.1:8000). + + + + The debug option sets the level of debugging information to log in the + logfile (and to the console in the Windows version). A debug level of 1 is + informative because it will show you each request as it happens. Higher + levels of debug are probably only of interest to developers. + + + + + + + debug 1 # GPC = show each GET/POST/CONNECT request + debug 2 # CONN = show each connection status + debug 4 # IO = show I/O status + debug 8 # HDR = show header parsing + debug 16 # LOG = log all data into the logfile + debug 32 # FRC = debug force feature + debug 64 # REF = debug regular expression filter + debug 128 # = debug fast redirects + debug 256 # = debug GIF deanimation + debug 512 # CLF = Common Log Format + debug 1024 # = debug kill popups + debug 4096 # INFO = Startup banner and warnings. + debug 8192 # ERROR = Non-fatal errors + + + + + + + It is highly recommended that you enable ERROR + reporting (debug 8192), at least until the next stable release. + + + + The reporting of FATAL errors (i.e. ones which crash + JunkBuster) is always on and cannot be disabled. + + + + If you want to use CLF (Common Log Format), you should set debug + 512 ONLY, do not enable anything else. + + + + Multiple debug directives, are OK - they're logical-OR'd + together. + + + + + + + debug 15 # same as setting the first 4 listed above + + + + + + + Default: + + + + + + + debug 1 # URLs + debug 4096 # Info + debug 8192 # Errors - *we highly recommended enabling this* + + + + + + + Junkbuster normally uses + multi-threading, a software technique that permits it to + handle many different requests simultaneously. In some cases you may wish to + disable this -- particularly if you're trying to debug a problem. The + single-threaded option forces + Junkbuster to handle requests sequentially. + Default: Multi-threaded mode. + + + + + + + #single-threaded + + + + + + + toggle allows you to temporarily disable all + Junkbuster's filtering. Just set toggle + 0. + + + + The Windows version of Junkbuster puts an icon in + the system tray, which allows you to change this option without having to + edit this file. If you right-click on that icon (or select the + Options menu), one choice is Enable. Clicking + on enable toggles Junkbuster on and off. This is + useful if you want to temporarily disable + Junkbuster, e.g., to access a site that requires + cookies which you normally have blocked. + + + + toggle 1 means Junkbuster runs + normally, toggle 0 means that + Junkbuster becomes a non-anonymizing non-blocking + proxy. Default: 1. + + + + + + toggle 1 + + + + + + + + + + + + + +Access Control List (ACL) + + Access controls are included at the request of some ISPs and systems + administrators, and are not usually needed by individual users. Please note + the warnings in the FAQ that this proxy is not intended to be a substitute + for a firewall or to encourage anyone to defer addressing basic security + weaknesses. + + + + If no access settings are specified, the proxy talks to anyone that + connects. If any access settings file are specified, then the proxy + talks only to IP addresses permitted somewhere in this file and not + denied later in this file. + + + + Summary -- if using an ACL: + + + + + Client must have permission to receive service. + + + + + LAST match in ACL wins. + + + + + Default behavior is to deny service. + + + + + The syntax for an entry in the Access Control List is: + + + + + + + ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ] + + + + + + + Where the individual fields are: + + + + + + + ACTION = permit-access or deny-access + + SRC_ADDR = client hostname or dotted IP address + SRC_MASKLEN = number of bits in the subnet mask for the source + + DST_ADDR = server or forwarder hostname or dotted IP address + DST_MASKLEN = number of bits in the subnet mask for the target + + + + + + + + The field separator (FS) is whitespace (space or tab). + + + + IMPORTANT NOTE: If the junkbuster is using a + forwarder (see below) or a gateway for a particular destination URL, the + DST_ADDR that is examined is the address of the forwarder + or the gateway and NOT the address of the ultimate + target. This is necessary because it may be impossible for the local + Junkbuster to determine the address of the + ultimate target (that's often what gateways are used for). + + + + Here are a few examples to show how the ACL features work: + + + + localhost is OK -- no DST_ADDR implies that + ALL destination addresses are OK: + + + + + + + permit-access localhost + + + + + + + A silly example to illustrate permitting any host on the class-C subnet with + Junkbuster to go anywhere: + + + + + + + permit-access www.junkbusters.com/24 + + + + + + + Except deny one particular IP address from using it at all: + + + + + + + deny-access ident.junkbusters.com + + + + + + + You can also specify an explicit network address and subnet mask. + Explicit addresses do not have to be resolved to be used. + + + + + + + permit-access 207.153.200.0/24 + + + + + + + A subnet mask of 0 matches anything, so the next line permits everyone. + + + + + + + permit-access 0.0.0.0/0 + + + + + + + Note, you cannot say: + + + + + + + permit-access .org + + + + + + + to allow all *.org domains. Every IP address listed must resolve fully. + + + + An ISP may want to provide a Junkbuster that is + accessible by the world and yet restrict use of some of their + private content to hosts on its internal network (i.e. its own subscribers). + Say, for instance the ISP owns the Class-B IP address block 123.124.0.0 (a 16 + bit netmask). This is how they could do it: + + + + + + + permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere + # with the following exceptions: + + deny-access 0.0.0.0/0 123.124.0.0/16 # block all external requests for + # sites on the ISP's network + + permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main + # web site + + permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go + # anywhere + + + + + + + Note that if some hostnames are listed with multiple IP addresses, + the primary value returned by DNS (via gethostbyname()) is used. Default: + Anyone can access the proxy. + + + + + + + + + + +Forwarding + + + This feature allows routing of HTTP requests via multiple proxies. + It can be used to better protect privacy and confidentiality when + accessing specific domains by routing requests to those domains + to a special purpose filtering proxy such as lpwa.com. + + + + It can also be used in an environment with multiple networks to route + requests via multiple gateways allowing transparent access to multiple + networks without having to modify browser configurations. + + + + Also specified here are SOCKS proxies. Junkbuster + SOCKS 4 and SOCKS 4A. The difference is that SOCKS 4A will resolve the target + hostname using DNS on the SOCKS server, not our local DNS client. + + + + The syntax of each line is: + + + + + + + forward target_domain[:port] http_proxy_host[:port] + forward-socks4 target_domain[:port] socks_proxy_host[:port] http_proxy_host[:port] + forward-socks4a target_domain[:port] socks_proxy_host[:port] http_proxy_host[:port] + + + + + + + If http_proxy_host is ., then requests are not forwarded to a + HTTP proxy but are made directly to the web servers. + + + + Lines are checked in sequence, and the last match wins. + + + + There is an implicit line equivalent to the following, which specifies that + anything not finding a match on the list is to go out without forwarding + or gateway protocol, like so: + + + + + + + forward .* . # implicit + + + + + + + In the following common configuration, everything goes to Lucent's LPWA, + except SSL on port 443 (which it doesn't handle): + + + + + + + forward .* lpwa.com:8000 + forward :443 . + + + + + + + See the FAQ for instructions on how to automate the login procedure for LPWA. + Some users have reported difficulties related to LPWA's use of + . as the last element of the domain, and have said that this + can be fixed with this: + + + + + + + forward lpwa. lpwa.com:8000 + + + + + + + (NOTE: the syntax for specifiying target_domain has changed since the + previous paragraph was written -- it will not work now. More information + is welcome.) + + + + In this fictitious example, everything goes via an ISP's caching proxy, + except requests to that ISP: + + + + + + + forward .* caching.myisp.net:8000 + forward myisp.net . + + + + + + + For the @home network, we're told the forwarding configuration is this: + + + + + + + + forward .* proxy:8080 + + + + + + + Also, we're told they insist on getting cookies and JavaScript, so you need + to add home.com to the cookie file. We consider JavaScript a security risk. + Java need not be enabled. + + + + In this example direct connections are made to all internal + domains, but everything else goes through Lucent's LPWA by way of the + company's SOCKS gateway to the Internet. + + + + + + + forward_socks4 .* lpwa.com:8000 firewall.my_company.com:1080 + forward my_company.com . + + + + + + + This is how you could set up a site that always uses SOCKS but no forwarders: + + + + + + + forward_socks4a .* . firewall.my_company.com:1080 + + + + + + + An advanced example for network administrators: + + + + If you have links to multiple ISPs that provide various special content to + their subscribers, you can configure forwarding to pass requests to the + specific host that's connected to that ISP so that everybody can see all + of the content on all of the ISPs. + + + + This is a bit tricky, but here's an example: + + + + + host-a has a PPP connection to isp-a.com. And host-b has a PPP connection to + isp-b.com. host-a can run a Junkbuster proxy with + forwarding like this: + + + + + + + forward .* . + forward isp-b.com host-b:8000 + + + + + + + host-b can run a Junkbuster proxy with forwarding + like this: + + + + + + + forward .* . + forward isp-a.com host-a:8000 + + + + + + + Now, anyone on the Internet (including users on host-a + and host-b) can set their browser's proxy to either + host-a or host-b and be able to browse the content on isp-a or isp-b. + + + + Here's another practical example, for University of Kent at + Canterbury students with a network connection in their room, who + need to use the University's Squid web cache. + + + + + + + forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for: + forward .ukc.ac.uk . # Anything on the same domain as us + forward * . # Host with no domain specified + forward 129.12.*.* . # A dotted IP on our /16 network. + forward 127.*.*.* . # Loopback address + forward localhost.localdomain . # Loopback address + forward www.ukc.mirror.ac.uk . # Specific host + + + + + + + If you intend to chain Junkbuster and + squid locally, then chain as + browser -> squid -> junkbuster is the recommended way. + + + + Your squid configuration could then look like this: + + + + + + + # Define junkbuster as parent cache + cache_peer 127.0.0.1 8000 parent 0 no-query + + # Define ACL for protocol FTP + acl FTP proto FTP + + # Do not forward ACL FTP to junkbuster + always_direct allow FTP + + # Do not forward ACL CONNECT (https) to junkbuster + always_direct allow CONNECT + + # Forward the rest to junkbuster + never_direct allow all + + + + + + + + + + + + + +Windows GUI Options + + + Junkbuster has a number of options specific to the + Windows GUI interface: + + + + If activity-animation is set to 1, the + Junkbuster icon will animate when + Junkbuster is active. To turn off, set to 0. + + + + + + + activity-animation 1 + + + + + + + If log-messages is set to 1, + Junkbuster will log messages to the console + window: + + + + + + + log-messages 1 + + + + + + + If log-buffer-size is set to 1, the size of the log buffer, + i.e. the amount of memory used for the log messages displayed in the + console window, will be limited to log-max-lines (see below). + + + + Warning: Setting this to 0 will result in the buffer to grow infinitely and + eat up all your memory! + + + + + + + log-buffer-size 1 + + + + + + + log-max-lines is the maximum number of lines held + in the log buffer. See above. + + + + + + + log-max-lines 200 + + + + + + + If log-highlight-messages is set to 1, + Junkbuster will highlight portions of the log + messages with a bold-faced font: + + + + + + + log-highlight-messages 1 + + + + + + + The font used in the console window: + + + + + + + log-font-name Comic Sans MS + + + + + + + Font size used in the console window: + + + + + + + log-font-size 8 + + + + + + + show-on-task-bar controls whether or not + Junkbuster will appear as a button on the Task bar + when minimized: + + + + + + + show-on-task-bar 0 + + + + + + + If close-button-minimizes is set to 1, the Windows close + button will minimize Junkbuster instead of closing + the program (close with the exit option on the File menu). + + + + + + + close-button-minimizes 1 + + + + + + + The hide-console option is specific to the MS-Win console + version of JunkBuster. If this option is used, + Junkbuster will disconnect from and hide the + command console. + + + + + + + #hide-console + + + + + + + + + + + + + +The Actions File + + + The actionsfile is used to define what actions + Junkbuster takes, and thus determines how images, + cookies and various other aspects of HTTP content and transactions are + handled. Images can be anything you want, including ads, banners, or just + some obnoxious image that you would rather not see. Cookies can be accepted + or rejected. The default file is in fact named actionsfile. + + + + To determine which actions apply to a request, the URL of the request is + compared to all patterns in this file. Every time it matches, the list of + applicable actions for the URL is incrementally updated. You can trace + this process by visiting http://i.j.b/show-url-info. + + + + There are four types of lines in this file: comments (begin with a + # character), actions, aliases and patterns, all of which are + explained below. + + + + + +URL Domain and Path Syntax + + Generally, a pattern has the form <domain>/<path>, where both the + <domain> and <path> part are optional. If you only specify a + domain part, the / can be left out: + + + + www.example.com - is a domain only pattern and will match any request to + www.example.com. + + + + www.example.com/ - means exactly the same. + + + + www.example.com/index.html - matches only the single + document /index.html on www.example.com. + + + + /index.html - matches the document /index.html, regardless of + the domain. + + + + index.html - matches nothing, since it would be + interpreted as a domain name and there is no top-level domain called + .html. + + + + The matching of the domain part offers some flexible options: if the + domain starts or ends with a dot, it becomes unanchored at that end. + For example: + + + + .example.com - matches any domain that ENDS in + .example.com. + + + + www. - matches any domain that STARTS with + www. + + + + Additionally, there are wildcards that you can use in the domain names + themselves. They work pretty similar to shell wildcards: * + stands for zero or more arbitrary characters, ? stands for + any single character. And you can define charachter classes in square + brackets and they can be freely mixed: + + + + ad*.example.com - matches adserver.example.com, + ads.example.com, etc but not sfads.example.com. + + + + *ad*.example.com - matches all of the above, and then some. + + + + .?pix.com - matches www.ipix.com, + pictures.epix.com, a.b.c.d.e.upix.com, etc. + + + + www[1-9a-ez].example.com - matches www1.example.com, + www4.example.com, wwwd.example.com, + wwwz.example.com, etc., but not + wwww.example.com. + + + + If Junkbuster was compiled with + pcre support (default), Perl compatible regular expressions + can be used. See the pcre/docs/ direcory or man + perlre (also available on http://www.perldoc.com/perl5.6/pod/perlre.html) + for details. A brief discussion of regular expressions is in the + Appendix. For instance: + + + + /.*/advert[0-9]+\.jpe?g - would match a URL from any + domain, with any path that includes advert followed + immediately by one or more digits, then a . and ending in + either jpeg or jpg. So we match + example.com/ads/advert2.jpg, and + www.example.com/ads/banners/advert39.jpeg, but not + www.example.com/ads/banners/advert39.gif (no gifs in the + example pattern). + + + + Please note that matching in the path is case + INSENSITIVE by default, but you can switch to case + sensitive at any point in the pattern by using the + (?-i) switch: + + + + www.example.com/(?-i)PaTtErN.* - will match only + documents whose path starts with PaTtErN in + exactly this capitalization. + + + + + + + + + + + +Actions + + Actions are enabled if preceded with a +, and disabled if + preceded with a -. Actions are invoked by enclosing the + action name in curly braces (e.g. {+some_action}), followed by a list of + URLs to which the action applies. There are three classes of actions: + + + + + + + + Boolean (e.g. +/-block): + + + + + + {+name} # enable this action + {-name} # disable this action + + + + + + + + + + Parameterized (e.g. +/-hide-user-agent): + + + + + + {+name{param}} # enable action and set parameter to param + {-name} # disable action + + + + + + + + + Multi-value (e.g. {+/-add-header{Name: value}}, {+/-wafer{name=value}}): + + + + + + {+name{param}} # enable action and add parameter param + {-name{param}} # remove the parameter param + {-name} # disable this action totally + + + + + + + + + + + If nothing is specified in this file, no actions are taken. + So in this case JunkBuster would just be a + normal, non-blocking, non-anonymizing proxy. You must specifically + enable the privacy and blocking features you need (although the + provided default actionsfile file will + give a good starting point). + + + + Later defined actions always over-ride earlier ones. For multi-valued + actions, the actions are applied in the order they are specified. + + + + The list of valid Junkbuster actions are: + + + + + + + + Add the specified HTTP header, which is not checked for validity. + You may specify this many times to specify many different headers: + + + + + + +add-header{Name: value} + + + + + + + + + + Block this URL totally. + + + + + + +block + + + + + + + + + + De-animate all animated GIF images, i.e. reduce them to their last frame. + This will also shrink the images considerably (in bytes, not pixels!). If + the option first is given, the first frame of the animation + is used as the replacement. If last is given, the last frame + of the animation is used instead, which propably makes more sense for most + banner animations, but also has the risk of not showing the entire last + frame (if it is only a delta to an earlier frame). + + + + + + +deanimate-gifs{last} + +deanimate-gifs{first} + + + + + + + + + Many sites, like yahoo.com, don't just link to other sites. Instead, they + will link to some script on their own server, giving the destination as a + parameter, which will then redirect you to the final target. URLs resulting + from this scheme typically look like: + http://some.place/some_script?http://some.where-else. + + + Sometimes, there are even multiple consecutive redirects encoded in the + URL. These redirections via scripts make your web browing more traceable, + since the server from which you follow such a link can see where you go to. + Apart from that, valuable bandwidth and time is wasted, while your browser + ask the server for one redirect after the other. Plus, it feeds the + advertisers. + + + The +fast-redirects option enables interception of these + requests by Junkbuster, who will cut off all but + the last valid URL in the request and send a local redirect back to your + browser without contacting the remote site. + + + + + + +fast-redirects + + + + + + + + + Filter the website through the re_filterfile: + + + + + + +filter{filename} + + + + + + + + + Block any existing X-Forwarded-for header, and do not add a new one: + + + + + + +hide-forwarded + + + + + + + + + If the browser sends a From: header containing your e-mail + address, this either completely removes the header (block), or + changes it to the specified e-mail address. + + + + + + +hide-from{block} + +hide-from{spam@sittingduck.xqq} + + + + + + + + + Don't send the Referer: (sic) header to the web site. You + can block it, forge a URL to the same server as the request (which is + preferred because some sites will not send images otherwise) or set it to a + constant string of your choice. + + + + + + +hide-referer{block} + +hide-referer{forge} + +hide-referer{http://nowhere.com} + + + + + + + + + Alternative spelling of +hide-referer. It has the same + parameters, and can be freely mixed with, +hide-referer. + (referrer is the correct English spelling, however the HTTP + specification has a bug - it requires it to be spelled referer.) + + + + + + +hide-referrer{...} + + + + + + + + + Change the User-Agent: header so web servers can't tell your + browser type. Warning! This breaks many web sites. Specify the + user-agent value you want. Example, pretend to be using Netscape on + Linux: + + + + + + +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)} + + + + + + + + + + Treat this URL as an image. This only matters if it's also +blocked, + in which case a blocked image can be sent rather than a HTML page. + See +image-blocker{} below for the control over what is actually sent. + + + + + + +image + + + + + + + + + Decides what to do with URLs that end up tagged with {+block + +image}. There are 4 options. -image-blocker will + send a HTML blocked page, usually resulting in a + broken image icon. +image-blocker{logo} will + send a JunkBuster image. + +image-blocker{blank} will send a 1x1 transparent GIF image. + And finally, +image-blocker{http://xyz.com} will send a HTTP + temporary redirect to the specified image. This has the advantage of the + icon being being cached by the browser, which will speed up the display. + + + + + + +image-blocker{logo} + +image-blocker{blank} + +image-blocker{http://i.j.b/send-banner} + + + + + + + + + Prevent the website from reading cookies: + + + + + + +no-cookies-read + + + + + + + + + Prevent the website from setting cookies: + + + + + + +no-cookies-set + + + + + + + + + Filter the website through a built-in filter to disable those obnoxious + JavaScript pop-up windows via window.open(), etc. The two alternative + spellings are equivalent. + + + + + + +no-popup + +no-popups + + + + + + + + + This action only applies if you are using a jarfile + for saving cookies. It sends a cookie to every site stating that you do not + accept any copyright on cookies sent to you, and asking them not to track + you. Of course, this is a (relatively) unique header they could use to + track you. + + + + + + +vanilla-wafer + + + + + + + + + This allows you to add an arbitrary cookie. It can be specified multiple + times in order to add as many cookies as you like. + + + + + + +wafer{name=value} + + + + + + + + + + + The meaning of any of the above is reversed by preceding the action with a + -, in place of the +. + + + + Some examples: + + + + Turn off cookies by default, then allow a few through for specified sites: + + + + + + + # Turn off all cookies + { +no-cookies-read } + { +no-cookies-set } + + # Execeptions to the above, sites that need cookies + { -no-cookies-read } + { -no-cookies-set } + .javasoft.com + .sun.com + .yahoo.com + .msdn.microsoft.com + .redhat.com + + # Alternative way of saying the same thing + {-no-cookies-set -no-cookies-read} + .sourceforge.net + .sf.net + + + + + + + Now turn off fast redirects, and then we allow two exceptions: + + + + + + + # Turn them off! + {+fast-redirects} + + # Reverse it for these two sites, which don't work right without it. + {-fast-redirects} + www.ukc.ac.uk/cgi-bin/wac\.cgi\? + login.yahoo.com + + + + + + + Turn on page filtering, with one exception for sourceforge: + + + + + + + # Run everything through the default filter file (re_filterfile): + {+filter} + + # But please don't re_filter code from sourceforge! + {-filter} + .cvs.sourceforge.net + + + + + + + Now some URLs that we want blocked, ie we won't see them. + Many of these use regular expressions that will expand to match multiple + URLs: + + + + + + + # Blocklist: + {+block} + /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g)) + /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/]) + /.*/(ng)?adclient\.cgi + /.*/(plain|live|rotate)[-_.]?ads?/ + /.*/(sponsor)s?[0-9]?/ + /.*/_?(plain|live)?ads?(-banners)?/ + /.*/abanners/ + /.*/ad(sdna_image|gifs?)/ + /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe) + /.*/adbanners/ + /.*/adserver + /.*/adstream\.cgi + /.*/adv((er)?ts?|ertis(ing|ements?))?/ + /.*/banner_?ads/ + /.*/banners?/ + /.*/banners?\.cgi/ + /.*/cgi-bin/centralad/getimage + /.*/images/addver\.gif + /.*/images/marketing/.*\.(gif|jpe?g) + /.*/popupads/ + /.*/siteads/ + /.*/sponsor.*\.gif + /.*/sponsors?[0-9]?/ + /.*/advert[0-9]+\.jpg + /Media/Images/Adds/ + /ad_images/ + /adimages/ + /.*/ads/ + /bannerfarm/ + /grafikk/annonse/ + /graphics/defaultAd/ + /image\.ng/AdType + /image\.ng/transactionID + /images/.*/.*_anim\.gif # alvin brattli + /ip_img/.*\.(gif|jpe?g) + /rotateads/ + /rotations/ + /worldnet/ad\.cgi + /cgi-bin/nph-adclick.exe/ + /.*/Image/BannerAdvertising/ + /.*/ad-bin/ + /.*/adlib/server\.cgi + /autoads/ + + + + + + + + + + + + +Aliases + + Custom actions, known to Junkbuster + as aliases, can be defined by combing other actions. + These can in turn be invoked just like the built-in actions. + Currently, an alias can contain any character except space, tab, =, + { or }. But please use only a- + z, 0-9, +, and + -. Alias names are not case sensitive, and must be defined + before they are used. + + + + Now let's define a few aliases: + + + + + + + # Aliases + {{alias}} + + # Useful aliases + +no-cookies = +no-cookies-set +no-cookies-read + -no-cookies = -no-cookies-set -no-cookies-read + fragile = -block -no-cookies -filter -fast-redirects -hide-referer -no-popups + shop = -no-cookies -filter -fast-redirects + +imageblock = +block +image + + #For people who don't like to type too much: ;-) + c0 = +no-cookies + c1 = -no-cookies + c2 = -no-cookies-set +no-cookies-read + c3 = +no-cookies-set -no-cookies-read + #... etc. Customize to your heart's content. + + + + + + + Some examples using our shop and fragile + aliases from above: + + + + + + + # These sites are very complex and require + # minimal interference. + {fragile} + .office.microsoft.com + .windowsupdate.microsoft.com + + # Shopping sites - still want to block ads. + {shop} + .quietpc.com + .worldpay.com # for quietpc.com + .jungle.com + .scan.co.uk + + # These shops require pop-ups + {shop -no-popups} + .dabs.com + .overclockers.co.uk + + + + + + + + + + + + + +The Filter File + + The filter file defines what filtering of web pages + Junkbuster does. The default filter file is + re_filterfile, located in the config directory. In this + file, any document content, whether viewable text or + embedded non-visible content, can be changed. + + + + This file uses regular expressions to alter or remove any string in the + target page. Some examples from the included default re_filterfile: + + + + Stop web pages from displaying annoying messages in the status bar by + deleting such references: + + + + + + + # The status bar is for displaying link targets, not pointless buzzwords. + # Again, check it out on http://www.airport-cgn.de/. + s/status='.*?';*//ig + + + + + + + Just for kicks, replace any occurrence of Microsoft with + MicroSuck: + + + + + + + s/microsoft(?!.com)/MicroSuck/ig + + + + + + + Kill those auto-refresh tags: + + + + + + + # Kill refresh tags. I like to refresh myself. Manually. + # check it out on http://www.airport-cgn.de/ and go to the arrivals page. + # + s/<meta[^>]*http-equiv[^>]*refresh.*URL=([^>]*?)"?>/<link rev="x-refresh" href=$1>/i + s/<meta[^>]*http-equiv="?page-enter"?[^>]*content=[^>]*>/<!--no page enter for me-->/i + + + + + + + + + + +Quickstart to Using Junkbuster +To be filled. + + + + + +Contact the developers +To be filled. mention the support forums as the primary channel of +communication (bugs, feature requests, etc.) + + + + +Copyright and History +To be filled. + + + + +See also +To be filled. + + + + + + +Appendix + + + + +Regular Expressions + + WIP + + + +