4 By: Junkbuster Developers
6 $Id: user-manual.sgml,v 1.48 2002/03/12 06:33:01 hal9 Exp $
8 The user manual gives users information on how to install, configure
9 and use Internet Junkbuster. Internet Junkbuster is a web proxy with
10 advanced filtering capabilities for protecting privacy, filtering web
11 page content, managing cookies, controlling access, and removing ads,
12 banners, pop-ups and other obnoxious Internet Junk. Junkbuster has a
13 very flexible configuration and can be customized to suit individual
14 needs and tastes. Internet Junkbuster has application for both
15 stand-alone systems and multi-user networks.
17 You can find the latest version of the user manual at
18 [1]http://ijbswa.sourceforge.net/user-manual/.
19 _________________________________________________________________
35 3. [11]JunkBuster Configuration
37 3.1. [12]Controlling Junkbuster with Your Web Browser
38 3.2. [13]Configuration Files Overview
39 3.3. [14]The Main Configuration File
41 3.3.1. [15]Defining Other Configuration Files
42 3.3.2. [16]Other Configuration Options
43 3.3.3. [17]Access Control List (ACL)
45 3.3.5. [19]Windows GUI Options
47 3.4. [20]The Actions File
49 3.4.1. [21]URL Domain and Path Syntax
53 3.5. [24]The Filter File
56 4. [26]Quickstart to Using Junkbuster
58 4.1. [27]Command Line Options
60 5. [28]Contacting the Developers, Bug Reporting and Feature Requests
61 6. [29]Copyright and History
69 8.1. [34]Regular Expressions
81 8.2. [35]JunkBuster's Internal Pages
85 Internet Junkbuster is a web proxy with advanced filtering
86 capabilities for protecting privacy, filtering and modifying web page
87 content, managing cookies, controlling access, and removing ads,
88 banners, pop-ups and other obnoxious Internet Junk. Junkbuster has a
89 very flexible configuration and can be customized to suit individual
90 needs and tastes. Internet Junkbuster has application for both
91 stand-alone systems and multi-user networks.
93 This documentation is included with the current BETA version of
94 Internet Junkbuster and is mostly complete at this point. The most up
95 to date reference for the time being is still the comments in the
96 source files and in the individual configuration files. Development of
97 version 3.0 is currently nearing completion, and includes many
98 significant changes and enhancements over earlier versions. The target
99 release date for stable v3.0 is "soon" ;-)
101 Since this is a BETA version, not all new features are well tested.
102 This documentation may be slightly out of sync as a result (especially
103 with CVS sources). And there may be bugs, though hopefully not many!
104 _________________________________________________________________
108 In addition to Junkbuster's traditional features of ad and banner
109 blocking and cookie management, this is a list of new features
110 currently under development:
112 * Integrated browser based configuration and control utility
113 ([36]http://i.j.b). Browser-based tracing of rule and filter
115 * Modularized configuration that will allow for system wide
116 settings, and individual user settings. (not implemented yet,
117 probably a 3.1 feature)
118 * Blocking of annoying pop-up browser windows.
119 * HTTP/1.1 compliant (most, but not all 1.1 features are supported).
120 * Support for Perl Compatible Regular Expressions in the
121 configuration files, and generally a more sophisticated and
122 flexible configuration syntax over previous versions.
124 * Web page content filtering (removes banners based on size,
125 invisible "web-bugs", JavaScript, pop-ups, status bar abuse, etc.)
126 * Bypass many click-tracking scripts (avoids script redirection).
127 * Multi-threaded (POSIX and native threads).
128 * Auto-detection and re-reading of config file changes.
129 * User-customizable HTML templates (e.g. 404 error page).
130 * Improved cookie management features (e.g. session based cookies).
131 * Builds from source on most UNIX-like systems. Packages available
132 for: Linux (RedHat, SuSE, or Debian), Windows, Sun Solaris, Mac
133 OSX, OS/2, HP-UX 11 and AmigaOS.
134 * In addition, the configuration is much more powerful and versatile
136 _________________________________________________________________
140 Junkbuster is available as raw source code, or pre-compiled binaries.
141 See the [37]Junkbuster Home Page for binaries and current release
142 info. Junkbuster is also available via [38]CVS. This is the
143 recommended approach at this time. But please be aware that CVS is
144 constantly changing, and it may break in mysterious ways.
145 _________________________________________________________________
149 For gzipped tar archives, unpack the source:
151 tar xzvf ijb_source_* [.tgz or .tar.gz]
152 cd ijb_source_2.9.11_beta
154 For retrieving the current CVS sources, you'll need the CVS package
155 installed first. To download CVS source:
157 cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login
158 cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co cu
162 This will create a directory named current/, which will contain the
165 Then, in either case, to build from tarball/CVS source:
167 ./configure (--help to see options)
168 make (the make from gnu, gmake for *BSD)
170 make -n install (to see where all the files will go)
171 make install (to really install)
173 For Redhat and SuSE Linux RPM packages, see below.
174 _________________________________________________________________
178 To build Redhat RPM packages, install source as above. Then:
180 autoheader [suggested for CVS source]
181 autoconf [suggested for CVS source]
185 This will create both binary and src RPMs in the usual places.
188 /usr/src/redhat/RPMS/i686/junkbuster-2.9.11-1.i686.rpm
190 /usr/src/redhat/SRPMS/junkbuster-2.9.11-1.src.rpm
192 To install, of course:
194 rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.11-1.i686.rpm
196 This will place the Junkbuster configuration files in
197 /etc/junkbuster/, and log files in /var/log/junkbuster/.
198 _________________________________________________________________
202 To build SuSE RPM packages, install source as above. Then:
204 autoheader [suggested for CVS source]
205 autoconf [suggested for CVS source]
209 This will create both binary and src RPMs in the usual places.
212 /usr/src/packages/RPMS/i686/junkbuster-2.9.11-1.i686.rpm
214 /usr/src/packages/SRPMS/junkbuster-2.9.11-1.src.rpm
216 To install, of course:
218 rpm -Uvv /usr/src/packages/RPMS/i686/junkbuster-2.9.11-1.i686.rpm
220 This will place the Junkbuster configuration files in
221 /etc/junkbuster/, and log files in /var/log/junkbuster/.
222 _________________________________________________________________
226 Junkbuster is packaged in a WarpIN self- installing archive. The
227 self-installing program will be named depending on the release
228 version, something like: ijbos2_setup_1.2.3.exe. In order to install
229 it, simply run this executable or double-click on its icon and follow
230 the WarpIN installation panels. A shadow of the Junkbuster executable
231 will be placed in your startup folder so it will start automatically
232 whenever OS/2 starts.
234 The directory you choose to install Junkbuster into will contain all
235 of the configuration files.
237 If you would like to build binary images on OS/2 yourself, you will
238 need a few Unix-like tools: autoconf, autoheader and sh. These tools
239 will be used to create the required config.h file, which is not part
240 of the source distribution because it differs based on platform. You
241 will also need a compiler. The distribution has been created using IBM
242 VisualAge compilers, but you can use any compiler you like. GCC/EMX
243 has the disadvantage of needing to be single-threaded due to a
244 limitation of EMX's implementation of the select() socket call.
246 In addition to needing the source code distribution as outlined
247 earlier, you will want to extract the os2seutp directory from CVS:
248 cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login
250 cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co os2
253 This will create a directory named os2setup/, which will contain the
254 Makefile.vac makefile and os2build.cmd which is used to completely
255 create the binary distribution. The sequence of events for building
256 the executable for yourself goes something like this:
262 nmake -f Makefile.vac
264 You will see this sequence laid out in os2build.cmd.
265 _________________________________________________________________
269 Click-click. (I need help on this. Not a clue here. Also for
270 configuration section below. HB.)
271 _________________________________________________________________
275 Some quick notes on other Operating Systems.
277 For FreeBSD (and other *BSDs?), the build will require gmake instead
278 of the included make. gmake is available from [39]http://www.gnu.org.
279 The rest should be the same as above for Linux/Unix.
280 _________________________________________________________________
282 3. JunkBuster Configuration
284 All JunkBuster configuration is kept in text files. These files can be
285 edited with a text editor. Many important aspects of JunkBuster can
286 also be controlled easily with a web browser.
287 _________________________________________________________________
289 3.1. Controlling Junkbuster with Your Web Browser
291 JunkBuster can be reached by the special URL [40]http://i.j.b/ (or
292 alternately [41]http://ijbswa.sourceforge.net/config/, which is an
293 internal page. You will see the following section:
295 Please choose from the following options:
297 * Show information about the current configuration
298 * Show the source code version numbers
299 * Show the client's request headers.
300 * Show which actions apply to a URL and why
301 * Toggle JunkBuster on or off
302 * Edit the actions list
305 This should be self-explanatory. Note the last item is an editor for
306 the "actions list", which is where much of the ad, banner, cookie, and
307 URL blocking magic is configured as well as other advanced features of
308 Junkbuster. This is an easy way to adjust various aspects of
309 Junkbuster configuration. The actions file, and other configuration
310 files, are explained in detail below. Junkbuster will automatically
311 detect any changes to these files.
313 "Toggle JunkBuster On or Off" is handy for sites that might have
314 problems with your current actions and filters, or just to test if a
315 site misbehaves, whether it is JunkBuster causing the problem or not.
316 Junkbuster continues to run as a proxy in this case, but all filtering
318 _________________________________________________________________
320 3.2. Configuration Files Overview
322 For Unix, *BSD and Linux, all configuration files are located in
323 /etc/junkbuster/ by default. For MS Windows, OS/2, and AmigaOS these
324 are all in the same directory as the Junkbuster executable. The name
325 and number of configuration files has changed from previous versions,
326 and is subject to change as development progresses.
328 The installed defaults provide a reasonable starting point, though
329 possibly aggressive by some standards. For the time being, there are
330 only three default configuration files (this will change in time):
332 * The main configuration file is named config on Linux, Unix, BSD,
333 OS/2, and AmigaOS and config.txt on Windows.
334 * The ijb.action file is used to define various "actions" relating
335 to images, banners, pop-ups, access restrictions, banners and
336 cookies. There is a CGI based editor for this file that can be
337 accessed via [42]http://i.j.b. (Other actions files are included
338 as well with differing levels of filtering and blocking, e.g.
340 * The re_filterfile file can be used to re-write the raw page
341 content, including viewable text as well as embedded HTML and
342 JavaScript, and whatever else lurks on any given web page.
344 ijb.action and re_filterfile can use Perl style regular expressions
345 for maximum flexibility. All files use the "#" character to denote a
346 comment. Such lines are not processed by Junkbuster. After making any
347 changes, there is no need to restart Junkbuster in order for the
348 changes to take effect. Junkbuster should detect such changes
351 While under development, the configuration content is subject to
352 change. The below documentation may not be accurate by the time you
353 read this. Also, what constitutes a "default" setting, may change, so
354 please check all your configuration files on important issues.
355 _________________________________________________________________
357 3.3. The Main Configuration File
359 Again, the main configuration file is named config on Linux/Unix/BSD
360 and OS/2, and config.txt on Windows. Configuration lines consist of an
361 initial keyword followed by a list of values, all separated by
362 whitespace (any number of spaces or tabs). For example:
364 blockfile blocklist.ini
366 Indicates that the blockfile is named "blocklist.ini".
368 A "#" indicates a comment. Any part of a line following a "#" is
369 ignored, except if the "#" is preceded by a "\".
371 Thus, by placing a "#" at the start of an existing configuration line,
372 you can make it a comment and it will be treated as if it weren't
373 there. This is called "commenting out" an option and can be useful to
374 turn off features: If you comment out the "logfile" line, junkbuster
375 will not log to a file at all. Watch for the "default:" section in
376 each explanation to see what happens if the option is left unset (or
379 Long lines can be continued on the next line by using a "\" as the
382 There are various aspects of Junkbuster behavior that can be tuned.
383 _________________________________________________________________
385 3.3.1. Defining Other Configuration Files
387 Junkbuster can use a number of other files to tell it what ads to
388 block, what cookies to accept, etc. This section of the configuration
389 file tells Junkbuster where to find all those other files.
391 On Windows and AmigaOS, Junkbuster looks for these files in the same
392 directory as the executable. On Unix and OS/2, Junkbuster looks for
393 these files in the current working directory. In either case, an
394 absolute path name can be used to avoid problems.
396 When development goes modular and multi-user, the blocker, filter, and
397 per-user config will be stored in subdirectories of "confdir". For
398 now, only confdir/templates is used for storing HTML templates for CGI
401 The location of the configuration files:
403 confdir /etc/junkbuster # No trailing /, please.
405 The directory where all logging (i.e. logfile and jarfile) takes
406 place. No trailing "/", please:
408 logdir /var/log/junkbuster
410 Note that all file specifications below are relative to the above two
413 The "ijb.action" file contains patterns to specify the actions to
414 apply to requests for each site. Default: Cookies to and from all
415 destinations are kept only during the current browser session (i.e.
416 they are not saved to disk). Pop-ups are disabled for all sites. All
417 sites are filtered through selected sections of "re_filterfile". No
418 sites are blocked. The JunkBuster logo is displayed for filtered ads
419 and other images . The syntax of this file is explained in detail
422 actionsfile ijb.action
424 The "re_filterfile" file contains content modification rules that use
425 "regular expressions". These rules permit powerful changes on the
426 content of Web pages, e.g., you could disable your favorite JavaScript
427 annoyances, re-write the actual displayed text, or just have some fun
428 replacing "Microsoft" with "MicroSuck" wherever it appears on a Web
429 page. Default: whatever the developers are playing with :-/
431 Filtering requires buffering the page content, which may appear to
432 slow down page rendering since nothing is displayed until all content
433 has passed the filters. (It does not really take longer, but seems
434 that way since the page is not incrementally displayed.) This effect
435 will be more noticeable on slower connections.
437 re_filterfile re_filterfile
439 The logfile is where all logging and error messages are written. The
440 logfile can be useful for tracking down a problem with Junkbuster
441 (e.g., it's not blocking an ad you think it should block) but in most
442 cases you probably will never look at it.
444 Your logfile will grow indefinitely, and you will probably want to
445 periodically remove it. On Unix systems, you can do this with a cron
446 job (see "man cron"). For Redhat, a logrotate script has been
449 On SuSE Linux systems, you can place a line like
450 "/var/log/junkbuster.* +1024k 644 nobody.nogroup" in /etc/logfiles,
451 with the effect that cron.daily will automatically archive, gzip, and
452 empty the log, when it exceeds 1M size.
454 Default: Log to the a file named logfile. Comment out to disable
459 The "jarfile" defines where Junkbuster stores the cookies it
460 intercepts. Note that if you use a "jarfile", it may grow quite large.
461 Default: Don't store intercepted cookies.
465 If you specify a "trustfile", Junkbuster will only allow access to
466 sites that are named in the trustfile. You can also mark sites as
467 trusted referrers, with the effect that access to untrusted sites will
468 be granted, if a link from a trusted referrer was used. The link
469 target will then be added to the "trustfile". This is a very
470 restrictive feature that typical users most probably want to leave
471 disabled. Default: Disabled, don't use the trust mechanism.
475 If you use the trust mechanism, it is a good idea to write up some
476 on-line documentation about your blocking policy and to specify the
477 URL(s) here. They will appear on the page that your users receive when
478 they try to access untrusted content. Use multiple times for multiple
479 URLs. Default: Don't display links on the "untrusted" info page.
481 trust-info-url http://www.your-site.com/why_we_block.html
482 trust-info-url http://www.your-site.com/what_we_allow.html
483 _________________________________________________________________
485 3.3.2. Other Configuration Options
487 This part of the configuration file contains options that control how
490 "Admin-address" should be set to the email address of the proxy
491 administrator. It is used in many of the proxy-generated pages.
492 Default: fill@me.in.please.
494 #admin-address fill@me.in.please
496 "Proxy-info-url" can be set to a URL that contains more info about
497 this Junkbuster installation, it's configuration and policies. It is
498 used in many of the proxy-generated pages and its use is highly
499 recommended in multi-user installations, since your users will want to
500 know why certain content is blocked or modified. Default: Don't show a
501 link to on-line documentation.
503 proxy-info-url http://www.your-site.com/proxy.html
505 "Listen-address" specifies the address and port where Junkbuster will
506 listen for connections from your Web browser. The default is to listen
507 on the localhost port 8118, and this is suitable for most users. (In
508 your web browser, under proxy configuration, list the proxy server as
509 "localhost" and the port as "8118").
511 If you already have another service running on port 8118, or if you
512 want to serve requests from other machines (e.g. on your local
513 network) as well, you will need to override the default. The syntax is
514 "listen-address [<ip-address>]:<port>". If you leave out the IP
515 address, junkbuster will bind to all interfaces (addresses) on your
516 machine and may become reachable from the Internet. In that case,
517 consider using access control lists (acl's) (see "aclfile" above), or
520 For example, suppose you are running Junkbuster on a machine which has
521 the address 192.168.0.1 on your local private network (192.168.0.0)
522 and has another outside connection with a different address. You want
523 it to serve requests from inside only:
525 listen-address 192.168.0.1:8118
527 If you want it to listen on all addresses (including the outside
532 If you do this, consider using ACLs (see "aclfile" above). Note: you
533 will need to point your browser(s) to the address and port that you
534 have configured here. Default: localhost:8118 (127.0.0.1:8118).
536 The debug option sets the level of debugging information to log in the
537 logfile (and to the console in the Windows version). A debug level of
538 1 is informative because it will show you each request as it happens.
539 Higher levels of debug are probably only of interest to developers.
541 debug 1 # GPC = show each GET/POST/CONNECT request
542 debug 2 # CONN = show each connection status
543 debug 4 # IO = show I/O status
544 debug 8 # HDR = show header parsing
545 debug 16 # LOG = log all data into the logfile
546 debug 32 # FRC = debug force feature
547 debug 64 # REF = debug regular expression filter
548 debug 128 # = debug fast redirects
549 debug 256 # = debug GIF de-animation
550 debug 512 # CLF = Common Log Format
551 debug 1024 # = debug kill pop-ups
552 debug 4096 # INFO = Startup banner and warnings.
553 debug 8192 # ERROR = Non-fatal errors
555 It is highly recommended that you enable ERROR reporting (debug 8192),
556 at least until the next stable release.
558 The reporting of FATAL errors (i.e. ones which crash JunkBuster) is
559 always on and cannot be disabled.
561 If you want to use CLF (Common Log Format), you should set "debug 512"
562 ONLY, do not enable anything else.
564 Multiple "debug" directives, are OK - they're logical-OR'd together.
566 debug 15 # same as setting the first 4 listed above
572 debug 8192 # Errors - *we highly recommended enabling this*
574 Junkbuster normally uses "multi-threading", a software technique that
575 permits it to handle many different requests simultaneously. In some
576 cases you may wish to disable this -- particularly if you're trying to
577 debug a problem. The "single-threaded" option forces Junkbuster to
578 handle requests sequentially. Default: Multi-threaded mode.
582 "toggle" allows you to temporarily disable all Junkbuster's filtering.
585 The Windows version of Junkbuster puts an icon in the system tray,
586 which also allows you to change this option. If you right-click on
587 that icon (or select the "Options" menu), one choice is "Enable".
588 Clicking on enable toggles Junkbuster on and off. This is useful if
589 you want to temporarily disable Junkbuster, e.g., to access a site
590 that requires cookies which you would otherwise have blocked. This can
591 also be toggled via a web browser at the Junkbuster internal address
592 of [44]http://i.j.b on any platform.
594 "toggle 1" means Junkbuster runs normally, "toggle 0" means that
595 Junkbuster becomes a non-anonymizing non-blocking proxy. Default: 1
600 For content filtering, i.e. the "+filter" and "+deanimate-gif"
601 actions, it is necessary that Junkbuster buffers the entire document
602 body. This can be potentially dangerous, since a server could just
603 keep sending data indefinitely and wait for your RAM to exhaust. With
606 The buffer-limit option lets you set the maximum size in Kbytes that
607 each buffer may use. When the documents buffer exceeds this size, it
608 is flushed to the client unfiltered and no further attempt to filter
609 the rest of it is made. Remember that there may multiple threads
610 running, which might require increasing the "buffer-limit" Kbytes
611 each, unless you have enabled "single-threaded" above.
615 To enable the web-based ijb.action file editor set enable-edit-actions
616 to 1, or 0 to disable. Note that you must have compiled JunkBuster
617 with support for this feature, otherwise this option has no effect.
618 This internal page can be reached at [45]http://i.j.b.
620 Security note: If this is enabled, anyone who can use the proxy can
621 edit the actions file, and their changes will affect all users. For
622 shared proxies, you probably want to disable this. Default: enabled.
624 enable-edit-actions 1
626 Allow JunkBuster to be toggled on and off remotely, using your web
627 browser. Set "enable-remote-toggle"to 1 to enable, and 0 to disable.
628 Note that you must have compiled JunkBuster with support for this
629 feature, otherwise this option has no effect.
631 Security note: If this is enabled, anyone who can use the proxy can
632 toggle it on or off (see [46]http://i.j.b), and their changes will
633 affect all users. For shared proxies, you probably want to disable
634 this. Default: enabled.
636 enable-remote-toggle 1
637 _________________________________________________________________
639 3.3.3. Access Control List (ACL)
641 Access controls are included at the request of some ISPs and systems
642 administrators, and are not usually needed by individual users. Please
643 note the warnings in the FAQ that this proxy is not intended to be a
644 substitute for a firewall or to encourage anyone to defer addressing
645 basic security weaknesses.
647 If no access settings are specified, the proxy talks to anyone that
648 connects. If any access settings file are specified, then the proxy
649 talks only to IP addresses permitted somewhere in this file and not
650 denied later in this file.
652 Summary -- if using an ACL:
654 Client must have permission to receive service.
656 LAST match in ACL wins.
658 Default behavior is to deny service.
660 The syntax for an entry in the Access Control List is:
662 ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ]
664 Where the individual fields are:
666 ACTION = "permit-access" or "deny-access"
667 SRC_ADDR = client hostname or dotted IP address
668 SRC_MASKLEN = number of bits in the subnet mask for the source
669 DST_ADDR = server or forwarder hostname or dotted IP address
670 DST_MASKLEN = number of bits in the subnet mask for the target
672 The field separator (FS) is whitespace (space or tab).
674 IMPORTANT NOTE: If the junkbuster is using a forwarder (see below) or
675 a gateway for a particular destination URL, the DST_ADDR that is
676 examined is the address of the forwarder or the gateway and NOT the
677 address of the ultimate target. This is necessary because it may be
678 impossible for the local Junkbuster to determine the address of the
679 ultimate target (that's often what gateways are used for).
681 Here are a few examples to show how the ACL features work:
683 "localhost" is OK -- no DST_ADDR implies that ALL destination
686 permit-access localhost
688 A silly example to illustrate permitting any host on the class-C
689 subnet with Junkbuster to go anywhere:
691 permit-access www.junkbusters.com/24
693 Except deny one particular IP address from using it at all:
695 deny-access ident.junkbusters.com
697 You can also specify an explicit network address and subnet mask.
698 Explicit addresses do not have to be resolved to be used.
700 permit-access 207.153.200.0/24
702 A subnet mask of 0 matches anything, so the next line permits
705 permit-access 0.0.0.0/0
707 Note, you cannot say:
711 to allow all *.org domains. Every IP address listed must resolve
714 An ISP may want to provide a Junkbuster that is accessible by "the
715 world" and yet restrict use of some of their private content to hosts
716 on its internal network (i.e. its own subscribers). Say, for instance
717 the ISP owns the Class-B IP address block 123.124.0.0 (a 16 bit
718 netmask). This is how they could do it:
720 permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere
721 # with the following exceptions
724 deny-access 0.0.0.0/0 123.124.0.0/16 # block all external request
726 # sites on the ISP's network
727 permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main
729 permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go
732 Note that if some hostnames are listed with multiple IP addresses, the
733 primary value returned by DNS (via gethostbyname()) is used. Default:
734 Anyone can access the proxy.
735 _________________________________________________________________
739 This feature allows chaining of HTTP requests via multiple proxies. It
740 can be used to better protect privacy and confidentiality when
741 accessing specific domains by routing requests to those domains to a
742 special purpose filtering proxy such as lpwa.com. Or to use a caching
743 proxy to speed up browsing.
745 It can also be used in an environment with multiple networks to route
746 requests via multiple gateways allowing transparent access to multiple
747 networks without having to modify browser configurations.
749 Also specified here are SOCKS proxies. Junkbuster SOCKS 4 and SOCKS
750 4A. The difference is that SOCKS 4A will resolve the target hostname
751 using DNS on the SOCKS server, not our local DNS client.
753 The syntax of each line is:
755 forward target_domain[:port] http_proxy_host[:port]
756 forward-socks4 target_domain[:port] socks_proxy_host[:port]
757 http_proxy_host[:port]
758 forward-socks4a target_domain[:port] socks_proxy_host[:port]
759 http_proxy_host[:port]
761 If http_proxy_host is ".", then requests are not forwarded to a HTTP
762 proxy but are made directly to the web servers.
764 Lines are checked in sequence, and the last match wins.
766 There is an implicit line equivalent to the following, which specifies
767 that anything not finding a match on the list is to go out without
768 forwarding or gateway protocol, like so:
770 forward .* . # implicit
772 In the following common configuration, everything goes to Lucent's
773 LPWA, except SSL on port 443 (which it doesn't handle):
775 forward .* lpwa.com:8000
778 See the FAQ for instructions on how to automate the login procedure
779 for LPWA. Some users have reported difficulties related to LPWA's use
780 of "." as the last element of the domain, and have said that this can
783 forward lpwa. lpwa.com:8000
785 (NOTE: the syntax for specifying target_domain has changed since the
786 previous paragraph was written -- it will not work now. More
787 information is welcome.)
789 In this fictitious example, everything goes via an ISP's caching
790 proxy, except requests to that ISP:
792 forward .* caching.myisp.net:8000
795 For the @home network, we're told the forwarding configuration is
798 forward .* proxy:8080
800 Also, we're told they insist on getting cookies and JavaScript, so you
801 should add home.com to the cookie file. We consider JavaScript a
802 security risk. Java need not be enabled.
804 In this example direct connections are made to all "internal" domains,
805 but everything else goes through Lucent's LPWA by way of the company's
806 SOCKS gateway to the Internet.
808 forward-socks4 .* lpwa.com:8000 firewall.my_company.com:1080
809 forward my_company.com .
811 This is how you could set up a site that always uses SOCKS but no
814 forward-socks4a .* . firewall.my_company.com:1080
816 An advanced example for network administrators:
818 If you have links to multiple ISPs that provide various special
819 content to their subscribers, you can configure forwarding to pass
820 requests to the specific host that's connected to that ISP so that
821 everybody can see all of the content on all of the ISPs.
823 This is a bit tricky, but here's an example:
825 host-a has a PPP connection to isp-a.com. And host-b has a PPP
826 connection to isp-b.com. host-a can run a Junkbuster proxy with
827 forwarding like this:
830 forward isp-b.com host-b:8118
832 host-b can run a Junkbuster proxy with forwarding like this:
835 forward isp-a.com host-a:8118
837 Now, anyone on the Internet (including users on host-a and host-b) can
838 set their browser's proxy to either host-a or host-b and be able to
839 browse the content on isp-a or isp-b.
841 Here's another practical example, for University of Kent at Canterbury
842 students with a network connection in their room, who need to use the
843 University's Squid web cache.
845 forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for:
846 forward .ukc.ac.uk . # Anything on the same domain as us
847 forward * . # Host with no domain specified
848 forward 129.12.*.* . # A dotted IP on our /16 network.
849 forward 127.*.*.* . # Loopback address
850 forward localhost.localdomain . # Loopback address
851 forward www.ukc.mirror.ac.uk . # Specific host
853 If you intend to chain Junkbuster and squid locally, then chain as
854 browser -> squid -> junkbuster is the recommended way.
856 Your squid configuration could then look like this:
858 # Define junkbuster as parent cache
860 cache_peer 127.0.0.1 parent 8118 0 no-query
862 # Define ACL for protocol FTP
864 # Do not forward ACL FTP to junkbuster
865 always_direct allow FTP
866 # Do not forward ACL CONNECT (https) to junkbuster
867 always_direct allow CONNECT
868 # Forward the rest to junkbuster
869 never_direct allow all
870 _________________________________________________________________
872 3.3.5. Windows GUI Options
874 Junkbuster has a number of options specific to the Windows GUI
877 If "activity-animation" is set to 1, the Junkbuster icon will animate
878 when "Junkbuster" is active. To turn off, set to 0.
882 If "log-messages" is set to 1, Junkbuster will log messages to the
887 If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the
888 amount of memory used for the log messages displayed in the console
889 window, will be limited to "log-max-lines" (see below).
891 Warning: Setting this to 0 will result in the buffer to grow
892 infinitely and eat up all your memory!
896 log-max-lines is the maximum number of lines held in the log buffer.
901 If "log-highlight-messages" is set to 1, Junkbuster will highlight
902 portions of the log messages with a bold-faced font:
904 log-highlight-messages 1
906 The font used in the console window:
908 log-font-name Comic Sans MS
910 Font size used in the console window:
914 "show-on-task-bar" controls whether or not Junkbuster will appear as a
915 button on the Task bar when minimized:
919 If "close-button-minimizes" is set to 1, the Windows close button will
920 minimize Junkbuster instead of closing the program (close with the
921 exit option on the File menu).
923 close-button-minimizes 1
925 The "hide-console" option is specific to the MS-Win console version of
926 JunkBuster. If this option is used, Junkbuster will disconnect from
927 and hide the command console.
930 _________________________________________________________________
932 3.4. The Actions File
934 The "ijb.action" file (formerly actionsfile) is used to define what
935 actions Junkbuster takes, and thus determines how images, cookies and
936 various other aspects of HTTP content and transactions are handled.
937 Images can be anything you want, including ads, banners, or just some
938 obnoxious image that you would rather not see. Cookies can be accepted
939 or rejected, or accepted only during the current browser session (i.e.
940 not written to disk). Changes to ijb.action should be immediately
941 visible to Junkbuster without the need to restart.
943 To determine which actions apply to a request, the URL of the request
944 is compared to all patterns in this file. Every time it matches, the
945 list of applicable actions for the URL is incrementally updated. You
946 can trace this process by visiting [47]http://i.j.b/show-url-info.
948 The actions file can be edited with a browser by loading
949 [48]http://i.j.b/, and then select "Edit Actions".
951 There are four types of lines in this file: comments (begin with a "#"
952 character), actions, aliases and patterns, all of which are explained
953 below, as well as the configuration file syntax that Junkbuster
955 _________________________________________________________________
957 3.4.1. URL Domain and Path Syntax
959 Generally, a pattern has the form <domain>/<path>, where both the
960 <domain> and <path> part are optional. If you only specify a domain
961 part, the "/" can be left out:
963 www.example.com - is a domain only pattern and will match any request
964 to "www.example.com".
966 www.example.com/ - means exactly the same.
968 www.example.com/index.html - matches only the single document
969 "/index.html" on "www.example.com".
971 /index.html - matches the document "/index.html", regardless of the
974 index.html - matches nothing, since it would be interpreted as a
975 domain name and there is no top-level domain called ".html".
977 The matching of the domain part offers some flexible options: if the
978 domain starts or ends with a dot, it becomes unanchored at that end.
981 .example.com - matches any domain that ENDS in ".example.com".
983 www. - matches any domain that STARTS with "www".
985 Additionally, there are wild-cards that you can use in the domain
986 names themselves. They work pretty similar to shell wild-cards: "*"
987 stands for zero or more arbitrary characters, "?" stands for any
988 single character. And you can define character classes in square
989 brackets and they can be freely mixed:
991 ad*.example.com - matches "adserver.example.com", "ads.example.com",
992 etc but not "sfads.example.com".
994 *ad*.example.com - matches all of the above, and then some.
996 .?pix.com - matches "www.ipix.com", "pictures.epix.com",
997 "a.b.c.d.e.upix.com", etc.
999 www[1-9a-ez].example.com - matches "www1.example.com",
1000 "www4.example.com", "wwwd.example.com", "wwwz.example.com", etc., but
1001 not "wwww.example.com".
1003 If Junkbuster was compiled with "pcre" support (default), Perl
1004 compatible regular expressions can be used. See the pcre/docs/
1005 directory or "man perlre" (also available on
1006 [49]http://www.perldoc.com/perl5.6/pod/perlre.html) for details. A
1007 brief discussion of regular expressions is in the [50]Appendix. For
1010 /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any
1011 path that includes "advert" followed immediately by one or more
1012 digits, then a "." and ending in either "jpeg" or "jpg". So we match
1013 "example.com/ads/advert2.jpg", and
1014 "www.example.com/ads/banners/advert39.jpeg", but not
1015 "www.example.com/ads/banners/advert39.gif" (no gifs in the example
1018 Please note that matching in the path is case INSENSITIVE by default,
1019 but you can switch to case sensitive at any point in the pattern by
1020 using the "(?-i)" switch:
1022 www.example.com/(?-i)PaTtErN.* - will match only documents whose path
1023 starts with "PaTtErN" in exactly this capitalization.
1024 _________________________________________________________________
1028 Actions are enabled if preceded with a "+", and disabled if preceded
1029 with a "-". Actions are invoked by enclosing the action name in curly
1030 braces (e.g. {+some_action}), followed by a list of URLs to which the
1031 action applies. There are three classes of actions:
1033 * Boolean (e.g. "+/-block"):
1034 {+name} # enable this action
1035 {-name} # disable this action
1037 * parameterized (e.g. "+/-hide-user-agent"):
1038 {+name{param}} # enable action and set parameter to "param"
1039 {-name} # disable action
1041 * Multi-value (e.g. "{+/-add-header{Name: value}}",
1042 "{+/-wafer{name=value}}"):
1043 {+name{param}} # enable action and add parameter "param"
1044 {-name{param}} # remove the parameter "param"
1045 {-name} # disable this action totally
1047 If nothing is specified in this file, no "actions" are taken. So in
1048 this case JunkBuster would just be a normal, non-blocking,
1049 non-anonymizing proxy. You must specifically enable the privacy and
1050 blocking features you need (although the provided default ijb.action
1051 file will give a good starting point).
1053 Later defined actions always over-ride earlier ones. For multi-valued
1054 actions, the actions are applied in the order they are specified.
1056 The list of valid Junkbuster "actions" are:
1058 * Add the specified HTTP header, which is not checked for validity.
1059 You may specify this many times to specify many different headers:
1060 +add-header{Name: value}
1062 * Block this URL totally.
1065 * De-animate all animated GIF images, i.e. reduce them to their last
1066 frame. This will also shrink the images considerably (in bytes,
1067 not pixels!). If the option "first" is given, the first frame of
1068 the animation is used as the replacement. If "last" is given, the
1069 last frame of the animation is used instead, which probably makes
1070 more sense for most banner animations, but also has the risk of
1071 not showing the entire last frame (if it is only a delta to an
1073 +deanimate-gifs{last}
1074 +deanimate-gifs{first}
1076 * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0
1077 and downgrade the responses as well. Use this action for servers
1078 that use HTTP/1.1 protocol features that Junkbuster doesn't handle
1079 well yet. HTTP/1.1 is only partially implemented. Default is not
1080 to downgrade requests.
1083 * Many sites, like yahoo.com, don't just link to other sites.
1084 Instead, they will link to some script on their own server, giving
1085 the destination as a parameter, which will then redirect you to
1086 the final target. URLs resulting from this scheme typically look
1087 like: http://some.place/some_script?http://some.where-else.
1088 Sometimes, there are even multiple consecutive redirects encoded
1089 in the URL. These redirections via scripts make your web browsing
1090 more traceable, since the server from which you follow such a link
1091 can see where you go to. Apart from that, valuable bandwidth and
1092 time is wasted, while your browser ask the server for one redirect
1093 after the other. Plus, it feeds the advertisers.
1094 The "+fast-redirects" option enables interception of these
1095 requests by Junkbuster, who will cut off all but the last valid
1096 URL in the request and send a local redirect back to your browser
1097 without contacting the remote site.
1100 * Apply the filters in the section_header section of the
1101 re_filterfile file to the site(s). Re_filterfile sections are
1102 grouped according to like functionality.
1103 +filter{section_header}
1105 Filter sections that are pre-defined in the supplied re_filterfile
1108 html-annoyances: Get rid of particularly annoying HTML abuse.
1110 js-annoyances: Get rid of particularly annoying JavaScript abuse
1112 no-poups: Kill all popups in JS and HTML
1114 frameset-borders: Give frames a border
1116 webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking)
1118 no-refresh: Automatic refresh sucks on auto-dialup lines
1120 fun: Text replacements for subversive browsing fun!
1122 nimda: Remove (virus) Nimda code.
1124 banners-by-size: Kill banners by size
1126 crude-parental: Kill all web pages that contain the words "sex" or
1129 * Block any existing X-Forwarded-for header, and do not add a new
1133 * If the browser sends a "From:" header containing your e-mail
1134 address, this either completely removes the header ("block"), or
1135 changes it to the specified e-mail address.
1137 +hide-from{spam@sittingduck.xqq}
1139 * Don't send the "Referer:" (sic) header to the web site. You can
1140 block it, forge a URL to the same server as the request (which is
1141 preferred because some sites will not send images otherwise) or
1142 set it to a constant string of your choice.
1143 +hide-referer{block}
1144 +hide-referer{forge}
1145 +hide-referer{http://nowhere.com}
1147 * Alternative spelling of "+hide-referer". It has the same
1148 parameters, and can be freely mixed with, "+hide-referer".
1149 ("referrer" is the correct English spelling, however the HTTP
1150 specification has a bug - it requires it to be spelled "referer".)
1153 * Change the "User-Agent:" header so web servers can't tell your
1154 browser type. Warning! This breaks many web sites. Specify the
1155 user-agent value you want. Example, pretend to be using Netscape
1157 +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)}
1159 * Treat this URL as an image. This only matters if it's also
1160 "+block"ed, in which case a "blocked" image can be sent rather
1161 than a HTML page. See "+image-blocker{}" below for the control
1162 over what is actually sent.
1165 * Decides what to do with URLs that end up tagged with "{+block
1166 +image}", e.g an advertizement. There are five options.
1167 "-image-blocker" will send a HTML "blocked" page, usually
1168 resulting in a "broken image" icon. "+image-blocker{logo}" will
1169 send a "JunkBuster" logo image. "+image-blocker{blank}" will send
1170 a 1x1 transparent GIF image. And finally,
1171 "+image-blocker{http://xyz.com}" will send a HTTP temporary
1172 redirect to the specified image. This has the advantage of the
1173 icon being being cached by the browser, which will speed up the
1174 display. "+image-blocker{pattern}" will send a checkboard type
1175 pattern, which scales better than the logo (which can get blocky
1176 if the browser enlarges it too much).
1177 +image-blocker{logo}
1178 +image-blocker{blank}
1179 +image-blocker{pattern}
1180 +image-blocker{http://i.j.b/send-banner}
1182 * By default (i.e. in the absence of a "+limit-connect" action),
1183 Junkbuster will only allow CONNECT requests to port 443, which is
1184 the standard port for https as a precaution.
1185 The CONNECT methods exists in HTTP to allow access to secure
1186 websites (https:// URLs) through proxies. It works very simply:
1187 the proxy connects to the server on the specified port, and then
1188 short-circuits its connections to the client and to the remote
1189 proxy. This can be a big security hole, since CONNECT-enabled
1190 proxies can be abused as TCP relays very easily.
1191 If you want to allow CONNECT for more ports than this, or want to
1192 forbid CONNECT altogether, you can specify a comma separated list
1193 of ports and port ranges (the latter using dashes, with the
1194 minimum defaulting to 0 and max to 65K):
1195 +limit-connect{443} # This is the default and need no be
1197 +limit-connect{80,443} # Ports 80 and 443 are OK.
1198 +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to
1200 #and above 500 are OK.
1202 * "+no-compression" prevents the website from compressing the data.
1203 Some websites do this, which can be a problem for Junkbuster,
1204 since "+filter", "+no-popup" and "+gif-deanimate" will not work on
1205 compressed data. This will slow down connections to those
1206 websites, though. Default is "nocompression" is turned on.
1209 * If the website sets cookies, "no-cookies-keep" will make sure they
1210 are erased when you exit and restart your web browser. This makes
1211 profiling cookies useless, but won't break sites which require
1212 cookies so that you can log in for transactions. Default: on.
1215 * Prevent the website from reading cookies:
1218 * Prevent the website from setting cookies:
1221 * Filter the website through a built-in filter to disable those
1222 obnoxious JavaScript pop-up windows via window.open(), etc. The
1223 two alternative spellings are equivalent.
1227 * This action only applies if you are using a jarfile for saving
1228 cookies. It sends a cookie to every site stating that you do not
1229 accept any copyright on cookies sent to you, and asking them not
1230 to track you. Of course, this is a (relatively) unique header they
1231 could use to track you.
1234 * This allows you to add an arbitrary cookie. It can be specified
1235 multiple times in order to add as many cookies as you like.
1238 The meaning of any of the above is reversed by preceding the action
1239 with a "-", in place of the "+".
1243 Turn off cookies by default, then allow a few through for specified
1246 # Turn off all persistent cookies
1247 { +no-cookies-read }
1249 # Allow cookies for this browser session ONLY
1250 { +no-cookies-keep }
1251 # Exceptions to the above, sites that benefit from persistent cookies
1252 { -no-cookies-read }
1254 { -no-cookies-keep }
1260 # Alternative way of saying the same thing
1261 {-no-cookies-set -no-cookies-read -no-cookies-keep}
1265 Now turn off "fast redirects", and then we allow two exceptions:
1270 # Reverse it for these two sites, which don't work right without it.
1272 www.ukc.ac.uk/cgi-bin/wac\.cgi\?
1275 Turn on page filtering according to rules in the defined sections of
1276 refilterfile, and make one exception for sourceforge:
1278 # Run everything through the filter file, using only the
1279 # specified sections:
1280 +filter{html-annoyances} +filter{js-annoyances} +filter{no-popups}\
1281 +filter{webbugs} +filter{nimda} +filter{banners-by-size}
1283 # Then disable filtering of code from sourceforge!
1285 .cvs.sourceforge.net
1287 Now some URLs that we want "blocked", ie we won't see them. Many of
1288 these use regular expressions that will expand to match multiple URLs:
1292 /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g))
1293 /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/])
1294 /.*/(ng)?adclient\.cgi
1295 /.*/(plain|live|rotate)[-_.]?ads?/
1296 /.*/(sponsor)s?[0-9]?/
1297 /.*/_?(plain|live)?ads?(-banners)?/
1299 /.*/ad(sdna_image|gifs?)/
1300 /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe)
1304 /.*/adv((er)?ts?|ertis(ing|ements?))?/
1308 /.*/cgi-bin/centralad/getimage
1309 /.*/images/addver\.gif
1310 /.*/images/marketing/.*\.(gif|jpe?g)
1314 /.*/sponsors?[0-9]?/
1315 /.*/advert[0-9]+\.jpg
1322 /graphics/defaultAd/
1324 /image\.ng/transactionID
1325 /images/.*/.*_anim\.gif # alvin brattli
1326 /ip_img/.*\.(gif|jpe?g)
1330 /cgi-bin/nph-adclick.exe/
1331 /.*/Image/BannerAdvertising/
1333 /.*/adlib/server\.cgi
1335 _________________________________________________________________
1339 Custom "actions", known to Junkbuster as "aliases", can be defined by
1340 combining other "actions". These can in turn be invoked just like the
1341 built-in "actions". Currently, an alias can contain any character
1342 except space, tab, "=", "{" or "}". But please use only "a"- "z",
1343 "0"-"9", "+", and "-". Alias names are not case sensitive, and must be
1344 defined before anything else in the ijb.actionfile ! And there can
1345 only be one set of "aliases" defined.
1347 Now let's define a few aliases:
1349 # Useful customer aliases we can use later. These must come first!
1351 +no-cookies = +no-cookies-set +no-cookies-read
1352 -no-cookies = -no-cookies-set -no-cookies-read
1353 fragile = -block -no-cookies -filter -fast-redirects -hide-refere
1355 shop = -no-cookies -filter -fast-redirects
1356 +imageblock = +block +image
1357 #For people who don't like to type too much: ;-)
1360 c2 = -no-cookies-set +no-cookies-read
1361 c3 = +no-cookies-set -no-cookies-read
1362 #... etc. Customize to your heart's content.
1364 Some examples using our "shop" and "fragile" aliases from above:
1366 # These sites are very complex and require
1367 # minimal interference.
1369 .office.microsoft.com
1370 .windowsupdate.microsoft.com
1372 # Shopping sites - still want to block ads.
1375 .worldpay.com # for quietpc.com
1378 # These shops require pop-ups
1382 _________________________________________________________________
1384 3.5. The Filter File
1386 Any web page can be dynamically modified with the filter file. This
1387 modification can be removal, or re-writing, of any web page content,
1388 including tags and non-visible content. The default filter file is
1389 re_filterfile, located in the config directory.
1391 The included example file is divided into sections. Each section
1392 begins with the FILTER keyword, followed by the identifier for that
1393 section, e.g. "FILTER: webbugs". Each section performs a similar type
1394 of filtering, such as "html-annoyances".
1396 This file uses regular expressions to alter or remove any string in
1397 the target page. The expressions can only operate on one line at a
1398 time. Some examples from the included default re_filterfile:
1400 Stop web pages from displaying annoying messages in the status bar by
1401 deleting such references:
1403 FILTER: html-annoyances
1404 # New browser windows should be resizeable and have a location and st
1408 s/resizable="?(no|0)"?/resizable=1/ig s/noresize/yesresize/ig
1409 s/location="?(no|0)"?/location=1/ig s/status="?(no|0)"?/status=1/ig
1410 s/scrolling="?(no|0|Auto)"?/scrolling=1/ig
1411 s/menubar="?(no|0)"?/menubar=1/ig
1412 # The <BLINK> tag was a crime!
1414 s*<blink>|</blink>**ig
1417 #s/framespacing="?(no|0)"?//ig
1418 #s/margin(height|width)=[0-9]*//gi
1420 Just for kicks, replace any occurrence of "Microsoft" with
1421 "MicroSuck", and have a little fun with topical buzzwords:
1424 s/microsoft(?!.com)/MicroSuck/ig
1427 s/industry-leading|cutting-edge|award-winning/<font color=red><b>BING
1430 Kill those pesky little web-bugs:
1432 # webbugs: Squish WebBugs (1x1 invisible GIFs used for user tracking)
1434 s/<img\s+[^>]*?(width|height)\s*=\s*['"]?1\D[^>]*?(width|height)\s*=\
1435 s*['"]?1(\D[^>]*?)?>/<!-- Squished WebBug -->/sig
1436 _________________________________________________________________
1440 When Junkbuster displays one of its internal pages, such as a 404 Not
1441 Found error page, it uses the appropriate template. On Linux, BSD, and
1442 Unix, these are located in /etc/junkbuster/templates by default. These
1443 may be customized, if desired.
1444 _________________________________________________________________
1446 4. Quickstart to Using Junkbuster
1448 Install package, then run and enjoy! JunkBuster is typically started
1449 by specifying the main configuration file to be used on the command
1450 line. Example Unix startup command:
1453 # /usr/sbin/junkbuster /etc/junkbuster/config
1456 An init script is provided for SuSE and Redhat.
1458 For for SuSE: /etc/rc.d/junkbuster start
1460 For RedHat: /etc/rc.d/init.d/junkbuster start
1462 If no configuration file is specified on the command line, Junkbuster
1463 will look for a file named config in the current directory. Except on
1464 Win32 where it will try config.txt. If no file is specified on the
1465 command line and no default configuration file can be found,
1466 Junkbuster will fail to start.
1468 Be sure your browser is set to use the proxy which is by default at
1469 localhost, port 8118. With Netscape (and Mozilla), this can be set
1470 under Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy. For
1471 Internet Explorer: Tools > Internet Properties -> Connections -> LAN
1472 Setting. Then, check "Use Proxy" and fill in the appropriate info
1473 (Address: localhost, Port: 8118). Include if HTTPS proxy support too.
1475 The included default configuration files should give a reasonable
1476 starting point, though may be somewhat aggressive in blocking junk.
1477 You will probably want to keep an eye out for sites that require
1478 persistent cookies, and add these to ijb.action as needed. By default,
1479 most of these will be accepted only during the current browser
1480 session, until you add them to the configuration. If you want the
1481 browser to handle this instead, you will need to edit ijb.action and
1482 disable this feature. If you use more than one browser, it would make
1483 more sense to let Junkbuster handle this. In which case, the
1484 browser(s) should be set to accept all cookies.
1486 If a particular site shows problems loading properly, try adding it to
1487 the {fragile} section of ijb.action. This will turn off most actions
1490 Junkbuster is HTTP/1.1 compliant, but not all 1.1 features are as yet
1491 implemented. If browsers that support HTTP/1.1 (like Mozilla or recent
1492 versions of I.E.) experience problems, you might try to force HTTP/1.0
1493 compatibility. For Mozilla, look under Edit -> Preferences -> Debug ->
1494 Networking. Or set the "+downgrade" config option in ijb.action.
1496 After running Junkbuster for a while, you can start to fine tune the
1497 configuration to suit your personal, or site, preferences and
1498 requirements. There are many, many aspects that can be customized.
1499 "Actions" (as specified in ijb.action) can be adjusted by pointing
1500 your browser to [51]http://i.j.b/, and then follow the link to "edit
1501 the actions list". (This is an internal page and does not require
1504 In fact, various aspects of Junkbuster configuration can be viewed
1505 from this page, including current configuration parameters, source
1506 code version numbers, the browser's request headers, and "actions"
1507 that apply to a given URL. In addition to the ijb.action file editor
1508 mentioned above, Junkbuster can also be turned "on" and "off" from
1511 If you encounter problems, please verify it is a Junkbuster bug, by
1512 disabling Junkbuster, and then trying the same page. Also, try another
1513 browser if possible to eliminate browser or site problems. Before
1514 reporting it as a bug, see if there is not a configuration option that
1515 is enabled that is causing the page not to load. You can then add an
1516 exception for that page or site. If a bug, please report it to the
1517 developers (see below).
1518 _________________________________________________________________
1520 4.1. Command Line Options
1522 JunkBuster may be invoked with the following command-line options:
1525 Print version info and exit, Unix only.
1527 Print a short usage info and exit, Unix only.
1529 Don't become a daemon, i.e. don't fork and become process group
1530 leader, don't detach from controlling tty. Unix only.
1532 On startup, write the process ID to FILE. Delete the FILE on exit.
1533 Failiure to create or delete the FILE is non-fatal. If no FILE
1534 option is given, no PID file will be used. Unix only.
1535 * --user USER[.GROUP]
1536 After (optionally) writing the PID file, assume the user ID of
1537 USER, and if included the GID of GROUP. Exit if the privileges are
1538 not sufficient to do so. Unix only.
1540 If no configfile is included on the command line, JunkBuster will
1541 look for a file named "config" in the current directory (except on
1542 Win32 where it will look for "config.txt" instead). Specify full
1543 path to avoid confusion.
1544 _________________________________________________________________
1546 5. Contacting the Developers, Bug Reporting and Feature Requests
1548 We value your feedback. However, to provide you with the best support,
1551 * Use the [52]Sourceforge support forum to get help.
1552 * Submit bugs only thru our [53]Sourceforge bug forum. Make sure
1553 that the bug has not already been submitted. Please try to verify
1554 that it is a Junkbuster bug, and not a browser or site bug first.
1555 If you are using your own custom configuration, please try the
1556 stock configs to see if the problem is a configuration related
1557 bug. And if not using the latest development snapshot, please try
1558 the latest one. Or even better, CVS sources.
1559 * Submit feature requests only thru our [54]Sourceforge feature
1562 For any other issues, feel free to use the [55]mailing lists.
1564 Anyone interested in actively participating in development and related
1565 discussions can join the appropriate mailing list [56]here. Archives
1566 are available here too.
1567 _________________________________________________________________
1569 6. Copyright and History
1573 Internet Junkbuster is free software; you can redistribute it and/or
1574 modify it under the terms of the GNU General Public License as
1575 published by the Free Software Foundation; either version 2 of the
1576 License, or (at your option) any later version.
1578 This program is distributed in the hope that it will be useful, but
1579 WITHOUT ANY WARRANTY; without even the implied warranty of
1580 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1581 General Public License for more details, which is available from
1582 [57]the Free Software Foundation, Inc, 59 Temple Place - Suite 330,
1583 Boston, MA 02111-1307, USA.
1584 _________________________________________________________________
1588 Junkbuster was originally written by Anonymous Coders and
1589 [58]Junkbuster's Corporation, and was released as free open-source
1590 software under the GNU GPL. [59]Stefan Waldherr made many
1591 improvements, and started the [60]SourceForge project to rekindle
1592 development. There are now several active developers contributing. The
1593 last stable release was v2.0.2, which has now grown whiskers ;-).
1594 _________________________________________________________________
1598 [61]http://sourceforge.net/projects/ijbswa
1600 [62]http://ijbswa.sourceforge.net/
1604 [64]http://www.junkbusters.com/ht/en/cookies.html
1606 [65]http://www.waldherr.org/junkbuster/
1608 [66]http://privacy.net/analyze/
1610 [67]http://www.squid-cache.org/
1611 _________________________________________________________________
1615 8.1. Regular Expressions
1617 Junkbuster can use "regular expressions" in various config files.
1618 Assuming support for "pcre" (Perl Compatible Regular Expressions) is
1619 compiled in, which is the default. Such configuration directives do
1620 not require regular expressions, but they can be used to increase
1621 flexibility by matching a pattern with wild-cards against URLs.
1623 If you are reading this, you probably don't understand what "regular
1624 expressions" are, or what they can do. So this will be a very brief
1625 introduction only. A full explanation would require a book ;-)
1627 "Regular expressions" is a way of matching one character expression
1628 against another to see if it matches or not. One of the "expressions"
1629 is a literal string of readable characters (letter, numbers, etc), and
1630 the other is a complex string of literal characters combined with
1631 wild-cards, and other special characters, called meta-characters. The
1632 "meta-characters" have special meanings and are used to build the
1633 complex pattern to be matched against. Perl Compatible Regular
1634 Expressions is an enhanced form of the regular expression language
1635 with backward compatibility.
1637 To make a simple analogy, we do something similar when we use
1638 wild-card characters when listing files with the dir command in DOS.
1639 *.* matches all filenames. The "special" character here is the
1640 asterisk which matches any and all characters. We can be more specific
1641 and use ? to match just individual characters. So "dir file?.text"
1642 would match "file1.txt", "file2.txt", etc. We are pattern matching,
1643 using a similar technique to "regular expressions"!
1645 Regular expressions do essentially the same thing, but are much, much
1646 more powerful. There are many more "special characters" and ways of
1647 building complex patterns however. Let's look at a few of the common
1648 ones, and then some examples:
1650 . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
1652 ? - The preceding character or expression is matched ZERO or ONE
1655 + - The preceding character or expression is matched ONE or MORE
1658 * - The preceding character or expression is matched ZERO or MORE
1661 \ - The "escape" character denotes that the following character should
1662 be taken literally. This is used where one of the special characters
1663 (e.g. ".") needs to be taken literally and not as a special
1666 [] - Characters enclosed in brackets will be matched if any of the
1667 enclosed characters are encountered.
1669 () - parentheses are used to group a sub-expression, or multiple
1672 | - The "bar" character works like an "or" conditional statement. A
1673 match is successful if the sub-expression on either side of "|"
1676 s/string1/string2/g - This is used to rewrite strings of text.
1677 "string1" is replaced by "string2" in this example.
1679 These are just some of the ones you are likely to use when matching
1680 URLs with Junkbuster, and is a long way from a definitive list. This
1681 is enough to get us started with a few simple examples which may be
1684 /.*/banners/.* - A simple example that uses the common combination of
1685 "." and "*" to denote any character, zero or more times. In other
1686 words, any string at all. So we start with a literal forward slash,
1687 then our regular expression pattern (".*") another literal forward
1688 slash, the string "banners", another forward slash, and lastly another
1689 ".*". We are building a directory path here. This will match any file
1690 with the path that has a directory named "banners" in it. The ".*"
1691 matches any characters, and this could conceivably be more forward
1692 slashes, so it might expand into a much longer looking path. For
1693 example, this could match:
1694 "/eye/hate/spammers/banners/annoy_me_please.gif", or just
1695 "/banners/annoying.html", or almost an infinite number of other
1696 possible combinations, just so it has "banners" in the path somewhere.
1698 A now something a little more complex:
1700 /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
1701 forward slashes again ("/"), so we are building another expression
1702 that is a file path statement. We have another ".*", so we are
1703 matching against any conceivable sub-path, just so it matches our
1704 expression. The only true literal that must match our pattern is adv,
1705 together with the forward slashes. What comes after the "adv" string
1706 is the interesting part.
1708 Remember the "?" means the preceding expression (either a literal
1709 character or anything grouped with "(...)" in this case) can exist or
1710 not, since this means either zero or one match. So
1711 "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
1712 sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
1713 "or". We have two of those. For instance, "(ing|ements?)", can expand
1714 to match either "ing" OR "ements?". What is being done here, is an
1715 attempt at matching as many variations of "advertisement", and
1716 similar, as possible. So this would expand to match just "adv", or
1717 "advert", or "adverts", or "advertising", or "advertisement", or
1718 "advertisements". You get the idea. But it would not match
1719 "advertizements" (with a "z"). We could fix that by changing our
1720 regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
1721 which would then match either spelling.
1723 /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
1724 forward slashes. Anything in the square brackets "[]" can be matched.
1725 This is using "0-9" as a shorthand expression to mean any digit one
1726 through nine. It is the same as saying "0123456789". So any digit
1727 matches. The "+" means one or more of the preceding expression must be
1728 included. The preceding expression here is what is in the square
1729 brackets -- in this case, any digit one through nine. Then, at the
1730 end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
1731 needs to match the expression on either side of that bar character
1732 also. A simple "gif" on one side, and the other side will in turn
1733 match either "jpeg" or "jpg", since the "?" means the letter "e" is
1734 optional and can be matched once or not at all. So we are building an
1735 expression here to match image GIF or JPEG type image file. It must
1736 include the literal string "advert", then one or more digits, and a
1737 "." (which is now a literal, and not a special character, since it is
1738 escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
1739 possible matches would include: "//advert1.jpg",
1740 "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
1741 would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
1742 (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
1743 not in the expression anywhere).
1745 s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
1746 will replace any occurrence of "microsoft". The "i" at the end of the
1747 expression means ignore case. The "(?!.com)" means the match should
1748 fail if "microsoft" is followed by ".com". In other words, this acts
1749 like a "NOT" modifier. In case this is a hyperlink, we don't want to
1752 We are barely scratching the surface of regular expressions here so
1753 that you can understand the default Junkbuster configuration files,
1754 and maybe use this knowledge to customize your own installation. There
1755 is much, much more that can be done with regular expressions. Now that
1756 you know enough to get started, you can learn more on your own :/
1758 More reading on Perl Compatible Regular expressions:
1759 [68]http://www.perldoc.com/perl5.6/pod/perlre.html
1760 _________________________________________________________________
1762 8.2. JunkBuster's Internal Pages
1764 Since JunkBuster proxies each requested web page, it is easy for
1765 JunkBuster to trap certain URLs. In this way, we can talk directly to
1766 JunkBuster, and see how it is configured, see how our rules are being
1767 applied, change these rules and other configuration options, and even
1768 turn JunkBuster's filtering off, all with a web browser.
1770 The URLs listed below are the special ones that allow direct access to
1771 JunkBuster. Of course, JunkBuster must be running to access these. If
1772 not, you will get a friendly error message.
1774 * Junkbuster main page:
1776 [69]http://ijbswa.sourceforge.net/config/
1777 Alternately, this may be reached at [70]http://i.j.b/, but this
1778 variation may not work as reliably as the above in some
1780 * Show information about the current configuration:
1782 [71]http://ijbswa.sourceforge.net/config/show-status
1783 * Show the source code version numbers:
1785 [72]http://ijbswa.sourceforge.net/config/show-version
1786 * Show the client's request headers:
1788 [73]http://ijbswa.sourceforge.net/config/show-request
1789 * Show which actions apply to a URL and why:
1791 [74]http://ijbswa.sourceforge.net/config/show-url-info
1792 * Toggle JunkBuster on or off:
1794 [75]http://ijbswa.sourceforge.net/config/toggle
1795 Short cuts. Turn off, then on:
1797 [76]http://ijbswa.sourceforge.net/config/toggle?set=disable
1799 [77]http://ijbswa.sourceforge.net/config/toggle?set=enable
1800 * Edit the actions list file:
1802 [78]http://ijbswa.sourceforge.net/config/edit-actions
1804 These may be bookmarked for quick reference.
1809 1. http://ijbswa.sourceforge.net/user-manual/
1810 2. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INTRODUCTION
1811 3. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN27
1812 4. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION
1813 5. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SOURCE
1814 6. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-RH
1815 7. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-SUSE
1816 8. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OS2
1817 9. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-WIN
1818 10. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#INSTALLATION-OTHER
1819 11. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONFIGURATION
1820 12. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN146
1821 13. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN163
1822 14. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN194
1823 15. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN227
1824 16. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN318
1825 17. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN455
1826 18. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN543
1827 19. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN652
1828 20. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE
1829 21. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN749
1830 22. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN823
1831 23. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1134
1832 24. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#FILTERFILE
1833 25. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1193
1834 26. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#QUICKSTART
1835 27. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1248
1836 28. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#CONTACT
1837 29. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#COPYRIGHT
1838 30. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1307
1839 31. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1313
1840 32. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#SEEALSO
1841 33. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#APPENDIX
1842 34. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX
1843 35. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1497
1845 37. http://sourceforge.net/projects/ijbswa/
1846 38. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/
1847 39. http://www.gnu.org/
1849 41. http://ijbswa.sourceforge.net/config/
1851 43. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#ACTIONSFILE
1855 47. http://i.j.b/show-url-info
1857 49. http://www.perldoc.com/perl5.6/pod/perlre.html
1858 50. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#REGEX
1860 52. http://sourceforge.net/tracker/?group_id=11118&atid=211118
1861 53. http://sourceforge.net/tracker/?group_id=11118&atid=111118
1862 54. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse
1863 55. http://sourceforge.net/mail/?group_id=11118
1864 56. http://sourceforge.net/mail/?group_id=11118
1865 57. http://www.gnu.org/copyleft/gpl.html
1866 58. http://www.junkbusters.com/ht/en/ijbfaq.html
1867 59. http://www.waldherr.org/junkbuster/
1868 60. http://sourceforge.net/projects/ijbswa/
1869 61. http://sourceforge.net/projects/ijbswa
1870 62. http://ijbswa.sourceforge.net/
1872 64. http://www.junkbusters.com/ht/en/cookies.html
1873 65. http://www.waldherr.org/junkbuster/
1874 66. http://privacy.net/analyze/
1875 67. http://www.squid-cache.org/
1876 68. http://www.perldoc.com/perl5.6/pod/perlre.html
1877 69. http://ijbswa.sourceforge.net/config/
1879 71. http://ijbswa.sourceforge.net/config/show-status
1880 72. http://ijbswa.sourceforge.net/config/show-version
1881 73. http://ijbswa.sourceforge.net/config/show-request
1882 74. http://ijbswa.sourceforge.net/config/show-url-info
1883 75. http://ijbswa.sourceforge.net/config/toggle
1884 76. http://ijbswa.sourceforge.net/config/toggle?set=disable
1885 77. http://ijbswa.sourceforge.net/config/toggle?set=enable
1886 78. http://ijbswa.sourceforge.net/config/edit-actions
1889 79. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1369
1890 80. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1377
1891 81. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1380
1892 82. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1383
1893 83. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1386
1894 84. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1391
1895 85. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1394
1896 86. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1397
1897 87. file://localhost/home/swa/sf/current-org/doc/source/tmp.html#AEN1403